aboutsummaryrefslogtreecommitdiffstats
path: root/extra/openlibrary/README.md
blob: 35ab1de8a6630805731fdbe65e3d6d5da40fd54b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# OpenLibrary

We use the [official dumps](https://openlibrary.org/developers/dumps) and
convert and merge them for reference processing.

> 2021-04-21

OL SOLR dump: https://archive.org/details/olsolr8-2021-04-12

Running on aitio docker:

```
docker: Error response from daemon: failed to update store for object type *libnetwork.endpointCnt: Key not found in store.
ERRO[0000] error waiting for container: context canceled
```

Maybe we do not need to extract the volume up front?

```shell
$ sudo docker run --name ol-solr -e SOLR_JAVA_MEM='-Xms3g -Xmx3g' -v
backup/var/lib/solr/data/openlibrary/data/openlibrary:/var/solr/data/openlibrary
-v $(pwd)/conf/solr8-ol:/opt/solr/server/solr/configsets/olconfig:ro -p
8983:8983 solr:8.8.2 solr-precreate openlibrary
/opt/solr/server/solr/configsets/olconfig
```

Create a named volume `ol-solr-data`.

```
$ wget -c https://archive.org/download/olsolr8-2021-04-12/olsolr8-2021-04-12.tar.gz
$ docker run -v ol-solr-data:/var/lib/solr/data/openlibrary -v $(pwd):/backup ubuntu:xenial tar xzf /backup/olsolr8-2021-04-12.tar.gz
```

Plan is to get data as JSON via [solrdump](https://github.com/ubleipzig/solrdump/) once the server is running.