## Upstream Dumps

Open Library does monthly bulk dumps: <https://archive.org/details/ol_exports?sort=-publicdate>

Latest work dump: <https://openlibrary.org/data/ol_dump_works_latest.txt.gz>

TSV columns:

    type - type of record (/type/edition, /type/work etc.)
    key - unique key of the record. (/books/OL1M etc.)
    revision - revision number of the record
    last_modified - last modified timestamp
    JSON - the complete record in JSON format

    zcat ol_dump_works_latest.txt.gz | cut -f5 | head | jq .

We are going to want:

- title (with "prefix"?)
- authors
- subtitle
- year
- identifier (work? edition?)
- isbn-13 (if available)
- borrowable or not

## SOLR export

One time export: <https://archive.org/details/olsolr8-2021-04-12>

Start OL/SOLR, then export to jsonl:

```
$ time solrdump -rows 10000 -verbose -sort "key asc" \
    -server http://localhost:8983/solr/openlibrary | \
    jq -rc . | zstd -c9 -T0 > ol.jsonl.zst
```

* 35842305 docs

```
24438138 work
8425773 author
2978394 subject
```