## Upstream Dumps Open Library does monthly bulk dumps: <https://archive.org/details/ol_exports?sort=-publicdate> Latest work dump: <https://openlibrary.org/data/ol_dump_works_latest.txt.gz> TSV columns: type - type of record (/type/edition, /type/work etc.) key - unique key of the record. (/books/OL1M etc.) revision - revision number of the record last_modified - last modified timestamp JSON - the complete record in JSON format zcat ol_dump_works_latest.txt.gz | cut -f5 | head | jq . We are going to want: - title (with "prefix"?) - authors - subtitle - year - identifier (work? edition?) - isbn-13 (if available) - borrowable or not ## SOLR export One time export: <https://archive.org/details/olsolr8-2021-04-12> Start OL/SOLR, then export to jsonl: ``` $ time solrdump -rows 10000 -verbose -sort "key asc" \ -server http://localhost:8983/solr/openlibrary | \ jq -rc . | zstd -c9 -T0 > ol.jsonl.zst ``` * 35842305 docs ``` 24438138 work 8425773 author 2978394 subject ```