## Upstream Dumps Open Library does monthly bulk dumps: Latest work dump: TSV columns: type - type of record (/type/edition, /type/work etc.) key - unique key of the record. (/books/OL1M etc.) revision - revision number of the record last_modified - last modified timestamp JSON - the complete record in JSON format zcat ol_dump_works_latest.txt.gz | cut -f5 | head | jq . We are going to want: - title (with "prefix"?) - authors - subtitle - year - identifier (work? edition?) - isbn-13 (if available) - borrowable or not ## SOLR export One time export: Start OL/SOLR, then export to jsonl: ``` $ time solrdump -rows 10000 -verbose -sort "key asc" \ -server http://localhost:8983/solr/openlibrary | \ jq -rc . | zstd -c9 -T0 > ol.jsonl.zst ``` * 35842305 docs ``` 24438138 work 8425773 author 2978394 subject ```