diff options
Diffstat (limited to 'python/notes/openlibrary_works.md')
-rw-r--r-- | python/notes/openlibrary_works.md | 27 |
1 files changed, 27 insertions, 0 deletions
diff --git a/python/notes/openlibrary_works.md b/python/notes/openlibrary_works.md new file mode 100644 index 0000000..25df527 --- /dev/null +++ b/python/notes/openlibrary_works.md @@ -0,0 +1,27 @@ + +## Upstream Dumps + +Open Library does monthly bulk dumps: <https://archive.org/details/ol_exports?sort=-publicdate> + +Latest work dump: <https://openlibrary.org/data/ol_dump_works_latest.txt.gz> + +TSV columns: + + type - type of record (/type/edition, /type/work etc.) + key - unique key of the record. (/books/OL1M etc.) + revision - revision number of the record + last_modified - last modified timestamp + JSON - the complete record in JSON format + + zcat ol_dump_works_latest.txt.gz | cut -f5 | head | jq . + +We are going to want: + +- title (with "prefix"?) +- authors +- subtitle +- year +- identifier (work? edition?) +- isbn-13 (if available) +- borrowable or not + |