diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2021-03-21 01:39:13 +0100 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2021-03-21 01:39:13 +0100 |
commit | 1eae78c37dcb605c369d977f4ad764694603641b (patch) | |
tree | 45c6f500c8d004f341939b2c027fba30147ed7cd /python/notes/openlibrary_works.md | |
parent | 6af4de12553fe1fdbb1e08342df0a84052e985cb (diff) | |
download | refcat-1eae78c37dcb605c369d977f4ad764694603641b.tar.gz refcat-1eae78c37dcb605c369d977f4ad764694603641b.zip |
add ol and wikipedia notes
Diffstat (limited to 'python/notes/openlibrary_works.md')
-rw-r--r-- | python/notes/openlibrary_works.md | 27 |
1 files changed, 27 insertions, 0 deletions
diff --git a/python/notes/openlibrary_works.md b/python/notes/openlibrary_works.md new file mode 100644 index 0000000..25df527 --- /dev/null +++ b/python/notes/openlibrary_works.md @@ -0,0 +1,27 @@ + +## Upstream Dumps + +Open Library does monthly bulk dumps: <https://archive.org/details/ol_exports?sort=-publicdate> + +Latest work dump: <https://openlibrary.org/data/ol_dump_works_latest.txt.gz> + +TSV columns: + + type - type of record (/type/edition, /type/work etc.) + key - unique key of the record. (/books/OL1M etc.) + revision - revision number of the record + last_modified - last modified timestamp + JSON - the complete record in JSON format + + zcat ol_dump_works_latest.txt.gz | cut -f5 | head | jq . + +We are going to want: + +- title (with "prefix"?) +- authors +- subtitle +- year +- identifier (work? edition?) +- isbn-13 (if available) +- borrowable or not + |