aboutsummaryrefslogtreecommitdiffstats
path: root/python/notes/openlibrary_works.md
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-03-21 01:39:13 +0100
committerMartin Czygan <martin.czygan@gmail.com>2021-03-21 01:39:13 +0100
commit1eae78c37dcb605c369d977f4ad764694603641b (patch)
tree45c6f500c8d004f341939b2c027fba30147ed7cd /python/notes/openlibrary_works.md
parent6af4de12553fe1fdbb1e08342df0a84052e985cb (diff)
downloadrefcat-1eae78c37dcb605c369d977f4ad764694603641b.tar.gz
refcat-1eae78c37dcb605c369d977f4ad764694603641b.zip
add ol and wikipedia notes
Diffstat (limited to 'python/notes/openlibrary_works.md')
-rw-r--r--python/notes/openlibrary_works.md27
1 files changed, 27 insertions, 0 deletions
diff --git a/python/notes/openlibrary_works.md b/python/notes/openlibrary_works.md
new file mode 100644
index 0000000..25df527
--- /dev/null
+++ b/python/notes/openlibrary_works.md
@@ -0,0 +1,27 @@
+
+## Upstream Dumps
+
+Open Library does monthly bulk dumps: <https://archive.org/details/ol_exports?sort=-publicdate>
+
+Latest work dump: <https://openlibrary.org/data/ol_dump_works_latest.txt.gz>
+
+TSV columns:
+
+ type - type of record (/type/edition, /type/work etc.)
+ key - unique key of the record. (/books/OL1M etc.)
+ revision - revision number of the record
+ last_modified - last modified timestamp
+ JSON - the complete record in JSON format
+
+ zcat ol_dump_works_latest.txt.gz | cut -f5 | head | jq .
+
+We are going to want:
+
+- title (with "prefix"?)
+- authors
+- subtitle
+- year
+- identifier (work? edition?)
+- isbn-13 (if available)
+- borrowable or not
+