aboutsummaryrefslogtreecommitdiffstats
path: root/python/notes/openlibrary_works.md
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-04-26 19:43:34 +0200
committerMartin Czygan <martin.czygan@gmail.com>2021-04-26 19:43:34 +0200
commit79b95cef49b242500c5f0e967c11de7732bd9514 (patch)
tree361e28c5d0714b82074593166f13a1952ed22495 /python/notes/openlibrary_works.md
parent0eb7cdcbf3986944064446849188533c091e5867 (diff)
downloadrefcat-79b95cef49b242500c5f0e967c11de7732bd9514.tar.gz
refcat-79b95cef49b242500c5f0e967c11de7732bd9514.zip
notes: on open library export
Diffstat (limited to 'python/notes/openlibrary_works.md')
-rw-r--r--python/notes/openlibrary_works.md19
1 files changed, 19 insertions, 0 deletions
diff --git a/python/notes/openlibrary_works.md b/python/notes/openlibrary_works.md
index 25df527..8f3e751 100644
--- a/python/notes/openlibrary_works.md
+++ b/python/notes/openlibrary_works.md
@@ -25,3 +25,22 @@ We are going to want:
- isbn-13 (if available)
- borrowable or not
+## SOLR export
+
+One time export: <https://archive.org/details/olsolr8-2021-04-12>
+
+Start OL/SOLR, then export to jsonl:
+
+```
+$ time solrdump -rows 10000 -verbose -sort "key asc" \
+ -server http://localhost:8983/solr/openlibrary | \
+ jq -rc . | zstd -c9 -T0 > ol.jsonl.zst
+```
+
+* 35842305 docs
+
+```
+24438138 work
+8425773 author
+2978394 subject
+```