aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--python/notes/openlibrary_works.md19
1 files changed, 19 insertions, 0 deletions
diff --git a/python/notes/openlibrary_works.md b/python/notes/openlibrary_works.md
index 25df527..8f3e751 100644
--- a/python/notes/openlibrary_works.md
+++ b/python/notes/openlibrary_works.md
@@ -25,3 +25,22 @@ We are going to want:
- isbn-13 (if available)
- borrowable or not
+## SOLR export
+
+One time export: <https://archive.org/details/olsolr8-2021-04-12>
+
+Start OL/SOLR, then export to jsonl:
+
+```
+$ time solrdump -rows 10000 -verbose -sort "key asc" \
+ -server http://localhost:8983/solr/openlibrary | \
+ jq -rc . | zstd -c9 -T0 > ol.jsonl.zst
+```
+
+* 35842305 docs
+
+```
+24438138 work
+8425773 author
+2978394 subject
+```