diff options
Diffstat (limited to 'python')
-rw-r--r-- | python/notes/openlibrary_works.md | 19 |
1 files changed, 19 insertions, 0 deletions
diff --git a/python/notes/openlibrary_works.md b/python/notes/openlibrary_works.md index 25df527..8f3e751 100644 --- a/python/notes/openlibrary_works.md +++ b/python/notes/openlibrary_works.md @@ -25,3 +25,22 @@ We are going to want: - isbn-13 (if available) - borrowable or not +## SOLR export + +One time export: <https://archive.org/details/olsolr8-2021-04-12> + +Start OL/SOLR, then export to jsonl: + +``` +$ time solrdump -rows 10000 -verbose -sort "key asc" \ + -server http://localhost:8983/solr/openlibrary | \ + jq -rc . | zstd -c9 -T0 > ol.jsonl.zst +``` + +* 35842305 docs + +``` +24438138 work +8425773 author +2978394 subject +``` |