From 79b95cef49b242500c5f0e967c11de7732bd9514 Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Mon, 26 Apr 2021 19:43:34 +0200 Subject: notes: on open library export --- python/notes/openlibrary_works.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'python') diff --git a/python/notes/openlibrary_works.md b/python/notes/openlibrary_works.md index 25df527..8f3e751 100644 --- a/python/notes/openlibrary_works.md +++ b/python/notes/openlibrary_works.md @@ -25,3 +25,22 @@ We are going to want: - isbn-13 (if available) - borrowable or not +## SOLR export + +One time export: + +Start OL/SOLR, then export to jsonl: + +``` +$ time solrdump -rows 10000 -verbose -sort "key asc" \ + -server http://localhost:8983/solr/openlibrary | \ + jq -rc . | zstd -c9 -T0 > ol.jsonl.zst +``` + +* 35842305 docs + +``` +24438138 work +8425773 author +2978394 subject +``` -- cgit v1.2.3