diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2021-04-26 19:43:34 +0200 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2021-04-26 19:43:34 +0200 |
commit | 79b95cef49b242500c5f0e967c11de7732bd9514 (patch) | |
tree | 361e28c5d0714b82074593166f13a1952ed22495 /python/notes | |
parent | 0eb7cdcbf3986944064446849188533c091e5867 (diff) | |
download | refcat-79b95cef49b242500c5f0e967c11de7732bd9514.tar.gz refcat-79b95cef49b242500c5f0e967c11de7732bd9514.zip |
notes: on open library export
Diffstat (limited to 'python/notes')
-rw-r--r-- | python/notes/openlibrary_works.md | 19 |
1 files changed, 19 insertions, 0 deletions
diff --git a/python/notes/openlibrary_works.md b/python/notes/openlibrary_works.md index 25df527..8f3e751 100644 --- a/python/notes/openlibrary_works.md +++ b/python/notes/openlibrary_works.md @@ -25,3 +25,22 @@ We are going to want: - isbn-13 (if available) - borrowable or not +## SOLR export + +One time export: <https://archive.org/details/olsolr8-2021-04-12> + +Start OL/SOLR, then export to jsonl: + +``` +$ time solrdump -rows 10000 -verbose -sort "key asc" \ + -server http://localhost:8983/solr/openlibrary | \ + jq -rc . | zstd -c9 -T0 > ol.jsonl.zst +``` + +* 35842305 docs + +``` +24438138 work +8425773 author +2978394 subject +``` |