diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2018-09-28 11:58:28 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2018-09-28 11:58:28 -0700 |
commit | b51bd93ce7d7ab758b00a938cc665a091d2e2995 (patch) | |
tree | fb5340c417f4f0dad7230356fac93974ff233b25 | |
parent | 9857b3347586608cf6d83dc096d5a2f1fc90ed62 (diff) | |
download | fatcat-b51bd93ce7d7ab758b00a938cc665a091d2e2995.tar.gz fatcat-b51bd93ce7d7ab758b00a938cc665a091d2e2995.zip |
document need to LC_ALL=C.UTF-8 for ES import
-rw-r--r-- | extra/elasticsearch/README.md | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/extra/elasticsearch/README.md b/extra/elasticsearch/README.md index 1e9d58fa..c94c3109 100644 --- a/extra/elasticsearch/README.md +++ b/extra/elasticsearch/README.md @@ -43,6 +43,7 @@ Bulk insert from a file on disk: Or, in a bulk production live-stream conversion: + export LC_ALL=C.UTF-8 time zcat /srv/fatcat/snapshots/fatcat_release_dump_expanded.json.gz | ./transform_release.py | esbulk -verbose -size 20000 -id ident -w 8 -index fatcat -type release # 2018/09/24 21:42:26 53028167 docs in 1h0m56.853006293s at 14501.039 docs/s with 8 workers @@ -60,7 +61,7 @@ actual query string, and "size" field with the max results to return): "default_operator": "AND", "analyze_wildcard": true, "lenient": true, - "fields": ["title^3", "contrib_names^3", "container_title"] + "fields": ["title^5", "contrib_names^2", "container_title"] } }, "size": 3 |