summaryrefslogtreecommitdiffstats
path: root/notes/indexing_pipeline.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2021-01-26 00:55:05 -0800
committerBryan Newbold <bnewbold@archive.org>2021-01-26 00:55:05 -0800
commit15c7c9ea0f09b2e30dffa85cd79a9f761ea29607 (patch)
tree228075896b62d49c254bb588de1122948f8ccde4 /notes/indexing_pipeline.md
parent2995379f558e8f5c2712bb17467586644d2d2fb4 (diff)
downloadfatcat-scholar-15c7c9ea0f09b2e30dffa85cd79a9f761ea29607.tar.gz
fatcat-scholar-15c7c9ea0f09b2e30dffa85cd79a9f761ea29607.zip
sim indexing: new parallel fetch structure
Diffstat (limited to 'notes/indexing_pipeline.md')
-rw-r--r--notes/indexing_pipeline.md8
1 files changed, 8 insertions, 0 deletions
diff --git a/notes/indexing_pipeline.md b/notes/indexing_pipeline.md
index f891d27..ce4d687 100644
--- a/notes/indexing_pipeline.md
+++ b/notes/indexing_pipeline.md
@@ -46,3 +46,11 @@ Transform and index both into local elasticsearch:
=> 132635 docs in 2m18.787824205s at 955.667 docs/s with 4 workers
+## Iterated
+
+ # in pipenv shell
+ python -m fatcat_scholar.sim_pipeline run_print_issues \
+ | parallel -j8 --colsep "\t" python -m fatcat_scholar.sim_pipeline run_fetch_issue {1} {2} \
+ | pv -l \
+ | gzip \
+ > data/sim_intermediate.json.gz