diff options
author | Bryan Newbold <bnewbold@archive.org> | 2021-12-07 14:49:50 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2021-12-07 14:49:50 -0800 |
commit | 9e6ac281b73825c2ba79212f261b881b7f577a16 (patch) | |
tree | 46c077341634d2aabe6a0f7693fc04bcf591696c /notes/2021-12_sim_update.md | |
parent | 7fa9e7dd83e41f3d331cb6b10df5f950f3d5ec8f (diff) | |
download | fatcat-scholar-9e6ac281b73825c2ba79212f261b881b7f577a16.tar.gz fatcat-scholar-9e6ac281b73825c2ba79212f261b881b7f577a16.zip |
final notes on this SIM pipeline iteration
Diffstat (limited to 'notes/2021-12_sim_update.md')
-rw-r--r-- | notes/2021-12_sim_update.md | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/notes/2021-12_sim_update.md b/notes/2021-12_sim_update.md index 691c916..7610d2e 100644 --- a/notes/2021-12_sim_update.md +++ b/notes/2021-12_sim_update.md @@ -316,4 +316,8 @@ Ok, start the dump again: | pv -l \ | pigz \ > /kubwa/scholar/2021-12-01/sim_intermediate.2021-12-01.json.gz + # 43.5M 34:20:45 [ 351 /s] +Huh. Why is this still only 43 out of 75 million pages? Because of blank pages, +or something else? Should add counters to indexing process, write out a +per-issue log of counts and status. But good progress for now, I guess. |