aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2021-12-07 14:49:50 -0800
committerBryan Newbold <bnewbold@archive.org>2021-12-07 14:49:50 -0800
commit9e6ac281b73825c2ba79212f261b881b7f577a16 (patch)
tree46c077341634d2aabe6a0f7693fc04bcf591696c
parent7fa9e7dd83e41f3d331cb6b10df5f950f3d5ec8f (diff)
downloadfatcat-scholar-9e6ac281b73825c2ba79212f261b881b7f577a16.tar.gz
fatcat-scholar-9e6ac281b73825c2ba79212f261b881b7f577a16.zip
final notes on this SIM pipeline iteration
-rw-r--r--notes/2021-12_sim_update.md4
1 files changed, 4 insertions, 0 deletions
diff --git a/notes/2021-12_sim_update.md b/notes/2021-12_sim_update.md
index 691c916..7610d2e 100644
--- a/notes/2021-12_sim_update.md
+++ b/notes/2021-12_sim_update.md
@@ -316,4 +316,8 @@ Ok, start the dump again:
| pv -l \
| pigz \
> /kubwa/scholar/2021-12-01/sim_intermediate.2021-12-01.json.gz
+ # 43.5M 34:20:45 [ 351 /s]
+Huh. Why is this still only 43 out of 75 million pages? Because of blank pages,
+or something else? Should add counters to indexing process, write out a
+per-issue log of counts and status. But good progress for now, I guess.