From 9e6ac281b73825c2ba79212f261b881b7f577a16 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Tue, 7 Dec 2021 14:49:50 -0800 Subject: final notes on this SIM pipeline iteration --- notes/2021-12_sim_update.md | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'notes/2021-12_sim_update.md') diff --git a/notes/2021-12_sim_update.md b/notes/2021-12_sim_update.md index 691c916..7610d2e 100644 --- a/notes/2021-12_sim_update.md +++ b/notes/2021-12_sim_update.md @@ -316,4 +316,8 @@ Ok, start the dump again: | pv -l \ | pigz \ > /kubwa/scholar/2021-12-01/sim_intermediate.2021-12-01.json.gz + # 43.5M 34:20:45 [ 351 /s] +Huh. Why is this still only 43 out of 75 million pages? Because of blank pages, +or something else? Should add counters to indexing process, write out a +per-issue log of counts and status. But good progress for now, I guess. -- cgit v1.2.3