aboutsummaryrefslogtreecommitdiffstats
path: root/notes/ingest
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-10-11 21:42:24 -0700
committerBryan Newbold <bnewbold@archive.org>2020-10-11 21:42:24 -0700
commitca75f7295c3f5383534b25069ec1e64e4064cef6 (patch)
treed8ed4368a4e8a23a4f0de16f54fe3506569b3d76 /notes/ingest
parentba6a1dd509e5d4835159568c47b3bc234256d6af (diff)
downloadsandcrawler-ca75f7295c3f5383534b25069ec1e64e4064cef6.tar.gz
sandcrawler-ca75f7295c3f5383534b25069ec1e64e4064cef6.zip
OAI-PMH ingest progress timestamps
Diffstat (limited to 'notes/ingest')
-rw-r--r--notes/ingest/2020-05_oai_pmh.md13
1 files changed, 13 insertions, 0 deletions
diff --git a/notes/ingest/2020-05_oai_pmh.md b/notes/ingest/2020-05_oai_pmh.md
index de9bfba..fe22c75 100644
--- a/notes/ingest/2020-05_oai_pmh.md
+++ b/notes/ingest/2020-05_oai_pmh.md
@@ -192,6 +192,19 @@ And went from about 42,826,313 rows to 31,773,874 unique URLs to crawl, so
expecting at least 11,052,439 `no-capture` ingest results (and should probably
filter for these or even delete from the ingest request table).
+Ingest progress:
+
+ 2020-08-05 14:02: 32,571,018
+ 2020-08-06 13:49: 31,195,169
+ 2020-08-07 10:11: 29,986,169
+ 2020-08-10 10:43: 26,497,196
+ 2020-08-12 11:02: 23,811,845
+ 2020-08-17 13:34: 19,460,502
+ 2020-08-20 09:49: 15,069,507
+ 2020-08-25 09:56: 9,397,035
+ 2020-09-02 15:02: 305,889 (72k longest queue)
+ 2020-09-03 14:30: done
+
## Post-ingest stats
SELECT ingest_file_result.status, COUNT(*)