diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-05-28 14:28:08 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-05-28 14:28:08 -0700 |
commit | b839dcb734805397b8bf611eb77942b9555f4915 (patch) | |
tree | 1f2046d15216c65e6c3949ef25804eb9297c395e | |
parent | 46c422e4b6d8e6a36ea65af19afd124ab42e457c (diff) | |
download | sandcrawler-b839dcb734805397b8bf611eb77942b9555f4915.tar.gz sandcrawler-b839dcb734805397b8bf611eb77942b9555f4915.zip |
ingest: OAI-PMH count table
-rw-r--r-- | notes/ingest/2020-05_oai_pmh.md | 24 |
1 files changed, 24 insertions, 0 deletions
diff --git a/notes/ingest/2020-05_oai_pmh.md b/notes/ingest/2020-05_oai_pmh.md index 37e7dfc..2f20415 100644 --- a/notes/ingest/2020-05_oai_pmh.md +++ b/notes/ingest/2020-05_oai_pmh.md @@ -142,6 +142,30 @@ but doesn't matter because fatcat wasn't importing these anyways): ORDER BY COUNT DESC LIMIT 20; + status | count + -------------------------+---------- + no-capture | 42565875 + success | 5227609 + no-pdf-link | 2156341 + redirect-loop | 559721 + cdx-error | 260446 + wrong-mimetype | 148871 + terminal-bad-status | 109725 + link-loop | 92792 + null-body | 30688 + | 15287 + petabox-error | 11109 + wayback-error | 6261 + skip-url-blocklist | 184 + gateway-timeout | 86 + bad-gzip-encoding | 25 + invalid-host-resolution | 24 + spn2-cdx-lookup-failure | 22 + bad-redirect | 15 + spn2-error | 4 + spn2-error:job-failed | 2 + (20 rows) + Dump again for crawling: COPY ( |