aboutsummaryrefslogtreecommitdiffstats
path: root/notes/ingest/2022-03_doaj.md
diff options
context:
space:
mode:
Diffstat (limited to 'notes/ingest/2022-03_doaj.md')
-rw-r--r--notes/ingest/2022-03_doaj.md12
1 files changed, 12 insertions, 0 deletions
diff --git a/notes/ingest/2022-03_doaj.md b/notes/ingest/2022-03_doaj.md
index bace480..9722459 100644
--- a/notes/ingest/2022-03_doaj.md
+++ b/notes/ingest/2022-03_doaj.md
@@ -264,3 +264,15 @@ Create seedlist:
Send off an added to `TARGETED-ARTICLE-CRAWL-2022-03` heritrix crawl, will
re-ingest when that completes (a week or two?).
+
+
+## Bulk Ingest
+
+After `TARGETED-ARTICLE-CRAWL-2022-03` wrap-up.
+
+ # 2022-03-22
+ cat /srv/sandcrawler/tasks/doaj_seedlist_2022-03-10.requests.json \
+ | rg -v "\\\\" \
+ | jq . -c \
+ | kafkacat -P -b wbgrp-svc263.us.archive.org -t sandcrawler-prod.ingest-file-requests-bulk -p -1
+