switch default kafka-broker host from wbgrp-svc263 to wbgrp-svc350

author: Bryan Newbold <bnewbold@archive.org> 2022-05-03 17:12:48 -0700
committer: Bryan Newbold <bnewbold@archive.org> 2022-05-03 17:12:48 -0700
commit: 00ae74378413e87f230c88113ff8163a6f969d63 (patch)
tree: 16cdcbde7a002704e80f494b7fd13fc5c19dd695 /RUNBOOK.md
parent: ef0421567dd67a248d0f92f32ad4e14ae0776920 (diff)
download: sandcrawler-00ae74378413e87f230c88113ff8163a6f969d63.tar.gz
sandcrawler-00ae74378413e87f230c88113ff8163a6f969d63.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/RUNBOOK.md b/RUNBOOK.md
index 33d4711..6c4165d 100644
--- a/RUNBOOK.md
+++ b/RUNBOOK.md
@@ -23,7 +23,7 @@ Copy/transfer to a Kafka node; load a sample and then the whole output:
 
 Older example; if this fails, need to re-run entire thing:
 
-    cat /srv/sandcrawler/tasks/regrobid_cdx.split_*.json | pv -l | parallel -j40 --linebuffer --round-robin --pipe ./grobid_tool.py --kafka-env prod --kafka-hosts wbgrp-svc263.us.archive.org:9092,wbgrp-svc284.us.archive.org:9092,wbgrp-svc285.us.archive.org:9092 --kafka-mode --grobid-host http://localhost:8070 -j0 extract-json -
+    cat /srv/sandcrawler/tasks/regrobid_cdx.split_*.json | pv -l | parallel -j40 --linebuffer --round-robin --pipe ./grobid_tool.py --kafka-env prod --kafka-hosts wbgrp-svc350.us.archive.org:9092,wbgrp-svc284.us.archive.org:9092,wbgrp-svc285.us.archive.org:9092 --kafka-mode --grobid-host http://localhost:8070 -j0 extract-json -
 
 TODO: is it possible to use job log with millions of `--pipe` inputs? That
 would be more efficient in the event of failure.
@@ -35,7 +35,7 @@ Want to use GNU/Parallel in a mode that will do retries well:
     fd .zip /srv/sandcrawler/tasks/crossref-pre-1909-scholarly-works/ | \
         sort | \
         parallel -j16 --progress --joblog extract_tasks.log --resume-failed \
-        './grobid_tool.py --kafka-mode --kafka-env prod --kafka-hosts wbgrp-svc263.us.archive.org:9092,wbgrp-svc284.us.archive.org:9092,wbgrp-svc285.us.archive.org:9092 --grobid-host http://localhost:8070 extract-zipfile {}'
+        './grobid_tool.py --kafka-mode --kafka-env prod --kafka-hosts wbgrp-svc350.us.archive.org:9092,wbgrp-svc284.us.archive.org:9092,wbgrp-svc285.us.archive.org:9092 --grobid-host http://localhost:8070 extract-zipfile {}'
 
 After starting, check that messages are actually getting pushed to kafka
 (producer failures can be silent!). If anything goes wrong, run the exact same
author	Bryan Newbold <bnewbold@archive.org>	2022-05-03 17:12:48 -0700
committer	Bryan Newbold <bnewbold@archive.org>	2022-05-03 17:12:48 -0700
commit	00ae74378413e87f230c88113ff8163a6f969d63 (patch)
tree	16cdcbde7a002704e80f494b7fd13fc5c19dd695 /RUNBOOK.md
parent	ef0421567dd67a248d0f92f32ad4e14ae0776920 (diff)
download	sandcrawler-00ae74378413e87f230c88113ff8163a6f969d63.tar.gz sandcrawler-00ae74378413e87f230c88113ff8163a6f969d63.zip