aboutsummaryrefslogtreecommitdiffstats
path: root/kafka/grobid_kafka_notes.txt
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2018-12-10 13:33:41 +0800
committerBryan Newbold <bnewbold@archive.org>2018-12-10 13:33:41 +0800
commit6e8305e625f8b033d2697d40ed31ec15368678f9 (patch)
treecec31f542750e922786a1e3bf8a6eb60529ab06e /kafka/grobid_kafka_notes.txt
parent4736db1b1caca50a83bf7fb0d45e2e8d48d4e233 (diff)
downloadsandcrawler-6e8305e625f8b033d2697d40ed31ec15368678f9.tar.gz
sandcrawler-6e8305e625f8b033d2697d40ed31ec15368678f9.zip
update notes
Diffstat (limited to 'kafka/grobid_kafka_notes.txt')
-rw-r--r--kafka/grobid_kafka_notes.txt6
1 files changed, 6 insertions, 0 deletions
diff --git a/kafka/grobid_kafka_notes.txt b/kafka/grobid_kafka_notes.txt
index d8bb171..b4fa2a8 100644
--- a/kafka/grobid_kafka_notes.txt
+++ b/kafka/grobid_kafka_notes.txt
@@ -41,6 +41,12 @@ Check grobid output:
kafkacat -C -b localhost:9092 -t sandcrawler-qa.grobid-output
+## Actual Production Commands
+
+ gohdfs get sandcrawler/output-prod/2018-11-30-2125.55-dumpungrobided/part-00000
+ mv part-00000 2018-11-30-2125.55-dumpungrobided.tsv
+ cat 2018-11-30-2125.55-dumpungrobided.tsv | kafkacat -P -b localhost:9092 -t sandcrawler-prod.ungrobided
+
## Performance
On 2018-11-21, using grobid-vm (svc096) with 30 cores, and running with 50x