diff options
author | Bryan Newbold <bnewbold@archive.org> | 2018-12-10 13:33:41 +0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2018-12-10 13:33:41 +0800 |
commit | 6e8305e625f8b033d2697d40ed31ec15368678f9 (patch) | |
tree | cec31f542750e922786a1e3bf8a6eb60529ab06e /kafka/grobid_kafka_notes.txt | |
parent | 4736db1b1caca50a83bf7fb0d45e2e8d48d4e233 (diff) | |
download | sandcrawler-6e8305e625f8b033d2697d40ed31ec15368678f9.tar.gz sandcrawler-6e8305e625f8b033d2697d40ed31ec15368678f9.zip |
update notes
Diffstat (limited to 'kafka/grobid_kafka_notes.txt')
-rw-r--r-- | kafka/grobid_kafka_notes.txt | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/kafka/grobid_kafka_notes.txt b/kafka/grobid_kafka_notes.txt index d8bb171..b4fa2a8 100644 --- a/kafka/grobid_kafka_notes.txt +++ b/kafka/grobid_kafka_notes.txt @@ -41,6 +41,12 @@ Check grobid output: kafkacat -C -b localhost:9092 -t sandcrawler-qa.grobid-output +## Actual Production Commands + + gohdfs get sandcrawler/output-prod/2018-11-30-2125.55-dumpungrobided/part-00000 + mv part-00000 2018-11-30-2125.55-dumpungrobided.tsv + cat 2018-11-30-2125.55-dumpungrobided.tsv | kafkacat -P -b localhost:9092 -t sandcrawler-prod.ungrobided + ## Performance On 2018-11-21, using grobid-vm (svc096) with 30 cores, and running with 50x |