aboutsummaryrefslogtreecommitdiffstats
path: root/kafka/grobid_kafka_notes.txt
diff options
context:
space:
mode:
Diffstat (limited to 'kafka/grobid_kafka_notes.txt')
-rw-r--r--kafka/grobid_kafka_notes.txt6
1 files changed, 6 insertions, 0 deletions
diff --git a/kafka/grobid_kafka_notes.txt b/kafka/grobid_kafka_notes.txt
index d8bb171..b4fa2a8 100644
--- a/kafka/grobid_kafka_notes.txt
+++ b/kafka/grobid_kafka_notes.txt
@@ -41,6 +41,12 @@ Check grobid output:
kafkacat -C -b localhost:9092 -t sandcrawler-qa.grobid-output
+## Actual Production Commands
+
+ gohdfs get sandcrawler/output-prod/2018-11-30-2125.55-dumpungrobided/part-00000
+ mv part-00000 2018-11-30-2125.55-dumpungrobided.tsv
+ cat 2018-11-30-2125.55-dumpungrobided.tsv | kafkacat -P -b localhost:9092 -t sandcrawler-prod.ungrobided
+
## Performance
On 2018-11-21, using grobid-vm (svc096) with 30 cores, and running with 50x