summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2018-11-14 18:36:48 -0800
committerBryan Newbold <bnewbold@robocracy.org>2018-11-14 18:36:48 -0800
commitcb8067ca6e4abe3bea499fc45ece05a454c794d6 (patch)
tree8e005e4edb7a58f60547976dab2017be2e4fb1ef
parent6bcd62005dd7eab94744f5f368d4724732bcfbd9 (diff)
downloadfatcat-cb8067ca6e4abe3bea499fc45ece05a454c794d6.tar.gz
fatcat-cb8067ca6e4abe3bea499fc45ece05a454c794d6.zip
more kafka performance notes
-rw-r--r--notes/performance/kafka_pipeline.txt16
1 files changed, 15 insertions, 1 deletions
diff --git a/notes/performance/kafka_pipeline.txt b/notes/performance/kafka_pipeline.txt
index f0862d89..0a503a18 100644
--- a/notes/performance/kafka_pipeline.txt
+++ b/notes/performance/kafka_pipeline.txt
@@ -11,7 +11,21 @@ messages/second. Because this worker consumes from 8x partitions, I have a
feeling it might be consumer group related. kafka-manager shows "0% coverage"
for this topic. Note that this is a single worker process.
-_consumer_offsets is seeing about 36 messages/sec.
+`_consumer_offsets` is seeing about 36 messages/sec.
Oh, looks like I just needed to enable auto_commit and tune parameters in
pykafka!
+
+That helped reduce `_consumer_offsets` churn, significantly, but didn't
+increase throughput (or not much). Might want to switch to kafka connect
+(presuming it somehow does faster/bulk inserts/indexing), with a simple worker
+doing the transforms. Probably worth doing a `> /dev/null` version of the
+worker first (with a different consumer group) to make sure the bottlneck isn't
+somewhere else.
+
+Another thing to try is more kafka fetch threads.
+
+elastic-release python processing is at 66% (of one core) CPU! and elastic at
+~30%. Huh.
+
+But, in general, "seems to be working".