From cb8067ca6e4abe3bea499fc45ece05a454c794d6 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Wed, 14 Nov 2018 18:36:48 -0800 Subject: more kafka performance notes --- notes/performance/kafka_pipeline.txt | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) (limited to 'notes/performance') diff --git a/notes/performance/kafka_pipeline.txt b/notes/performance/kafka_pipeline.txt index f0862d89..0a503a18 100644 --- a/notes/performance/kafka_pipeline.txt +++ b/notes/performance/kafka_pipeline.txt @@ -11,7 +11,21 @@ messages/second. Because this worker consumes from 8x partitions, I have a feeling it might be consumer group related. kafka-manager shows "0% coverage" for this topic. Note that this is a single worker process. -_consumer_offsets is seeing about 36 messages/sec. +`_consumer_offsets` is seeing about 36 messages/sec. Oh, looks like I just needed to enable auto_commit and tune parameters in pykafka! + +That helped reduce `_consumer_offsets` churn, significantly, but didn't +increase throughput (or not much). Might want to switch to kafka connect +(presuming it somehow does faster/bulk inserts/indexing), with a simple worker +doing the transforms. Probably worth doing a `> /dev/null` version of the +worker first (with a different consumer group) to make sure the bottlneck isn't +somewhere else. + +Another thing to try is more kafka fetch threads. + +elastic-release python processing is at 66% (of one core) CPU! and elastic at +~30%. Huh. + +But, in general, "seems to be working". -- cgit v1.2.3