From 7634f6ecf2361b1cb1cafd4e27fd1fb84d81d130 Mon Sep 17 00:00:00 2001
From: Bryan Newbold <bnewbold@robocracy.org>
Date: Tue, 13 Nov 2018 23:48:45 -0800
Subject: switch to auto consumer offset updates

This is the classic/correct way to do consumer group updates for higher
throughput, when "at least once" semantics are acceptible (as they are
here; double processing should be safe/fine).
---
 notes/performance/kafka_pipeline.txt | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
 create mode 100644 notes/performance/kafka_pipeline.txt

(limited to 'notes')

diff --git a/notes/performance/kafka_pipeline.txt b/notes/performance/kafka_pipeline.txt
new file mode 100644
index 00000000..f0862d89
--- /dev/null
+++ b/notes/performance/kafka_pipeline.txt
@@ -0,0 +1,17 @@
+
+## Early Notes (2018-11-13)
+
+Ran through about 100k crossref objects, resulting in about 77k messages (in
+about 4k editgroups/changelogs).
+
+Have seen tens of messages per second go through trivially.
+
+The elastic-release worker is the current bottleneck, only some 4.3
+messages/second. Because this worker consumes from 8x partitions, I have a
+feeling it might be consumer group related. kafka-manager shows "0% coverage"
+for this topic. Note that this is a single worker process.
+
+_consumer_offsets is seeing about 36 messages/sec.
+
+Oh, looks like I just needed to enable auto_commit and tune parameters in
+pykafka!
-- 
cgit v1.2.3