diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2018-11-13 23:48:45 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2018-11-13 23:48:47 -0800 |
commit | 7634f6ecf2361b1cb1cafd4e27fd1fb84d81d130 (patch) | |
tree | 69b18860ed4188c5169e9d9cb174355966b6f7de /notes | |
parent | 7edae5c9d2267ba5e381ecbf00a7c3f7dacf4194 (diff) | |
download | fatcat-7634f6ecf2361b1cb1cafd4e27fd1fb84d81d130.tar.gz fatcat-7634f6ecf2361b1cb1cafd4e27fd1fb84d81d130.zip |
switch to auto consumer offset updates
This is the classic/correct way to do consumer group updates for higher
throughput, when "at least once" semantics are acceptible (as they are
here; double processing should be safe/fine).
Diffstat (limited to 'notes')
-rw-r--r-- | notes/performance/kafka_pipeline.txt | 17 |
1 files changed, 17 insertions, 0 deletions
diff --git a/notes/performance/kafka_pipeline.txt b/notes/performance/kafka_pipeline.txt new file mode 100644 index 00000000..f0862d89 --- /dev/null +++ b/notes/performance/kafka_pipeline.txt @@ -0,0 +1,17 @@ + +## Early Notes (2018-11-13) + +Ran through about 100k crossref objects, resulting in about 77k messages (in +about 4k editgroups/changelogs). + +Have seen tens of messages per second go through trivially. + +The elastic-release worker is the current bottleneck, only some 4.3 +messages/second. Because this worker consumes from 8x partitions, I have a +feeling it might be consumer group related. kafka-manager shows "0% coverage" +for this topic. Note that this is a single worker process. + +_consumer_offsets is seeing about 36 messages/sec. + +Oh, looks like I just needed to enable auto_commit and tune parameters in +pykafka! |