Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandaries Explored)
PaulBrebner
74 views
59 slides
Jun 18, 2024
Slide 1 of 59
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
About This Presentation
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Ap...
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
Overview
1 Kafka Scalability
2 Kafka Clusters and Zipf’sLaw
3 Kafka Clusters and Storage
4 Top 10 Kafka Clusters and Performance
Thanks to Instaclustr colleagues for Kafka cluster data:
Kafka Clusters & Storage -Alastair Daivis& Kafka Team
Top 10 Clusters -Joseph Clay & Ramana Selvaratnam (Technical Operations Team)
A note on Kafka cluster metrics
EasyPerformance MetricsHarder
Broker Cluster All Clusters
Size Metrics available
Focus of our metrics
collection is
Per broker
Not per cluster or all clusters
DALL·E 3
Partitionn
Topic
Partition1Producer
Partition2
ConsumerGroup
Consumer
Consumer
Consumers share
workwithingroups
Consumer
Partitions enable Consumers to share work
(c.f. Amish Barn raising) within a consumer group
Partitions –concurrency mechanism –more is better –until it’s not
You need sufficient partitions to benefit from the cluster concurrency
And not too many that the replication overhead impacts overall throughput
0
0.5
1
1.5
2
2.5
1 10 100 1000 10000
Partitions vs. Throughput (M TPS)
ZK TPS (M)KRAFT TPS (M)2020 TPS (M)
2022 -Better
2020 -Worse
2022 results better due to improvements to Kafka and h/w
Part 4
Performance Metrics for
Top Ten Kafka Clusters
Top 10 tallest buildings (Wikipedia)
But in reality more people are killed by horses, cows, dogs,
and bees than kangaroos, sharks, snakes, crocodiles,
emus, jellyfish, etc!
Most Dangerous
Australian Critters?
Ranking can be tricky
Most “dangerous” = most teeth? Most venomous?
(Paul Brebner)(Wikimedia)