Pushing Your Streaming Platform to the Limit by Elad Leev

ScyllaDB 65 views 37 slides Mar 04, 2025
Slide 1
Slide 1 of 37
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37

About This Presentation

Join Elad for a hands-on session on Chaos Engineering for streaming platforms like Kafka, Pulsar, NATS, and RabbitMQ. Learn to stress test, benchmark, and fine-tune performance to ensure your system stays resilient under pressure.


Slide Content

A ScyllaDB Community
Pushing Your Streaming
Platform to the Limit
Elad Leev
Staff Engineer

Your photo
goes here,
smile :)
Elad Leev
#DistributedSystems #DataStream
#Scalability #DataMesh #Kafka
#Flink

No single metric can measure the performance of computer systems on all
applications. System performance varies enormously from one application
domain to another. Each system is typically designed for a few problem domains
and may be incapable of performing other tasks.

… The system that does the job with the lowest cost-of-ownership.

Your Vendor’s
Benchmarks
=
Marketing
Job

Know Your System Limits

Disk
Memor
y
CPU
Networ
k

The Key Criteria for a Good
Benchmark

The benchmark should test the exact same
deployment and configuration as we planned to use
in Production.
No special tricks. Environment Simulation

Conduct advanced research to understand your
traffic characteristics and SLAs/SLOs. Always
prepare for the unexpected.
Aim for the peak. Environment Simulation
Test peak performance

Failures are a given and everything will eventually fail over time: from routers to
hard disks, from operating systems to memory units corrupting TCP packets,
from transient errors to permanent failures. This is a given, whether you are
using the highest quality hardware or lowest cost components.


Werner Vogels, AWS CTO, 2016

Our system is a living organism; it scales up and
down. Embrace the change, know the impact.
Make it scalable. Environment Simulation
Test peak performance
Scalable & Portable

The benchmark must be understandable to maintain
credibility.
Environment SimulationSimplicity is key.
Test peak performance
Scalable & Portable
Transparent Design

What to look for?

For every resource in the system, we should check:

Utilization

Saturation

Errors
The USE Method

system, user, idle
loadavg
buffers, cache, mem-free,
mem-used
JVM heap, GC
bytes in-out, package
drops
io-ms, io-wait,
QueueLength

How to Benchmark

Open Messaging
Initiative
https://openmessaging.cloud

OMB
Driver
OMB
Worker
Task and Topic Orchestrator The Benchmark Executor

Kubernetes-based solution that's easy to scale. Runs
as a simple Producer/Consumer model, matching
our service architecture.
How to
Benchmark?

Kafka
Pulsar

Kafka
Pulsar

Kafka
Pulsar

Kafka
Pulsar

Kafka
Pulsar

Workload File

Workload File

Benchmark Insights

Average
Latency

Average
Latency
Test Our
Monitoring &
Dashboards

Average
Latency
Test Our
Monitoring &
Dashboards
Potential
Bottlenecks

Average
Latency
Test Our
Monitoring &
Dashboards
Potential
Bottlenecks
Scale-up
Estimates

Average
Latency
Test Our
Monitoring &
Dashboards
Potential
Bottlenecks
Scale-up
Estimates
Consumer/Producer
Configuration
Guidelines

Stay in Touch
Elad Leev
https://leevs.dev
@eladleev
@EladLeev
linkedin.com/in/elad-leev
Tags