Superpower Your Apache Kafka Applications Development with Complementary Open Source Technologies

PaulBrebner 108 views 89 slides Jun 18, 2024
Slide 1
Slide 1 of 89
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89

About This Presentation

Kafka Summit talk (Bangalore, India, May 2, 2024, https://events.bizzabo.com/573863/agenda/session/1300469 )
Many Apache Kafka use cases take advantage of Kafka’s ability to integrate multiple heterogeneous systems for stream processing and real-time machine learning scenarios. But Kafka also exi...


Slide Content

© 2024 NetApp, Inc. All rights reserved.© 2024 NetApp, Inc. All rights reserved.
Kafka Summit, Bangalore 2024
Superpower your Apache Kafka®
applications development
withcomplementary
open sourcetechnologies
Paul Brebner
Instaclustr Technology Evangelist

© 2024 NetApp, Inc. All rights reserved.
Focus on complementary technologies –
different to Kafka
“Colours seem more brilliant when they are in contrast
with their complementary colours.” Monet

© 2024 NetApp, Inc. All rights reserved.
Complementary Colours
Matisse, Goldfish -
Red/Green
complementarycolors
(Source: Wikimedia)

© 2024 NetApp, Inc. All rights reserved.
Contrasting flowers from the Bengaluru market
Bengaluru market flowers (Paul Brebner)

© 2024 NetApp, Inc. All rights reserved.
Complementary Kafka Technologies
CassandraPostgreSQL
Superset
Camel
Cadence
OpenTelemetry
TensorFlow
RisingWaveLLMs
Guava EventBus
Kubernetes
Prometheus
Grafana
Parallel Consumer
OpenSearch + Dashboard
Matisse, Goldfish -Red/Green
complementarycolors
(Source: Wikimedia)

© 2024 NetApp, Inc. All rights reserved.
C.f. analogous Kafka technologies
•Apache Pulsar,Flink, Storm, Spark Streaming, Beam,
ActiveMQ,RocketMQ,StreamPark,RisingWaveetc.
Van Gogh, Sunflowers on
Yellow Background,
(Source: Wikimedia)
But we will look at
RisingWave

© 2024 NetApp, Inc. All rights reserved.
Approach
Use Cases
Technologies
Superpowers

© 2024 NetApp, Inc. All rights reserved.
0. Apache Kafka®

© 2024 NetApp, Inc. All rights reserved.
Apache Kafka®
Postal DeliveryService
Railway PostOffice:
Mail bagssnatched byspeeding train
(Source: Wikimedia CCL)

© 2024 NetApp, Inc. All rights reserved.
Apache Kafka visual introduction
My first Kafka talk: Visual introductionto a Kafka postalservice

© 2024 NetApp, Inc. All rights reserved.
Christmas tree lights simulation
Christmas 2017
My first Kafkademo application
100% Kafka
Asimple simulation–
to start with

© 2024 NetApp, Inc. All rights reserved.
Use case 1: “Kongo” IoT logistics simulation
•Real-time logistics
•IoT transportationand rules checking
•Complex simulation

© 2024 NetApp, Inc. All rights reserved.
Design 1: Pure Kafka, many topics
1000s of locations (warehouses,trucks)
and millions of goods
Each location has a topic
andmultipleconsumer groups
(all goodsat that location)
7,000 TPS→ SLOW!
Many topics/partitions (withoutincreasing
cluster resources) reducedthroughput on
older versions of Kafka

© 2024 NetApp, Inc. All rights reserved.
1. Guava EventBus

© 2024 NetApp, Inc. All rights reserved.
Guava EventBus
Telegram messengers
(Source: Wikimedia CCL)

© 2024 NetApp, Inc. All rights reserved.
Design 2: One topic + Guava EventBusfor notifications
Single topic, one consumer group
Kafka supplemented with Guava
EventBus to handle high fan-out
notifications
1.2M TPS→FAST!
Uber’s Cadence can be/has been
used forscalable notifications

© 2024 NetApp, Inc. All rights reserved.
Use case 2: Anomaly detection at scale
One of these things is not like the others…
(Source: Shutterstock)

© 2024 NetApp, Inc. All rights reserved.
Streaming anomaly detection
Incoming Event Stream
Run Anomaly Check –Quickly!
Persist new event
Get previous 50 events for key
Run algorithm
Fast writes → Cassandra
Application scaling→ Kubernetes
Initially single threaded consumers

© 2024 NetApp, Inc. All rights reserved.
2. Apache Cassandra®

© 2024 NetApp, Inc. All rights reserved.
Apache Cassandra®
Fast Writes
Office typing pool, 1918
(Source: Wikimedia)

© 2024 NetApp, Inc. All rights reserved.
Apache Cassandra®
What?
•NoSQL horizontally scalable key-value database
Superpowers
•Fast writes (lots of typewriters)
•Wide column store
•Good for ML feature stores
•Clustering columns
•Good for hierarchical data modeling (eg.Geospatial)
•In-built multi-DC replication

© 2024 NetApp, Inc. All rights reserved.
3. Kubernetes

© 2024 NetApp, Inc. All rights reserved.
Kubernetes
Greek Triremesruled the seas
Captained byHelmsmen(Kubernetes)
(Source: Wikimedia)

© 2024 NetApp, Inc. All rights reserved.
Kubernetes
What?
•Automation of containerized applications
Superpowers
•Available on public clouds (E.g. AWS EKS)
•Ephemeral Pods are the unit of concurrency
•Easy to scale applications with more or less Pods

© 2024 NetApp, Inc. All rights reserved.
But scalability isn’t great

© 2024 NetApp, Inc. All rights reserved.
4. Prometheus
5. Grafana

© 2024 NetApp, Inc. All rights reserved.
Kubernetes
Abacus counting
(Source: Wikimedia)

© 2024 NetApp, Inc. All rights reserved.
Prometheus + Grafana
What?
•Prometheus:Monitoring and alerting
•Grafana:Graphing
Superpowers
•Instrumentation or agents (exporters) to expose application metrics
•Time series data with counter, gauge, histogram, and summary metrics
•Instaclustrmonitoring API supports Prometheus metrics for Apache Kafka clusters
•Integration of Kafka Cluster metrics and Kafka application (e.g.producers and
consumers) is powerful
àMetrics suggested optimizations

© 2024 NetApp, Inc. All rights reserved.
Slow Kafka consumers problem
Slow consumers require more partitions/consumers
(Source: Getty Images)Little’s Law: Concurrency (Partitions=Consumers) = Time x Throughput

© 2024 NetApp, Inc. All rights reserved.
2 pool solution
The famous Bondi Ocean Pool in Sydney Australia has 2 pools
(Source: Shutterstock)

© 2024 NetApp, Inc. All rights reserved.
Optimize consumer speed/concurrency using 2 stage pipeline
Less consumers
(around 100) gives
higher throughput—
a surprise!
Hint: Less partitions
1.Minimizepolling time
(thread pool 1)
2. Maximizeanomaly
detectorconcurrency
(thread pool 2)
1
2

© 2024 NetApp, Inc. All rights reserved.
19 billion checks/day after tuning

© 2024 NetApp, Inc. All rights reserved.
6. Kafka Parallel
Consumer

© 2024 NetApp, Inc. All rights reserved.
KafkaParallel Consumer
Jacquard Loom, Berlin
Makes multipleribbons
concurrently
(Source: Paul Brebner)

© 2024 NetApp, Inc. All rights reserved.
KafkaParallel Consumer: Multi-threaded consumer
•Multiple ordering options—c.f. default Kafka only guarantees order within partitions!
PARTITION→KEY→ UNORDERED
Increasing concurrency→
•Concurrency from 1 to lots—depends on client resources, and partitions/key
space sizes
•KEY has higher concurrency than partition and is ordered by KEY—
reasonable compromise
•Higher concurrency for less partitions/consumers

© 2024 NetApp, Inc. All rights reserved.
Experimental results
3, 50, and 200 timesimprovement, unordered best
1 consumer
10 partitions
100 keys
10ms latency

© 2024 NetApp, Inc. All rights reserved.
Use case 3: Pipelines
Berlin “Beer” (?) Pipeline
(Source: Paul Brebner)

© 2024 NetApp, Inc. All rights reserved.
Kafka® Connect data pipelines
REST Tidal Data toOpenSearchREST Tidal Data toPostgreSQL + Superset
Alternative sinks
Kafka Connectors

© 2024 NetApp, Inc. All rights reserved.
7. OpenSearch
8. Dashboard

© 2024 NetApp, Inc. All rights reserved.
OpenSearch + Dashboard
Library of Congress
Card Division 1919
(city block long)
(Source: Wikimedia)

© 2024 NetApp, Inc. All rights reserved.
OpenSearch + Dashboard
What?
•Open source version of Elasticsearch
•Based on Lucene—powerful and scalable text searching
Superpowers
•Ingestion, indexing, and searching of JSON documents
•Complex linguistic and geospatial queries
•Integrated dashboard for visualization

© 2024 NetApp, Inc. All rights reserved.
9. PostgreSQL®
®

© 2024 NetApp, Inc. All rights reserved.
PostgreSQL®
Elephant vs. tree
Elephants are powerful
(Source: Adobe Stock)

© 2024 NetApp, Inc. All rights reserved.
PostgreSQL®
What?
•Powerful SQL database
Superpowers
•Extensible
•JSONB+GIN indexes (efficient storage and search of JSON)
®

© 2024 NetApp, Inc. All rights reserved.
10. Apache Superset™

© 2024 NetApp, Inc. All rights reserved.
Apache Superset™
Superhero Supersets
All superheroes(B) are a
supersetof those who
useweapons (A)
(Source: Adobe Stock)

© 2024 NetApp, Inc. All rights reserved.
Apache Superset™
What?
•Powerful data visualization tool
Superpowers
•Reads from SQL sources
•Lots of visualization and graph types, including geospatial

© 2024 NetApp, Inc. All rights reserved.
11. Apache Camel™

© 2024 NetApp, Inc. All rights reserved.
Apache Camel™
Camel train
(Source: Adobe Stock)

© 2024 NetApp, Inc. All rights reserved.
Apache Camel™
What?
•Apache Camel –integration framework
•Apache Camel Kafka Connectors
Superpowers
•Large number ofopen sourceKafka Connectors—179 sources and sinks
•Auto-generated from Camel components

© 2024 NetApp, Inc. All rights reserved.
Use case 4: Drone delivery
(Source: Adobe Stock)

© 2024 NetApp, Inc. All rights reserved.
12. Uber’s Cadence®

© 2024 NetApp, Inc. All rights reserved.
Cadence®
Railway signal“man”
(signalwoman!)
(Source: Wikimedia)

© 2024 NetApp, Inc. All rights reserved.
Uber’s Cadence®
What?
•Scalable code-as-workflows engine
Superpowers
•Sequenced, stateful, long-running, scheduled steps
•Scalable and reliable using event-sourcing
oWorkflows are failproof, history is replayed until the point of failure and resumed

© 2024 NetApp, Inc. All rights reserved.
Drone delivery application
Computationally
expensive mission
critical
calculations
Kafka microservices integration
of fast/slow systems

© 2024 NetApp, Inc. All rights reserved.
Drone way point flight calculations
Returning to base leg
•Drone flight path is computed in anactivity
•Using location, distance, bearing,speed,
and charge
•Every 10 seconds
•On failure, the drone won’t crashand will
continue flying fromthelast location

© 2024 NetApp, Inc. All rights reserved.
Uber’s Cadence + Apache Kafka = similarities
Cadence (Workflows)Kafka (Streaming Events)
Scalable (event sourcing)Scalable (partitions, cluster)
Persistent (event sourcing)Persistent (event replaying)
Reliable workflow execution (eventsourcing)Reliable event delivery
Asynchronous signalsAsynchronous events
Open sourceOpen source
Available as a managed serviceAvailable as a managed service

© 2024 NetApp, Inc. All rights reserved.
Uber’s Cadence =
Orchestration (synchronous/timed sequences)
(Source: Getty Images)
Differentarchitectural
(musical)styles

© 2024 NetApp, Inc. All rights reserved.
Apache Kafka =
Choreography (asynchronous)
Differentarchitectural
(musical)styles
(Source: Getty Images)

© 2024 NetApp, Inc. All rights reserved.
Combined Cadence + Kafka =Ballet!
Integrated in a
new style

© 2024 NetApp, Inc. All rights reserved.
Cadence + Kafka =Complementary timescales
(Source: Getty Images)

© 2024 NetApp, Inc. All rights reserved.
Cadence + Kafka =Complementary timescales
Cadence (Slow Workflows)Kafka (Fast Streaming Events)
Synchronous eventsAsynchronous events
Stateful flowsStateless events
SequencesOne-off events
Slow/long running flowsFast/instantaneous events
Sleep/schedule eventsReal-time processing of events
Complex flow logicComplex stream processing (Kafka Streams)

© 2024 NetApp, Inc. All rights reserved.
Cadence + Kafka =
Integration→Drone Ballet
Drone show, Japan
(Source: Getty Images)

© 2024 NetApp, Inc. All rights reserved.
How many drones can we fly?
(Source: Shutterstock)

© 2024 NetApp, Inc. All rights reserved.
Cluster Details (VCPUS):
Client (8), Cadence (6), Cassandra (18)

© 2024 NetApp, Inc. All rights reserved.
Load test:
2,000 drones+ 2,000 orders = 4,000 workflows

© 2024 NetApp, Inc. All rights reserved.
20 Drones flying
Purple = base
Black = drone
Orange = shop
Red = delivery location
Green = successful delivery

© 2024 NetApp, Inc. All rights reserved.
Use case 5: Streaming ML
(Source: Getty Images)(Source: Getty Images)
Busy! Not Busy!
Shop busy/not busy prediction

© 2024 NetApp, Inc. All rights reserved.
Drone learning problem
Kafka Streams
Kafka Streams computes
aggregatedhourly shop and order
details→
Busy/NotBusycategorization
Sent to TensorFlow
Train model to predict shop
busy/notbusy an hour ahead
Simulation producesstreaming
spatiotemporaldata (drone and
order stateand locations)

© 2024 NetApp, Inc. All rights reserved.
13. TensorFlow

© 2024 NetApp, Inc. All rights reserved.
TensorFlow
What does the
future hold?
(Source: Adobe Stock)

© 2024 NetApp, Inc. All rights reserved.
TensorFlow
What?
•Neural network ML library
Superpowers
•Supports incremental ML
•From streaming Kafka data

© 2024 NetApp, Inc. All rights reserved.
TensorFlow
Watch out for
•ML over streaming spatiotemporal data with concept driftsis tricky
oTime/space bias
-Wild model accuracy oscillation
oConcept shift can result in very low-accuracy models initially
-Train/use multiple models

© 2024 NetApp, Inc. All rights reserved.
Use case 6:
Santa’s elves' toy and box packing
KafkaStreams,ChatGPT,RisingWave,andOpenTelemetry
Streaming joins to match toys and boxes
(Source: Adobe Stock)

© 2024 NetApp, Inc. All rights reserved.
14. OpenTelemetry

© 2024 NetApp, Inc. All rights reserved.
OpenTelemetry
X-ray vision!
(Source: Wikimedia Public Domain)

© 2024 NetApp, Inc. All rights reserved.
OpenTelemetry
•OpenTelemetryis the new standard for distributed tracing
•Combines tracing (OpenTracing), metrics, and logs
•Automatic instrumentation
•Lots of open source visualization tools
-Jager,SigNoz, Uptrace,etc.
•Used in new client monitoring KIP-714
-Kafka 3.7.0

© 2024 NetApp, Inc. All rights reserved.
SigNozservice map for
toy+boxesapplication

© 2024 NetApp, Inc. All rights reserved.
15. RisingWave

© 2024 NetApp, Inc. All rights reserved.
RisingWave
Wave processing
(Source: Adobe Stock)

© 2024 NetApp, Inc. All rights reserved.
RisingWave
What?
•Stream processing database—also as a service
Superpowers
•Stateful stream processing
oSQL syntax
oUsing cloud native storage
oPotential replacement for Kafka Streams
•PostgreSQL compatible
oWorks with Apache Superset for visualization

© 2024 NetApp, Inc. All rights reserved.
16. LLMs

© 2024 NetApp, Inc. All rights reserved.
LLMs
The Answer?
(Source: Wikimedia)

© 2024 NetApp, Inc. All rights reserved.
LLMs/GenAI
•E.g. ChatGPT
-not open source
+ there may be suitable open source alternatives
for code generation
•Worked well to generate
+ Kafka clients
+ Kafka Streams DSL
+ and test-cases
•Not as accurate forRisingWave
-lack of examples?

© 2024 NetApp, Inc. All rights reserved.
Bonus Technologies from my Instaclustr colleagues
●Kafka benchmarking
○Apache JMeterfor Kafka benchmarking (Thanks to Anup Shirolkar)
○OpenMessaging(Thanks to Alastair Daivis)
●Strimzi–a Kafka Operator for Kubernetes, and Debezium(CDC using Kafka Connect)
(Thanks to Felix Alipaz-Dicke)
●Kafka GUIs (Thanks to Ana-Maria Minda)
○Kafdrop
○AKHQ
○UI for Apache Kafka
○These all work with Kafka + Instaclustr console and provide complementary features

© 2024 NetApp, Inc. All rights reserved.
Ballet pattern àHanoi street intersection pattern
●A working integrated synchronous + asynchronous system

© 2024 NetApp, Inc. All rights reserved.
I survived as a pedestrian!

© 2024 NetApp, Inc. All rights reserved.
Try us out
•We offer Apache Kafka and
theseopensourcetechnologies
as a managed service
•You can use the others with our
managedservices
•FREE30-daytrial of developer-
sizedclusters

© 2024 NetApp, Inc. All rights reserved.
Paul Brebner | Instaclustr Technology Evangelist
www.Instaclustr.com/paul-brebneràAll my blogs
Thank You!