Event-Driven Architecture Masterclass: Engineering a Robust, High-performance EDA

ScyllaDB 84 views 28 slides May 15, 2024
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.


Slide Content

Engineering a Robust and
High performance EDA
with Redpanda
Christina Lin

2
About me
2
Christina Lin
Developer Advocate, Redpanda
SOA
WebSphere
DB2
Sybase
Oracle
MQ
J2EE
EJB
DevOps
Microservice
EIP
K8s
Agile Integration
Data
Mesh
Active MQLive data stack
Resilience -handle failures and scale gracefully
Elasticity –infrastructure that can scale dynamically
Decentralization -data ownership, empowering individual teams
Performance -low latency and high throughput
Autonomy –self service, define quality, and access
Nimble -efficient data movement
Distributed-distributed data processing for cloud native
Agility –quickly respond to change in data

3
Robust

Event Driven Architecture
4
Services
Microservices
Databases
IoT Devices
Applications
System B
Team A
Department C
Team D
Group E
Services
Databases
IoT Devices
Applications
System B
Team A
Department C
Team D
Group E
Microservices
Producer
Consumer

Event Driven Architecture
5
Orders
Health records
Restock Signal
CDC Event
Streaming
Table/
Materialize
view
Data Store
Payroll
Payment
Shipment Signal
Inventory

The Contracts
6
Microservices
Microservices
Databases
/ CDC
Microservices
Data
Lake/Data
warehouse
Microservices

Schema Registry
7
Producer
Data structure encoding
-Avro, Protobuf and JSON
Data structure
-{name:type}
Serialize
Download the
schema (version)
Consumer
Schema Registry
Deserialize
Value
(Binary)
Schema
ID
Key
(Binary)
Value
(Binary)

Schema Registry
8
Server-side validation
Value
(Binary)
Sche
ma ID
Key
(Binar
y)
Value
(Binary)
Schema Registry
Check if schema id is
valid
Schema Registry
Producer
•Backward
•Forward
•Full
compatibility
•None
Schema Registry
Version 1
Version 2
Version 3

Schema Registry in Redpanda
9
Service Registry
Service Registry
Restful Endpoint
Restful Endpoint
_schemas
_schemas

Schema Registry
•Assign a default value to the fields that you might remove in the future
•Do not rename an existing field—add an alias instead
•When using schema evolution, always provide a default value
•Never delete a required field
Whennotto use Schema registry
•You’re certain the schema won’t change in the future
•If hardware resources are limited and low latency is critical, it may impact
performance (e.g., for IoT)
•You want to serialize the data with an unsupported serialization scheme
10

Event validation & DLQ
11
DLQ
Consumer
Correction/
Remedy
Validator
DLQ
Correction/
Remedy
Validator
DLQ
Correction/
Remedy

In broker validation –how it works
12
Replicate
across clusters
customer
partition 1
Load to
cache
Validate
against
schema
Transform
Write back to
disk with DMA
Customer validated
partition 1
Example repo: https://github.com/redpanda-data/redpanda-labs/tree/main/data-transforms/to_avro

In broker validation & transformation
•Firsthand processing, quick filtering
•Simple rerouting determine on ingested data
•Masking, schema validation
•Stateless, functional processing
Whennotto use in broker transformation?
•When it requires external data dependencies
•Windowing, complex processing, with multiple streams of input
•When it requires to keep the state of the processes
13

14
High Performance

15
Turning the knobs
ProducerProducerProducer
ProducerProducer
Producer
Consumer
Consumer
Consumer
Consumer
Consumer
Consumer
Consumer

16
The Broker
InfrastructureStorage –XFS,NVMe
Network bandwidth
Memory
CPU
Location (Multi-AZ)
OS Disk I/O
read_iops/bandwidth
write_iops/bandwidth
Broker
# Brokers
# Replicas
# Partitions
Log segment size

17
Partitions
Partitions
Producer
Consumer
Consumer
Consumer
Group A
•Round Robin
•Hashing Key Partition
•Custom Partitioner
Overhead
•File handler
•Follower, heartbeat
•Large Metadata
quadratic (N2)
Idempotency
•Order guarantee in partition only
Higher latency
•Producer batch
Consumer rebalance
•RangeAssignor (SW)
•RoundRobinAssignor(SW)
•StickyAssignor(SW)
•CooperativeStickyAssignor
•Static (No Rebalance)

18
Producer
Producer
fsync
Acknowledgment
from the leaderAck=all
Ack=1
Majority of replicas
acknowledge
write_caching_default=true
flush_bytes, flush_ms
Ack=0
Doesn’t wait for
acknowledgments
and doesn’t retry
sending messages
Producer
batch.size
linger.mscompression

19
Consumer
Consumer
fetch.min.bytes
max.poll.records
fetch.max.bytes
fetch.max.wait.ms

High Throughput
•Thereisnoonsizefits all, therearemanyfactor when it comes to
performances.
•More partition will allow more parallel processing, hence higher throughput,
but it comes with cost.(Avoid over-partitioning or under-partitioning.)
•Experiment withackssettings, Enablewrite caching,
•Explore how the producer batches messages. Increasing the value
ofbatch.sizeandlinger.mscan increase throughput by making the
producer add more messages into one batch
•Explore consumer fetch frequency and message size.
•Start with a baseline configuration and gradually make changes, measuring
the impact of each change on performance.
20

21
Robust for Stateful Processes

Beyond just streams of events
22
Databases
/ CDC
Microservices
Databases
/ CDC
Databases
/ CDC
Processor

Beyond just streams of events
23
Databases
/ CDC
Microservices
Databases
/ CDC
Databases
/ CDC
Processor

Limited disk space
24
Event Sourcing
S3Rehydrate

State Snapshot
25
Microservices
Databases
/ CDC
Databases
/ CDC
Databases
/ CDC
Processor
Snapshot

Summary
■Use schema to insure data shape for consumer
■When designing, think about compatibility
■Validation to ensures consumer always get the correct format.
■In broker transform are great for simple, functions, stateless processes
■Provision appropriate partition to your topics
■Depends on your use case, for producer, always set the right Ack, and buffer
■For stateful streams processing, use snapshot for fault tolerance
26

On demand example
27
Batch
Every 10 mins
CSV
CSV
Batch pipeline
Batch Processing
Batch
pipeline
Right away!
Stream
CSV

Keep in touch!
Christina Lin
Developer Advocate
Redpanda
[email protected]