Event-Driven Architecture Masterclass: Engineering a Robust, High-performance EDA
ScyllaDB
84 views
28 slides
May 15, 2024
Slide 1 of 28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
About This Presentation
Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Size: 5.31 MB
Language: en
Added: May 15, 2024
Slides: 28 pages
Slide Content
Engineering a Robust and
High performance EDA
with Redpanda
Christina Lin
2
About me
2
Christina Lin
Developer Advocate, Redpanda
SOA
WebSphere
DB2
Sybase
Oracle
MQ
J2EE
EJB
DevOps
Microservice
EIP
K8s
Agile Integration
Data
Mesh
Active MQLive data stack
Resilience -handle failures and scale gracefully
Elasticity –infrastructure that can scale dynamically
Decentralization -data ownership, empowering individual teams
Performance -low latency and high throughput
Autonomy –self service, define quality, and access
Nimble -efficient data movement
Distributed-distributed data processing for cloud native
Agility –quickly respond to change in data
3
Robust
Event Driven Architecture
4
Services
Microservices
Databases
IoT Devices
Applications
System B
Team A
Department C
Team D
Group E
Services
Databases
IoT Devices
Applications
System B
Team A
Department C
Team D
Group E
Microservices
Producer
Consumer
Event Driven Architecture
5
Orders
Health records
Restock Signal
CDC Event
Streaming
Table/
Materialize
view
Data Store
Payroll
Payment
Shipment Signal
Inventory
The Contracts
6
Microservices
Microservices
Databases
/ CDC
Microservices
Data
Lake/Data
warehouse
Microservices
Schema Registry
7
Producer
Data structure encoding
-Avro, Protobuf and JSON
Data structure
-{name:type}
Serialize
Download the
schema (version)
Consumer
Schema Registry
Deserialize
Value
(Binary)
Schema
ID
Key
(Binary)
Value
(Binary)
Schema Registry
8
Server-side validation
Value
(Binary)
Sche
ma ID
Key
(Binar
y)
Value
(Binary)
Schema Registry
Check if schema id is
valid
Schema Registry
Producer
•Backward
•Forward
•Full
compatibility
•None
Schema Registry
Version 1
Version 2
Version 3
Schema Registry in Redpanda
9
Service Registry
Service Registry
Restful Endpoint
Restful Endpoint
_schemas
_schemas
Schema Registry
•Assign a default value to the fields that you might remove in the future
•Do not rename an existing field—add an alias instead
•When using schema evolution, always provide a default value
•Never delete a required field
Whennotto use Schema registry
•You’re certain the schema won’t change in the future
•If hardware resources are limited and low latency is critical, it may impact
performance (e.g., for IoT)
•You want to serialize the data with an unsupported serialization scheme
10
In broker validation –how it works
12
Replicate
across clusters
customer
partition 1
Load to
cache
Validate
against
schema
Transform
Write back to
disk with DMA
Customer validated
partition 1
Example repo: https://github.com/redpanda-data/redpanda-labs/tree/main/data-transforms/to_avro
In broker validation & transformation
•Firsthand processing, quick filtering
•Simple rerouting determine on ingested data
•Masking, schema validation
•Stateless, functional processing
Whennotto use in broker transformation?
•When it requires external data dependencies
•Windowing, complex processing, with multiple streams of input
•When it requires to keep the state of the processes
13
High Throughput
•Thereisnoonsizefits all, therearemanyfactor when it comes to
performances.
•More partition will allow more parallel processing, hence higher throughput,
but it comes with cost.(Avoid over-partitioning or under-partitioning.)
•Experiment withackssettings, Enablewrite caching,
•Explore how the producer batches messages. Increasing the value
ofbatch.sizeandlinger.mscan increase throughput by making the
producer add more messages into one batch
•Explore consumer fetch frequency and message size.
•Start with a baseline configuration and gradually make changes, measuring
the impact of each change on performance.
20
21
Robust for Stateful Processes
Beyond just streams of events
22
Databases
/ CDC
Microservices
Databases
/ CDC
Databases
/ CDC
Processor
Beyond just streams of events
23
Databases
/ CDC
Microservices
Databases
/ CDC
Databases
/ CDC
Processor
Summary
■Use schema to insure data shape for consumer
■When designing, think about compatibility
■Validation to ensures consumer always get the correct format.
■In broker transform are great for simple, functions, stateless processes
■Provision appropriate partition to your topics
■Depends on your use case, for producer, always set the right Ack, and buffer
■For stateful streams processing, use snapshot for fault tolerance
26
On demand example
27
Batch
Every 10 mins
CSV
CSV
Batch pipeline
Batch Processing
Batch
pipeline
Right away!
Stream
CSV
Keep in touch!
Christina Lin
Developer Advocate
Redpanda [email protected]