Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Across Event Streams

ScyllaDB 68 views 15 slides May 15, 2024
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

We start by setting up a common ground introducing why relational databases fall short, addressing common EDA characteristics such as the need for real-time response times and schemaless approaches to address recurring changes to adapt and on-board new use cases. Next, interact with a sample Rust-ba...


Slide Content

Integrating Distributed Data Stores Across Event Streams Felipe Cardeneti Mendes

‹#› Solution Architect at ScyllaDB Published Author Linux and Open Source enthusiast Felipe Cardeneti Mendes

DDD? EDA? Event Sourcing? Event Streaming? SAGA? ‹#›

Complexity Exist in Both Sides… But for Different Reasons ‹#› Highly Encouraged: Martin Kleppmann — Event Sourcing and Stream Processing at Scale

PageView Event Example Track whenever a page gets viewed… So what? Who viewed your profile? People also viewed … Reporting Relevance Models (ie: Result Ranking) Metrics ‹#› From Martin Kleppmann — Event Sourcing and Stream Processing at Scale

Ok… so why do we need ScyllaDB then? This is where we (slightly) diverge from Martin: ‹#› From Martin Kleppmann — Event Sourcing and Stream Processing at Scale Super slow – But doesn't have to be!

So you're telling me to ditch Stream Processing? Well… No. Although you could . Data and Domain dependent Full-text searches; Ad-hoc querying; Joins; Stateless vs Stateful Processing Keep doing Stateless (or semi) as you know; Re-think your Stateful Processing strategy We are (almost) a decade past 2016 ‹#›

Let's Get to Practice ‹#›

(Very) High-level overview (Very) High Level Overview ‹#› GitHub Project: fee-mendes/eda-socialnetwork

Tracking Events Each event is uniquely identified Great for even distribution and performance Auditing, history of interactions, power Batch Analytics Bad for aggregations (including sliding windows) – Blocker for Real-time analytics Potential workaround: Map event to entity+type, cluster by timestamp Actual data model: ‹#› CREATE TABLE IF NOT EXISTS ks.events ( id uuid, ts timestamp, event_type text, src_page text, PRIMARY KEY(id, event_type, ts) )

Counter Tables (Post Likes, Profile Views, Page Hits) Well… Used for counting things :-) Goods: Simple Highly performant (no aggregations needed, hooray!) Problem: Misses context Who viewed a profile? Which pages were popular for users within a given region? Was a post liked by similar users? Hence the importance of defining your events table upfront Actual data model: ‹#› CREATE TABLE IF NOT EXISTS ks.post_likes ( post_id uuid PRIMARY KEY, count counter )

Relationships (Follow and Followers) ‹#› Easy done (and consistent): If Y is followed by X, then X follows Y Simply materialize keys in the opposite direction Though admittedly takes a while to get the gist of it Actual (base) data model: CREATE TABLE IF NOT EXISTS ks.follows ( id uuid, follower uuid, ts timestamp, PRIMARY KEY( id , follower ) ) Swap the two

Improvement Thoughts ‹#› Consistency guarantees – Apply common patterns such as: Outbox – For non-idempotent operations ( DON'T overuse it ) Listen to Yourself Publish database events (CDC): Strongly consistent Push notifications! Feed other downstream app/services (ie: Support full-text, ad-hoc, etc) WebSockets for real-time communication / async callbacks? You name it! ;-)

User Extreme (and Inspiring) Stories ‹#› How Numberly Replaced Kafka with Rust + ScyllaDB How Palo Alto Networks Replaced Kafka with ScyllaDB From 1M to 1B Features Per Second: Scaling ShareChat’s ML Feature Store Elasticity vs. State? Exploring Kafka Streams Cassandra State Store

Keep in touch ! Felipe Cardeneti Mendes Solution Architect ScyllaDB [email protected] Find me on LinkedIn !