Event-Driven Architecture Masterclass: Challenges in Stream Processing

ScyllaDB 174 views 24 slides May 15, 2024

Slide 1 of 24

About This Presentation

Discuss the core tradeoffs and considerations involved in order-free and ordered stream processing. Brian Taylor walks through the pros and cons of three different approaches: no data dependency, deferred inter-event data dependency, and streaming inter-event data dependency.

Size: 4.09 MB

Language: en

Added: May 15, 2024

Slides: 24 pages

Slide Content

A Tale of 3 Pipelines Brian Taylor

Reference Architectures ‹#› Stream to Stream Stream to State Stateful Stream to Stream Write: Money = Performance Read: Data dependency limited Data dependency limited Money = Performance ODP “Analytic segments” ODP “Real-Time Segments” Webhook system What it is: How it scales:

Inter-Event Data Dependency ‹#› A property of the stream and the problem . Measures the way that events impact the processing of following events. For example: A stream of record mutations that must be applied one after another within a single record id A stream with many records and no sequential mutations for any given record id has no data dependency A stream with a single record id and only sequential mutations has maximum data dependency A topic with a single partition has maximum data dependency (in some sense)

Inter-Event Data Dependency The average length of the data-dependent chains in your stream decide your average throughput at any scale. This is equivalent to the way “the sequential portion” of a problem constrains the maximum parallel speedup in Amdhal’s Law. S: max speedup fraction, s: parallelism, p: “data dependency fraction”

Big Idea ‹#› “It’s all about the data-dependency, baby” No data dependency: Smooth scaling Data dependency: Navigating hell

No Data Dependency ∅DD ‹#›

Reference Architecture ‹#› Stream to Stream Money = Performance Webhook system What it is: How it scales: Subscription information Change Notifications Delivery Requests

What you can do with ∅DD ‹#› Abstractly Data reshaping Order-independent enrichment Non-self Joins Concrete Use Cases Adapters Sentiment detectors Geo-IP mappers Redaction If no external data access is required: Redpanda transforms FTW!

Performance Tradespace ‹#› More money = More Throughput Tactics: Add shards and partitions until you have enough capacity

Deferred Inter-Event Data Dependency DEDD ‹#›

Reference Architecture ‹#› Stream to State Write: Money = Performance Read: Data dependency limited ODP “Analytic segments” Optimizely Experimentation What it is: How it scales:

What you can do with DEDD ‹#› Abstractly Use it when Write Performance is more important than Read Performance Concrete Use Cases Reporting: Especially when users read less than they write Nightly model training

Performance Tradespace ‹#› Write side: More money = More Speed Read side: Data-dependency limited Tactics: Reduce data dependency with finer grained partitioning

Streaming Inter-Event Data Dependency SEDD ‹#›

Reference Architecture ‹#› Stateful Stream to Stream Data dependency limited ODP “Real-Time Segments” What it is: How it scales:

What you can do with SEDD ‹#› Abstractly Streaming aggregates Pattern detectors Concrete Use Cases Segmentation Real time model training

Performance Tradespace ‹#› Throughput : Data-dependency limited Tactics for reducing data-dependency: Finer grained partitioning Accept eventual consistency with CRDTs

Fundamental Tradeoff Inter-event data dependency Max throughput If you need SEDD and throughput, welcome to hell. ‹#›

But… ‹#›

Query Latency Data Latency Query Latency : Time it takes to respond to a request Driven by : DD work remaining to resolve the request Impact : The places where it’s suitable to use your query API ‹#› Data Latency : How long it takes for new information to impact a query Driven by : How you cheated to hide from your data dependency Impact : How actionable the results from your API are

“Cheating” out of Hell ‹#› S tream to State Introduces a data latency / cost tradeoff Min-data latency is now data dependency limited Everyone else’s “Real-Time Segments” What it is: How it scales: Periodic State to Stream

Wrapping it Up

Data Dependency Decides Everything ∅DD - Oddly common in example code and marketing materials. Very rarely happens in real life. DEDD - Practical workaround most of the time. Became truly effective in the last decade as data warehouses have matured. SEDD - Sounds like “sad” for a reason. A difficult place to be. Hopefully the next decade will bring some meaningful breakthroughs here. ‹#›

Keep in touch ! Brian Taylor Director of Engineering Optimizely [email protected] @netguy204

Event-Driven Architecture Masterclass: Challenges in Stream Processing

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Event-Driven Architecture Masterclass: Challenges in Stream Processing

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx