Streaming vs batching (conundrum ai internal meetup)

MarkAndreev1 10 views 15 slides May 01, 2024
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

Discussion about streaming & batching


Slide Content

Streaming vs Batching
Conundrum, May 2020
[email protected] — Mark Andreev

Agenda
-About data processing pipeline
-Batching approach
-Streaming approach
-Architecture
-Tools
-Conclusion
https://clck.ru/VQL9W

Data pipeline
-Event := (timestamp, payload)
-Pipeline := F(Event … Event ; state)

Examples:
-Web Banner CTR
-Forecast Ads Budget Consumption
-Fraud prevention
-Compute analytics over session windows
1 n
Batch
Latency - Hours
Streaming
Latency - Minutes/Seconds

Batching approach
-Fetch periodically
-Bounded stream approach
-Easy to write (look like SQL select/insert)
-Scheduled by standalone tool (like Airflow)

Streaming approach
-Fetch as soon as possible
-Unbounded stream approach
-Require KV storage for state

Streaming approach [Watermarks]
Event time
(when did it happen)
Processing time
(when did it fetched in streaming engine)
Ideal case
(no latency)
Outdated event
(fetched too late)
events older than watermark dropped
delay

Architecture [Lambda Architecture]
Data Source
Speed layer
Real time view
Batch layer
Pre compute view
Serve layer
https://clck.ru/VR4Lg

Architecture [Lambda Architecture]
Data Source
Speed layer
Real time view
Batch layer
Pre compute view
https://clck.ru/VR4Lg
Serve layer

Architecture [Lambda Architecture]
Data Source
Speed layer
Real time view
Batch layer
Pre compute view
https://clck.ru/VR4Lg
Serve layer

Architecture [Kappa Architecture]
Data Source
Speed layer
Real time view
Serve layer
https://clck.ru/VR4Bm

Architecture [Lambda vs Kappa]
Lambda Kappa
Batch + Streaming Streaming
Query all data Incremental algorithms on deltas
Batch is reliable
Streaming is approximate
Streaming with consistency
Two scripts for both approach Single script

Architecture [Lambda vs Kappa]
Lambda Kappa
Batch + Streaming Streaming
Query all data Incremental algorithms on deltas
Batch is reliable
Streaming is approximate
Streaming with consistency
Two scripts for both approach Single script

Architecture [Lambda vs Kappa]
Lambda Kappa
Batch + Streaming Streaming
Query all data Incremental algorithms on deltas
Batch is reliable
Streaming is approximate
Streaming with consistency
Two scripts for both approach Single script

Architecture [Lambda vs Kappa]
Lambda Kappa
Batch + Streaming Streaming
Query all data Incremental algorithms on deltas
Batch is reliable
Streaming is approximate
Streaming with consistency
Two scripts for both approach Single script

References
-Flink Concepts [https://clck.ru/VQLMU]
-Spark Streaming [https://clck.ru/VQLPA]
-Streaming Systems: The What, Where, When, and How of Large-Scale
Data Processing
-Stream Processing with Apache Flink: Fundamentals, Implementation, and
Operation of Streaming Applications
Tags