Building Planet-Scale Streaming Apps: Proven Strategies with Apache Flink by Sanchay Javeria

ScyllaDB 2 views 14 slides Oct 15, 2025
Slide 1
Slide 1 of 14
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14

About This Presentation

At Pinterest, we use Apache Flink for streaming applications, powering various use cases like real-time metrics reporting for ads from external advertisers and real-time ad budget calculations to optimize ad spend. Ensuring our Flink applications are high-performing and resilient is crucial for us. ...


Slide Content

A ScyllaDB Community
Building Planet-Scale Streaming
Apps: Proven Strategies with
Apache Flink
Sanchay Javeria
Senior Software Engineer

Sanchay Javeria

Senior Software Engineer at Pinterest
■@Pinterest: Ads Data Infrastructure, Big Data Platform
■Specialize in high performance and scalable
distributed data systems
■In my free time, I enjoy analyzing special-situations
and event-driven value investments
●Blog at sanchayjaveria.com

A very high level Flink application flow (source: Flink documentation)
●Distributed processing engine for stateful computations over unbounded and bounded data
streams
●Suitable for high-throughput, low-latency real-time data processing at virtually any scale
●Allows for event consumption from a variety of “sources” and can plug results into a variety of
“sinks”
What?

Client
YARN
Cluster
Resource
Manager
1. Job Submission
(shell command)
Node Manager 0 Node Manager 1 Node Manager m
Application
Master
Application
Master
Shell Cmd
Container
Container 0
Container 1 Container n
Container 2
Shell runner application
User Flink Application
2. Shell Runner
launch
3. User Application
submission
RocksDB
StatsD
Local Disk
AWS S3Zookeeper
ZK session-based liveness State persistence for checkpoints/savepoints
Running Flink on Yarn
4. User
Application
launch

Memory Considerations

Jobs and Scheduling
Source
[1]
Source
[2]
Stateful
[3]
Stateful
[2]
Stateful
[1]
Keys (A, B, C)
Keys (A, B, C)
Keys (D, E, F)
Keys (D, E, F)
Keys (G, H, I)
Keys (G, H, I)
Sink
[1]
Sink
[2]
Sink
[3]
A …
B …
C …
G …
H …
I …
D …
E …
F …
Task Manager
Task Manager
JVM
Process
JVM
Process
Slot 0
Slot 0
Slot 1
Slot 1
State
ful
[3]
State
ful
[2]
State
ful
[1]
A
…B
…C

G
…H
…I

D
…E
…F

State
ful
[3]
State
ful
[2]
State
ful
[1]
A
…B
…C

G
…H
…I

D
…E
…F

State
ful
[3]
State
ful
[2]
State
ful
[1]
A
…B
…C

G
…H
…I

D
…E
…F

State
ful
[3]
State
ful
[2]
State
ful
[1]
A
…B
…C

G
…H
…I

D
…E
…F

Flink Memory Model
■Memory roughly split between Framework & Task heap; RocksDB state store, Network buffers, and JVM memory
off-heap
■Example below was running with taskmanager.memory.process.size set to 20g

Flink Memory Model
■Utilize metrics monitoring tools and observe p99 / max memory usages for different slices, then tune for
performance
taskmanager.memory.task.heap.size=2g
taskmanager.memory.managed.size=18g
taskmanager.memory.network.min=250mb
taskmanager.memory.network.max=250mb

taskmanager.memory.jvm-metaspace.size=200mb

State Management and Checkpointing
■Configure a reasonable checkpoint timeout
(execution.checkpointing.timeout)
■Choose incremental checkpointing to only
persist changes since the last checkpoint
■For keyed state, consider enabling local
recovery for fast state restore and recovery
from failures
■Use SSD for RocksDB storage over say, NFS or
a remote EBS volume for faster throughput
■With a HashMapStateBackend, set managed
memory to 0 to maximize heap memory
allocation
Heap (HashMap) state
backend
RocksDB state backend
Fastest lookups (no SerDe
overhead)
Relatively slower lookups
(all read-writes go through
SerDe)
Affected by GC overhead /
pauses
Not affected by GC
No incremental checkpointIncremental checkpointing
supported
State size limited by heap
memory
State size limited by local
disk space
Choose if performance is
priority!
Choose if scalability and
reliability is priority!

Serializers
■Serialization is important for overall
throughput of your application
■Flink applications are expected to run
for long durations and need to adapt to
changing requirements: schema
evolution
■POJO serializer is a good choice for
evolving applications, avoid Kryo
serializer

Jmh benchmarking with 1,000,000 records

CPU considerations

Dealing with Back Pressure
■Scale out horizontally if your application has an organic increase in data
■Disable operator chaining to localize source of backpressure
■Look out for data skew by monitoring number of records processed by subtasks
■If checkpointing becomes unstable and backpressure is expected to be high, consider using unaligned checkpoints but beware
of I/O bottlenecks in this path
■If your application performs asynchronous operations, consider using the egress → side-output → retry → discard pattern
instead of potentially wasting CPU cycles
■Reduce data shuffle by using reinterpretAsKeyedStream() for pre-keyed data and operator chaining
■Profile your job to understand heap allocations and CPU utilization better

Chose the instance type that is right for you!
■Chose an instance type that works best for your workload
●For our use case, AWS i4i instances made our jobs 40% more efficient (40% decrease in CPU consumption for
a 10% increase in cost)
●Nitro SSDs with very high IOPS/low tail latencies work well with large stateful applications

Thank you! Let’s connect.
Sanchay Javeria

Email: [email protected]
X / Twitter: @sanchayjaveria
Website: sanchayjaveria.com
Tags