Rivian's Push Notification Sub Stream with Mega Filter by Marcus Kim & Saahil Khurana
ScyllaDB
1 views
23 slides
Oct 14, 2025
Slide 1 of 23
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
About This Presentation
Rivian vehicles stream over 5500 signals every 5 seconds, but only about 80 are relevant for push notifications. Without filtering, downstream jobs were overwhelmed by the full firehose. Mega Filter, a Flink application backed by RocksDB state and a Kafka metadata topic, trims traffic by 98 percent,...
Rivian vehicles stream over 5500 signals every 5 seconds, but only about 80 are relevant for push notifications. Without filtering, downstream jobs were overwhelmed by the full firehose. Mega Filter, a Flink application backed by RocksDB state and a Kafka metadata topic, trims traffic by 98 percent, from 172 TB a day to 3 TB. This talk explains how Mega Filter reduces load across Kafka, Flink, and EKS, how new or deprecated signals are handled in real time via DynamoDB and REST APIs, and the impact at scale for more than 100,000 cars.
Size: 3.84 MB
Language: en
Added: Oct 14, 2025
Slides: 23 pages
Slide Content
A ScyllaDB Community
Saahil Khurana
Staff Software Engineer
Rivian's Push Notification
Sub Stream with Mega Filter
Marcus Kim
Software Engineer II
Marcus Kim
Software Engineer II
■Streaming Team at
Rivian and Volkswagen Group
Technologies
■Interested in big data & distributed
system
■Spend time biking & playing
volleyball
■Founding engineer at Streaming
team at Rivian and Volkswagen
Group Technologies
■Interested in big data systems at
scale and excited about improving
current systems
■Love playing ping pong and tennis
Saahil Khurana
Staff Software Engineer
Agenda
1.About RV Tech
2.Data at RV Tech
3.What is Event Watch?
4.Challenges in Streaming
5.Event Watch Mega Filter
6.Results & Future Improvements
Rivian and Volkswagen Group Technologies
A partnership to change the world
Advancing the capabilities of software-defined vehicles to create a world where every
vehicle becomes more intelligent, more sustainable and more enjoyable over time.
Currently supporting
more than 150,000 Rivian
vehicles on the road
Rivian vehicles sync more
than 5,000 different types
of telemetry signals at 5
sec sampling rate
Processing ~150MB of
data every second
Data at RV Tech
Data-driven vehicles to empower:
■Manufacturing
■Vehicle Delivery
■Over-the-Air Updates
■B2B Fleet Management
■Vehicle Quality Management
■Charging Infrastructure Management
■ML and AI platform
Data at RV Tech (Cont’d)
What is Event Watch?
■RV Tech’s vehicle-to-cloud real-time data streaming
platform
■Primarily leverages Flink and Kafka for low-latency and
Event Watch Service for managing Flink pipelines
■Currently hosting > 120 data pipelines in Flink session
clusters
■Receives ~500,000 events every second
■Downstream use cases include analytics, push
notifications, geofencing, and more
Event Watch Architecture
Rootly
Challenges
Computing Costs
Increasing computing cost from EKS for hosting
increasing number of pipelines
Data Transfer Costs
Increasing data transfer cost from Kafka clusters
Stability Issues
Compromised stability due to noisy neighbor
problem in our Flink session clusters due to
Autoscaler
Increase in the
number of pipelines
Mega Filter
Mega Filter
■Substream layer before Event Watch
Reduces the volume of data consumed by Event Watch pipelines
■Automated workflow
Support dynamic telemetry signal addition, update, and removal in the substreams
■Leverages existing Event Watch architecture
Leverage the existing Event Watch Service, Flink, and Kafka topic to enable data filtering
Availability
Non Functional Requirements
Low Latency Scalability Cost Savings
●Protects
downstream Event
Watch system
from high volumes
of data
●High Availability
●All streaming data
passes through the
Mega Filter
●Should introduce
minimum
additional latency
●Mega Filter should
be able to
accommodate
varying volume of
traffic
●Reduction in MSK
cluster, EKS nodes
for Event Watch
and data transfer
EKS costs.
API Design
API Design
Streaming Design
Flink DAG
Flink State & State updates
■Adding a new Spec - Whenever a new Event Watch job spec is to be deployed
a new add event will be published to the signal subscription topic, state is
updated and the new signal makes its way to the filtered topic.
■Deprecating a spec - When downstream teams no longer need an existing
push notification calls EW service cancel endpoint.
Results
Reduction in Traffic & Costs
Bytes in per Topic per Second
Amazon Consumer Rivian Fleet
Pre Mega 25M 57M 3.66M 20K
Filtered 10M 17.5M 1.25M 5.0K
Filtered with
Compression
3.25M 6.74M 706K 3.6K
Data Volume Reduction
*
Data Ingress Number of Jobs Ingestion Rate
Pre Mega Filter
Post Mega Filter
3420 MB/second
288 TB/day
60 jobs 57 MB/job
404.4 MB/second
34 TB/day
60 jobs 6.74 MB/job
254 TB (88%) reduction in data volume
What Mega Filter Achieves
Reduction in Messages per
second and Bytes in per
topic in source topics of
Event Watch
Reduction in Data transfer
costs due to less data flowing
through the system
Reduction in MSK cluster
size needed for hosting Event
Watch
Getting away from Node
selector deployments for Flink
Jobs ensuring stability and
scalability
Reduction in EKS nodes and
Flink cluster sizes needed to
run the Event Watch clusters
Future Improvements
■Enable the autoscaler to manage scaling of Mega Filter cluster
based on the resource consumption of Mega Filter Flink pipelines
■Further optimization of data consumption from common source
topics
■More fine-grained Flink pipeline management to reduce noisy
neighbor problem and better observability
Thank you! Let’s connect.
Marcus Kim [email protected]
linkedin.com/in/marcus-kim-966773175
Saahil Khurana [email protected]
linkedin.com/in/saahil-khurana-sk9