Rivian's Push Notification Sub Stream with Mega Filter by Marcus Kim & Saahil Khurana

ScyllaDB 1 views 23 slides Oct 14, 2025
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

Rivian vehicles stream over 5500 signals every 5 seconds, but only about 80 are relevant for push notifications. Without filtering, downstream jobs were overwhelmed by the full firehose. Mega Filter, a Flink application backed by RocksDB state and a Kafka metadata topic, trims traffic by 98 percent,...


Slide Content

A ScyllaDB Community
Saahil Khurana
Staff Software Engineer
Rivian's Push Notification
Sub Stream with Mega Filter
Marcus Kim
Software Engineer II

Marcus Kim
Software Engineer II
■Streaming Team at
Rivian and Volkswagen Group
Technologies
■Interested in big data & distributed
system
■Spend time biking & playing
volleyball
■Founding engineer at Streaming
team at Rivian and Volkswagen
Group Technologies
■Interested in big data systems at
scale and excited about improving
current systems
■Love playing ping pong and tennis
Saahil Khurana
Staff Software Engineer

Agenda
1.About RV Tech
2.Data at RV Tech
3.What is Event Watch?
4.Challenges in Streaming
5.Event Watch Mega Filter
6.Results & Future Improvements

Rivian and Volkswagen Group Technologies
A partnership to change the world
Advancing the capabilities of software-defined vehicles to create a world where every
vehicle becomes more intelligent, more sustainable and more enjoyable over time.

Currently supporting
more than 150,000 Rivian
vehicles on the road
Rivian vehicles sync more
than 5,000 different types
of telemetry signals at 5
sec sampling rate
Processing ~150MB of
data every second
Data at RV Tech

Data-driven vehicles to empower:
■Manufacturing
■Vehicle Delivery
■Over-the-Air Updates
■B2B Fleet Management
■Vehicle Quality Management
■Charging Infrastructure Management
■ML and AI platform

Data at RV Tech (Cont’d)

What is Event Watch?
■RV Tech’s vehicle-to-cloud real-time data streaming
platform
■Primarily leverages Flink and Kafka for low-latency and
Event Watch Service for managing Flink pipelines
■Currently hosting > 120 data pipelines in Flink session
clusters
■Receives ~500,000 events every second
■Downstream use cases include analytics, push
notifications, geofencing, and more

Event Watch Architecture
Rootly

Challenges
Computing Costs
Increasing computing cost from EKS for hosting
increasing number of pipelines
Data Transfer Costs
Increasing data transfer cost from Kafka clusters
Stability Issues
Compromised stability due to noisy neighbor
problem in our Flink session clusters due to
Autoscaler
Increase in the
number of pipelines

Mega Filter

Mega Filter
■Substream layer before Event Watch
Reduces the volume of data consumed by Event Watch pipelines
■Automated workflow
Support dynamic telemetry signal addition, update, and removal in the substreams
■Leverages existing Event Watch architecture
Leverage the existing Event Watch Service, Flink, and Kafka topic to enable data filtering

Availability
Non Functional Requirements
Low Latency Scalability Cost Savings
●Protects
downstream Event
Watch system
from high volumes
of data
●High Availability
●All streaming data
passes through the
Mega Filter
●Should introduce
minimum
additional latency
●Mega Filter should
be able to
accommodate
varying volume of
traffic
●Reduction in MSK
cluster, EKS nodes
for Event Watch
and data transfer
EKS costs.

API Design

API Design

Streaming Design

Flink DAG

Flink State & State updates

■Adding a new Spec - Whenever a new Event Watch job spec is to be deployed
a new add event will be published to the signal subscription topic, state is
updated and the new signal makes its way to the filtered topic.
■Deprecating a spec - When downstream teams no longer need an existing
push notification calls EW service cancel endpoint.

Results

Reduction in Traffic & Costs
Bytes in per Topic per Second
Amazon Consumer Rivian Fleet
Pre Mega 25M 57M 3.66M 20K
Filtered 10M 17.5M 1.25M 5.0K
Filtered with
Compression
3.25M 6.74M 706K 3.6K

Data Volume Reduction
*
Data Ingress Number of Jobs Ingestion Rate
Pre Mega Filter
Post Mega Filter
3420 MB/second
288 TB/day
60 jobs 57 MB/job
404.4 MB/second
34 TB/day
60 jobs 6.74 MB/job
254 TB (88%) reduction in data volume

What Mega Filter Achieves
Reduction in Messages per
second and Bytes in per
topic in source topics of
Event Watch
Reduction in Data transfer
costs due to less data flowing
through the system
Reduction in MSK cluster
size needed for hosting Event
Watch
Getting away from Node
selector deployments for Flink
Jobs ensuring stability and
scalability
Reduction in EKS nodes and
Flink cluster sizes needed to
run the Event Watch clusters

Future Improvements
■Enable the autoscaler to manage scaling of Mega Filter cluster
based on the resource consumption of Mega Filter Flink pipelines
■Further optimization of data consumption from common source
topics
■More fine-grained Flink pipeline management to reduce noisy
neighbor problem and better observability

Thank you! Let’s connect.
Marcus Kim
[email protected]
linkedin.com/in/marcus-kim-966773175
Saahil Khurana
[email protected]
linkedin.com/in/saahil-khurana-sk9
Tags