Netflix's Scalable Page Construction with Real-Time Impression History by Saurabh Jaluka & Tulika Bhatt
ScyllaDB
0 views
23 slides
Oct 15, 2025
Slide 1 of 23
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
About This Presentation
Netflix built a Page Construction Architecture to deliver scalable, efficient, and personalized experiences across devices from phones to TVs. This talk introduces the API-driven system that centralizes content and module selection, decouples business logic from clients, and standardizes data models...
Netflix built a Page Construction Architecture to deliver scalable, efficient, and personalized experiences across devices from phones to TVs. This talk introduces the API-driven system that centralizes content and module selection, decouples business logic from clients, and standardizes data models for faster innovation. We’ll highlight the Impression History Service API, which tracks interactions and enables near-real-time page optimization, driving consistency and personalization at scale.
Size: 4.55 MB
Language: en
Added: Oct 15, 2025
Slides: 23 pages
Slide Content
A ScyllaDB Community
Saurabh Jaluka
Senior Software Engineer
Tulika Bhatt
Senior Software Engineer
Netflix's Scalable Page Construction
with Real-Time Impression History
Hello!
■Part of the platform team powering every Netflix page, ensuring
availability and low latency for hundreds of millions of users.
■To me, P99s reflect the real user experience at the edge.
■Outside of work, I enjoy experimenting in the kitchen and
embracing my newest role as a proud new dad.
■I manage core data systems at Netflix that power recommendations for
over 300M subscribers, processing trillions of events each year.
■I see P99s as the real measure of reliability in large-scale data
pipelines.
■Outside of work, I enjoy traveling, exploring new cuisines, and sharing
knowledge through mentoring and speaking.
GROWING, HIGHLY
ENGAGED
AUDIENCES
> 300M 18,000190
Global Netflix
household members
Number of titles in the
catalog
Number of countries
188B HV
Number of Hours
Viewed annually
Scale At Netflix
■Diverse Ecosystem: Supporting thousands of client devices and various
pages (Home, TV Shows, Movies, Search, etc.).
■Constant Flux: High velocity of A/B experiments and independent service
deployments.
■Resulting Challenge: We can’t precompute and cache every possible
outcome. Every page is assembled dynamically, on-demand.
Page Construction
What is it?
■Assembles responses from various
microservices.
■Dictates client layout from the server for
agility.
■Defines page structure via flexible,
non-hardcoded configurations.
Select
Page Layout
Generate
Sections
Assemble
Page
Select Page Layout
LOLOMO Gallery Feed
Generate Sections
Assemble Page
How We Guarantee a Fast Experience
Our Defense: Resilience & Fallbacks
●Enforcing a 3-second hard timeout for all page loads.
●Serving cached content instantly if personalization models are
slow.
●Principle: A fast response is always better than a slow one.
How We Guarantee a Fast Experience
Our Offense: Fragmented Construction
●Achieving a 2.1-second P99 latency.
●Building the page in pieces, loading the next fragment
on-demand.
●Result: Faster initial loads and more dynamic personalization.
Dynamic Personalization
A Romantic Comedy Movies row may surface for members who just engaged with ‘Set It Up.’
Personalization Under Pressure
●Challenge: How do we deliver deep personalization with a
3-second P99 timeout?
●Constraint: At our scale, real-time personalization is a necessity.
●Bottleneck: Personalization is only as fast as its slowest data
source.
Fast Data for Personalization
●Personalization models consume
most of the latency budget.
●Model speed depends on fast,
real-time data inputs.
●Impression History provides critical
data.
Page
Construction
Personalization
Models
Impression
History
Service
Other
Data
Providers
Impression History Service
What Is Impression?
■Any image asset that is presented
to a user
■Certain % of boxart being seen
■Up-down and to-from scroll
movement
■Cached impressions vs new
request impressions
■Deduplication key + Filters
depends on use case
Scale of Impression History
Read path
■200K+ RPS at peak
■99th percentile
latency ~ 120 ms
Impression History Evolution Timeline
3
rd
Iteration
gRPC request response caching
enabled
4
th
Iteration
Needed another longer term fix ??
1
st
Iteration
Impression History
Service library
2
nd
Iteration
gRPC service on top of database
Design Goals
Reduce read cost -
cache what clients
frequently read.
Support silent migration
Easily configurable
Improve latencies, at the
same time, not store too
much data in cache
Reduce read
amplification and
cassandra load
05
01
02 03
04
Cache Design and Rollout
A raw impression cache that would store time and size- bounded data with 1 day
lag time
■Started with a small test EVCache cluster
■Shadow testing with 1% of profiles and gradual dial-up from 10% to 100%
Improvements
■P90 latency dropped by more than 50%
■P50 latency improved to ~10ms
■50% reduction in Cassandra read throughput
■50% decrease in Cassandra CPU utilization
Future Work
■Cost optimizations
■Cassandra cluster operational improvements
■Check data retention
■Implement node-level caching to reduce P99 for Page Construction