Netflix's Scalable Page Construction with Real-Time Impression History by Saurabh Jaluka & Tulika Bhatt

ScyllaDB 0 views 23 slides Oct 15, 2025
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

Netflix built a Page Construction Architecture to deliver scalable, efficient, and personalized experiences across devices from phones to TVs. This talk introduces the API-driven system that centralizes content and module selection, decouples business logic from clients, and standardizes data models...


Slide Content

A ScyllaDB Community
Saurabh Jaluka
Senior Software Engineer
Tulika Bhatt
Senior Software Engineer
Netflix's Scalable Page Construction
with Real-Time Impression History

Hello!

■Part of the platform team powering every Netflix page, ensuring
availability and low latency for hundreds of millions of users.
■To me, P99s reflect the real user experience at the edge.
■Outside of work, I enjoy experimenting in the kitchen and
embracing my newest role as a proud new dad.
■I manage core data systems at Netflix that power recommendations for
over 300M subscribers, processing trillions of events each year.
■I see P99s as the real measure of reliability in large-scale data
pipelines.
■Outside of work, I enjoy traveling, exploring new cuisines, and sharing
knowledge through mentoring and speaking.

GROWING, HIGHLY
ENGAGED
AUDIENCES
> 300M 18,000190
Global Netflix
household members
Number of titles in the
catalog
Number of countries
188B HV
Number of Hours
Viewed annually

Scale At Netflix

■Diverse Ecosystem: Supporting thousands of client devices and various
pages (Home, TV Shows, Movies, Search, etc.).
■Constant Flux: High velocity of A/B experiments and independent service
deployments.
■Resulting Challenge: We can’t precompute and cache every possible
outcome. Every page is assembled dynamically, on-demand.

Page Construction

What is it?
■Assembles responses from various
microservices.
■Dictates client layout from the server for
agility.
■Defines page structure via flexible,
non-hardcoded configurations.
Select
Page Layout
Generate
Sections
Assemble
Page

Select Page Layout
LOLOMO Gallery Feed

Generate Sections

Assemble Page

How We Guarantee a Fast Experience
Our Defense: Resilience & Fallbacks

●Enforcing a 3-second hard timeout for all page loads.

●Serving cached content instantly if personalization models are
slow.

●Principle: A fast response is always better than a slow one.

How We Guarantee a Fast Experience
Our Offense: Fragmented Construction

●Achieving a 2.1-second P99 latency.

●Building the page in pieces, loading the next fragment
on-demand.

●Result: Faster initial loads and more dynamic personalization.

Dynamic Personalization
A Romantic Comedy Movies row may surface for members who just engaged with ‘Set It Up.’

Personalization Under Pressure
●Challenge: How do we deliver deep personalization with a
3-second P99 timeout?

●Constraint: At our scale, real-time personalization is a necessity.

●Bottleneck: Personalization is only as fast as its slowest data
source.

Fast Data for Personalization
●Personalization models consume
most of the latency budget.

●Model speed depends on fast,
real-time data inputs.

●Impression History provides critical
data.

Page
Construction
Personalization
Models
Impression
History
Service
Other
Data
Providers

Impression History Service

What Is Impression?
■Any image asset that is presented
to a user
■Certain % of boxart being seen
■Up-down and to-from scroll
movement
■Cached impressions vs new
request impressions
■Deduplication key + Filters
depends on use case

Scale of Impression History
Read path

■200K+ RPS at peak
■99th percentile
latency ~ 120 ms

Write path

■1.5K RPS globally
■End-to-end P90
latency ~ 10s
■P90 latency ~ 300
ms

Impression History Evolution Timeline
3
rd
Iteration
gRPC request response caching
enabled
4
th
Iteration
Needed another longer term fix ??
1
st
Iteration
Impression History
Service library
2
nd
Iteration
gRPC service on top of database

Design Goals
Reduce read cost -
cache what clients
frequently read.
Support silent migration
Easily configurable
Improve latencies, at the
same time, not store too
much data in cache
Reduce read
amplification and
cassandra load
05
01
02 03
04

Cache Design and Rollout
A raw impression cache that would store time and size- bounded data with 1 day
lag time

■Started with a small test EVCache cluster
■Shadow testing with 1% of profiles and gradual dial-up from 10% to 100%

Improvements
■P90 latency dropped by more than 50%
■P50 latency improved to ~10ms
■50% reduction in Cassandra read throughput
■50% decrease in Cassandra CPU utilization

Future Work
■Cost optimizations
■Cassandra cluster operational improvements
■Check data retention
■Implement node-level caching to reduce P99 for Page Construction

Thank you! Let’s connect.
Saurabh Jaluka
https://www.linkedin.com/in/sjaluka/


Tulika Bhatt
https://www.linkedin.com/in/tulikabhatt/
Tags