Netflix's Scalable Page Construction with Real-Time Impression History by Saurabh Jaluka & Tulika Bhatt

ScyllaDB 0 views 23 slides Oct 15, 2025

Slide 1 of 23

About This Presentation

Netflix built a Page Construction Architecture to deliver scalable, efficient, and personalized experiences across devices from phones to TVs. This talk introduces the API-driven system that centralizes content and module selection, decouples business logic from clients, and standardizes data models...

Size: 4.55 MB

Language: en

Added: Oct 15, 2025

Slides: 23 pages

Slide Content

A ScyllaDB Community
Saurabh Jaluka
Senior Software Engineer
Tulika Bhatt
Senior Software Engineer
Netﬂix's Scalable Page Construction
with Real-Time Impression History

Hello!

■Part of the platform team powering every Netﬂix page, ensuring
availability and low latency for hundreds of millions of users.
■To me, P99s reﬂect the real user experience at the edge.
■Outside of work, I enjoy experimenting in the kitchen and
embracing my newest role as a proud new dad.
■I manage core data systems at Netﬂix that power recommendations for
over 300M subscribers, processing trillions of events each year.
■I see P99s as the real measure of reliability in large-scale data
pipelines.
■Outside of work, I enjoy traveling, exploring new cuisines, and sharing
knowledge through mentoring and speaking.

GROWING, HIGHLY
ENGAGED
AUDIENCES
> 300M 18,000190
Global Netflix
household members
Number of titles in the
catalog
Number of countries
188B HV
Number of Hours
Viewed annually

Scale At Netﬂix

■Diverse Ecosystem: Supporting thousands of client devices and various
pages (Home, TV Shows, Movies, Search, etc.).
■Constant Flux: High velocity of A/B experiments and independent service
deployments.
■Resulting Challenge: We can’t precompute and cache every possible
outcome. Every page is assembled dynamically, on-demand.

Page Construction

What is it?
■Assembles responses from various
microservices.
■Dictates client layout from the server for
agility.
■Deﬁnes page structure via ﬂexible,
non-hardcoded conﬁgurations.
Select
Page Layout
Generate
Sections
Assemble
Page

Select Page Layout
LOLOMO Gallery Feed

Generate Sections

Assemble Page

How We Guarantee a Fast Experience
Our Defense: Resilience & Fallbacks

●Enforcing a 3-second hard timeout for all page loads.

●Serving cached content instantly if personalization models are
slow.

●Principle: A fast response is always better than a slow one.

How We Guarantee a Fast Experience
Our Offense: Fragmented Construction

●Achieving a 2.1-second P99 latency.

●Building the page in pieces, loading the next fragment
on-demand.

●Result: Faster initial loads and more dynamic personalization.

Dynamic Personalization
A Romantic Comedy Movies row may surface for members who just engaged with ‘Set It Up.’

Personalization Under Pressure
●Challenge: How do we deliver deep personalization with a
3-second P99 timeout?

●Constraint: At our scale, real-time personalization is a necessity.

●Bottleneck: Personalization is only as fast as its slowest data
source.

Fast Data for Personalization
●Personalization models consume
most of the latency budget.

●Model speed depends on fast,
real-time data inputs.

●Impression History provides critical
data.

Page
Construction
Personalization
Models
Impression
History
Service
Other
Data
Providers

Impression History Service

What Is Impression?
■Any image asset that is presented
to a user
■Certain % of boxart being seen
■Up-down and to-from scroll
movement
■Cached impressions vs new
request impressions
■Deduplication key + Filters
depends on use case

Scale of Impression History
Read path

■200K+ RPS at peak
■99th percentile
latency ~ 120 ms

Write path

■1.5K RPS globally
■End-to-end P90
latency ~ 10s
■P90 latency ~ 300
ms

Impression History Evolution Timeline
3
rd
Iteration
gRPC request response caching
enabled
4
th
Iteration
Needed another longer term ﬁx ??
1
st
Iteration
Impression History
Service library
2
nd
Iteration
gRPC service on top of database

Design Goals
Reduce read cost -
cache what clients
frequently read.
Support silent migration
Easily conﬁgurable
Improve latencies, at the
same time, not store too
much data in cache
Reduce read
ampliﬁcation and
cassandra load
05
01
02 03
04

Cache Design and Rollout
A raw impression cache that would store time and size- bounded data with 1 day
lag time

■Started with a small test EVCache cluster
■Shadow testing with 1% of proﬁles and gradual dial-up from 10% to 100%

Improvements
■P90 latency dropped by more than 50%
■P50 latency improved to ~10ms
■50% reduction in Cassandra read throughput
■50% decrease in Cassandra CPU utilization

Future Work
■Cost optimizations
■Cassandra cluster operational improvements
■Check data retention
■Implement node-level caching to reduce P99 for Page Construction

Thank you! Let’s connect.
Saurabh Jaluka
https://www.linkedin.com/in/sjaluka/

Tulika Bhatt
https://www.linkedin.com/in/tulikabhatt/

Netflix's Scalable Page Construction with Real-Time Impression History by Saurabh Jaluka & Tulika Bhatt

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Netflix&#39;s Scalable Page Construction with Real-Time Impression History by Saurabh Jaluka &amp; Tulika Bhatt

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx

Netflix's Scalable Page Construction with Real-Time Impression History by Saurabh Jaluka & Tulika Bhatt