Zero Downtime Critical Traffic Migration @Netflix Scale
ScyllaDB
342 views
25 slides
Jun 24, 2024
Slide 1 of 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
About This Presentation
Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. Behind these perfect moments of entertainment is a complex mechanism, with numerous gears and cogs working in harmony. But what happens when this ma...
Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. Behind these perfect moments of entertainment is a complex mechanism, with numerous gears and cogs working in harmony. But what happens when this machinery needs a transformation? This is where large-scale system migrations come into play.
Come join us to learn about how Netflix does these migrations at scale with replay traffic testing, canary analysis, dual writes/reads, and with NO downtime.
Size: 4.91 MB
Language: en
Added: Jun 24, 2024
Slides: 25 pages
Slide Content
Zero Downtime Critical Traffic Migration @Netflix Scale Abhishek Pandey Tech Lead at Meta + Ex Senior Engineer at Netflix
Abhishek Pandey ( he/him/his ) Tech Lead at Meta Migrated and modernized bunch of critical Netflix components. Explore nature with my wife and newborn. Travel and Play Tennis 2 truths and a lie: Caused global outage at Uber, Met Roger Federer, Got fired once.
Introduction Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. Behind these perfect moments of entertainment is a complex mechanism, with numerous gears and cogs working in harmony. Large-scale system migrations are necessary when this machinery needs transformation.
Introduction Challenges in Transitioning Traffic Netflix's Challenge: Uninterrupted Streaming Backend Systems: Orchestrating Product Experience Evolution and Optimization of Backend Systems Focus of this talk: Migration Strategies
Challenges of System Migrations Main Challenge: Transitioning Traffic with No Customer Impact Ensuring confidence in upgraded architecture. Strategies to meet Quality-of-Experience metrics Architecture of Backend Systems Distributed microservices architecture. Migration points across the service call graph. Stateless and stateful APIs involved.
Replay Traffic Testing What is replay traffic testing? Benefits of using replay traffic. Sandboxed testing at scale. Exercise diversity of inputs. Functional correctness, performance validation and load testing.
Replay Traffic Testing Components Component 1: Traffic Duplication and Correlation Clone and Fork Production Traffic Record and Correlate Responses Component 2: Comparative Analysis and Reporting Compare and Analyze Responses Generate Comprehensive Reports
Approaches for Replay Traffic Generation Device Driven Approach
Approaches for Replay Traffic Generation Server Driven Approach
Approaches for Replay Traffic Generation Dedicated Service Approach
Clean Up Cleanup and Optimization Removing Migration-Related Code Documentation for Future Migrations
Conclusion Utilized diverse techniques for various migrations. Achieved success with minimal downtime. Gained valuable insights and refined methods. Customized strategies for unique migration scenarios. Goal: Seamless migrations without disruptions.