Building resilient serverless workloads: Navigating through failures
JimmyDahlqvist
22 views
35 slides
Sep 19, 2024
Slide 1 of 35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
About This Presentation
My talk "Building resilient serverless workloads:
Navigating through failures"
from AWS Community Day DACH
Size: 20.64 MB
Language: en
Added: Sep 19, 2024
Slides: 35 pages
Slide Content
Building resilient serverless workloads: Navigating through failures JIMMY DAHLQVIST | 2024-09-17
Building resilient serverless workloads: Navigating through failures JIMMY DAHLQVIST | 2024-09-17
Thank You!
JIMMY DAHLQVIST Head of AWS @ Sigma Technology Cloud Founder of serverless- handbook.com AWS Serverless Hero ☁️ User Group Leader ☁️ AWS Ambassador § Hello, I'm
Agenda What is serverless and resiliency Architecting resilient system – Good practices Summary What did you learn?
What is serverless? Automatic and flexible scaling No capacity planning High Availability Pay-for-use billing
What is resiliency? The ability for a software solution to handle the impact of problems, and recover from turbulent conditions, when other parts in the system fails.
“Everything fails all the time Dr. Werner Vogels , CTO, Amazon.com
Understand AWS Services Everything has a limit Understand how services work under the hood
Resiliency testing (in prod- ish ) Chaos Engineering Amazon Fault Injector Service Start in QA Don’t forget about data
Web application
Do we need an immediate response?
Storage-First
Storage-First Data-centric design Durability and availability Scalable System Design Asynchronous processing
Storage-First Things to consider Architectural complexity Eventual consistency Design for idempotency Risk of over-optimization
Retries Selfish Exponential backoff Users can make it worse
Retries with backoff and jitter No Jitter With Jitter Image: Amazon Architecture blog (https:// tinyurl.com /y48t2v4h)
DLQ
DLQ
Circuit breaker
Circuit breaker Half Open
Circuit breaker Avoid cascading failures Protect system resources Risk of early circuit break Good observability required
Put it all together
Notification Service Payment Service
What we talked about Design for failure Buffer and store messages first Process asynchronously Level the load Retry on failures Break if integrations are not healthy
Quiz with me!
@jimmydahlqvist dahlqvistjimmy https://serverless- handbook.com https:// jimmydqv.com THANK YOU
@jimmydahlqvist dahlqvistjimmy https://serverless- handbook.com https:// jimmydqv.com PLEASE FILL IN THE SESSION SURVEY