Making sense of service quotas of AWS Serverless services and how to deal with them at AWS SLA London 2024

VadymKazulkin 65 views 66 slides May 03, 2024
Slide 1
Slide 1 of 66
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66

About This Presentation

There is a misunderstanding that everything is possible with the Serverless Services in AWS. For example, the misunderstanding that your Lambda function may scale without limitations. But each AWS service (not only Serverless) has a big list of quotas that everybody needs to be aware of, understand,...


Slide Content

Making sense of service quotas of AWS Serverless services and how to deal with them Vadym Kazulkin, ip.labs , Serverless Architecture Conference, 9 April 2024

Contact Vadym Kazulkin ip.labs GmbH Bonn, Germany Co-Organizer of the Java User Group Bonn [email protected] @VKazulkin https://dev.to/vkazulkin https://github.com/Vadym79/ https://de.slideshare.net/VadymKazulkin/ https://www.linkedin.com/in/vadymkazulkin https://www.iplabs.de/

ip.labs https://www.iplabs.de/

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Agenda Start with some basic AWS Serverless application Look at the various Service Quotas from different perspectives : ( Hyper ) Scaling Other important quotas to be aware of Re- architect our application to be more scalable , performant and resilient

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Serverless Application

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Service Quotas

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Service Quotas

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Service Quotas account and current region limits

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Service Quotas

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Service Quotas Request History

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Service Quotas Request History

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Serverless Application

API Gateway Important Service Quotas Quota Description Value Adjustable Default throughput / Throttle rate The maximum number of requests per second that your APIs can receive 10.000 Throttle burst rate The maximum number of additional requests per second that you can send in one burst 5.000

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Service Quotas https://aws.amazon.com/de/blogs/compute/building-well-architected-serverless-applications-controlling-serverless-api-access-part-2/ https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-request-throttling.html The throttle rate then determines how many requests are allowed per second The throttle burst determines how many additional requests are allowed per second API Gateway throttling-related settings are applied in the following order: Per-client or per-method throttling limits that you set for an API stage in a usage plan Per-method throttling limits that you set for an API stage Account-level throttling per Region AWS Regional throttling Token bucket algorithm

API Gateway Important Service Quotas Quota Description Value Adjustable Mitigation Max timeout The maximum integration timeout in milliseconds 29 sec API Payload size Maximum payload size for non WebSocket API 10 MB 1)The client makes an HTTP GET request to API Gateway, and the Lambda function generates and returns a presigned S3 URL 2)The client uploads the image to S3 directly, using the resigned S3 URL

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Serverless Application

Lambda Important Service Quotas Quota Description Value Adjustable Mitigation Concurrent executions / Concurrency limit The maximum number of events that functions can process simultaneously in the current region 1.000 Rearchitect Burst Concurrency Limit After the initial burst, concurrency scales by 1000 executions every 10 seconds up to your account concurrency limit. Each function within an account now scales independently from each other US West (Oregon), US East (N. Virginia), Europe (Ireland)= 3.000 Asia Pacific (Tokyo), Europe (Frankfurt), US East (Ohio)=1000 All other Regions =500 Use provisioned concurrency

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda Concurrency https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda Concurrency and TPS https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html Concurrency is the number of in-flight requests your AWS Lambda function is handling at the same time

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda Concurrency and TPS https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html Concurrency is the number of in-flight requests your AWS Lambda function is handling at the same time

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda Concurrency and TPS https://aws.amazon.com/de/blogs/aws/aws-lambda-functions-now-scale-12-times-faster-when-handling-high-volume-requests/ Concurrency is the number of in-flight requests your AWS Lambda function is handling at the same time

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda Concurrency and TPS https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/ Lambda concurrency limit is a limit on the simultaneous in-flight invocations allowed at the same time Transaction per second (TPS) = concurrency / function duration in seconds

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda Burst Limit and Cold Start https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/ If there are sudden and steep spikes in the number of cold starts, it can put pressure on the invoke services that handle these cold start operations, and also cause undesirable side effects for your application such as increased latencies, reduced cache efficiency and increased fan out on downstream dependencies The burst limit exists to protect against such surges of cold starts, especially for accounts that have a high concurrency limit

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda Burst Limit https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/ https://docs.aws.amazon.com/lambda/latest/dg/burst-concurrency.html The chart above shows the burst limit in action with a maximum concurrency limit of 3000 , a maximum burst(B) of 1000 and a refill rate(r) of 500/minute . The token bucket starts full with 1000 tokens, as is the available burst headroom

Lambda Important Service Quotas Quota Description Value Adjustable Mitigation TPS (Transaction per Second) The maximum number of TPS TPS = min(10 x concurrency , concurrency / function duration in seconds ) If the function duration is exactly 100ms (or 1/10th of a second), both terms in the min function are equal If the function duration is over 100ms, the second term is lower and TPS is limited as per concurrency/function duration If the function duration is under 100ms, the first term is lower and TPS is limited as per 10 x concurrency

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda TPS Limit https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/ https://www.linkedin.com/pulse/how-aws-lambda-works-underneath-shwetabh-shekhar/ The burst limit isn’t a rate limit on the invoke itself, but a rate limit on how quickly concurrency can rise. However, since invoke TPS is a function of concurrency, it also clamps how quickly TPS can rise.

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda TPS Limit https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/ https://www.linkedin.com/pulse/how-aws-lambda-works-underneath-shwetabh-shekhar/ The TPS limit exists to protect the Invoke Data Plane from the high churn of short-lived invocations. In case of short invocations of under 100ms, throughput is capped as though the function duration is 100ms (at 10 x concurrency). This implies that short lived invocations may be TPS limited, rather than concurrency limited.

Lambda Important Service Quotas Quota Description Value Adjustable Mitigation Concurrent executions / Concurrency limit The maximum number of events that functions can process simultaneously in the current region 1.000 Rearchitect Burst Concurrency Limit After the initial burst, concurrency scales by 1000 executions every 10 seconds up to your account concurrency limit. Each function within an account now scales independently from each other US West (Oregon), US East (N. Virginia), Europe (Ireland)= 3.000 Asia Pacific (Tokyo), Europe (Frankfurt), US East (Ohio)=1000 All other Regions =500 Use provisioned concurrency

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda Function level Concurrency

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea General Best Practices for using Lambda Optimize for cost-performance Use AWS Lambda Power Tuning Reuse AWS Service clients/connections outside of the Lambda handler Use the newest version of AWS SDK of programming language of your choice Minimize dependencies and package size Import only dependencies that you need (especially from AWS SDK) Use a keep-alive directive to maintain persistent connections Implement (other) best practices to reduce cold starts https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html

Lambda Important Service Quotas Quota Description Value Adjustable Mitigation Function timeout The maximum timeout that you can configure for a function 15 min Synchronous payload The maximum size of an incoming synchronous invocation request or outgoing response 6 MB For the Request: use API Gateway service proxy to S3 use pre-signed S3 URL and upload directly to S3 For the Response: Use response streaming (with AWS Lambda Web Adapter ) https://theburningmonk.com/2020/04/hit-the-6mb-lambda-payload-limit-heres-what-you-can-do/

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda Response Streaming https://aws.amazon.com/de/blogs/compute/introducing-aws-lambda-response-streaming/ You can use response streaming to send responses larger than Lambda’s 6 MB response payload limit up to a soft limit of 20 MB . Response streaming currently supports the Node.js 14.x and subsequent managed runtimes To indicate to the runtime that Lambda should stream your function’s responses, you must wrap your function handler with the streamifyResponse () decorator. This tells the runtime to use the correct stream logic path, allowing the function to stream responses exports.handler = awslambda. streamifyResponse ( async ( event , responseStream , context ) => { responseStream.setContentType (“ text / plain ”); responseStream.write (“Hello, world !”); responseStream.end ();});

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda Response Streaming (with AWS Lambda Web Adapter) https://aws.amazon.com/de/blogs/compute/using-response-streaming-with-aws-lambda-web-adapter-to-optimize-performance/ The Lambda Web Adapter , written in Rust, serves as a universal adapter for Lambda Runtime API and HTTP API It allows developers to package familiar HTTP 1.1/1.0 web applications , such as Express.js, Next.js, Flask , SpringBoot , or Laravel , and deploy them on AWS Lambda This replaces the need to modify the web application to accommodate Lambda’s input and output formats , reducing the complexity of adapting code to meet Lambda’s requirements

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda Response Streaming (with AWS Lambda Web Adapter) https://github.com/awslabs/aws-lambda-web-adapter https://aws.amazon.com/de/blogs/compute/using-response-streaming-with-aws-lambda-web-adapter-to-optimize-performance/ Spring Boot 3 example with AWS Lambda Web Adapter: https://github.com/Vadym79/AWSLambdaJavaWithSpringBoot/tree/master/spring-boot-3.2-with-lambda-web-adapter

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Serverless Application

SQS (Standard) Important Service Quotas Quota Description Value Adjustable Throughput per Standard Queue Standard queues support a nearly unlimited number of transactions per second (TPS) per API action. Nearly unlimited

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda scaling with SQS standard queues https://aws.amazon.com/about-aws/whats-new/2023/11/aws-lambda-polling-scale-rate-sqs-event-source/?nc1=h_ls https://aws.amazon.com/blogs/compute/introducing-faster-polling-scale-up-for-aws-lambda-functions-configured-with-amazon-sqs/ When a Lambda function subscribes to an SQS queue, Lambda polls the queue as it waits for messages to arrive. It consumes messages in batches, starting with 5 functions at a time If there are more messages in the queue, Lambda adds up to 300 functions/concurrent executions per minute, up to 1,000 functions (or up to your account concurrency limit ) , to consume those messages from the SQS queue This scaling behavior is managed by AWS and cannot be modified To process more messages, you can optimize your Lambda configuration for higher throughput

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Lambda scaling with SQS standard queues https://aws.amazon.com/de/blogs/compute/understanding-how-aws-lambda-scales-when-subscribed-to-amazon-sqs-queues/ https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting Increase the allocated memory for your Lambda function Optimize batching behavior : by default, Lambda batches up to 10 messages in a queue to process them during a single Lambda execution. You can increase this number up to 10,000 messages , or up to 6 MB of messages in a single batch for standard SQS queues If each payload size is 256KB (the maximum message size for SQS), Lambda can only take 23 messages per batch, regardless of the batch size setting Implement partial batch responses

SQS (Standard) Important Service Quotas Quota Description Value Adjustable Throughput per Standard Queue Standard queues support a nearly unlimited number of transactions per second (TPS) per API action. Nearly unlimited In-Flight Messages per Standard Queue The number of in-flight messages ( received from a queue by a consumer, but not yet deleted from the queue ) in a standard queue 120.000 Message size The size of a message 256KB

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Use BatchWriteItem for storing to DynamoDB The BatchWriteItem operation puts or deletes multiple items in one or more tables. A single call to BatchWriteItem can transmit up to 16MB of data over the network, consisting of up to 25 item put or delete operations use BatchWriteItem https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea SQS FIFO https://aws.amazon.com/blogs/compute/solving-complex-ordering-challenges-with-amazon-sqs-fifo-queues/ https://jayendrapatil.com/aws-sqs-standard-vs-fifo-queue/ SQS Standard Queue SQS FIFO Queue Ordering Best Effort Ordering First In First Out Ordering within the Message Group Delivery At Least Once Exactly once

SQS (FIFO) Important Service Quotas Quota Description Value Adjustable Batched Message Throughput for FIFO Queues The number of batched transactions per second (TPS) for FIFO queues 3.000 In-Flight Messages per FIFO Queue The number of in-flight messages in a FIFO queue 20.000

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea SQS FIFO https://aws.amazon.com/blogs/compute/solving-complex-ordering-challenges-with-amazon-sqs-fifo-queues/ https://jayendrapatil.com/aws-sqs-standard-vs-fifo-queue/

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea SQS FIFO Message Groups and Multiple Consumers https://aws.amazon.com/blogs/compute/solving-complex-ordering-challenges-with-amazon-sqs-fifo-queues/ https://jayendrapatil.com/aws-sqs-standard-vs-fifo-queue/ The combination of increased messages and extra processing time for the new features means that a single consumer is too slow. The solution is to scale to have more consumers and process messages in parallel To work in parallel, only the messages related to a single Auction must be kept in order. FIFO can handle that case with a feature called message groups. Each transaction related to Auction A is placed by your producer into message group A, and so on

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea SQS FIFO Message Groups and Multiple Consumers https://aws.amazon.com/blogs/compute/solving-complex-ordering-challenges-with-amazon-sqs-fifo-queues/ https://jayendrapatil.com/aws-sqs-standard-vs-fifo-queue/

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea SQS FIFO High Throughput Mode https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/high-throughput-fifo.html High throughput for FIFO queues supports a higher number of requests per API, per second To increase the number of requests in high throughput for FIFO queues, you can increase the number of message groups you use. Each message group supports 300 requests per second Quota Description Value Adjustable Throughput for FIFO High Throughput mode Number of transactions per second (TPS) per API in the high throughput node of FIFO queue 2.400-9.000

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Serverless Application

DynamoDB Important Service Quotas Quota Description Value Adjustable Mitigation Table-level read / wrtie throughput limit The maximum number of read/write throughput allocated for a table or global secondary index 40.000 RCU/ 40.000 WCU Ask for quote increase Table-Level burst capacity for provisioned capacity mode During an occasional burst of read or write activity, these extra capacity units can be consumed quickly up to 300 seconds of unused RCUs and WCUs Partition-level read / write throughput The maximum number of read/write throughput allocated for a partition 3000 RCU /1000 WCU Use best practices to avoid hot partition https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#default-limits-throughput-capacity-modes https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html#bp-partition-key-throughput-bursting

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Recommendations for partition keys Use high-cardinality attributes These are attributes that have distinct values for each item, like email_id , employee_no , customer_id , session_id , order_id , and so on Use composite attributes Try to combine more than one attribute to form a unique key, if that meets your access pattern. For example, consider an orders table with customerid#productid#countrycode as the partition key and order_date as the sort key, where the symbol # is used to split different field Add random numbers or digits from a predetermined range for write-heavy use cases Suppose that you expect a large volume of writes for a partition key (for example, greater than 1000K writes per second). In this case, use an additional prefix or suffix (a fixed number from predetermined range, say 0–9) and add it to the partition key, like InvoiceNumber#Random (0-N) https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-uniform-load.html https://aws.amazon.com/de/blogs/database/choosing-the-right-dynamodb-partition-key/

DynamoDB Important Service Quotas Quota Description Value Adjustable Initial throughput for on-demand capacity mode Initial throughput for on-demand capacity mode See futher details https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#default-limits-throughput-capacity-modes

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea DynamoDB On-Demand Capacity Mode On-Demand Capacity Mode ideal for: Unknown workloads Frequently idle workloads Have unpredictable application traffic Low management overhead (truly serverless mode) Pricing based on the actual data reads and writes your application performs on your tables

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Initial Throughput For DynamoDB On-Demand Capacity Mode Newly created table with on-demand capacity mode: enables newly created on-demand tables to serve up to 4,000 WCUs or 12,000 RCUs If you exceed double your previous traffic's peak within 30 minutes , then you might experience throttling One solution is to pre-warm the tables to the anticipated peak capacity of the spike by: Performing the load test Creating table in provisioned mode with high enough WCUs/RCUs and then switch to on-demand mode https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html#HowItWorks.InitialThroughput

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Initial Throughput For DynamoDB On-Demand Capacity Mode Existing table switched from provisioned to on-demand capacity mode: The previous peak is half the maximum write capacity units and read capacity units provisioned since the table was created or the settings for a newly created table with on-demand capacity mode, whichever is higher In other words, your table will deliver at least as much throughput as it did prior to switching to on-demand capacity mode https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html#HowItWorks.InitialThroughput

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Serverless Application

Aurora (Serverless v2) Important Service Quotas Quotas Quota Description Value Adjustable Data API requests per second The maximum number of requests to the Data API per second allowed The max_connections value for Aurora Serverless v2DB instances is based on the memory size derived from the maximum ACUs. However, when you specify a minimum capacity of 0.5 ACUs on PostgreSQL-compatible DB instances, the maximum value of max_connections is capped at 2,000. https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.setting-capacity.html#aurora-serverless-v2.max-connections

Aurora (Serverless v2) Important Service Quotas Quotas https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.setting-capacity.html#aurora-serverless-v2.max-connections

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea DynamoDB vs Aurora (Serverless v2) DynamoDB DynamoDB + DAX Investment in Knowledge Understanding of NoSQL databases Understanding of single- table design principles Same Requires to put Lambda into VPC to access

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea DynamoDB vs Aurora (Serverless v2) Aurora Serverless v2 Aurora Serverless v2 + Data API Investment in Knowledge Relational databases are familiar to many Same Engine Support MySQL and PostgreSQL Currently only PostgreSQL Requires to put Lambda into VPC to access May require Amazon RDS Proxy for connection pooling https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.Aurora_Fea_Regions_DB-eng. Feature .ServerlessV2.html https://dev.to/aws-builders/data-api-for-amazon-aurora-serverless-v2-with-aws-sdk-for-java-part-1-introduction-and-set-up-of-the-sample-application-3g71

S3 Important Service Quotas Quota Description Value Adjustable Maximal number of requests per second Maximal number of requests per second per partitioned Amazon S3 prefix At least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD Maximal file size Individual Amazon S3 objects size Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB. For objects larger than 100 MB, customers should consider using the multipart upload capability. https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Other Optimizations: Caching Put DynamoDB Accelerator (DAX) in front of DynamoDB Requires putting Lambda behind VPC ElastiCache before Aurora Serverless Requires putting Lambda behind VPC No “pay as you go” pricing Enable API Gateway Caching Uses ElastiCache behind the scenes No “pay as you go” pricing for ElastiCache Use CloudFront (and its caching capabilities) in front of API Gateway

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Other Optimizations: Error Handling and Retries Set meaningful timeouts For API Gateway, Lambda Retry with exponential backoff and jitter AWS SDK supports them out of the box Implement idempotency AWS Lambda Powertools (Java, Python) supports idempotency module https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/ https://aws.amazon.com/de/blogs/architecture/exponential-backoff-and-jitter/ https://aws.amazon.com/blogs/compute/handling-lambda-functions-idempotency-with-aws-lambda-powertools/

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea AWS “Virtual Waiting Room” Solution Virtual Waiting Room https://aws.amazon.com/solutions/implementations/virtual-waiting-room-on-aws/

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea AWS “Virtual Waiting Room” Solution Open-source project written in Python that can be integrated into existing applications Source code available on GitHub Different CloudFormation Templates t o choose from (from minimal to e xtended solutions) E stimated costs for a 50,000-user and a 100,000-user waiting room with an event duration ranging 2-4 hours Virtual Waiting Room on AWS has been load tested with a tool called Locust . The simulated event sizes ranged from 10,000 to 100,000 clients https://docs.aws.amazon.com/solutions/latest/virtual-waiting-room-on-aws/architecture-overview.html https://docs.aws.amazon.com/pdfs/solutions/latest/virtual-waiting-room-on-aws/virtual-waiting-room-on-aws.pdf

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Services Quotas of other Serveress services More Serverless services, more service quotas  CloudFront EventBridge SNS Kinesis StepFunctions

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea General Best Practices for Service Quotas Know, understand and observe the service quotas Architect with service quotas in mind AWS adjusts them from time to time In case I’d like to request the quota increase, provide a valid justification for the new desired value Service quotas are valid per AWS account (per region) Use different AWS accounts for development and testing Use different AWS accounts for independent (micro-)services Separate AWS accounts on the team level Use AWS Organizations

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea Recent AWS Quota Increases