Lessons Learned from Building a Serverless Notifications System.pdf
SrushithR
33 views
22 slides
May 17, 2024
Slide 1 of 22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
About This Presentation
Building a notification campaign might seem easy and it is easy to get started with a simple set up. But once the scale kicks in, it becomes every important to have a resilient architecture that can handle hundreds of thousands of recipients.
This talk will focus on the Serverless services consumed...
Building a notification campaign might seem easy and it is easy to get started with a simple set up. But once the scale kicks in, it becomes every important to have a resilient architecture that can handle hundreds of thousands of recipients.
This talk will focus on the Serverless services consumed in building the architecture and the various architectural decisions.
The talk covers the various challenges in building an architecture of this sorts and how we overcame them using Serverless services.
Size: 1.18 MB
Language: en
Added: May 17, 2024
Slides: 22 pages
Slide Content
@SrushithR@SrushithR
Srushith Repakula
Head of Engineering, KonfHub
AWS Serverless Hero
Lessons Learned from Building a
Serverless Notifications System
@SrushithR@SrushithR
Srushith Repakula
AWS Serverless Hero
Head of Engineering at KonfHub
Serverless Consultant
AWS UG Tirupati Lead
Co-organiser of Serverless Group
Host of ServerlessSaturdays
@SrushithR
Agenda
1.What did we try to solve?
2.How did we solve it?
3.Why Serverless?
4.Problems, challenges encountered and how we conquered it
5.Unconquered challenges
@SrushithR
●A highly scalable notification system that can deliver notifications to
hundreds of thousands of recipients
●Should work for multiple channels - Email, SMS and WhatsApp
●Should support instant launch and scheduled campaigns
●Handle duplicates
●Real time analytics on deliverability, open, and click rates
What did we try to solve?
@SrushithR
How did we solve it?
Amazon API Gateway
Amazon Simple Queue
Service (Amazon SQS)
AWS Lambda
Amazon EventBridge
Amazon Simple Email
Service (Amazon SES)
Amazon Simple Notification
Service (Amazon SNS)
AWS Lambda
Amazon Cognito
Amazon DynamoDB Amazon Aurora
Webhooks
@SrushithR
AWS Services used
●Amazon API Gateway
●Amazon Cognito
●AWS Lambda
●Amazon DynamoDB
●Amazon SQS
●Amazon Aurora
●Amazon EventBridge
●Amazon SES
●Amazon SNS
@SrushithR
@SrushithR
Problems and Challenges
●Deliverability of emails and domain reputation
●SQS - FIFO queue vs standard queue
●Throttling limits of the notification service providers. For example, number of
emails sent per second and per day
●Scalability, cost and time to finish the campaigns
●Lambda function’s max timeout of 15 minutes
●Scheduling of campaigns every minute
@SrushithR
Deliverability of emails and domain reputation
Dedicated IP
Each consumer uses a dedicated IP
address set for sending emails. For
example, one dedicated IP for
transactional, one for marketing, etc.
Shared IP
Multiple consumers use the same IP
set for sending emails
Pic credit:https://postmarkapp.com/guides/dedicated-vs-shared-ips-for-email-when-to-use-each
@SrushithR
Dedicated IP Addresses for Amazon SES
Standard
Refers to dedicated IP
addresses that you
manually set up and
manage, including the
option to manually warm
them up and scale them
out, and to manually move
them in and out of IP
pools
Managed
Refers to dedicated IP
addresses that are
automatically set up.
SES automatically warm
up for each ISP
individually and
auto-scale based on your
sending volume
Bring your own IP
addresses
Makes it possible to use your
own IP addresses to send
email through SES
Helpful when there is a
positive IP reputation built
using an in-house email
sending system, but want to
migrate to Amazon SES
@SrushithR
SQS - FIFO queue vs standard queue
Pic credit:https://www.bitslovers.com/sqs-fifo/
@SrushithR
SQS - FIFO queue vs standard queue: The case for FIFO
●Handling of duplicates and order
●Easier to understand the status of a campaign
●Unbatched throughput - 300
●Batched transactions per second FIFO - 3000
●Batched transactions per second for high throughput FIFO - 70,000*
●In flight messages per queue - 20,000
@SrushithR
●In flight messages per queue - 120,000
●Faster to process since there is no order to be maintained
●Cheaper in comparison to FIFO
SQS - FIFO queue vs standard queue: The case for Standard
@SrushithR
Although Standard queue increase application
layer complexity like
●Implementation of idempotency
●Extra processing for fetching the campaign
status
etc, Standard was the choice because:
●FIFO has only a 5-minute deduplication
interval
●Slower when hundreds of messages are
published within the same group
SQS - FIFO queue vs standard queue: Standard wins!
@SrushithR
Throttling limits are everywhere!
●Daily sending quota - the maximum number of emails that you can send in a
24-hour period
●Maximum send rate - the maximum number of emails that Amazon SES can
accept from the account each second. One can exceed this quota for short
bursts, but not for sustained periods of time.
Throttling limits of the notification service providers
@SrushithR
Conquering the throttling limits - Reserved Concurrency
●The concurrency limit is reserved just for the lambda function ensuring that there
are execution environments available and not shared with the account level
settings
●The maximum number of execution environment instances
●Throttled when the reserved concurrency limit is reached
@SrushithR
Conquering the throttling limits - Maximum concurrency for SQS
@SrushithR
Scheduling of campaigns every minute
The Challenges:
●Campaigns can be scheduled to the minute
●Scalability of the scheduler service - hundreds and thousands of
campaigns can be scheduled
@SrushithR
Scheduling of campaigns every minute
The Challenges:
●Campaigns can be scheduled to the minute
●Scalability of the scheduler service - hundreds and thousands of campaigns
can be scheduled
Common solution - run a cron job every minute to pick up campaigns that
are scheduled
Problems with this solution:
●Unnecessary costs involved in empty Lambda function invocations
●Increases application complexity
@SrushithR
Scheduling of campaigns every minute
@SrushithR
Unconquered Challenges
●Automated end-to-end testing of the entire architecture
●Observability and monitoring
●Automatic handling of service provider limits for “Bring your own provider”
use case