Lessons Learned from Building a Serverless Notifications System by Srushith Repakula

ScyllaDB 106 views 23 slides Mar 11, 2025
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

Reaching your audience isn’t just about email. Learn how we built a scalable, cost-efficient notifications system using AWS serverless—handling SMS, WhatsApp, and more. From architecture to throttling challenges, this talk dives into key decisions for high-scale messaging.


Slide Content

A ScyllaDB Community
Lessons Learned from Building a
Serverless Notifications System
Srushith Repakul
Head of Engineering

Srushith Repakula (he/him)
■Head of Engineering, KonfHub
■AWS Serverless Hero
■Co-organiser of Serverless Group
■AWS UG Tirupati Lead

■What did we try to solve?
■How did we solve it?
■Why Serverless?
■Problems, challenges encountered and how we solved them
■Unconquered challenges
Agenda

What did we try to solve?

■A highly scalable notification system that can deliver notifications to
hundreds of thousands of recipients
■Should work for multiple channels - Email, SMS and WhatsApp
■Should support instant launch and scheduled campaigns
■Handle duplicates
■Real time analytics on deliverability, open, and click rates
What did we try to solve?

Amazon API Gateway
Amazon Simple Queue
Service (Amazon SQS)
AWS Lambda
Amazon EventBridge
Amazon Simple Email
Service (Amazon SES)
Amazon Simple
Notification Service
(Amazon SNS)
Amazon DynamoDB Amazon Aurora
Webhooks
Amazon Cognito
AWS Lambda
How did we solve?

■Amazon API Gateway
■Amazon Cognito
■AWS Lambda
■Amazon DynamoDB
■Amazon SQS
■Amazon Aurora
■Amazon EventBridge
■Amazon SES
■Amazon SNS
AWS Services used

■Deliverability of emails and domain reputation
■SQS - FIFO queue vs standard queue
■Throttling limits of the notification service providers. For example,
number of emails sent per second and per day
■Scalability, cost and time to finish the campaigns
■Lambda function’s max timeout of 15 minutes
■Scheduling of campaigns every minute
Problems and Challenges

Deliverability of emails and domain reputation
Dedicated IP

Each consumer uses a dedicated IP address
set for sending emails. For example, one
dedicated IP for transactional, one for
marketing, etc.
Shared IP

Multiple consumers use the same IP set for
sending emails
Pic credit:https://postmarkapp.com/guides/dedicated-vs-shared-ips-for-email-when-to-use-each

Dedicated IP Addresses for Amazon SES
Standard

Refers to dedicated IP addresses
that you manually set up and
manage, including the option to
manually warm them up and scale
them out, and to manually move
them in and out of IP pools
Managed

Refers to dedicated IP
addresses that are
automatically set up.

SES automatically warm up
for each ISP individually and
auto-scale based on your
sending volume
Bring your own IP addresses

Makes it possible to use your own
IP addresses to send email
through SES

Helpful when there is a positive IP
reputation built using an in-house
email sending system, but want to
migrate to Amazon SES

SQS - FIFO queue vs standard queue
Pic credit: https://www.bitslovers.com/sqs-fifo/

■Handling of duplicates and order
■Easier to understand the status of a campaign
■Unbatched throughput - 300
■Batched transactions per second FIFO - 3000
■Batched transactions per second for high throughput FIFO - 70,000*
■In flight messages per queue - 20,000
SQS - FIFO queue vs standard queue: The case for FIFO

■In flight messages per queue - 120,000
■Faster to process since there is no order to be maintained
■Cheaper in comparison to FIFO
SQS - FIFO queue vs standard queue: The case for Standard

Although Standard queue increase application layer
complexity like
■Implementation of idempotency
■Extra processing for fetching the campaign
status
■etc, Standard was the choice because:
■FIFO has only a 5-minute deduplication interval
■Slower when hundreds of messages are
published within the same group
SQS - FIFO queue vs standard queue: Standard wins!

Throttling limits are everywhere!
■Daily sending quota - the maximum number of emails that you can
send in a 24-hour period
■Maximum send rate - the maximum number of emails that Amazon
SES can accept from the account each second. One can exceed this
quota for short bursts, but not for sustained periods of time.
Throttling limits of the notification service providers

■The concurrency limit is reserved just for the lambda function ensuring
that there are execution environments available and not shared with
the account level settings
■The maximum number of execution environment instances
■Throttled when the reserved concurrency limit is reached
Conquering the throttling limits - Reserved Concurrency

Conquering the throttling limits - Maximum concurrency for SQS

Scheduling of campaigns every minute
The Challenges:
■Campaigns can be scheduled to the minute
■Scalability of the scheduler service - hundreds and thousands of
campaigns can be scheduled

The Challenges:
■Campaigns can be scheduled to the minute
■Scalability of the scheduler service - hundreds and thousands of campaigns
can be scheduled
Common solution - run a cron job every minute to pick up campaigns that are
scheduled. But
■Unnecessary costs involved in empty Lambda function invocations
■Increases application complexity
Scheduling of campaigns every minute

Scheduling of campaigns every minute

■Automated end-to-end testing of the entire architecture
■Observability and monitoring
■Automatic handling of service provider limits for “Bring your own
provider” use case
Unconquered Challenges

Stay in Touch
Srushith Repakula
https://x.com/SrushithR
https://github.com/SrushithR
https://www.linkedin.com/in/srushith/
Tags