Event-driven architecture patterns in highly scalable image storage solution at GoTo x AWS EDA Day Warsaw 2024
VadymKazulkin
31 views
55 slides
Sep 21, 2024
Slide 1 of 55
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
About This Presentation
ip.labs is the world's leading white label e-commerce software imaging company and processes millions of images every day. The workflows of our users consist of designing, saving, loading, ordering and delivering to the printing facilities the photo products like prints, photobooks, calendars, g...
ip.labs is the world's leading white label e-commerce software imaging company and processes millions of images every day. The workflows of our users consist of designing, saving, loading, ordering and delivering to the printing facilities the photo products like prints, photobooks, calendars, gift products among others. In this talk we'll explore our motivation and architecture behind the reimplementation of our image storage solution based on AWS Serverless services like API Gateway, Lambda, DynamoDB, SQS, SNS, EventBridge, Kinesis and others. We'll especially dive deeper into the parts using event-driven communication and explore patterns like storage-first, fan out, change log and cross-account logging among others and explain our architectural decisions.
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
About ip.labs
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Agenda
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Application architecture back in 2018-2019 after initial
migration to AWS
Motivation behind the reimplementation of our image
storage solution to one based on AWS Serverless services
Current architecture and APIs of our image storage
solution
Event-driven patterns inour image storage solution
Challenges and lessons learned
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Application architecture back in 2017 in the datacenters
Challenges with the
operations and application:
•The same application was
deployed in different
datacenters across the globe
•Mainlyin Europe and Asia
•Different vendorsforhardware
(loadbalancer, virtualization,
NFS storage) used
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
AWS Migration
Main focus of migration was:
•Use the same AWS services in all regions
•The abilitytogogloballyand havethe
completeinternal ownershipof
operations
•Automateeverything
•Use Infrastructure asCode
•Provideundependendproduction-near
stagingand testenvironments
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Application architecture back in 2018-2019 after initial
migration to AWS
During the migration to AWS we:
•Automated everything
•On theinfrastructuraland
deploymentlevel
•No hot patches, only immutable
infrastructure
•Introducedseveralundependendcore
microservices
•Useddifferent auto-scalingpoliciesfor
them
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Application architecture back in 2018-2019 after initial
migration to AWS
During the migration to AWS we:
•Separated the storage for static web
assets, static product images and
videos and personal images
•Separatedimageuploadsand order
downloadsbythelabs
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Motivation behind the reimplementation of our image
storage solution
Technical challenges
remained:
•Web Application deployed on
each EC2 server had front and
backend bundeled together
•Frontend mainlycontained
photoproductbuilders/editors
and shopfrontend
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Motivation behind the reimplementation of our image
storage solution
Technical challenges
remained:
•Backend contained nearly all
internal services bundeled
together as monolith
•Photoproductcreation, save
and loadofcloudproducts,
usermanagement, payment,
ordering, external integrations
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Motivation behind the reimplementation of our image
storage solution
Technical challenges remained:
•Central EFS for image
upload/download by the end user
still used from all EC2 servers
•Scalabilityissueson theshared
EFS will propagatetoall EC2
serversand makethemain
applicationunavailable
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Motivation behind the reimplementation of our image
storage solution
Technical challenges
remained:
•No clearly defined APIs for image
storage solution
•Low testabilityoftheimage
storagesolution
•Noclearlyownershipforimage
storagesolution
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Motivation behind the reimplementation of our image
storage solution
Product/Strategical challenges
remained:
•Our main application also contained
completely self-written eCommerce system
•Shop frontend, shopbackend, user
management, paymentand ordering
workflow, statistics/BI
•Ourapplicationcontainedexternal APIs to
enableUser SSO and External Cart
integrations
•Integration and maintenancecostswere
high forourcustomers
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Motivation behind the reimplementation of our image
storage solution
Product/Strategical challenges
remained:
•Business expressed the need to support
direct integrations with the popular
eCommerce solutions
•Magento/Adobe Cloud, Shopify
•Image handling/workflowwas a partofthis
monolithicalfront and backend
•Allthoughitprovidedgenericfunctionallity
itwas tightlycoupledtotheself-written
eCommerce solution
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Why we chose AWS Serverless for image the storage
solution?
•We already experimented with AWS
Serverless services for the new
development on a smaller scale since our
early days in AWS in 2018
•We saw a lot of benefits to fully utilize the
power of the AWS cloud
•further increasing the speed of delivery
•focus on our core capabilities by relying on
the AWS managed services
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Why we chose AWS Serverless for image the storage
solution?
•It becamealso a cultural thing
•Developers started to learn, build and share
knowledge in AWS Serverless
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Current architecture our image storage solution
File storage modules:
•File API
•Project API
•Ordering API
•Config API
•Support API
•Scheduler Jobs
•Reporting API
•Migration API
•…
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Current architecture our image storage solution
File storage modules:
•File API
•Project API
•Ordering API
•Config API
•Customer Support API
•Scheduler Jobs
•Reporting API
•Migration API
•…
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Different workflows and life cycles of images
•(Mobile)upload
•Mobile upload + long term storage
•Saved projects
•Savedprojectssharing
•Saved (internal and external) cart items
•Ordered items
•Manual (by the end user) orautomated (expirations,
user log outs) image deletions and clean ups
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Queues and DLQs everywhere
Use SQS for:
•Decoupling image thumbnail(s)
creation with “S3 object created
event”
•“Post Upload Lambda” takes care of
it in batches
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Queues and DLQs everywhere
Use SQS for:
•“Files Usage Handling” Queue for
events that saved projects or
files/images were deleted (actively by
the user or expired)
•“Process File Usage Handling Lambda “
checks in batches whether
files/images are still referenced in any
kinds of projects.
•If not, Lambda marks them as
“not_in_use” in DynamoDB “File Table”.
•Files/Images themselves will be
cleaned up later asynchronously(see
scheduled events section)
Firdaws Aboulaye & Vadym Kazulkin ip.labs
User State Observer with SQS (FIFO), EventBridge and
Lambda
•“User State FIFOSQS Queue” to
preserve the order of the user state
events (user log in, user log out)
•Use “External Events Observer
EventBridge” for multiple targets (in
our case 2 Lambda functions Project
and File User State Observer)
•In case of “user log out “ Lambdas
clean up not assigned mobile upload
files (file level) to any project and not
purchased cart items (project level)
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Change log Pattern with DynamoDB Streams and Lambda
•Requirement to implement “change
log”for certain operations in the
Config and Support APIs:
1.Config API captures all configuration
account changes (i.e.how long
saved projects are stored)
2.SupportAPIcapturesallactions
performedbytheinternalsupport
team(i.e.saved project access for
repair, or temporarily owner
change)
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Change log Pattern with DynamoDB Streams and Lambda
•Use DynamoDB Stream in
combination with filters (via
structures called FilterCriteria) to
control what events to send to
what Lambda function
•You can control the batch size
when processing your DynamoDB
Stream
•Storageall “changelog” data in the
single DynamoDB table
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Scheduled Events with CloudWatch Event Rules and
different event-driven patterns with EventBridge and SNS
1.Finding expired saved projects, cart
items, mobile upload long term
images, handling their deletion and
handling expiration notification (via
SNS Topic, see “handling expiration
notification”)
2.Triggering storage calculation for
accounting purpose on per
account base via EventBridgewith
2 Lambda function targets: for the
file and project level
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Handling Expiration Notifications with SNS to multiple
SQS queues fan-out pattern
•Our central systems are handling
email delivery for their customer
•SNS-SQS fan-out pattern with SNS
message filtering rule based on the
customer group number
•Each central system subscribes to
its individual SQS queuecontaining
expiration notification messages for
its customer group
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Migration API
Migrate existing projects from the
legacy system to the new file
storage.
Type of migration:
•Batch mode
•On demand
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Different event-driven patterns put together in
Migration API
Solution:
•Export project metadata to S3 (storage-first)
•Sent notification on S3 “object created” event
to “Migration SQS queue”
•Implemented Lambda message batching
from SQS queue
•Implemented partial batch responses
•Implemented the DynamoDB
BatchWriteItemoperation A single call to
BatchWriteItemcan transmit up to 16MBof
data over the network, consisting of up to 25
item put or delete operations
Firdaws Aboulaye & Vadym Kazulkin ip.labs
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CrossAccountSubscriptions.htmlphotos/a-look-across-the-landscape-with-view-of-the-sea
https://aws.amazon.com/about-aws/whats-new/2023/05/amazon-kinesis-data-firehose-document-id-amazon-opensearch-service/
Cross Account Logging with CloudWatch Log, Kinesis
Data Firehose and OpenSearch services
•Implemented as CloudWatch
Logs cross-account subscription
•First implemented with Kinesis
Data Streams
•Then switched to(Kinesis)Data
Firehose as it started to
support Amazon OpenSearch
Service as a destination
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Challenges and lessons learned
Architect with AWS Serverless quotas and technical concepts in mind
General architectural decisionsAurora (Serverless) vs DynamoDB
Challenging to trace all requests
Usage of X-Ray add another complexity on
the code
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
When to use SQS, SNS, Kinesis and EventBridge
https://www.serverlessguru.com/tips/sqs-vs-sns-vs-kinesis-vs-eventbridge
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Service Quotas
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Service Quotas
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
account and current region limits
Service Quotas
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Service Quotas
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Service Quotas Request History
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
https://aws.amazon.com/de/blogs/compute/building-well-architected-serverless-applications-controlling-serverless-api-access-part-2/
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-request-throttling.html
•The throttle ratethen determines how
many requests are allowed per second
•The throttle burstdetermines how many
additional requests are allowed per
second
API Gateway throttling-related settings are
applied in the following order:
•Per-client or per-method throttling limits
that you set for an API stage in a usage
plan
•Per-method throttling limits that you set
for an API stage
•Account-level throttling per Region
•AWS Regional throttling
Token bucket algorithm
API Gateway Token Bucket Algorithm
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Quota Description Value Adjustable
Default
throughput /
Throttle rate
The maximum number of requests per
second that your APIs can receive
10.000
Throttle burst rateThe maximum number of additional
requests per second that you can send in
one burst
5.000
API Gateway Important Service Quotas
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Quota Description ValueAdjustableMitigation
Max timeoutThe maximum integration
timeout in milliseconds
29 sec 1) Increase the limit
2) Lambda Function URL
with response streaming
API Payload
size
Maximum payload size for
non WebSocket API
10 MB 1)The client makes an
HTTP GET request to API
Gateway, and the
Lambda function
generates and returns a
presignedS3 URL
2)The client uploads the
image to S3 directly,
using the resigned S3
URL
API Gateway Important Service Quotas
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Challenges and lessons learned
Architect with AWS Serverless quotas and technical concepts in mind
General architectural decisionsAurora (Serverless) vs DynamoDB
Challenging to trace all requests
Usage of X-Ray add another complexity on
the code
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
DynamoDB DynamoDB + DAX
Investment in
Knowledge
•Understanding of
NoSQL databases
•Understanding of
single-table design
principles
Same
Requires to put
Lambda into the VPC
to access the database
DynamoDB
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Aurora (Serverless v2)
+ RDS Proxy
Aurora (Serverless v2)
+ Data API
Investment in
Knowledge
Relational databases are
familiar to many developers
Same
Engine Support MySQL and PostgreSQL Currently only PostgreSQL
Requires to put
Lambda into the
VPC to access the
database
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.Aurora_Fea_Regions_DB-eng.Feature.ServerlessV2.html
https://dev.to/aws-builders/data-api-for-amazon-aurora-serverless-v2-with-aws-sdk-for-java-part-1-introduction-and-set-up-of-the-sample-application-3g71
Aurora (Serverless v2 )
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Challenges and lessons learned
Architect with AWS Serverless quotas and technical concepts in mind
General architectural decisionsAurora (Serverless) vs DynamoDB
Challenging to trace all requests
Usage of X-Ray add another complexity on
the code
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Current architecture our image storage solution
File storage modules:
•File API
•Project API
•Ordering API
•Config API
•Support API
•Scheduler Jobs
•Reporting API
•Migration API
•…
Firdaws Aboulaye & Vadym Kazulkin ip.labs
Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Serverless reduces the need for (readily
available) ops skills but increases the
demand for (less readily available)
distributed system design skills.
https://architectelevator.com/cloud/serverless-illusion/
Serverless challenges