How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham

ScyllaDB 279 views 23 slides Mar 12, 2025

Slide 1 of 23

About This Presentation

Learn about Agoda's performance tuning strategies for ScyllaDB. Worakarn shares how they optimized disk performance, fine-tuned compaction strategies, and adjusted SSTable settings to match their workload for peak efficiency.

Size: 2.04 MB

Language: en

Added: Mar 12, 2025

Slides: 23 pages

Slide Content

A ScyllaDB Community
How Agoda Scaled 50x
Throughput with ScyllaDB
Worakarn Isaratham
Lead Software Engineer

Worakarn Isaratham (he/him)
■Lead Software Engineer, Agoda
■Based in Bangkok, Thailand
■Experience in distributed computing,
software testing
■Interested in dependable software systems

■ScyllaDB in Agoda Feature Store
■Capacity Problem
■Potential Solutions
Presentation Agenda

Agoda Feature Store

Online Feature Serving
Client SDK
Cache
ScyllaDB
App Servers
3.5M EPS 1.7M EPS
200k EPS
P99 Latency: 5 msP99 Latency: 8 ms
Average 5 features / entities

Growth
Since the start of 2023
■Servers traﬃc: 50x
Peak servers traﬃc, on the busiest DC

Growth
Since the start of 2023
■Servers traﬃc: 50x
■ScyllaDB traﬃc: 10x
10K EPS
Peak ScyllaDB traﬃc, on the busiest DC

A Capacity Problem
■A new use case wanted to onboard
■Problematic usage pattern:
■Bursty traﬃc from cold cache, hitting ScyllaDB at 120K EPS.
■Many duplicated requests in very quick succession
■Keep retrying any failed requests
12x of the load then
2x of the load now!

A Capacity Problem
■One DC was able to survive this load
without errors.
■The other DC got lots of problems
■Very high error rate
■Took 40 minutes to ﬁnish all
the retries
■Metrics were pointing to slow
read on ScyllaDB nodes

Slow Disks
Bad DC Good DC Advantage
Disks SATA SSD
RAID 0
NVMe SSD
RAID 0

Read iops 6868 79566 11.6x
Read
bandwidth
1.5G 10.1G 6.7x
Write iops 6615 41104 6.2x
Write
bandwidth
1.9G 6.3G 3.3x

Just Buy New Disks?
●New disks were ordered
●Improved user-side caching, reduced
this load to 7K.
●How long could we survive?
Capacity

Cache-Avoiding Load Test
■Use artiﬁcial, one-time-used load to avoid ScyllaDB caching.
25K 5K
Normal load
ScyllaDB cache
one-time-used entities
BYPASS CACHE
Flush, Restart ScyllaDB
Baseline EPS for SATA

Idea 1: Different Data Modeling
Current: one tall table
Alternative: one table per feature set

Idea 1: Different Data Modeling

Idea 2: Change Compaction Strategy
■Our workload is “Read-mostly, many updates”. Size-tiered strategy is recommended.
Prioritized read latency
Slow disk read
Large SSTable ﬁles
Size-tiered
Compaction
Leveled
Compaction

Idea 2: Change Compaction Strategy
1.5x

Idea 3: Increase Summary File Size
■ScyllaDB uses summary ﬁles to help navigate to index ﬁles
summary ﬁle size ≈ data ﬁle size × summary ratio
High ratio
Larger
summary
More
eﬃcient
index
Less disk I/O

Idea 3: Increase Summary File Size
4x

NVMe
60x

Rollout
Jul 2023
New summary ratio applied
Oct 2023
Migrated to NVMe disks
Focus shifted to other components.
Still trying out some new ideas on ScyllaDB.
Leveled Compaction:
Only applied to new table,
need data migration

Recent Experiments
●Partitioned By Feature Set, clustered by Entity
○Disastrous! 400x worse
●All features as a blob in a single row
○+35% throughput

Lessons
●Fast disks are essential!
●Benchmark your load
●Tailor your data model to ﬁt the needs

Stay in Touch
Worakarn Isaratham
[email protected]
github.com/arkorwan
www.linkedin.com/in/worakarn

How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......