Scaling to 6.6M Read OPS with ScyllaDB on Kubernetes: Achieving Sub-2ms Latency and Robust Recovery by Shubham Sharma
ScyllaDB
0 views
19 slides
Oct 15, 2025
Slide 1 of 19
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
About This Presentation
Learn how we achieved 6.6M read OPS with sub-2ms latency on a Single ScyllaDB cluster in Kubernetes, optimizing machine types, shard-aware porting, and backup/recovery. We’ll cover how shard-aware drivers and ScyllaDB’s shard-per-core model cut latency to ~900 µs, and how we tuned machine types...
Learn how we achieved 6.6M read OPS with sub-2ms latency on a Single ScyllaDB cluster in Kubernetes, optimizing machine types, shard-aware porting, and backup/recovery. We’ll cover how shard-aware drivers and ScyllaDB’s shard-per-core model cut latency to ~900 µs, and how we tuned machine types across Intel, AMD, and Google Axion hardware. The talk also details our GKE deployment with CPU pinning, host networking, and NVMe storage.
Size: 1.54 MB
Language: en
Added: Oct 15, 2025
Slides: 19 pages
Slide Content
A ScyllaDB Community
Scaling to 6.6M Read OPS
With ScyllaDB on Kubernetes:
Achieving Sub- 2ms Latency
Shubham Sharma
Staff Systems Engineer
Shubham Sharma
Staff Systems Engineer at Verve
■Scaled a ScyllaDB cluster on Kubernetes to hit 6.6M
reads/sec with P99 latency under 2ms, powering two
real-time apps—one in Java, one in Go.
■P99s are the real truth serum for performance—they expose
tail latency that averages try to hide.
■I love optimising systems until they’re almost too fast for the
dashboards to keep up.
■Experiment with new tech, explore the mountains.
Use Case: Audience Enrichment
■Real-time matching & retrieval of audience profiles
■Enables personalised ads & recommendations
■Handles huge request volumes with predictable latency
Flow
Once the audience profile is retrieved, the system can:
■Match the profile against campaign targeting rules (ads).Dataset: ~7 TB
■Generate personalized content recommendations in real time.
How ScyllaDB helps:
■Supports time-series/event data (recent clicks, searches, purchases).
■Can quickly filter and match profiles against advertiser/ML models.
The Challenge
■Need: Real-time data streaming with sub-millisecond latency
■Dataset: ~7 TB
■Requirement: Handle millions of read OPS reliably
■Use Case: Audience enrichment at scale
Production Environment Setup
■Platform: GKE v1.31
■Dedicated Node Pool: n2-highmem-64 (64 vCPUs, 480 GB RAM)
■All data in-memory → no cache misses
■3 racks, RF=3, NetworkTopologyStrategy for multi-DC efficiency
■Local NVMe SSD storage for low latency
Machine Type Chosen: n2-highmem-64
Best balance of cost & speed
Provides strong compute performance without overspending.
Efficient for always-on, latency-sensitive database clusters.
High memory (512 GB) Allows complete dataset caching in memory, reducing disk
reads. Critical for ScyllaDB, where in-memory operations significantly improve
throughput.
○
Performance Observations
Local SSD with ~680k read IOPS
Handles extremely high read/write rates needed by Scylla.
Ensures predictable low-latency for large-scale OLTP workloads.
Scales horizontally
n2-highmem family fits well into Scylla’s shard-per-core architecture. Easy to
expand cluster capacity while maintaining performance.
Key Optimizations
■Shard-aware Java & Go drivers
■CPU pinning & host networking
■Prometheus + Grafana for real-time monitoring
■Tuned Kubernetes deployment via ScyllaDB Operator
The Achievement
■Nearly 11M read OPS sustained
■P99 latency under 2 ms (as low as ~900µs)
■Powered two applications (Java & Go clients)
Closing
■Strategies to maximize ScyllaDB in cloud-native environments
■Backed by real-world benchmarks and production data
■Future scaling potential with cost efficiency in mind