Scaling to 6.6M Read OPS with ScyllaDB on Kubernetes: Achieving Sub-2ms Latency and Robust Recovery by Shubham Sharma

ScyllaDB 0 views 19 slides Oct 15, 2025
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

Learn how we achieved 6.6M read OPS with sub-2ms latency on a Single ScyllaDB cluster in Kubernetes, optimizing machine types, shard-aware porting, and backup/recovery. We’ll cover how shard-aware drivers and ScyllaDB’s shard-per-core model cut latency to ~900 µs, and how we tuned machine types...


Slide Content

A ScyllaDB Community
Scaling to 6.6M Read OPS
With ScyllaDB on Kubernetes:
Achieving Sub- 2ms Latency
Shubham Sharma
Staff Systems Engineer

Shubham Sharma

Staff Systems Engineer at Verve
■Scaled a ScyllaDB cluster on Kubernetes to hit 6.6M
reads/sec with P99 latency under 2ms, powering two
real-time apps—one in Java, one in Go.
■P99s are the real truth serum for performance—they expose
tail latency that averages try to hide.
■I love optimising systems until they’re almost too fast for the
dashboards to keep up.
■Experiment with new tech, explore the mountains.

Use Case: Audience Enrichment
■Real-time matching & retrieval of audience profiles
■Enables personalised ads & recommendations
■Handles huge request volumes with predictable latency

Flow

Once the audience profile is retrieved, the system can:
■Match the profile against campaign targeting rules (ads).Dataset: ~7 TB
■Generate personalized content recommendations in real time.

How ScyllaDB helps:
■Supports time-series/event data (recent clicks, searches, purchases).
■Can quickly filter and match profiles against advertiser/ML models.

The Challenge

■Need: Real-time data streaming with sub-millisecond latency
■Dataset: ~7 TB
■Requirement: Handle millions of read OPS reliably
■Use Case: Audience enrichment at scale

Production Environment Setup

■Platform: GKE v1.31
■Dedicated Node Pool: n2-highmem-64 (64 vCPUs, 480 GB RAM)
■All data in-memory → no cache misses
■3 racks, RF=3, NetworkTopologyStrategy for multi-DC efficiency
■Local NVMe SSD storage for low latency

Rack Structure
- name: "us-east4-a-2"
# Number of rack members (nodes)
members: 9
# Storage definition
storage:
storageClassName: scylladb-local-xfs
capacity: 2000G
# Scylla container resource definition
resources:
limits:
cpu: 60
memory: 480G
requests:
cpu: 60
memory: 480G

Why NetworkTopologyStrategy

■Ensures reads are served from nodes in the same region
■Optimised for multi-data-center approach
■Improved regional latency and redundancy

Testing Across Machine Types

■n2-highmem-64: 64 vCPUs, 512 GB RAM, Intel Ice Lake
■c3d-standard-90-lssd: 90 vCPUs, 360 GB RAM, AMD Genoa
■c4a-highmem-64-lssd: 64 vCPUs, 512 GB RAM, Google Axion
■z3-highmem-88: 88 vCPUs, 704 GB RAM, Intel Xeon

Instance Configurations

Performance Results

Performance Observations

Machine Type Chosen: n2-highmem-64
Best balance of cost & speed
Provides strong compute performance without overspending.
Efficient for always-on, latency-sensitive database clusters.
High memory (512 GB) Allows complete dataset caching in memory, reducing disk
reads. Critical for ScyllaDB, where in-memory operations significantly improve
throughput.

Performance Observations

Local SSD with ~680k read IOPS
Handles extremely high read/write rates needed by Scylla.
Ensures predictable low-latency for large-scale OLTP workloads.
Scales horizontally
n2-highmem family fits well into Scylla’s shard-per-core architecture. Easy to
expand cluster capacity while maintaining performance.

Key Optimizations
■Shard-aware Java & Go drivers
■CPU pinning & host networking
■Prometheus + Grafana for real-time monitoring
■Tuned Kubernetes deployment via ScyllaDB Operator

Lessons Learned

■Choose vCPU-to-memory ratio wisely
■In-memory dataset eliminates cache misses
■Network topology-aware replication boosts multi-region performance

The Achievement
■Nearly 11M read OPS sustained
■P99 latency under 2 ms (as low as ~900µs)
■Powered two applications (Java & Go clients)

Closing


■Strategies to maximize ScyllaDB in cloud-native environments
■Backed by real-world benchmarks and production data
■Future scaling potential with cost efficiency in mind

Thank you! Let’s connect.
Shubham Sharma
[email protected]
Tags