Scaling to 6.6M Read OPS with ScyllaDB on Kubernetes: Achieving Sub-2ms Latency and Robust Recovery by Shubham Sharma

ScyllaDB 0 views 19 slides Oct 15, 2025

Slide 1 of 19

About This Presentation

Learn how we achieved 6.6M read OPS with sub-2ms latency on a Single ScyllaDB cluster in Kubernetes, optimizing machine types, shard-aware porting, and backup/recovery. We’ll cover how shard-aware drivers and ScyllaDB’s shard-per-core model cut latency to ~900 µs, and how we tuned machine types...

Size: 1.54 MB

Language: en

Added: Oct 15, 2025

Slides: 19 pages

Slide Content

A ScyllaDB Community
Scaling to 6.6M Read OPS
With ScyllaDB on Kubernetes:
Achieving Sub- 2ms Latency
Shubham Sharma
Staff Systems Engineer

Shubham Sharma

Staff Systems Engineer at Verve
■Scaled a ScyllaDB cluster on Kubernetes to hit 6.6M
reads/sec with P99 latency under 2ms, powering two
real-time apps—one in Java, one in Go.
■P99s are the real truth serum for performance—they expose
tail latency that averages try to hide.
■I love optimising systems until they’re almost too fast for the
dashboards to keep up.
■Experiment with new tech, explore the mountains.

Use Case: Audience Enrichment
■Real-time matching & retrieval of audience proﬁles
■Enables personalised ads & recommendations
■Handles huge request volumes with predictable latency

Flow

Once the audience proﬁle is retrieved, the system can:
■Match the proﬁle against campaign targeting rules (ads).Dataset: ~7 TB
■Generate personalized content recommendations in real time.

How ScyllaDB helps:
■Supports time-series/event data (recent clicks, searches, purchases).
■Can quickly ﬁlter and match proﬁles against advertiser/ML models.

The Challenge

■Need: Real-time data streaming with sub-millisecond latency
■Dataset: ~7 TB
■Requirement: Handle millions of read OPS reliably
■Use Case: Audience enrichment at scale

Production Environment Setup

■Platform: GKE v1.31
■Dedicated Node Pool: n2-highmem-64 (64 vCPUs, 480 GB RAM)
■All data in-memory → no cache misses
■3 racks, RF=3, NetworkTopologyStrategy for multi-DC eﬃciency
■Local NVMe SSD storage for low latency

Rack Structure
- name: "us-east4-a-2"
# Number of rack members (nodes)
members: 9
# Storage definition
storage:
storageClassName: scylladb-local-xfs
capacity: 2000G
# Scylla container resource definition
resources:
limits:
cpu: 60
memory: 480G
requests:
cpu: 60
memory: 480G

Why NetworkTopologyStrategy

■Ensures reads are served from nodes in the same region
■Optimised for multi-data-center approach
■Improved regional latency and redundancy

Testing Across Machine Types

■n2-highmem-64: 64 vCPUs, 512 GB RAM, Intel Ice Lake
■c3d-standard-90-lssd: 90 vCPUs, 360 GB RAM, AMD Genoa
■c4a-highmem-64-lssd: 64 vCPUs, 512 GB RAM, Google Axion
■z3-highmem-88: 88 vCPUs, 704 GB RAM, Intel Xeon

Instance Configurations

Performance Results

Performance Observations

Machine Type Chosen: n2-highmem-64
Best balance of cost & speed
Provides strong compute performance without overspending.
Eﬃcient for always-on, latency-sensitive database clusters.
High memory (512 GB) Allows complete dataset caching in memory, reducing disk
reads. Critical for ScyllaDB, where in-memory operations signiﬁcantly improve
throughput.
○

Performance Observations

Local SSD with ~680k read IOPS
Handles extremely high read/write rates needed by Scylla.
Ensures predictable low-latency for large-scale OLTP workloads.
Scales horizontally
n2-highmem family ﬁts well into Scylla’s shard-per-core architecture. Easy to
expand cluster capacity while maintaining performance.

Key Optimizations
■Shard-aware Java & Go drivers
■CPU pinning & host networking
■Prometheus + Grafana for real-time monitoring
■Tuned Kubernetes deployment via ScyllaDB Operator

Lessons Learned

■Choose vCPU-to-memory ratio wisely
■In-memory dataset eliminates cache misses
■Network topology-aware replication boosts multi-region performance

The Achievement
■Nearly 11M read OPS sustained
■P99 latency under 2 ms (as low as ~900µs)
■Powered two applications (Java & Go clients)

Closing

■Strategies to maximize ScyllaDB in cloud-native environments
■Backed by real-world benchmarks and production data
■Future scaling potential with cost eﬃciency in mind

Scaling to 6.6M Read OPS with ScyllaDB on Kubernetes: Achieving Sub-2ms Latency and Robust Recovery by Shubham Sharma

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Scaling to 6.6M Read OPS with ScyllaDB on Kubernetes: Achieving Sub-2ms Latency and Robust Recovery by Shubham Sharma

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx