Shared Nothing Databases at Scale by Nick Van Wiggeren

ScyllaDB 0 views 24 slides Oct 14, 2025
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

This talk will discuss how PlanetScale scaled databases in the cloud, focusing on a shared-nothing architecture that is built around expecting failure. Nick will go into how to deliver low-latency high-throughput systems that span multiple nodes, availability zones, and regions, while maintaining su...


Slide Content

A ScyllaDB Community
Shared Nothing
Databases at Scale
Nick Van Wiggeren
CTO

Nick Van Wiggeren

CTO @ PlanetScale
■Based in Seattle, WA
■I spend a lot of my day thinking about tail latency
■Outside of work, I like getting as far away from
computers as possible

Vitess & PlanetScale

Vitess: Scale-out MySQL
Vitess is a CNCF-graduated project that scales MySQL. With Vitess, an application
can speak to tens of thousands of MySQL instances with a single connection
string.
■Built for OLTP workloads, designed for low-latency and high availability
■Fully MySQL compatible
■Improves on MySQL with transparent query rewriting, caching, connection
pooling and much more

Vitess Architecture

PlanetScale: Applied Vitess
PlanetScale runs petabyte-scale Vitess clusters in AWS and GCP for some of the
web’s largest businesses.
■Individual clusters running at over 5M QPS peak with <3ms p99
●Sustained throughput of 1,000,000s of IOPS and 10s of GB/s of file system throughput
■We execute 1000s of MySQL failovers every day
■This puts us in a unique position of being able to talk about what it takes to
run workloads at scale in the cloud

How?

Shared Nothing

What is “shared nothing?”
Shared nothing is the only way to scale beyond a single machine
■Build self-contained units of scale instead of systems of coordination
■Modern consensus algorithms are great, but they can’t beat the speed of light
■Linear scalability and consensus are like oil and water - it’s not possible to
achieve one and keep the other
■When state needs to be shared, keep it isolated, fault tolerant, and off the
query path

What is “shared nothing?”
Do one tiny thing really well, thousands and thousands of times
■A write operation follows a simple path:
●Client -> VTGate -> MySQL Primary -> Disk (+ replicas)
●No consensus, no distributed commits, no cross-node coordination
■At scale, this means:
●10,000 shards = 10,000 independent failure domains
●10,000 shards = 10,000 parallel write paths
●10,000 shards = 1 database that performs like 10,000 databases

What is “shared nothing?” in Vitess
Build thousands of individual MySQL clusters, not one single huge one
■Every Vitess shard is an isolated MySQL cluster, with its own replication
topology and ability to independently failover
■If a query reads or writes data from an individual shard, only that shard needs
to be available to serve the query.
■If a query reads data from multiple shards, only those involved are contacted

What is “shared nothing?” in Vitess
Deploy horizontally scalable query planners + routers, not a monolithic service
■VTGate reads from the “topology” server (etcd, consul, etc) and builds a view
of the world
■If the system doesn’t change, it can continue to operate without topology
updates
■Horizontally scalable: you can run one, you can run thousands of them, it
scales as far as key/value lookups

Achieving “Shared Nothing”
In order to achieve this, you need to design your data just as much as you need to
pick the right system
■Data needs to be partitioned by a functional unit: user, organization, etc
■Use a separate system for summing and counting: HTAP isn’t real, and you’re
better using both the best OLTP and OLAP database

Lessons Learned

The Network is Slow
In a distributed system, it’s impossible to match the latency of someone running
MySQL on their laptop, and it’s really easy to do a lot worse
■AWS Latencies
●500 microseconds between two EC2 instances in the same zone
●1-2 milliseconds between two EC2 instances in the same region, different zones
■Every “hop” between your user and a write - whether it’s to move data or reach
consensus - is a cost.
■The network is variable: these are median numbers, the P99 has no SLA

Network-backed storage is harmful
In order to give our users the performance they demand, we had to abandon
network-attached storage
■AWS gp3 promises “single-digit millisecond latency” 99 percent of the time
●This means you can’t achieve a P99 of < 10ms if a gp3 volume is in your write path!
■You can pay 4-10x more for io2 and still get worse performance than a
handful of consumer SSDs

The Cloud is Unpredictable
It’s not just network and storage - it’s everything. In the datacenter, you can touch
your physical drives and know where every machine lives. In the cloud, assume
everything is ephemeral.
■Test your failover paths daily, and not just planned failovers
■Partial failure is a constant
●Are you prepared for an EBS volume to only achieve 10% of it’s provisioned throughput for 30
minutes?

Your P99 is a lie
Measuring percentiles in steady-state isn’t useful - over a year, your P99 is
dominated by failure scenarios
■Have you measured your P99:
●During peak load
●While orchestrating a failover
●At the same time as an unplanned maintenance event is happening in the cloud
■Measure from the client, not the server: just as much can go wrong
■The fastest P99s I’ve seen are when the database returned errors quickly

In Summary

Scaling Beyond a Single Machine is Hard
■Pick your technology wisely
●The wrong database will add complexity and give you no benefits
■Measure your steady-state P99, care about your edge cases

Thank you! Let’s connect.
Nick Van Wiggeren
[email protected]
@NickVanWig
Tags