Fast and Deterministic Full Table Scans at Scale by Felipe Cardeneti Mendes
ScyllaDB
0 views
15 slides
Oct 09, 2025
Slide 1 of 15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
About This Presentation
ScyllaDB’s new tablet replication algorithm replaces static vNodes with dynamic, elastic data distribution that adapts to shifting workloads. This talk discusses how tablets enable fast, predictable full table scans by keeping operations shard-local, balancing load automatically, and scaling linea...
ScyllaDB’s new tablet replication algorithm replaces static vNodes with dynamic, elastic data distribution that adapts to shifting workloads. This talk discusses how tablets enable fast, predictable full table scans by keeping operations shard-local, balancing load automatically, and scaling linearly through a simple layer of indirection.
Size: 1.99 MB
Language: en
Added: Oct 09, 2025
Slides: 15 pages
Slide Content
A ScyllaDB Community
Fast and Deterministic Full
Table Scans at Scale
Felipe Cardeneti Mendes
Technical Director
Felipe Cardeneti Mendes (he/him)
Technical Director at ScyllaDB
■Vibe-coded the simulator for this preso
■Predictability is important
■Previous speaker on Caching :-)
■Father of a 3yo toddler (puppy)
Full Scans
Wrong Way: Single Query
■Single coordinator
■Limited parallelism
■Doesn't exploit concurrency
SELECT * FROM tbl WHERE <...>
Coordinator work
vNodes Way: Non-deterministic
Which value to pick for :X and :Y?
■Between -2^63 to +2^63 - 1
●Traverse it → done.
■Large token boundary
●Fewer requests, more data
●Higher coordination
■Smaller token boundary
●More requests, fewer data
●Less coordination
SELECT * FROM tbl WHERE token(key) >= :X AND
token(key) < :Y
vNodes Way: system.size_estimates
■Per-table estimate of replica owned vNodes
■Non-deterministic
●Prone to estimate skews
●Range size is partitionsCount * meanPartitionSize
■Used by tools like Spark and Trino
●Manual splitSize tuning
■Divides rangeSize into subranges
Token Ranges
Estimates aren't enough
■Sparse tables waste work
●Most ranges contain little or no data
> SELECT (...) FROM size_estimates WHERE keyspace_name='k' AND table_name='t'
table_name | range_start | range_end | mean_partition_size | partitions_count
-----------+----------------------+----------------------+---------------------+------------------
t | -1078580004477237357 | -966880618140703446 | 2048 | 1250
t | -1085604003861837200 | -1078580004477237357 | 32768 | 156
t | -1117853426191590572 | -1085604003861837200 | 1024 | 3200
t | -1144611754287717220 | -1117853426191590572 | 1048576 | 1000000
t | -1198247555952586603 | -1144611754287717220 | 128 | 10
(...)
■High-density tables require fine-tuning
●Prone to uneven load and contention
Tablets
Recap
A
C
B
C
A
B
■Single unit of replication in ScyllaDB
■Abstraction: Smaller table "fragments"
■Span a contiguous token range
■Dynamically shrink/expand (geometric avg size)
system.tablets – A layer of indirection
■Tablets table
○Each (table, tablet) has its own token range → (node, shard) mapping
○Mapping can change independently of node addition and removal
■A new node can be added without any owned data!
○Different tables can have different tablet counts
○Tablet counts change, tablet size on disk remains relatively constant
system.tablets
Query
Replica
Set
Token
> SELECT … FROM system.tablets WHERE …;
Revisited (and Simplified) Logic
■Clients parse system.tablets (retrieve existing tablet mapping)
■Tablets spanning the same replica-shards get grouped and split together
■Workers set a routingKey for requests
SELECT * FROM tbl WHERE token(key) >= :X AND
token(key) < :Y
Summary
■Vibe-coded simulator: github.com/fee-mendes/tablet-fullscans
■Assumes homogeneous topologies
●Mixed-shard clusters left as an exercise :)
■Mimics jitter (sleeps)
■Relatively constant time
Thank you! Let’s connect.
Felipe Mendes [email protected]
@felipemendes.dev
scylladb.com