Architecture for Extreme Scale by Avi Kivity

ScyllaDB 235 views 33 slides Mar 05, 2025

Slide 1 of 33

About This Presentation

CTO Avi Kivity shares how scalability is core to ScyllaDB's architecture.

Size: 1.88 MB

Language: en

Added: Mar 05, 2025

Slides: 33 pages

Slide Content

A ScyllaDB Community
Architecture for Extreme Scale:
Scalability is a Feature
Avi Kivity
CTO

Avi Kivity
■Linux KVM
■First line written
■Seastar
■First line written
■ScyllaDB
■First line written

■What is Scalability?
■Case Studies

Presentation Agenda

What is Scalability?

Scalability - deﬁnition
■The property of a program whose throughput is linear with the amount of
computing resources dedicated to it

■Scalability does not imply fast single-node performance, only that as you
increase resources, so does performance

Practical scalability
■The property of a program whose throughput is near-linear with the
amount of computing resources dedicated to it, up to a reasonable limit,
and for targeted workloads

■Scalability does not imply fast single-node performance, only that as you
increase resources, so does performance cannot sacriﬁce low node
count performance, or it becomes impractical. A practical system must
deliver good performance in a three-node cluster.

Horizontal and Scalability
■Horizontal: Scale with the number of nodes in the system
■Vertical: Scale with the size (compute power) of individual nodes

■We need both!
Vertical

Why Horizontal
■Eventually any node size is limited
■Needed for geographical distribution
■Needed for availability and fault tolerance

Why Vertical
■With small nodes, overhead quickly dominates
■1000 nodes with 3-year MTBF -> one failure per day
■Upgrade hell

Why both?
■Work for horizontal scaling beneﬁts vertical
scaling (thread-per-core)
■Reasonable cluster sizes reduce maintenance
■Beneﬁt from geographical distribution

■100x 120 TB = 12 PB

■Should be enough for anybody

ScyllaDB Goals
■Fast algorithms and implementation
■Good scalability
■Practical tradeoffs

Fun fact
■The name “Scylla” was chosen to evoke “Scale”
■Then we learned how it’s pronounced

Practical scalability goals
■100 nodes/datacenter
■100 nodes, 60 TB/node -> 6PB/datacenter
■Thousand-node clusters with 1TB/node are skewed
■Terrible for maintenance
■256 vcpu / TB-RAM class machines
■Grow with hardware evolution

Components of Performance
■Good algorithms and data structures
■Scalability across the cluster
■Scalability within a node
■Eﬃcient implementation

Seastar and thread-per-core
■Clustered application already solves compute and data distribution
across nodes
■Re-apply the solution towards compute and data distribution within
cores
■Sacriﬁces some eﬃciency for high utilization

Memtable/cache sizing
■How much memory to allocate to memtables?
■How much memory to allocate to cache?
■How much memory to leave free?

Log-Structured Merge Tree Tiers
■Write ampliﬁcation is O(log N)
■But what is N?
■
■Size of largest sstable: ~ disk size
■Size of smallest sstable: ~ memtable size

■To reduce write ampliﬁcation, we must have the largest memtable size
we can

Memtable size allocation
■Allocate half of the machine’s memory to memtables
■The other half for cache and “incidentals”

Why half and half?
■Sensitivity analysis
■At 50%/50%, a 5% change has the same impact on both sides
■At 90%/10%, a 5% change has a drastic impact on the smaller side

Global index implementation choices
Solution review
■Cassandra secondary index
■ScyllaDB Global Index
■Cassandra Storage Attached Index (SAI/SASI)

Cassandra Secondary Index
■Each node indexes its own data
■Index is stored in an index table
■Invalidated/rebuilt/cleaned on node
bootstrap
■Work scales with number of nodes

Index
Index
Index
Index
Index
Index

ScyllaDB Global Index
■Use materialized view to store index
■Impervious to node bootstrap
■Additional hops to access data
■Work does not scale with cluster size
■Low cardinality indexed columns cause
large/hot partitions
Data Index

Cassandra SAI/SASI
■Each sstable indexes its own data
■Supports ranges and complex queries
in addition to simple matches
■Index rebuilt during compaction
■Much more memory/CPU eﬃcient
■Work scales with number of sstables
in the cluster
sstable
Index
sstable
Index
sstable
Index

Indexing comparison
Index type Antiscaling with Side effects Additional beneﬁts
Local index (Cassandra) Nodes Rebuild/cleanup
Local index (ScyllaDB) Shards Rebuild/cleanup
Global index - Materialized views
SAI/SASI sstables None Range queries/free text

Vector Indexes
■No partition key can be extracted from the vector
■HNSW computational complexity is O(log N)
■Partitioned index ->
■Antiscaling!

Vector Index - Solution
Data
■Vertical scaling - use few large nodes
■Prefer increasing the node size to adding nodes
■Prefer duplicating data to partitioning
■Asymmetric architecture
■Bonus: use GPU/TPU/FPGA
Index

Indexing comparison (with vectors)
Index type Antiscaling with Side effects Additional beneﬁts
Local index (Cassandra) Nodes Rebuild/cleanup
Local index (ScyllaDB) Shards Rebuild/cleanup
Global index - Materialized views
SAI/SASI sstables None Range queries/free text
Asymmetric Indexing nodes Increased coordination and
orchestration
GPU/TPU/FPGA

Change data capture streams
■Similar to indexes (and materialized views)
■A copy of the data, but with a different key
■CDC = time keyed
■How many streams?
■Streams require user effort
■Keep track of last-consumed entry

How many streams?
■One stream?
■A node becomes a hotspot
■Hard to synchronize
■One stream per node?
■A shard within a node becomes a hotspot?
■One stream per shard!

One stream per shard
■Number of shards in the cluster can change
■Need to keep pace and close/start new streams
■Need meta-stream to update about those changes

■Scalability considerations generates most of the work for this feature

Conclusions
■Scalability considerations permeates everything we do
■May sacriﬁce some eﬃciency to avoid breaking down at high scale
■But must also be competitive at low scale

Stay in Touch
Your First and Last Name
[email protected]
@AviKivity
@avikivity
Who uses it?

Architecture for Extreme Scale by Avi Kivity

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Architecture for Extreme Scale by Avi Kivity

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......