Cassandra vs. ScyllaDB: Evolutionary Differences

ScyllaDB 0 views 55 slides Oct 17, 2025
Slide 1
Slide 1 of 55
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55

About This Presentation

Comparing design decisions, performance, and use case fit

Apache Cassandra and ScyllaDB are distributed databases designed to handle massive globally-distributed workloads. They were both created with a highly scalable architecture and both use the same CQL query language. But their priorities and ...


Slide Content

Felipe Mendes, Technical Director at ScyllaDB
Guilherme Nogueira, Technical Director at ScyllaDB

Cassandra vs. ScyllaDB
Evolutionary Differences

Introductions
Felipe Mendes, Technical Director at ScyllaDB
+Published Author on Linux and Databases
+Helps teams solve their most challenging problems
+Years of experience with Linux and distributed systems

Guilherme Nogueira, Technical Director at ScyllaDB
+Previously Solutions Architect
+Publishing
+Streaming
+Automotive

Poll
How experienced are you with ScyllaDB/Cassandra?

Similarities and
Differences

ScyllaDB & Cassandra – Similarities
Distributed Peer-to-Peer Automatic Sharding Global Replication Cassandra Query Language
Wide-Column Compatible Ecosystem Anti-Entropy LSM Engine

Beyond Cassandra
Tablets Raft
Workload
Prioritization
Repair-based
Operations
Incremental
Compaction
SSTable
SSTable
Low
Amplification
Row-based
Cache
DynamoDB
Compatibility
Intriguing ScyllaDB Capabilities You Might Have Overlooked
Concurrency and
Rate-limiters

Tablets
A
C
B
C
A
B
+Abstraction: Smaller table "fragments"
+Span a contiguous token range
+Dynamically shrink/expand (geometric avg size)
+Migrated as a single unit

RAFT for metadata
■Strongly consistent
system.token_metadata

node A
bootstrap
bootstrap
node B
node C
Read
barrier
Read
barrier
■Fault tolerant storage

Workload Prioritization
No prioritization Workload Prioritization

Repair? Tombstones? Data Resurrection?
+Worst things a database can do:
+Lose data
+Corrupt data
+Resurrect data
+Not a problem with ScyllaDB
+We take your data seriously
+We know repair is painful
Faster, Safer Node Operations with Repair vs Streaming

Beyond Cassandra
Tablets Raft
Workload
Prioritization
Repair-based
Operations
Incremental
Compaction
SSTable
SSTable
Low
Amplification
Row-based
Cache
DynamoDB
Compatibility
Intriguing ScyllaDB Capabilities You Might Have Overlooked
Concurrency and
Rate-limiters

Incremental Compaction
A
B
...
Z
a
b
...
z
A+a
B+b
Aa
Bb
A+a
B+b
+We observed problems with legacy compaction strategies:
+STCS has high space amplification (and low write amplification)
+LCS has high write amplification (and low space amplification)
+We wanted to benefit from both approaches
+By borrowing SSTable Runs from LCS
+And applying them over size-tiers
+Merely replacing
+increasingly larger SSTables with
+increasingly longer SSTable Runs
Designing Access Methods: The RUM Conjecture

+ScyllaDB has a fast cache
+Efficient access & maintenance
+Thanks to collocation with replica and design
+Takes care of consistency guarantees
+Handles complexities of data and query model
Row-based Cache
memtable
RAM
Disk
Read
cache
sstable
sstable
sstable
We Compared ScyllaDB and Memcached and… We Lost?

+Run DynamoDB-compatible
workloads anywhere:
+on AWS
+on Google Cloud, Azure, or
+On-prem
+DynamoDB Streams, Global Tables
+Supports Load Balancing
+ScyllaDB Spark Migrator to move
data anywhere
DynamoDB-compatible
API (Alternator)
+Cassandra has no comparable feature

Per-Partition Rate-Limiting
Retaining Goodput with Query Rate Limiting
■Malicious/misbehaving users
■Parts of your system going awry due to bugs

The system does not have to satisfy these requests, and they should not affect the whole system too much.
■A maximum read/write rate can be set for a table.
■ScyllaDB will reject some operations in an effort to
keep the rate of successful requests under the limit.


ALTER TABLE ks.tbl
WITH per_partition_rate_limit = {
'max_writes_per_second': 100,
'max_reads_per_second': 200
};

Poll
How large are your clusters?

Comparing Performance

Setup
+DB Nodes
+3x AWS i4i.4xlarge (Cassandra 5.0.2, ScyllaDB 2024.2)
+16vCPU, 128GB RAM per node
+1.5TB used (~45%),
+Schema: Blob(key<10>, c0<200>, c1<200>, c2<200>, c3<200>, c4<200>)
+RF=3
+LOCAL_QUORUM
+Loader
+AWS c6in.8xlarge – Rust Latte
+Implied scheduling – see (pkolaczk/latte#120)
cassandra_latest.yml

Where do we start?
+Multiple iterations/settings
+Pick the best and carry out remaining tests

Thread Pools
+Quite a pain to fine-tune
+Single funnel
+Writes and Reads won't scale independently

Cache Workload
+Key cache: 2G
+Row cache: 51G
+Bummer: To use or not? :-(

Hot/Cold Overwrites
+Hot set: 40M rows
+Cold set: Remainder
+64% hot reads, 16% hot writes – 16% cold reads, 4% cold writes

Scaling

+ScyllaDB – Decouples topology changes from streaming
+Add nodes with time ~ 0
+Streaming happens in parallel
+Load gradually shifts, via tablet-aware drivers
+Cassandra – topology rely on streaming
+You add/remove a single node, and wait
+Then another, and wait…
+Time grows incrementally
Differences

+Cassandra 4.0 docs say:
+"To run any Zero Copy streaming benchmark the
stream_throughput_outbound_megabits_per_sec must be set to a really high value"
+Cassandra 5 docs say:
+"To run any Zero Copy streaming benchmark the stream_throughput_outbound must be set to a
really high value"
+Instaclustr says:
+entire_sstable_stream_throughput_outbound!
+Hint: Instaclustr is correct.
What Happened?

... and then you’ve got to Cleanup
+Not needed for ScyllaDB
+Boom!

Bootstrap
Bootstrap Cleanup

Oh! By the way...
+Our Cassandra cluster got inconsistent :-(
+How to benchmark this?
+Fixed after a rolling restart
+Quite annoying

Demo time

●Starting from 3 x i4i.4xlarge
○2TB pre-replication dataset, RF=3
○~56K ops/s for Cassandra
○~200K ops/s for ScyllaDB
●Scaling to:
○ScyllaDB: + 3 x i4i.32xlarge
○Cassandra: + 69 x i4i.4xlarge
Scaling to tackle 2M ops/s

Bootstrap
Bootstrap Cleanup
Scaling to tackle 2M ops/s
Bootstrap
Bootstrap Cleanup
26x
faster

Bootstrap
Bootstrap Cleanup
Cassandra scaling time
< 300GB transferred,
becomes linear

Cassandra node join process

ScyllaDB Scaling
●Process starts instantly and joins the cluster
●Load balancer continuously distribute tablets and load
●Client drivers are notified and route request according to tablet's movement

Costs

+Throughput
+Spiky and bounded – Batch, ETL
+ScyllaDB offers unparalleled throughput
+Latency sensitive
+Focus of our testing – Real-time and unpredictable
+ScyllaDB reacts faster to opportunities
+Storage dense – Tablets + Advanced Compression allow for up to 90% disk utilization
+Dictionary-based compression
+Data governance / Retention requirements
+ScyllaDB maximizes both disks and cache
Different savings for different workloads

Run df on your
Cassandra nodes
for a SURPRISE

Observability

Cassandra - DIY, Community or 3rd Party
+No centralized offer for monitoring
+Community versions available (Metrics Collector for Apache Cassandra)
+Lacks 5.0 support
+Each team comes up with their own set of dashboards and alerts
+JMX complexity
+Newer versions expose some metrics in a Prometheus friendly format bypassing JMX
+Still largely needed for Java metrics, maintenance operations
+Monitoring-as-a-Service varies in coverage and detail
+Datadog, AxiomOps, Dynatrace, New Relic

ScyllaDB - Out of the box monitoring
+Easy, out of the box with Scylla Monitoring stack
+Prometheus + collection rules and alerts
+Loki + alerts
+Alert Manager + alerting rules
+Grafana + powerful dashboards
+New release = new features = new dashboards
+Stay up-to-date for the latest and greatest

Poll
How do you monitor your clusters?
Which observability tools you use, whether custom-built or 3rd party

Operational
Simplicity

Cassandra setup and tuning
+sysctl
+ulimit | hugepages | kernel params
+disks
+scheduler | read_ahead_kb | non-rotational
+JVM ☕
+Java version | GC tuning | heap size
+cassandra.yaml
+main configuration file

ScyllaDB setup and tuning
+sysctl ✅ automatic
+scylla_setup
+memory, scheduling, network
+disks parameters ✅ automatic
+iotune, part of scylla_setup
+best concurrency settings to maximize disk utilization
+hardware interrupts handling ✅ automatic
+dedicated vCPUs to handle hardware interrupts via irqbalance
+allows shards to run without interferences
+jvm ✅ absent
+scylla.yaml ✅ simple changes
+usually just customized for enabling features (Alternator, encryption settings)

Repairs, backups
ScyllaDB Manager

Backup
Restore
Repair
Backup/restore
Medusa / K8ssandra

Repairs
Reaper

Vector
functionality

Vector
●Both implement VECTOR<float, dim> type
●Approximate Nearest Neighbour (ANN) queries
●Similar CQL syntax SELECT … ORDER BY col ANN OF vector LIMIT K

However, upon a closer look…

Vector - Cassandra implementation
●Implemented using Storage-Attached Index (SAI) and JVector
○Shared SSTable and compaction lifecycles
●Built on Indexes
○Susceptible to same issues
○Data locality, large partitions
●Shared data paths (storage, chunk cache)

Vector - ScyllaDB Cloud implementation
●External service
●In-memory data
●Rust-based service
●Leverages USearch library
●Fully-managed in the Cloud

Vector - ScyllaDB Cloud implementation

And back to
our demo

Wrap Up

+Benchmarks are complicated
+Be wary of sustained latencies on Apache Cassandra
+Measure sustained response times
+Our testing has limitations, it is impossible to test everything

+ScyllaDB outperforms Apache Cassandra 5.0 in every aspect
+Performance, Scaling, Costs
+Admin (Tip: Check out how the process to upgrade to C*5 looks like ;-)

+Plus Workload Prioritization, Alternator, frictionless monitoring, no GC, …
+Both databases evolved on their own paths
+ScyllaDB focused on maintaining high performance, scalability and vector features - all the while lowering costs
+Cassandra is built for commodity, aiming at a general purpose noSQL with use-cases with broader latency tolerance
Summary

Guilherme Nogueira
[email protected]
Keep Learning
Fast Scaling.
Max Efficiency. Lower Cost.
8 AM PT - 10 AM PT | 15:00 GMT - 17:00 GMT
ScyllaDB
X Cloud
ScyllaDB
University Live
LIVE LEARNNG
November 12
ONLINE | OCT 22 + 23, 2025
All Things Performance
p99conf.io
scylladb.com/events

Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com@scylladb company/scylladb/
scylladb/
Tags