Comparing design decisions, performance, and use case fit
Apache Cassandra and ScyllaDB are distributed databases designed to handle massive globally-distributed workloads. They were both created with a highly scalable architecture and both use the same CQL query language. But their priorities and ...
Comparing design decisions, performance, and use case fit
Apache Cassandra and ScyllaDB are distributed databases designed to handle massive globally-distributed workloads. They were both created with a highly scalable architecture and both use the same CQL query language. But their priorities and design decisions have diverged quite a bit over the past decade.
From the start, there have been notable differences in fundamentals such as sharding, caching, and indexing. The differences have become more significant with the most recent releases. For example, Cassandra 5 introduced Storage‑Attached Indexes, Trie‑based Memtables & SSTables and Unified Compaction Strategy. Meanwhile, ScyllaDB added tablets elasticity (vs. vNodes), Raft, support for up to 90% storage utilization, and ZSTD dictionary-based compression for both network and storage. And both are now taking distinctly different approaches to vector search.
Join this webinar for a side-by-side analysis of the key differences and what it means for teams working with high-performance databases. We’ll cover:
- A look inside each database’s architecture and internal operations
- How performance, features, and deployment options compare
- What’s unique in each and what use cases / technical requirements benefit most from each database’s differentiators
Size: 5.92 MB
Language: en
Added: Oct 17, 2025
Slides: 55 pages
Slide Content
Felipe Mendes, Technical Director at ScyllaDB
Guilherme Nogueira, Technical Director at ScyllaDB
Cassandra vs. ScyllaDB
Evolutionary Differences
Introductions
Felipe Mendes, Technical Director at ScyllaDB
+Published Author on Linux and Databases
+Helps teams solve their most challenging problems
+Years of experience with Linux and distributed systems
Guilherme Nogueira, Technical Director at ScyllaDB
+Previously Solutions Architect
+Publishing
+Streaming
+Automotive
Poll
How experienced are you with ScyllaDB/Cassandra?
Similarities and
Differences
ScyllaDB & Cassandra – Similarities
Distributed Peer-to-Peer Automatic Sharding Global Replication Cassandra Query Language
Wide-Column Compatible Ecosystem Anti-Entropy LSM Engine
Beyond Cassandra
Tablets Raft
Workload
Prioritization
Repair-based
Operations
Incremental
Compaction
SSTable
SSTable
Low
Amplification
Row-based
Cache
DynamoDB
Compatibility
Intriguing ScyllaDB Capabilities You Might Have Overlooked
Concurrency and
Rate-limiters
Tablets
A
C
B
C
A
B
+Abstraction: Smaller table "fragments"
+Span a contiguous token range
+Dynamically shrink/expand (geometric avg size)
+Migrated as a single unit
RAFT for metadata
■Strongly consistent
system.token_metadata
node A
bootstrap
bootstrap
node B
node C
Read
barrier
Read
barrier
■Fault tolerant storage
Workload Prioritization
No prioritization Workload Prioritization
Repair? Tombstones? Data Resurrection?
+Worst things a database can do:
+Lose data
+Corrupt data
+Resurrect data
+Not a problem with ScyllaDB
+We take your data seriously
+We know repair is painful
Faster, Safer Node Operations with Repair vs Streaming
Beyond Cassandra
Tablets Raft
Workload
Prioritization
Repair-based
Operations
Incremental
Compaction
SSTable
SSTable
Low
Amplification
Row-based
Cache
DynamoDB
Compatibility
Intriguing ScyllaDB Capabilities You Might Have Overlooked
Concurrency and
Rate-limiters
Incremental Compaction
A
B
...
Z
a
b
...
z
A+a
B+b
Aa
Bb
A+a
B+b
+We observed problems with legacy compaction strategies:
+STCS has high space amplification (and low write amplification)
+LCS has high write amplification (and low space amplification)
+We wanted to benefit from both approaches
+By borrowing SSTable Runs from LCS
+And applying them over size-tiers
+Merely replacing
+increasingly larger SSTables with
+increasingly longer SSTable Runs
Designing Access Methods: The RUM Conjecture
+ScyllaDB has a fast cache
+Efficient access & maintenance
+Thanks to collocation with replica and design
+Takes care of consistency guarantees
+Handles complexities of data and query model
Row-based Cache
memtable
RAM
Disk
Read
cache
sstable
sstable
sstable
We Compared ScyllaDB and Memcached and… We Lost?
+Run DynamoDB-compatible
workloads anywhere:
+on AWS
+on Google Cloud, Azure, or
+On-prem
+DynamoDB Streams, Global Tables
+Supports Load Balancing
+ScyllaDB Spark Migrator to move
data anywhere
DynamoDB-compatible
API (Alternator)
+Cassandra has no comparable feature
Per-Partition Rate-Limiting
Retaining Goodput with Query Rate Limiting
■Malicious/misbehaving users
■Parts of your system going awry due to bugs
The system does not have to satisfy these requests, and they should not affect the whole system too much.
■A maximum read/write rate can be set for a table.
■ScyllaDB will reject some operations in an effort to
keep the rate of successful requests under the limit.
ALTER TABLE ks.tbl
WITH per_partition_rate_limit = {
'max_writes_per_second': 100,
'max_reads_per_second': 200
};
+ScyllaDB – Decouples topology changes from streaming
+Add nodes with time ~ 0
+Streaming happens in parallel
+Load gradually shifts, via tablet-aware drivers
+Cassandra – topology rely on streaming
+You add/remove a single node, and wait
+Then another, and wait…
+Time grows incrementally
Differences
+Cassandra 4.0 docs say:
+"To run any Zero Copy streaming benchmark the
stream_throughput_outbound_megabits_per_sec must be set to a really high value"
+Cassandra 5 docs say:
+"To run any Zero Copy streaming benchmark the stream_throughput_outbound must be set to a
really high value"
+Instaclustr says:
+entire_sstable_stream_throughput_outbound!
+Hint: Instaclustr is correct.
What Happened?
... and then you’ve got to Cleanup
+Not needed for ScyllaDB
+Boom!
Bootstrap
Bootstrap Cleanup
Oh! By the way...
+Our Cassandra cluster got inconsistent :-(
+How to benchmark this?
+Fixed after a rolling restart
+Quite annoying
Demo time
●Starting from 3 x i4i.4xlarge
○2TB pre-replication dataset, RF=3
○~56K ops/s for Cassandra
○~200K ops/s for ScyllaDB
●Scaling to:
○ScyllaDB: + 3 x i4i.32xlarge
○Cassandra: + 69 x i4i.4xlarge
Scaling to tackle 2M ops/s
Bootstrap
Bootstrap Cleanup
Cassandra scaling time
< 300GB transferred,
becomes linear
Cassandra node join process
ScyllaDB Scaling
●Process starts instantly and joins the cluster
●Load balancer continuously distribute tablets and load
●Client drivers are notified and route request according to tablet's movement
Costs
+Throughput
+Spiky and bounded – Batch, ETL
+ScyllaDB offers unparalleled throughput
+Latency sensitive
+Focus of our testing – Real-time and unpredictable
+ScyllaDB reacts faster to opportunities
+Storage dense – Tablets + Advanced Compression allow for up to 90% disk utilization
+Dictionary-based compression
+Data governance / Retention requirements
+ScyllaDB maximizes both disks and cache
Different savings for different workloads
Run df on your
Cassandra nodes
for a SURPRISE
Observability
Cassandra - DIY, Community or 3rd Party
+No centralized offer for monitoring
+Community versions available (Metrics Collector for Apache Cassandra)
+Lacks 5.0 support
+Each team comes up with their own set of dashboards and alerts
+JMX complexity
+Newer versions expose some metrics in a Prometheus friendly format bypassing JMX
+Still largely needed for Java metrics, maintenance operations
+Monitoring-as-a-Service varies in coverage and detail
+Datadog, AxiomOps, Dynatrace, New Relic
ScyllaDB - Out of the box monitoring
+Easy, out of the box with Scylla Monitoring stack
+Prometheus + collection rules and alerts
+Loki + alerts
+Alert Manager + alerting rules
+Grafana + powerful dashboards
+New release = new features = new dashboards
+Stay up-to-date for the latest and greatest
Poll
How do you monitor your clusters?
Which observability tools you use, whether custom-built or 3rd party
ScyllaDB setup and tuning
+sysctl ✅ automatic
+scylla_setup
+memory, scheduling, network
+disks parameters ✅ automatic
+iotune, part of scylla_setup
+best concurrency settings to maximize disk utilization
+hardware interrupts handling ✅ automatic
+dedicated vCPUs to handle hardware interrupts via irqbalance
+allows shards to run without interferences
+jvm ✅ absent
+scylla.yaml ✅ simple changes
+usually just customized for enabling features (Alternator, encryption settings)
Vector
●Both implement VECTOR<float, dim> type
●Approximate Nearest Neighbour (ANN) queries
●Similar CQL syntax SELECT … ORDER BY col ANN OF vector LIMIT K
However, upon a closer look…
Vector - Cassandra implementation
●Implemented using Storage-Attached Index (SAI) and JVector
○Shared SSTable and compaction lifecycles
●Built on Indexes
○Susceptible to same issues
○Data locality, large partitions
●Shared data paths (storage, chunk cache)
Vector - ScyllaDB Cloud implementation
●External service
●In-memory data
●Rust-based service
●Leverages USearch library
●Fully-managed in the Cloud
Vector - ScyllaDB Cloud implementation
And back to
our demo
Wrap Up
+Benchmarks are complicated
+Be wary of sustained latencies on Apache Cassandra
+Measure sustained response times
+Our testing has limitations, it is impossible to test everything
+ScyllaDB outperforms Apache Cassandra 5.0 in every aspect
+Performance, Scaling, Costs
+Admin (Tip: Check out how the process to upgrade to C*5 looks like ;-)
+Plus Workload Prioritization, Alternator, frictionless monitoring, no GC, …
+Both databases evolved on their own paths
+ScyllaDB focused on maintaining high performance, scalability and vector features - all the while lowering costs
+Cassandra is built for commodity, aiming at a general purpose noSQL with use-cases with broader latency tolerance
Summary
Guilherme Nogueira [email protected]
Keep Learning
Fast Scaling.
Max Efficiency. Lower Cost.
8 AM PT - 10 AM PT | 15:00 GMT - 17:00 GMT
ScyllaDB
X Cloud
ScyllaDB
University Live
LIVE LEARNNG
November 12
ONLINE | OCT 22 + 23, 2025
All Things Performance
p99conf.io
scylladb.com/events
Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com@scylladb company/scylladb/
scylladb/