Using ScyllaDB for Real-Time Write-Heavy Workloads
ScyllaDB
866 views
44 slides
Jun 27, 2024
Slide 1 of 44
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
About This Presentation
Learn about major architectural shifts that enable new levels of elasticity and operational simplicity
ScyllaDB just launched the first release featuring our new “tablets” architecture. Tablets builds upon a multiyear project to re-architect our legacy ring architecture. Our metadata is now full...
Learn about major architectural shifts that enable new levels of elasticity and operational simplicity
ScyllaDB just launched the first release featuring our new “tablets” architecture. Tablets builds upon a multiyear project to re-architect our legacy ring architecture. Our metadata is now fully consistent, thanks to the assistance of the Raft consensus protocol. Together, these changes enable new levels of elasticity, speed, simplicity, and efficiency. Data is dynamically redistributed as the workload and topology evolve. New nodes can be spun up in parallel and start adapting to the load in near real-time.
Join ScyllaDB Co-founder Dor Laor to learn what this new approach means for:
- Rapidly responding to traffic spikes without overprovisioning
- Performing safe, concurrent, fast bootstrapping
- Simplifying cluster administration (e.g., cleanup, repair, tombstone garbage collection)
- Increasing efficiency and eliminating overprovisioning
Size: 4.2 MB
Language: en
Added: Jun 27, 2024
Slides: 44 pages
Slide Content
ScyllaDB Fast Forward:
True Elastic Scale
Dor Laor, CEO and Co-Founder, ScyllaDB
Felipe Cardeneti Mendes, Solutions Architect++, ScyllaDB
Dor Laor, CEO and Co-Founder
+Puppy Lover
+Open Source, snowboard and MTB Enthusiast
+ScyllaDB passionate!
Why ScyllaDB?
Best High Availability in the industry
Best Disaster Recovery in the industry
Best scalability in the industry
Best Price/Performance in the industry Auto-tune - out of the box performance
Compatible with Cassandra & DynamoDB
The power of Cassandra at the speed of Redis with the usability of DynamoDB
No Lock-in
Open Source Software
+400 Gamechangers Leverage ScyllaDB
Agenda
+Wasn’t ScyllaDB elastic before?
+The value proposition
+Demo time
+Deeper dive
+What's next
< 6.0 ScyllaDB Releases Were Good
2X 2X 2X 2X 2X
3 x 4-vcpu VMs
1B keys
3 x 8-vcpu VMs
2B keys
3 x 16-vcpu VMs
4B keys
3 x 32-vcpu VMs
8B keys
3 x 64-vcpu VMs
16B keys
3 x 128-vcpu VMs
32B keys
< 6.0 Latency Under Repair/Add/Remove Nodes
ScyllaDB < 6.0 Problems
+Eventual Consistency of cluster metadata
+Great architecture for scalable, partition-tolerable, widely distributed DB
+But
+Topology changes are allowed one-at-a-time
+Rely on 30+ second timeouts for poor’s man linearizability
+Node failed/down block scale
+Streaming time is a function of the schema - parsing partitions
+Additional complex operations: Cleanup, rebuild, etc
Our journey
from eventual
to immediate
consistency
& tablets
Speed
+Streaming sstables - 30x faster
+Load balancing of tablets Consistency
+Small tables fit in few (single) nodes
TCO
+Shrink free space
+Reduce static, over provisioned deployments
Demo Time!
Theory Behind Tablets
Raft- Consistent Metadata
Protocol for state machine replication
Total order broadcast of state change commands
X = 0 X += 1
CAS(X, 0, 1)
X = 0
node A
bootstrap
bootstrap
Linearizable Token Metadata
node B
node C
system.token_metadata
Read
barrier
Read
barrier
+Fencing - each write is signed with topology version
+If there is a version mismatch, the write doesn’t go through
Changes in the Data Plane
Replica
Coordinator
Topology
coordinator
Req(V
1)
Req(V
1
)
Fence(V
1)
Drain(V
1)
Req(V
2
)
Linearizable Schema Version
No re-hash of the entire schema on change
10x less CPU with large schemas.
TimeUUID-based Schema versionHash-based schema version
5.x: 6.x:
+Introduce a new layer of indirection - the tablets table
+Each table has its own token range to node mapping
+Mapping can change independently of node addition
and removal
+Different tables can have different tablet counts
+Managed by Raft
Implementation - Metadata
System, tablets
Query
Replica
Set
Token
+Each tablet replica is isolated into its own
memtable+ SSTables
+Forms its own little Log-Structured Merge Tree
+With compaction and stuff
+Can be migrated as a unit
+Migration: copy the unit
+Cleanup: delete the unit
+Split/merge as the table grows/shrinks
Implementation - Data Path
Standard Tables
{R1, R2, R3}
R1
R2
key1key2
Sharding function generates
good load distribution between
CPUs
R3
RAFT group
No. 299238
RAFT group
No. 299236 RAFT group
No. 299237
Raft Tables key1key2
tablet
tablet
replica
tablet
replica
+Source of truth for the cluster
+How many tablets for each table
+Token boundaries for each tablets
+On which nodes and shards do we have replicas
+What kind of transition the tablet is undergoing
+Which nodes and shards will host tablet replicas
after the transition
+Managed using the Raft protocol
+Replicated on every node
The Tablets Table
CREATE TABLE system.tablets (
table_id uuid,
last_token bigint,
keyspace_name text static,
replicas list<tuple<uuid, int>>,
new_replicas list<tuple<uuid, int>>,
session uuid,
stage text,
transition text,
table_name text static,
tablet_count int static,
PRIMARY KEY (table_id, last_token)
);
Tablets - Balancing
Depends on fault-tolerant,
reliable, and fast topology
changes.
+Hosted on one node
+But can be migrated freely if the node is down
+Synchronized via Raft
+Collects statistics on tables and tablets
+Migrates to balance space
+Evacuates nodes to decommission
+Migrates to balance CPU load
+Rebuilds and repairs
Implementation - Load Balancer
Tablet Scheduler
Scheduler globally controls movement,
maintenance operation on a per tablet basis
repairmigrationtablet 0
tablet 1
schedule schedule
repairBackup
Tablets
Resharding is cheap.
SStables split at tablet boundary.
Reassign tablets to shards (logical operation).
Tablets
Cleanup after topology change is cheap.
Just delete SSTables. Runs automatically,
under the hood
Tablets - Streaming (Enterprise)
Send files over RPC. No per
schema, per row processing.
Up to 30x faster, saturate links.
Sstables
files
Sstables
files
Sstables
files
+Drivers are tablet-aware
+But without reading the tablets table
+Driver contacts a random node/shard
+On miss, gets updated routing information for
that tablet
+Ensures fast start-up even with 100,000 entries
in the tablets table
Driver Support
Authentication and Service Levels on Raft
ScyllaDB 5.x was Manual
ScyllaDB 6.x:
+Automatically replicated on every node
+Linearizable with CREATE/DROP
+No denial of service if a node is down
What’s Coming
Tablets - Repair - Fire & Forget
+Continuous, transparent controlled by the load balancer
+Auto GC grace period
Poll
How long does it take for you to scale
your existing database?
Guilherme Nogueira [email protected]
Keep Learning
scylladb.com/category/engineering
Register now at p99conf.io
Visit our
blog for more
on ScyllaDB
engineering
Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com@scylladb company/scylladb/
scylladb/
40
New metrics in keyspace dashboard
●scylla-monitoring provides an updated dashboard that shows tablets over time per instance
and tablet count per instance at the moment