Using ScyllaDB for Real-Time Write-Heavy Workloads

ScyllaDB 866 views 44 slides Jun 27, 2024
Slide 1
Slide 1 of 44
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44

About This Presentation

Learn about major architectural shifts that enable new levels of elasticity and operational simplicity
ScyllaDB just launched the first release featuring our new “tablets” architecture. Tablets builds upon a multiyear project to re-architect our legacy ring architecture. Our metadata is now full...


Slide Content

ScyllaDB Fast Forward:
True Elastic Scale
Dor Laor, CEO and Co-Founder, ScyllaDB
Felipe Cardeneti Mendes, Solutions Architect++, ScyllaDB

Poll
How often do you scale your database?

Presenters

Felipe Cardeneti Mendes, Solutions Architect
+Puppy Lover
+Open Source Enthusiast
+ScyllaDB passionate!

Dor Laor, CEO and Co-Founder
+Puppy Lover
+Open Source, snowboard and MTB Enthusiast
+ScyllaDB passionate!

Why ScyllaDB?
Best High Availability in the industry
Best Disaster Recovery in the industry
Best scalability in the industry
Best Price/Performance in the industry Auto-tune - out of the box performance
Compatible with Cassandra & DynamoDB
The power of Cassandra at the speed of Redis with the usability of DynamoDB
No Lock-in
Open Source Software

+400 Gamechangers Leverage ScyllaDB

Agenda
+Wasn’t ScyllaDB elastic before?
+The value proposition
+Demo time
+Deeper dive
+What's next

< 6.0 ScyllaDB Releases Were Good
2X 2X 2X 2X 2X
3 x 4-vcpu VMs
1B keys
3 x 8-vcpu VMs
2B keys
3 x 16-vcpu VMs
4B keys
3 x 32-vcpu VMs
8B keys
3 x 64-vcpu VMs
16B keys
3 x 128-vcpu VMs
32B keys

< 6.0 Latency Under Repair/Add/Remove Nodes

ScyllaDB < 6.0 Problems
+Eventual Consistency of cluster metadata
+Great architecture for scalable, partition-tolerable, widely distributed DB
+But
+Topology changes are allowed one-at-a-time
+Rely on 30+ second timeouts for poor’s man linearizability
+Node failed/down block scale
+Streaming time is a function of the schema - parsing partitions
+Additional complex operations: Cleanup, rebuild, etc

Our journey
from eventual
to immediate
consistency
& tablets

ScyllaDB 6.0 Value
Elasticity
+Faster bootstrap/operations
+Incremental bootstrap
+Concurrent node operations


Simplicity
+Decouple topology operations
+Transparent cleanups
+Semi transparent repairs
+Parallel maintenance operations

Speed
+Streaming sstables - 30x faster
+Load balancing of tablets Consistency
+Small tables fit in few (single) nodes
TCO
+Shrink free space
+Reduce static, over provisioned deployments

Demo Time!

Theory Behind Tablets

Raft- Consistent Metadata
Protocol for state machine replication
Total order broadcast of state change commands
X = 0 X += 1
CAS(X, 0, 1)
X = 0

node A
bootstrap
bootstrap
Linearizable Token Metadata
node B
node C
system.token_metadata

Read
barrier
Read
barrier

+Fencing - each write is signed with topology version
+If there is a version mismatch, the write doesn’t go through
Changes in the Data Plane
Replica
Coordinator
Topology
coordinator
Req(V
1)
Req(V
1
)
Fence(V
1)
Drain(V
1)
Req(V
2
)

Linearizable Schema Version
No re-hash of the entire schema on change
10x less CPU with large schemas.
TimeUUID-based Schema versionHash-based schema version
5.x: 6.x:

Tablet History: ClayDB > Bigtable > ScyllaDB

Consistent Metadata & Tablet Journey
RAFT Safe schema
changes
Safe topology
changes
Dynamic partitioning
Consistent tables
Tablets
5.0
5.2
5.2+
6.0

+Introduce a new layer of indirection - the tablets table
+Each table has its own token range to node mapping
+Mapping can change independently of node addition
and removal
+Different tables can have different tablet counts
+Managed by Raft
Implementation - Metadata
System, tablets
Query
Replica
Set
Token

+Each tablet replica is isolated into its own
memtable+ SSTables
+Forms its own little Log-Structured Merge Tree
+With compaction and stuff
+Can be migrated as a unit
+Migration: copy the unit
+Cleanup: delete the unit
+Split/merge as the table grows/shrinks
Implementation - Data Path

Standard Tables
{R1, R2, R3}
R1
R2
key1key2
Sharding function generates
good load distribution between
CPUs

R3

RAFT group
No. 299238
RAFT group
No. 299236 RAFT group
No. 299237
Raft Tables key1key2
tablet
tablet
replica
tablet
replica

+Source of truth for the cluster
+How many tablets for each table
+Token boundaries for each tablets
+On which nodes and shards do we have replicas
+What kind of transition the tablet is undergoing
+Which nodes and shards will host tablet replicas
after the transition
+Managed using the Raft protocol
+Replicated on every node
The Tablets Table
CREATE TABLE system.tablets (
table_id uuid,
last_token bigint,
keyspace_name text static,
replicas list<tuple<uuid, int>>,
new_replicas list<tuple<uuid, int>>,
session uuid,
stage text,
transition text,
table_name text static,
tablet_count int static,
PRIMARY KEY (table_id, last_token)
);

Tablets - Balancing
Depends on fault-tolerant,
reliable, and fast topology
changes.

+Hosted on one node
+But can be migrated freely if the node is down
+Synchronized via Raft
+Collects statistics on tables and tablets
+Migrates to balance space
+Evacuates nodes to decommission
+Migrates to balance CPU load
+Rebuilds and repairs
Implementation - Load Balancer

Tablet Scheduler
Scheduler globally controls movement,
maintenance operation on a per tablet basis
repairmigrationtablet 0
tablet 1
schedule schedule
repairBackup

Tablets
Resharding is cheap.
SStables split at tablet boundary.
Reassign tablets to shards (logical operation).

Tablets
Cleanup after topology change is cheap.
Just delete SSTables. Runs automatically,
under the hood

Tablets - Streaming (Enterprise)
Send files over RPC. No per
schema, per row processing.
Up to 30x faster, saturate links.

Sstables
files
Sstables
files
Sstables
files

+Drivers are tablet-aware
+But without reading the tablets table
+Driver contacts a random node/shard
+On miss, gets updated routing information for
that tablet
+Ensures fast start-up even with 100,000 entries
in the tablets table
Driver Support

Authentication and Service Levels on Raft
ScyllaDB 5.x was Manual
ScyllaDB 6.x:
+Automatically replicated on every node
+Linearizable with CREATE/DROP
+No denial of service if a node is down

What’s Coming

Tablets - Repair - Fire & Forget
+Continuous, transparent controlled by the load balancer
+Auto GC grace period

Serverless (VM Based..)
Typeless Sizeless Limitless

Consistent
metadata +
Elasticity =

Much More

Poll
How long does it take for you to scale
your existing database?

Guilherme Nogueira
[email protected]
Keep Learning
scylladb.com/category/engineering
Register now at p99conf.io
Visit our
blog for more
on ScyllaDB
engineering

Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com@scylladb company/scylladb/
scylladb/

40
New metrics in keyspace dashboard
●scylla-monitoring provides an updated dashboard that shows tablets over time per instance
and tablet count per instance at the moment

41
Write workload

Tablet
Tablet
Tablet
Tablet
Growing a cluster - Tablets
Tablet
Tablet
Tablet
Tablet
Serving
reads/writes

Tablet
Tablet
Tablet
Tablet
Growing a cluster - Tablets
Tablet
Tablet
Tablet
Tablet

Tablet
Tablet
Tablet
Tablet
Growing a cluster - Tablets
Tablet
Tablet
Tablet
Tablet
Start Serving
reads/writes
Tags