ScyllaDB Topology on Raft: An Inside Look

ScyllaDB 360 views 37 slides Jun 21, 2024
Slide 1
Slide 1 of 37
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37

About This Presentation

In ScyllaDB 6.0, we complete the transition to strong consistency for all of the cluster metadata. In this session, Konstantin Osipov covers the improvements we introduce along the way for such features as CDC, authentication, service levels, Gossip, and others.


Slide Content

Topology on Raft: An Inside Look Konstantin Osipov, Director of Engineering @ ScyllaDB

Konstantin Osipov Seasoned database geek Certified Buteyko breather Muscovite and a father of three

Raft recap ScyllaDB path to consistency: Schema Topology Manageability Presentation Agenda

Previous Episodes

Problem Overview

Strong vs Eventual Consistency Strong consistency Node 1 Node 2 1. Write from client 4. Acknowledged to client 2. Write propagated through cluster 3.Internal acknowledgement Eventual consistency Node 1 Node 2 1. Write from client 2. Acknowledged to client 3. Eventual write propagation requires a live majority always returns latest write highly available writes must commute

Data vs metadata metadata data Schema information: table, view, type definitions Topology information: nodes, tokens Static and regular rows, counters Replicated everywhere Partitioned Not commutative Commutative Changes rarely Changes frequently Consistency of Metadata 1 2 3 3 1 2 replication_factor=2 ScyllaDB cluster

Raft for Metadata Replication Consensus module State machine Log x←1 y←2 z←3 Consensus module State machine Log x←1 y←2 z←3 Consensus module State machine Log x←1 y←2 z←3 Node A Node B Node C

Elements of the Raft State Topology ‹#› Schema keyspaces Backward compatibility topology peers cdc_generations columns tables tablets scylla_local local topology_requests auth 5.2 5.2 5.2 6.0 6.0 6.0 6.0 6.0 3.0 3 .0 3.0 service_levels 6.0

Runs alongside Raft leader Highly available Drives the progress Performs linearizable reads and writes of the topology Request coordinators still use the local view on topology No extra coordination when executing user requests The Centralized Topology Coordinator

Linearizable topology changes bootstrap bootstrap tablet migration backup repair + Simplicity + Safety

Automatic Coordinator Failover

Further improvements in schema changes

Dedicated commit log on shard 0 No need to FLUSH entire schema after changing it 10x less IO with large schemas! shard 6 shard 7 shard 8 shard 3 shard 5 shard 5 shard 0 shard 1 shard 2 Node 1 shard 6 shard 7 shard 8 shard 3 shard 5 shard 5 shard 0 shard 1 shard 2 Node 2 Schema commit log Schema commit log

Linearizable schema version No re-hash of the entire schema on change 10x less CPU with large schemas. TimeUUID-based Schema version Hash-based schema version 5.x: 6 .x:

Authentication and service levels on Raft ScyllaDB 5.x Manual: Set the system_auth keyspace replication factor to the number of nodes in the datacenter. For production environments use only NetworkTopologyStrategy . ScyllaDB 6.x: Automatically replicated on every node Linearizable with CREATE/DROP No denial of service if a node is down

Systems we moved

Features on Raft Can I join? Can I join? ok

CDC generations on Raft Quick & reliable propagation of CDC data at boot The topology coordinator is responsible for changing the ring Prerequisite for quick and concurrent boot

Automated cleanup No need to run nodetool cleanup - a utomatic after topology op Automatic repair is planned with tablets You should run nodetool cleanup whenever you scale-out (expand) your cluster, and new nodes are added to the same DC.

UUID based host identification Token metadata Hints Increased safety: Removed nodes are banned from the cluster Live nodes can’t be removed, only decommissioned

Fast and concurrent bootstrap bootstrap as many nodes as you want, simultaneously New c luster assembly takes seconds, not minutes/hours # DEPRECATED/ IGNORED skip_wait_for_gossip_to_settle : 30

Manageability improvements

New system table for Raft state cqlsh> select * from system.raft_state; group_id | disposition | server_id | can_vote --------------------------------------+-------------+--------------------------------------+---------- 7b818380-e9f8-11ed-9316-7c72c96b4bfa | CURRENT | c3b8f01d-e87f-487f-8e6c-e2c86f8b898b | True

New rest APIs localhost:9000/storage_service/cleanup_all localhost:9000/raft/trigger_snapshot/{group_id}

Maintenance mode ./ scylla --maintenance-mode=true --maintenance-socket=workdir kostja@hulk:~/work/scylla/db$ cqlsh ./cql.m Connected to at ./cql.m:9042 [cqlsh 6.2.0 | Scylla 5.5.0~dev-0.20240130.0cbf8f75f016 | CQL spec 3.3.1 | Native protocol v4] Use HELP for help. cqlsh>

Enabling Raft In 6.0 and up Raft is ALWAYS ON # DEPRECATED/ IGNORED consistent_cluster_management : true

Stay in Touch Konstantin Osipov [email protected] @kostja_osipov @kostja https://www.linkedin.com/in/kostja/

ScyllaDB Summit 2024 Styles 2024 Summit color palette #1B58EF #05CEE8 #00EFB6 #F244CD #8158FF #EEEEEE #FFA522 #4D4D4D The default body font is Roboto Condensed. You can adjust the size as needed. You can also use Roboto (the uncondensed version). For code you should use Roboto Mono and you can set it on this dark background

ScyllaDB Logo

ScyllaDB Products Mascots Scylla Open Source Scylla Enterprise Scylla Cloud Scylla Manager Scylla Drivers Scylla Operator Scylla Monitoring Scylla Alternator

ScyllaDB Monsters

Your Big Slide Title Goes Here

Your Big Slide Title Goes Here

Your Title Goes Here

Your Slide Title in Title Case Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum dictum ex leo, ac blandit arcu convallis et. Donec faucibus porttitor lorem vitae luctus Vestibulum ante ipsum primis in faucibus Orci luctus et ultrices posuere cubilia curae Donec pharetra turpis eu interdum fermentum Nulla facilisi Lacus est finibus ligula

Section Title
Tags