ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB
926 views
65 slides
Jun 20, 2024
Slide 1 of 65
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
About This Presentation
Join ScyllaDB’s CEO, Dor Laor, as he introduces the revolutionary tablet architecture that makes one of the fastest databases fully elastic. Dor will also detail the significant advancements in ScyllaDB Cloud’s security and elasticity features as well as the speed boost that ScyllaDB Enterprise ...
Join ScyllaDB’s CEO, Dor Laor, as he introduces the revolutionary tablet architecture that makes one of the fastest databases fully elastic. Dor will also detail the significant advancements in ScyllaDB Cloud’s security and elasticity features as well as the speed boost that ScyllaDB Enterprise 2024.1 received.
Size: 9.73 MB
Language: en
Added: Jun 20, 2024
Slides: 65 pages
Slide Content
ScyllaD B Leaps Forward Dor Laor, Co-founder & CEO of ScyllaDB
Dor Laor Hello world, Core router, KVM, OSv, ScyllaDB Phd in Snowboard , aspiring MTB rider Let’s shard!
3 Learnings from the KVM Hypervisor Dev Period Layer == overhead Locking == Evil Simplicity == True 1 million op/s == Expectations Off-by == one || 0x10
ScyllaDB is Proud to Serve!
You’ll hear from many of our customers
Why ScyllaDB? Best High Availability in the industry Best Disaster Recovery in the industry Best scalability in the industry Best Price/Performance in the industry Auto-tune - out of the box performance C ompatible with Cassandra & DynamoDB The power of Cassandra at the speed of Redis with the usability of DynamoDB No Lock-in Open Source Software
Agenda Part 1 Arch overview New 2024 results & benchmarks ScyllaDB Cloud brief Part 2 ScyllaDB 6.0 Tablets What’s coming
Shard Per Core Architecture Shards Threads
ScyllaDB Architecture Homogeneous nodes Ring Architecture
ScyllaDB Architecture Homogeneous nodes Ring Architecture?@$@#
Can we linearly scale up?
Time between failures TBF Ease of maintenance No noisy neighbours No virtualization, container overhead No other moving parts Scale up before out! Small vs Large Machines
Linear Scale Ingestion 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes
Linear Scale Ingestion 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes
Linear Scale Ingestion 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes
Linear Scale Ingestion 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes
Linear Scale Ingestion 2X 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes
Linear Scale Ingestion 2X 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes
Linear Scale Ingestion (2023 vs 2018) Constant Time Ingestion 2X 2X 2X 2X 2X
Ingestion: 17.9k OPS per shard * 16
Ingestion: <2ms P99 per shard
“Nodes must be small, in case they fail” You can scale OPS but can you scale failures?
2X 2X 2X 2X 2X “Nodes must be small, in case they fail” No they don’t! {Replace, Add, remove} Node at constant time
2X 2X 2X 2X 2X “Nodes must be small, in case they fail” No they don’t! {Replace, Add, remove} Node at constant time
2X 2X 2X 2X 2X “Nodes must be small, in case they fail” No they don’t! {Replace, Add, remove} Node at constant time
What’s new in 2024.1?
2024.1 vs 2023.1 vs OSS 5.4
Benchmark : ScyllaDB vs MongoDB
Benchmark : ScyllaDB vs MongoDB Fair 3rd party SaaS - zero config YCSB - standard 132 wins in 133 workloads 10x-20x better performance 10x-20x better latency MongoDB doesn’t scale!
Welcome ScyllaDB Our journey from eventual to immediate consistency
Scylla today is awesome but Topology changes are allowed one-at-a-time Rely on 30+ second timeouts for consistency Node failed/down block scale Streaming time is a function of the schema Additional complex operations: Cleanup, repair
ScyllaDB 6.0 Value Elasticity Faster bootstrap Concurrent node operations Immediate request serving Simplicity Transparent cleanups Semi transparent repairs Auto gc-grace period Parallel maintenance operations Speed Streaming sstables - 30x faster Load balancing of tablets Consistency TCO Shrink free space Reduce static, over provisioned deployments
Behind the Scene
Raft- Consistent metadata Protocol for state machine replication Total order broadcast of state change commands X = 0 X += 1 CAS (X, 0, 1) X = 0
node A bootstrap bootstrap Linearizable Token Metadata node B node C system.token_metadata Read barrier Read barrier
Fencing - e ach write is signed with topology version If there is a version mismatch, the write doesn’t go through Changes in the Data Plane Replica Coordinator Topology coordinator Req(V 1) Req(V 1 ) Fence(V 1) Drain(V 1) Req(V 2 )
Standard tables {R1, R2, R3} R1 R2 R3 key1 replication metadata: ( per keyspace )
Standard tables {R1, R2, R3} R1 R2 R3 key1 key2 Sharding function generates good load distribution between CPUs
RAFT group No. 299238 RAFT group No. 299236 RAFT group No. 299237 R AFT tables key1 key2 tablet tablet replica tablet replica
RAFT tables key1 key2 Good load distribution requires lots of RAFT groups.
Tablets - balancing Table starts with a few tablets. Small tables end there Not fragmented into tiny pieces like with tokens
Tablets - balancing When tablet becomes too heavy (disk, CPU, …) it is split
Tablets - balancing When tablet becomes too heavy (disk, CPU, …) it is split
Tablets - balancing The load balancer can decide to move tablets
Tablets - balancing Depends on fault-tolerant, reliable, and fast topology changes.
Tablets Resharding is cheap. SStables split at tablet boundary. Reassign tablets to shards (logical operation).
Tablets Cleanup after topology change is cheap. Just delete SStables.
Tablet Scheduler Scheduler globally controls movement, maintenance operation on a per tablet basis repair migration tablet 0 tablet 1 schedule schedule repair Backup
Tablet Scheduler Goals: Maximize throughput (saturate) Keep migrations short (don’t overload) Rules: migrations-in <= 2 per shard migrations-out <= 4 per shard
Tablets - Repair - Fire & forget Tablet based Continuous, transparent controlled by the load balancer Auto GC grace period
Tablets - Streaming (Enterprise) Send files over RPC. No per schema , per row processing. 30x faster, saturate links. Scylla Enterprise only Sstables files Sstables files Sstables files
Post 6.0
Since we have 30x faster streaming Parallel operations Small unit size - based on tablet/shard, not hardware Negligent performance impact Incremental serving as you add machines No reliance on cluster size, instance size or instance type Tablets =~ Serverless
Serverless Time Volume ie3n’s i4i Time Throughput Capacity Required Time Throughput On-demand Base Type less Size less limit less
What’s Cooking?
Full transactional consistency with Raft HDD and dense standard nodes (d3en) S3 backend Incremental repair Point in time backup/restore Tiered storage What’s in the Oven Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Immediate Eventual Consistency