ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB

ScyllaDB 926 views 65 slides Jun 20, 2024
Slide 1
Slide 1 of 65
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65

About This Presentation

Join ScyllaDB’s CEO, Dor Laor, as he introduces the revolutionary tablet architecture that makes one of the fastest databases fully elastic. Dor will also detail the significant advancements in ScyllaDB Cloud’s security and elasticity features as well as the speed boost that ScyllaDB Enterprise ...


Slide Content

ScyllaD B Leaps Forward Dor Laor, Co-founder & CEO of ScyllaDB

Dor Laor Hello world, Core router, KVM, OSv, ScyllaDB Phd in Snowboard , aspiring MTB rider Let’s shard!

3 Learnings from the KVM Hypervisor Dev Period Layer == overhead Locking == Evil Simplicity == True 1 million op/s == Expectations Off-by == one || 0x10

ScyllaDB is Proud to Serve!

You’ll hear from many of our customers

Why ScyllaDB? Best High Availability in the industry Best Disaster Recovery in the industry Best scalability in the industry Best Price/Performance in the industry Auto-tune - out of the box performance C ompatible with Cassandra & DynamoDB The power of Cassandra at the speed of Redis with the usability of DynamoDB No Lock-in Open Source Software

Agenda Part 1 Arch overview New 2024 results & benchmarks ScyllaDB Cloud brief Part 2 ScyllaDB 6.0 Tablets What’s coming

Shard Per Core Architecture Shards Threads

ScyllaDB Architecture Homogeneous nodes Ring Architecture

ScyllaDB Architecture Homogeneous nodes Ring Architecture?@$@#

Can we linearly scale up?

Time between failures TBF Ease of maintenance No noisy neighbours No virtualization, container overhead No other moving parts Scale up before out! Small vs Large Machines

Linear Scale Ingestion 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes

Linear Scale Ingestion 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes

Linear Scale Ingestion 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes

Linear Scale Ingestion 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes

Linear Scale Ingestion 2X 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes

Linear Scale Ingestion 2X 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes

Linear Scale Ingestion (2023 vs 2018) Constant Time Ingestion 2X 2X 2X 2X 2X

Ingestion: 17.9k OPS per shard * 16

Ingestion: <2ms P99 per shard

“Nodes must be small, in case they fail” You can scale OPS but can you scale failures?

2X 2X 2X 2X 2X “Nodes must be small, in case they fail” No they don’t! {Replace, Add, remove} Node at constant time

2X 2X 2X 2X 2X “Nodes must be small, in case they fail” No they don’t! {Replace, Add, remove} Node at constant time

2X 2X 2X 2X 2X “Nodes must be small, in case they fail” No they don’t! {Replace, Add, remove} Node at constant time

What’s new in 2024.1?

2024.1 vs 2023.1 vs OSS 5.4

Benchmark : ScyllaDB vs MongoDB

Benchmark : ScyllaDB vs MongoDB Fair 3rd party SaaS - zero config YCSB - standard 132 wins in 133 workloads 10x-20x better performance 10x-20x better latency MongoDB doesn’t scale!

Scale Benchmark : ScyllaDB vs MongoDB

ScyllaDB Cloud News

ScyllaDB Cloud - 65% of Customers Terraform providers, launch, scale, manage Multiple networking modes Encryption at rest, BYOK Certifications: SoC2, ISO, PCI Coming: Azure

Welcome ScyllaDB Our journey from eventual to immediate consistency

Scylla today is awesome but Topology changes are allowed one-at-a-time Rely on 30+ second timeouts for consistency Node failed/down block scale Streaming time is a function of the schema Additional complex operations: Cleanup, repair

ScyllaDB 6.0 Consistent schema changes (raft, >= 5.2) Consistent topology changes (raft, 6.0) Tablets

ScyllaDB 6.0 Value Elasticity Faster bootstrap Concurrent node operations Immediate request serving Simplicity Transparent cleanups Semi transparent repairs Auto gc-grace period Parallel maintenance operations Speed Streaming sstables - 30x faster Load balancing of tablets Consistency TCO Shrink free space Reduce static, over provisioned deployments

Behind the Scene

Raft- Consistent metadata Protocol for state machine replication Total order broadcast of state change commands X = 0 X += 1 CAS (X, 0, 1) X = 0

node A bootstrap bootstrap Linearizable Token Metadata node B node C system.token_metadata Read barrier Read barrier

Fencing - e ach write is signed with topology version If there is a version mismatch, the write doesn’t go through Changes in the Data Plane Replica Coordinator Topology coordinator Req(V 1) Req(V 1 ) Fence(V 1) Drain(V 1) Req(V 2 )

Consistent Metadata Journey RAFT Safe schema changes Safe topology changes Dynamic partitioning Consistent tables Tablets 5.0 5.2 5.2+ 6.0

6.0 Tablets FTW

Standard tables {R1, R2, R3} R1 R2 R3 key1 replication metadata: ( per keyspace )

Standard tables {R1, R2, R3} R1 R2 R3 key1 key2 Sharding function generates good load distribution between CPUs

RAFT group No. 299238 RAFT group No. 299236 RAFT group No. 299237 R AFT tables key1 key2 tablet tablet replica tablet replica

RAFT tables key1 key2 Good load distribution requires lots of RAFT groups.

Tablets - balancing Table starts with a few tablets. Small tables end there Not fragmented into tiny pieces like with tokens

Tablets - balancing When tablet becomes too heavy (disk, CPU, …) it is split

Tablets - balancing When tablet becomes too heavy (disk, CPU, …) it is split

Tablets - balancing The load balancer can decide to move tablets

Tablets - balancing Depends on fault-tolerant, reliable, and fast topology changes.

Tablets Resharding is cheap. SStables split at tablet boundary. Reassign tablets to shards (logical operation).

Tablets Cleanup after topology change is cheap. Just delete SStables.

Tablet Scheduler Scheduler globally controls movement, maintenance operation on a per tablet basis repair migration tablet 0 tablet 1 schedule schedule repair Backup

Tablet Scheduler Goals: Maximize throughput (saturate) Keep migrations short (don’t overload) Rules: migrations-in <= 2 per shard migrations-out <= 4 per shard

Tablets - Repair - Fire & forget Tablet based Continuous, transparent controlled by the load balancer Auto GC grace period

Tablets - Streaming (Enterprise) Send files over RPC. No per schema , per row processing. 30x faster, saturate links. Scylla Enterprise only Sstables files Sstables files Sstables files

Post 6.0

Since we have 30x faster streaming Parallel operations Small unit size - based on tablet/shard, not hardware Negligent performance impact Incremental serving as you add machines No reliance on cluster size, instance size or instance type Tablets =~ Serverless

Serverless Time Volume ie3n’s i4i Time Throughput Capacity Required Time Throughput On-demand Base Type less Size less limit less

What’s Cooking?

Full transactional consistency with Raft HDD and dense standard nodes (d3en) S3 backend Incremental repair Point in time backup/restore Tiered storage What’s in the Oven Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Immediate Eventual Consistency

Thank You! Keep Innovating! IoT Crypto eCommerce Telco Feature Store Streaming Fintech Cyber Social network Graph Storage Layer Recommendation & Personalization Engine Fraud & Threat Detection AI/ML Analytics Customer Experience
Tags