ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB

ScyllaD B Leaps Forward Dor Laor, Co-founder & CEO of ScyllaDB

Dor Laor Hello world, Core router, KVM, OSv, ScyllaDB Phd in Snowboard , aspiring MTB rider Let’s shard!

3 Learnings from the KVM Hypervisor Dev Period Layer == overhead Locking == Evil Simplicity == True 1 million op/s == Expectations Off-by == one || 0x10

ScyllaDB is Proud to Serve!

You’ll hear from many of our customers

Why ScyllaDB? Best High Availability in the industry Best Disaster Recovery in the industry Best scalability in the industry Best Price/Performance in the industry Auto-tune - out of the box performance C ompatible with Cassandra & DynamoDB The power of Cassandra at the speed of Redis with the usability of DynamoDB No Lock-in Open Source Software

Agenda Part 1 Arch overview New 2024 results & benchmarks ScyllaDB Cloud brief Part 2 ScyllaDB 6.0 Tablets What’s coming

Shard Per Core Architecture Shards Threads

ScyllaDB Architecture Homogeneous nodes Ring Architecture

ScyllaDB Architecture Homogeneous nodes Ring Architecture?@$@#

Can we linearly scale up?

Time between failures TBF Ease of maintenance No noisy neighbours No virtualization, container overhead No other moving parts Scale up before out! Small vs Large Machines

Linear Scale Ingestion 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes

Linear Scale Ingestion 2X 2X 2X 2X 2X 3 x 4-vcpu VMs 1B keys 3 x 8-vcpu VMs 2B keys 3 x 16-vcpu VMs 4B keys 3 x 32-vcpu VMs 8B keys 3 x 64-vcpu VMs 16B keys 3 x 128-vcpu VMs 32B keys Credit: Felipe Cardeneti Mendes

Linear Scale Ingestion (2023 vs 2018) Constant Time Ingestion 2X 2X 2X 2X 2X

Ingestion: 17.9k OPS per shard * 16

Ingestion: <2ms P99 per shard

“Nodes must be small, in case they fail” You can scale OPS but can you scale failures?

2X 2X 2X 2X 2X “Nodes must be small, in case they fail” No they don’t! {Replace, Add, remove} Node at constant time

What’s new in 2024.1?

2024.1 vs 2023.1 vs OSS 5.4

Benchmark : ScyllaDB vs MongoDB

Benchmark : ScyllaDB vs MongoDB Fair 3rd party SaaS - zero config YCSB - standard 132 wins in 133 workloads 10x-20x better performance 10x-20x better latency MongoDB doesn’t scale!

Scale Benchmark : ScyllaDB vs MongoDB

ScyllaDB Cloud News

ScyllaDB Cloud - 65% of Customers Terraform providers, launch, scale, manage Multiple networking modes Encryption at rest, BYOK Certifications: SoC2, ISO, PCI Coming: Azure

Welcome ScyllaDB Our journey from eventual to immediate consistency

Scylla today is awesome but Topology changes are allowed one-at-a-time Rely on 30+ second timeouts for consistency Node failed/down block scale Streaming time is a function of the schema Additional complex operations: Cleanup, repair

ScyllaDB 6.0 Consistent schema changes (raft, >= 5.2) Consistent topology changes (raft, 6.0) Tablets

ScyllaDB 6.0 Value Elasticity Faster bootstrap Concurrent node operations Immediate request serving Simplicity Transparent cleanups Semi transparent repairs Auto gc-grace period Parallel maintenance operations Speed Streaming sstables - 30x faster Load balancing of tablets Consistency TCO Shrink free space Reduce static, over provisioned deployments

Behind the Scene

Raft- Consistent metadata Protocol for state machine replication Total order broadcast of state change commands X = 0 X += 1 CAS (X, 0, 1) X = 0

node A bootstrap bootstrap Linearizable Token Metadata node B node C system.token_metadata Read barrier Read barrier

Fencing - e ach write is signed with topology version If there is a version mismatch, the write doesn’t go through Changes in the Data Plane Replica Coordinator Topology coordinator Req(V 1) Req(V 1 ) Fence(V 1) Drain(V 1) Req(V 2 )

Consistent Metadata Journey RAFT Safe schema changes Safe topology changes Dynamic partitioning Consistent tables Tablets 5.0 5.2 5.2+ 6.0

6.0 Tablets FTW

Standard tables {R1, R2, R3} R1 R2 R3 key1 replication metadata: ( per keyspace )

Standard tables {R1, R2, R3} R1 R2 R3 key1 key2 Sharding function generates good load distribution between CPUs

RAFT group No. 299238 RAFT group No. 299236 RAFT group No. 299237 R AFT tables key1 key2 tablet tablet replica tablet replica

RAFT tables key1 key2 Good load distribution requires lots of RAFT groups.

Tablets - balancing Table starts with a few tablets. Small tables end there Not fragmented into tiny pieces like with tokens

Tablets - balancing When tablet becomes too heavy (disk, CPU, …) it is split

Tablets - balancing The load balancer can decide to move tablets

Tablets - balancing Depends on fault-tolerant, reliable, and fast topology changes.

Tablets Resharding is cheap. SStables split at tablet boundary. Reassign tablets to shards (logical operation).

Tablets Cleanup after topology change is cheap. Just delete SStables.

Tablet Scheduler Scheduler globally controls movement, maintenance operation on a per tablet basis repair migration tablet 0 tablet 1 schedule schedule repair Backup

Tablet Scheduler Goals: Maximize throughput (saturate) Keep migrations short (don’t overload) Rules: migrations-in <= 2 per shard migrations-out <= 4 per shard

Tablets - Repair - Fire & forget Tablet based Continuous, transparent controlled by the load balancer Auto GC grace period

Tablets - Streaming (Enterprise) Send files over RPC. No per schema , per row processing. 30x faster, saturate links. Scylla Enterprise only Sstables files Sstables files Sstables files

Post 6.0

Since we have 30x faster streaming Parallel operations Small unit size - based on tablet/shard, not hardware Negligent performance impact Incremental serving as you add machines No reliance on cluster size, instance size or instance type Tablets =~ Serverless

Serverless Time Volume ie3n’s i4i Time Throughput Capacity Required Time Throughput On-demand Base Type less Size less limit less

What’s Cooking?

Full transactional consistency with Raft HDD and dense standard nodes (d3en) S3 backend Incremental repair Point in time backup/restore Tiered storage What’s in the Oven Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Do not have any databases before ScyllaDB. Always run nodetool cleanup after bootstrapping a new node. Run repair within gc-grace seconds. Do not bootstrap nodes concurrently, or make any other topology change Do not use SimpleStrategy in a multi DC setup Immediate Eventual Consistency

Thank You! Keep Innovating! IoT Crypto eCommerce Telco Feature Store Streaming Fintech Cyber Social network Graph Storage Layer Recommendation & Personalization Engine Fraud & Threat Detection AI/ML Analytics Customer Experience

ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx