Building a Cloud Native LSM on Object Storage

ScyllaDB 424 views 27 slides Oct 14, 2024

Slide 1 of 27

About This Presentation

Excited to introduce SlateDB, an open-source, cloud-native storage engine. Built as an LSM on object stores like S3/GCS/ABS, it leverages object storage benefits while tackling unique latency and cost challenges. Join us to explore our design decisions and tradeoffs. #DevTalk #SlateDB

Size: 2.55 MB

Language: en

Added: Oct 14, 2024

Slides: 27 pages

Slide Content

A ScyllaDB Community
Building a Cloud Native LSM on
Object Storage
Chris Riccomini
Materialized View Capital
Rohan Desai
Responsive

Chris Riccomini (he/him)

GP at Materialized View Capital
■Investor: Materialized View Capital
■Engineer: ex-LinkedIn, ex-WePay
■Open Source: SlateDB, Apache Samza, Apache Airﬂow
■Writing: The Missing README, Materialized View
■Co-Founder: Responsive
■Engineer: ex-Conﬂuent, ex-Yahoo
■Open Source: SlateDB, Responsive, ksqlDB
Rohan Desai (he/him)

Co-Founder at Responsive

Let’s talk about SlateDB.
■Backstory
■Overview
■Architecture
■Performance

The Plan

Backstory

Rise of Object Storage
(Chris is an investor)

Latency, Cost, Durability: Pick Two
https://bsky.app/proﬁle/chris.blue/post/3kqipq5bfos2k

The Cloud Storage Triad

The Cloud Storage Triad
https://materializedview.io/p/cloud-storage-triad-latency-cost-durability

We believe that the future of object storage are multi-region,
low latency buckets that support atomic CAS operations.
Inspired by The Cloud Storage Triad: Latency, Cost,
Durability, we set out to build a storage engine built for the
cloud. SlateDB is that storage engine.
https://slatedb.io/docs/introduction

Overview

A cloud native embedded storage engine built on object storage.
■In-process (Rust) library
■Key-Value interface
■All writes go to object storage
■Implemented as a log-structured merge-tree (LSM)

What is SlateDB?

■Zero-disk architecture
■Single-writer
■Multi-reader
■Read caching
■Writer fencing
■Snapshot isolation ᵗᵒᵈᵒ
■Transactions ᵗᵒᵈᵒ
■Pluggable compaction ᵗᵒᵈᵒ

Features

SlateDB is designed for use cases that are tolerant to 50-100ms write latency, are
tolerant to data loss during failure, or are willing to pay for frequent API PUT calls.
■Stream processing
■Serverless functions
■Durable execution
■Workﬂow orchestration
■Durable caches
■Data lakes
■Online transaction processing ᵗᵒᵈᵒ
Use Cases

Architecture

WAL
Immutable
WAL
Memtable
Frozen
Memtable
put(k, v);
00000000000000000073.sst
00000000000000000029.sst
…
01J53ZKSXP1MCCPENTTFXTQ6HS.sst
01J53ZMJ4E93TKZ6FKBYCM2Z43.sst
Object Storage
00000000000000000035.manifest
00000000000000000000.manifest
…
Memory
01J5433B0KY7Q76NNA1XYJ2V0A.sst
01J5433FVGKBPJESKHM0D6KC18.sst
…
L0
SR1
Manifest
get(k);
(uncommitted reads)
wal/ compacted/ manifest/
Block Diagram

Write Path
WAL
Immutable
WAL
Memtable
Frozen
Memtable
put(k, v);
00000000000000000073.sst 01J53ZKSXP1MCCPENTTFXTQ6HS.sst
Object Storage
Memory
wal/ compacted/
flush_ms l0_sst_size_bytes

Read Path
WAL
Immutable
WAL
Memtable
Frozen
Memtable
01J53ZKSXP1MCCPENTTFXTQ6HS.sst
01J53ZMJ4E93TKZ6FKBYCM2Z43.sst
Object Storage
Memory
01J5433B0KY7Q76NNA1XYJ2V0A.sst
01J5433FVGKBPJESKHM0D6KC18.sst
…
L0
SR1
get(k);
(uncommitted reads)
compacted/

Compactor
Orchestrator
01J53ZKSXP1MCCPENTTFXTQ6HS.
sst
01J53ZMJ4E93TKZ6FKBYCM2Z43.ss
t
Writer
01J5433B0KY7Q76NNA1XYJ2V0A.ss
t
01J5433FVGKBPJESKHM0D6KC18.s
st
…
L0
SR1
01J56FQZNHCATEC63XPDAM6429.s
st
01J56FR63JYE5J3YDKJS7X3HEC.sst
…
SR2
Writer
Scheduler
(pluggable)
Executor
(pluggable)
Manifest
Compactor
db updates
compactions
compactions
status
read/write
read
read/write
write
read/write

Want More?
https://github.com/slatedb/slatedb/blob/main/rfcs

Performance

Fencing Simulator
A little simulator to test fencing protocol. Just a test, early days, no tuning, YMMV, etc.
■Instance: t2.2xlarge (us-east-1)
■Bucket: us-east-1
■Conﬁguration: 1KiB write payload, flush_ms 5ms
■Latency
●Mean: 40.44ms
●Median: 36ms
●99th percentile (p99): 67 ms
●Minimum: 28ms
●Maximum: 67 ms
https://github.com/slatedb/simluator/issues/1

Benchmark to test compaction speed of a single compaction step (compacting
multiple SSTs/SRs to 1 SR
■Instance: m5.xlarge (4 cores, 16GB RAM, 1.25Gbit baseline network,
us-west-2)
■Bucket: (us-west-2)
■Conﬁguration: 32 1GB SSTs to 1 SR, max_sst_size 1GB
■Duration: 302491 ms => 864Mbps / 108MBps
■Utilization: 1.5 cores
■With 2 parallel compactions we can fully utilize available network
Compaction Bench

Get Started!
slatedb.io
github.com/slatedb/slatedb

Thank you! Let’s connect.
Chris Riccomini
@criccomini
linkedin.com/in/riccomini
materializedview.io
Rohan Desai
@_RohanDesai
linkedin.com/in/rohanpd

Addendum

Write Path
■Call put call on the client
■Write to the mutable, in-memory WAL table
■After flush_ms milliseconds
●Freeze mutable WAL into immutable WAL
●Asynchronously write immutable WAL to object storage
■On WAL write success
●Merge mutable WAL table into the mutable memtable
●Notify all await'ing writers
●If memtable is >= l0_sst_size_bytes, freeze it and write as an L0 SSTable in the object store

■Call get on the client
■Look for key in order of…
●Mutable memtable
●Immutable memtable
●L0 SSTables (newest to oldest using bloom ﬁltering)
●Sorted runs (newest to oldest using bloom ﬁltering)
■Return ﬁrst value found or none if doesn’t exist (or deletion tombstone)
Read Path

Building a Cloud Native LSM on Object Storage

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Building a Cloud Native LSM on Object Storage

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

MGV Residential Design projects for different clients, including a New Mexico Adobe project-1-.pdf

EUNITED_Advocacy and Public Engagement through Visual Media

DESIGN THINKINGGG PPT 2 TOPIC IDEATION.pptx

DESIGN THINKING CHAPTER 1 PPTT PPT 1.pptx

Hinduism and Its History - PowerPoint Slides.pptx

Service Attributes of Manufactured Parts.pptx