Designing a Query Queue for ScyllaDB by Avi Kivity

ScyllaDB 619 views 20 slides Oct 14, 2024

Slide 1 of 20

About This Presentation

Database queries vary widely—from milliseconds to hours. Optimizing concurrency is a delicate balance of CPU, memory, and stability. Bad design can lead to high latency or crashes. Join us to learn how we designed ScyllaDB's query queue to handle these challenges. #Database #ScyllaDB

Size: 1.92 MB

Language: en

Added: Oct 14, 2024

Slides: 20 pages

Slide Content

A ScyllaDB Community
Avi Kivity
CTO & Co-founder
Designing a Query Queue
for ScyllaDB

Avi Kivity

CTO at ScyllaDB
■Linux KVM, Seastar, ScyllaDB

Typical queue presentation
●SPSC/MPSC Queue
●1 instruction/queue, 2 instructions/dequeue
●No lock!
●Tricky code involving memory barriers
●Test on 435 core machine @ 331,442,336 ops/sec

This is not that presentation!
SPSC queue (and similar)
●Tiny amounts of memory per item
●Fixed size, regular
●Few instructions
●Millions ops+/sec
●CPU only
This queue
●Megabytes (potentially) per item
●Variable size
●Millions of instructions
●10k ops/sec/vcpu
●CPU + I/O

About ScyllaDB
●Horizontally and vertically scalable NoSQL database
○Horizontally = just add nodes
○Vertically = happy to use fat nodes
●Using the thread-per-core Seastar framework
●Cassandra, DynamoDB compatible

ScyllaDB read path overview
Coordinator
Replica
Coordinator
Replica
Coordinator
Replica
Coordinator
Replica
Coordinator
Replica
Coordinator
Replica
Client request (“SELECT”)
●Coordinator selects replicas that own the data
●Issues reads to replicas, merges responses

Replica-side read features
●Variable response size
○One 100-byte row
○100 10kB rows
○Zero rows
●Variable effort
○Lookup cache, return row
○Merge data from memtable and 8 sstables
■Many megabytes of parallel and sequential I/O
●Variable timeout
●Variable importance to end-user
●Queries can pause and restart (“paging”)

Why queue?
Hypothesis: let requests execute as they arrive
●Simple
●No time lost waiting in queue
Problems
●Queries use too much memory, going
out-of-memory (OOM)
●Queries compete against each other
○Everyone loses
○Queries time out during execution
○Partial work is thrown away
RPC
Query 1
Query 2
Query 3
Query 4
Query 5

Simple, no concurrency queue
●One (at most) executing query
●The rest wait
●Advantages
○Won’t OOM
○Minimal latency for executing query
○Queries in queue can be timed out before
starting execution
●Disadvantages
○Disk underutilized
○CPU unused while waiting for disk
RPC
Query 1
Query 2
Query 3
Query 4
Query 5
Queue Execution

Count limits
Allow 100 queries to execute, the rest wait

Advantages
●Saturate CPU, disk
Disadvantages
●Queries use too much memory, going
out-of-memory (OOM)
●Queries compete against each other

RPC
Query 1
Query 2
Query 3
Query 4
Query 5
Queue Execution

Memory accounting
Account for every byte used in query
●Disk DMA buffers allocated and deallocated
●Temporary structures used for holding query rows
Queue issues queries for execution only if memory is available
●Memory can grow past the limit as the query executes

Query caching
SELECT * FROM my_table
●Coordinator assigns each query an ID
●When a query completes a page, store state in a cache
●When the next page is requested, pull state from cache
●Cache has its own memory and count limits
●Avoids per-page index lookup
●Reuses disk buffers and query state

Out-of-memory killer
Memory accounting is great, but still unbounded
●Add OOM killer
○On threshold 1, stop issuing new queries
○On threshold 2, kill all queries but one
■Allow the user to see slow progress
○On threshold 3, kill everything
■Protect the database
●Rarely triggered

Disks vs vCPUs
vCPUS
●Only process one thing at a time
●Add more things
○Latency increases
○Throughput decreases slightly
Disks
●Process many things at a time
●Add more things
○Latency increases slightly
○Throughput increases
○(up to a point)

CPU accounting
Reducing query competition
●One queue per vCPU
●Query graph is parallelized
○Count how many leaves are blocked on I/O
○(Each leaf = a disk or memory read)
○If all - issue one more query
●Serializes cache-only queries
●Stops issuing queries when CPU bound
●Allows concurrency when I/O bound
Advantages
●More queries can be timed out while queued
●Early queries get lower latency

What about I/O?
I/O scheduling is delegated to the Seastar I/O scheduler

Re-introducing parallelism
Not all queries are equally important to
the user
●Solution: one queue per session
○Each with its own memory reserve
○Queries from different queues compete
○Queues serialize (when possible) own
queries
○Seastar CPU and I/O schedulers assign
resources according to priority

DB
OLTP OLAP

Gotchas
■Can have a single query page
consume large amounts of CPU
■Stalls all queries behind
■Solution: introduce user
override for CPU concurrency

Conclusions
●Database query queues are complicated
●High-performance queue, yet focused on “administrative” tasks
●Years of effort
●Journey is never complete, but can reach asymptotically

Designing a Query Queue for ScyllaDB by Avi Kivity

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Designing a Query Queue for ScyllaDB by Avi Kivity

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

MGV Residential Design projects for different clients, including a New Mexico Adobe project-1-.pdf

EUNITED_Advocacy and Public Engagement through Visual Media

DESIGN THINKINGGG PPT 2 TOPIC IDEATION.pptx

DESIGN THINKING CHAPTER 1 PPTT PPT 1.pptx

Hinduism and Its History - PowerPoint Slides.pptx

Service Attributes of Manufactured Parts.pptx