Designing a Query Queue for ScyllaDB by Avi Kivity

ScyllaDB 619 views 20 slides Oct 14, 2024
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

Database queries vary widely—from milliseconds to hours. Optimizing concurrency is a delicate balance of CPU, memory, and stability. Bad design can lead to high latency or crashes. Join us to learn how we designed ScyllaDB's query queue to handle these challenges. #Database #ScyllaDB


Slide Content

A ScyllaDB Community
Avi Kivity
CTO & Co-founder
Designing a Query Queue
for ScyllaDB

Avi Kivity

CTO at ScyllaDB
■Linux KVM, Seastar, ScyllaDB

Typical queue presentation
●SPSC/MPSC Queue
●1 instruction/queue, 2 instructions/dequeue
●No lock!
●Tricky code involving memory barriers
●Test on 435 core machine @ 331,442,336 ops/sec

This is not that presentation!
SPSC queue (and similar)
●Tiny amounts of memory per item
●Fixed size, regular
●Few instructions
●Millions ops+/sec
●CPU only
This queue
●Megabytes (potentially) per item
●Variable size
●Millions of instructions
●10k ops/sec/vcpu
●CPU + I/O

About ScyllaDB
●Horizontally and vertically scalable NoSQL database
○Horizontally = just add nodes
○Vertically = happy to use fat nodes
●Using the thread-per-core Seastar framework
●Cassandra, DynamoDB compatible

ScyllaDB read path overview
Coordinator
Replica
Coordinator
Replica
Coordinator
Replica
Coordinator
Replica
Coordinator
Replica
Coordinator
Replica
Client request (“SELECT”)
●Coordinator selects replicas that own the data
●Issues reads to replicas, merges responses

Replica-side read features
●Variable response size
○One 100-byte row
○100 10kB rows
○Zero rows
●Variable effort
○Lookup cache, return row
○Merge data from memtable and 8 sstables
■Many megabytes of parallel and sequential I/O
●Variable timeout
●Variable importance to end-user
●Queries can pause and restart (“paging”)

Why queue?
Hypothesis: let requests execute as they arrive
●Simple
●No time lost waiting in queue
Problems
●Queries use too much memory, going
out-of-memory (OOM)
●Queries compete against each other
○Everyone loses
○Queries time out during execution
○Partial work is thrown away
RPC
Query 1
Query 2
Query 3
Query 4
Query 5

Simple, no concurrency queue
●One (at most) executing query
●The rest wait
●Advantages
○Won’t OOM
○Minimal latency for executing query
○Queries in queue can be timed out before
starting execution
●Disadvantages
○Disk underutilized
○CPU unused while waiting for disk
RPC
Query 1
Query 2
Query 3
Query 4
Query 5
Queue Execution

Count limits
Allow 100 queries to execute, the rest wait

Advantages
●Saturate CPU, disk
Disadvantages
●Queries use too much memory, going
out-of-memory (OOM)
●Queries compete against each other


RPC
Query 1
Query 2
Query 3
Query 4
Query 5
Queue Execution

Memory accounting
Account for every byte used in query
●Disk DMA buffers allocated and deallocated
●Temporary structures used for holding query rows
Queue issues queries for execution only if memory is available
●Memory can grow past the limit as the query executes

Query caching
SELECT * FROM my_table
●Coordinator assigns each query an ID
●When a query completes a page, store state in a cache
●When the next page is requested, pull state from cache
●Cache has its own memory and count limits
●Avoids per-page index lookup
●Reuses disk buffers and query state

Out-of-memory killer
Memory accounting is great, but still unbounded
●Add OOM killer
○On threshold 1, stop issuing new queries
○On threshold 2, kill all queries but one
■Allow the user to see slow progress
○On threshold 3, kill everything
■Protect the database
●Rarely triggered

Disks vs vCPUs
vCPUS
●Only process one thing at a time
●Add more things
○Latency increases
○Throughput decreases slightly
Disks
●Process many things at a time
●Add more things
○Latency increases slightly
○Throughput increases
○(up to a point)

CPU accounting
Reducing query competition
●One queue per vCPU
●Query graph is parallelized
○Count how many leaves are blocked on I/O
○(Each leaf = a disk or memory read)
○If all - issue one more query
●Serializes cache-only queries
●Stops issuing queries when CPU bound
●Allows concurrency when I/O bound
Advantages
●More queries can be timed out while queued
●Early queries get lower latency

What about I/O?
I/O scheduling is delegated to the Seastar I/O scheduler

Re-introducing parallelism
Not all queries are equally important to
the user
●Solution: one queue per session
○Each with its own memory reserve
○Queries from different queues compete
○Queues serialize (when possible) own
queries
○Seastar CPU and I/O schedulers assign
resources according to priority

DB
OLTP OLAP

Gotchas
■Can have a single query page
consume large amounts of CPU
■Stalls all queries behind
■Solution: introduce user
override for CPU concurrency

Conclusions
●Database query queues are complicated
●High-performance queue, yet focused on “administrative” tasks
●Years of effort
●Journey is never complete, but can reach asymptotically

Thank you! Let’s connect.
Avi Kivity
[email protected]
@AviKivity
Tags