As Fast as Possible, But Not Faster: ScyllaDB Flow Control by Nadav Har'El
ScyllaDB
0 views
24 slides
Oct 14, 2025
Slide 1 of 24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
About This Presentation
Pushing requests faster than a system can handle results in rapidly growing queues. If unchecked, it risks depleting memory and system stability. This talk discusses how we engineered ScyllaDB’s flow control for high volume ingestions, allowing it to throttle over-eager clients to exactly the righ...
Pushing requests faster than a system can handle results in rapidly growing queues. If unchecked, it risks depleting memory and system stability. This talk discusses how we engineered ScyllaDB’s flow control for high volume ingestions, allowing it to throttle over-eager clients to exactly the right pace – not so fast that we run out of memory, but also not so slow that we let available resources go to waste.
Size: 3.43 MB
Language: en
Added: Oct 14, 2025
Slides: 24 pages
Slide Content
A ScyllaDB Community
As Fast As Possible, But Not Faster:
ScyllaDB Flow Control
Nadav Har’El
Distinguished Engineer
Nadav Har’El
Distinguished Engineer at ScyllaDB
■Before Scylla, I wrote a Hebrew spell-checker, a
kernel from scratch, and other eclectic stuff.
■My favorite percentile is 99.
■I’m a mathematician in theory, programmer in
practice.
■I have three children (and also a ?????? and a ?????? ).
October 2025
INTRODUCTION
AND GOALS OF THIS TALK
Ingestion
■We look into ingestion of data into a Scylla cluster,
i.e., a large volume of write requests.
■We would like the ingestion to proceed as fast as
possible, but not faster.
“as fast as possible, but not faster”?
■An over-eager client may send writes faster than the
cluster can complete earlier requests.
■We can absorb a short burst of requests in queues.
■
Allow the client to
continue writing at
excessive rate
Backlog of uncompleted
writes grows until
memory runs out!
Goals of this talk
Understand the
■different causes of queue buildup during writes.
■flow control mechanisms Scylla uses to automatically
cause ingestion to proceed at the optimal pace.
Simulator for experimenting with different workloads and
flow control mechanisms.
https://github.com/nyh/flowsim
Dealing with an over-eager writer
■As the backlog grows, server needs a way to tell the client
to slow down its request rate.
■The CQL protocol does not offer explicit flow-control
mechanisms for the server to slow down a client.
■Two options remain: delaying replies to the client’s
requests, and failing requests.
■Which we can use depends on what drives the workload:
Workload model 1: Fixed concurrency
■Batch workload: Application wishes to write a large batch of
data, as fast as possible (driving the server at 100% utilization).
■A fixed number of client threads, each running a request loop:
prepare some data, make a write request, waiting for response.
■The server can control the request rate by throttling (delaying)
its replies:
●If the server only sends N replies per second, the client will
only send N new requests per second!
Workload model 2: Unbounded concurrency
■Interactive workload: The client sends requests driven by some
external events (e.g., activity of real users).
●Request rate unrelated to completion of previous requests.
●If request rate is above cluster capacity, the server can’t slow down
these requests and the backlog grows and grows.
●To avoid that, we must fail some requests.
■To reach optimum throughput, we should fail fresh client
requests. Admission control.
FLOW CONTROL OF
REGULAR WRITES
Regular writes (i.e., no materialized view)
A coordinator node receives a write and:
■Sends it to to RF (e.g., 3) replicas.
■Waits for CL (e.g., 2) of those writes to complete,
■Then reply to the client: “desired consistency-level achieved”.
■The remaining (e.g., 1) writes to replicas will continue
“in the background” without the client waiting.
Why background writes cause problems
■A batch workload, upon receiving the server’s reply, will send
a new request before these background writes finish.
■If new writes come in faster than we complete background
writes, the number of these background writes can grow
without bound.
Typical cause: a slower node
■3 nodes. One slightly slower:
■RF=3, CL=2.
■Each second:
●10,000 writes are replied when the two faster nodes respond.
●Backlog of background writes to slowest node increases by 100.
10,000 wps10,000 wps9,900 wps
(1% slower)
Regular writes - Scylla’s solution
A simple, but effective, throttling mechanism:
■When total memory used by background writes exceeds some
limit (10% of shard’s memory), the coordinator will only reply after
all RF replica writes have completed. (not after CL).
■The backlog of background writes does not continue to grow.
■Replies are only sent at the rate we can complete all the work, so a
batch workload will slow down its requests to the same rate.
Simulation of “slower node” example
max backlog = 300
FLOW CONTROL OF
MATERIALIZED VIEWS WRITES
Write to a table with materialized views
■As before: coordinator sends writes to RF (e.g., 3) replicas,
waits for only first CL (e.g., 2) of those writes to complete.
■Each replica later sends updates to view table(s) -
The client does not wait for these view updates.
●A deliberate, though often-debated, design choice.
Why view writes cause problems
■Again the problem is background work the client doesn’t await.
■There may be a lot such work - with V views, we typically have
2*V of these background view writes .
■If new writes come in faster than we can finish view updates,
the number of these queued view updates can grow without
bound.
Simulation (problem)
pending view writes
pending base writes
request rate
View updates - solution
■How to prevent view backlog from growing endlessly?
■Delay each client write by an extra delay.
●Higher delay: slows client, view-update backlog declines.
●Lower delay: speeds-up client, view-update backlog increases.
■Goal: a controller, changing delay as backlog changes, to
keep backlog in desired range.
View updates - solution
■A simple linear controller:
●delay = α * backlog
■When client concurrency is fixed, will converge on delay
that keeps backlog constant:
●If delay is higher, client slows and backlog declines,
causing delay to go down.
Simulation (solution)
pending base writes
pending view writes
request rate
Conclusions
■Scylla flow-controls a batch workload,
●data is ingested as fast as possible - but not faster than that.
■When client cannot be slowed down to the optimal rate,
●Scylla starts dropping new requests, to achieve the highest
possible throughput of successful writes.
■Automatic, no need for user intervention or configuration.