Queues, Hockey Sticks and Performance by David Collier-Brown
ScyllaDB
309 views
30 slides
Oct 15, 2024
Slide 1 of 30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
About This Presentation
Queues: both a blessing and a curse in computer science. They help predict performance but also signal overload. This talk explores their role in diagnosis, capacity planning, and development using physics concepts and the "hockey-stick" curve. Master queue intuition for better programs. #...
Queues: both a blessing and a curse in computer science. They help predict performance but also signal overload. This talk explores their role in diagnosis, capacity planning, and development using physics concepts and the "hockey-stick" curve. Master queue intuition for better programs. #DevTalk
Size: 2.24 MB
Language: en
Added: Oct 15, 2024
Slides: 30 pages
Slide Content
A ScyllaDB Community
Queues, Hockey Sticks and
Performance
David Collier-Brown
No, Not the Tim Horton Kind!
This Kind
Two graphs from a textbook:
■The upper one is throughput
■The lower is slowdown under load
(aka queue delay)
−You’ve probably plotted the top
one from a load test
■Both are really hockey-sticks, but the
top one is upside-down
Our Graphs
■They can be computed from one
another
■If you measure response time, you
can build a little mathematical model,
like the one I used to draw these
■And the latter is easy to draw
Part 1. Slowdowns in a Benchmark
■That’s a whole collection of hockey-sticks
●Look at the dark-red one, for example
■This is not a nice result
What Did We Expect?
■A flat line around 1
■Rising to 2 at quite a high load
■This is an increase in response
time
■The increases tells us work is
stuck in a queue
What Did We Just See?
The DBA Asked for a CPU Chart
■He’d noticed “DB Writer”
slowing down
●That should never happen
■DB Writer (black) is a critical part of
the database: it updates the disk
■Middleware (yellow), on the other
hand, grows without bound
Middleware vs DB Writer
■Middleware just keeps going
up
■DB writer heads down
15002250 30003750
Middleware7.42%11.91%26.38%31.73%
DB Writer 0.43%0.67%0.67%0.42%
Fixed It!
■We gave DB Writer
guaranteed CPU
■We also doubled the
number of CPU
cores (We had run
out of CPU, too (:-))
Part 2. Why Does it Happen?
■Because I have more work
than CPUs
■This is what that causes
●the Y axis is queue delay
●the X axis is the start times of
the transaction
●Units are tenths of a second
Why Does it Happen II
■Transaction 2 isn’t done when
3 arrives
■Three has to wait
■Ditto 4 and 5
The Line of Green Boxes Created the Handle
■The horizontal line is the initial
service time
■The diagonal one is the delay we get
from not enough resources
■And the curve between them is from
probability
●The busier we are, the higher probability a
transaction will have to wait
Why and What For
■We use the slowdown curve in
at least four areas
1)capacity planning
2)diagnosis
3)development and
4)repair
Part 3. Capacity Planning
■The risk is of over-
or under-buying
●Over-buying
wastes money
●Under-buying
causes a
business failure
Capacity Planning, Ctd
■We need to “just
stay ahead of
demand”
●Marketing does
the estimate
●Ops buys
enough
machines
Part 4. Diagnosis
3.1 Another slowness
graph
■The red line is the
new machine
■The blue one is the
old
●Something’s
wrong, the new
one is slower
Old Versus New
■Old was reusing
established
connections,
saving lots of time
■New was not
●It was mis-set to
use HTTP 1.1
■
Part 5.Bottleneck-Hunting
■Process 1 is the bottleneck
■What happens if we fix it?
●The performance almost
doubles
Where Bottleneck Removal Doesn’t Work
■If we fix process 1,
■We just bottleneck
on process 2
Part 6. Development
Too many cats, all wanting to fed riight now
If you’re writing the program and it uses https,
then return status 429:
■Tells the client to slow down.
● Browsers report it to the user to retry
● Various packages will resubmit. eg,
golang’s retryafter
■Is part of http, and 4XX codes are
retryable
■It forces the client to take the time to
re-send, even if the client would like to
ignore it and proceed immediately
What this sends
■429 means “wait”
■Retry-After: 3600 means
“after an hour”
Part 7. Controlling Demand
■Control the sender
●TCP/IP does exactly
that
■When someone sends too
rapidly, they don’t get a
go-ahead from the recipient
●They are delayed,
causing them to slow
down. They do so,
then gradually speed
up until they are
slowed once again
When They Cheat
■“Bufferbloat” is excessive
buffering, trying to get more of
the channel than is fair
●The shape of those curves should
be familiar (:-))
How? Controlling Demand
Is the same as managing your
unread books
■First, capture their credit card
■If that doesn’t work, smash
their internet connection
■That’s what CAKE and a
program called LibreQoS do,
less violently
For One Thing, Signal Sooner
■Send “slow down”
headers before
stopping the
acknowledgments
■Stay safely below the
maximum
throughput (around
80% utilization)
CAKE does this for home routers
LibreQoS does it for entire ISPs
And That’s It
■You now know everything I’ve learned about queues in the last ten
years (:-))
■Go fix something!
References
■LibreQoS – libreqos.io
■“You Don’t know Jack” articles (part of a series)
−Application Performance, about queues, https://dl.acm.org/doi/10.1145/3595862
−Bandwidth, about LibreQoS and TCP/IP, https://dl.acm.org/doi/10.1145/3674953
■ • Two books from my favourite mathie, Neil J. Gunther, at
http://www.perfdynamics.com/
−Analyzing Computer System Performance with Perl::PDQ
− Guerrilla Capacity Planning
■Bufferbloat article, https://dl.acm.org/doi/pdf/10.1145/2063166.2071893
■TeamQuest Predictor,
https://www.fortra.com/resources/datasheets/vcm-enterprise