Patterns of Low Latency by Pekka Enberg

ScyllaDB 3,609 views 43 slides Oct 17, 2024
Slide 1
Slide 1 of 43
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43

About This Presentation

Building for low latency is important, but the tips and tricks are often part of developer folklore and hard to discover on your own. This talk shares some of the important latency related patterns you want to know when working on low latency apps.


Slide Content

A ScyllaDB Community
Patterns of Low Latency
Pekka Enberg
Founder/CTO at Turso

Pekka Enberg

Founder/CTO at Turso
■Previously ScyllaDB and Linux
■P99 happens often, the conference should be
P99.999

■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline

Why is latency important?

Low latency = good customer experience
■Bad latency often means unhappy customers.

■It’s very easy to write code with bad tail latency.

■Many users can experience bad latency of just one component.

■Tail latency is the high percentiles (e.g. 99th) of the latency distribution.

■Users experience tail latency fairly often because latency compounds.

■Example: If you fan out processing to 10 components and wait for all of them
to complete, 9% of user requests experience the 99th percentile latency.
●P(at least one slow) = 1 - P(fast request)ⁿ = 1 - (1 - P(slow request))ⁿ
Users experience tail latency often
Dean and Barroso. (2013) The Tail at Scale. Communications of the ACM

■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline

Measuring latency

Primer on measuring latency
■Latency is a distribution so measure it as such.

■Average latency is not an interesting metric (it’s the inverse of throughput).

■Maximum latency is an interesting metric, but hard to optimize.

■The 99th percentile latency and beyond is a good compromise.

■In coordinated omission, benchmark accidentally coordinates with the system
being measures.

■Outliers in the latency distribution are not measured.
Beware coordinated omission
Ivan Prisyazhynyy On Coordinated Omission (2021)

■Visualizing latency is important

■Histograms are good…

■…but eCDFs can be even better!
Visualizing latency
Marc Brooker Histogram vs eCDF (2022)

Examples

Example of latency histogram.
The x-axis represents percentiles and y-axis represents the latency.

Example of latency eCDF.
The x-axis represents the latency and the y-axis represents the cumulative density (probability)

■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline

Reducing latency

■Latency lurks everywhere.

■To reduce latency:
●Avoid data movement
●Avoid work
●Avoid waiting
Reducing latency

Avoiding data movement

Avoiding data movement
■Moving data is slow:
●Network round-trip between New York and London is at least 57 ms.
●Data center network round trip is about 100-500 μs.
●DRAM access latency is 100 ns.

■Move data where it is used:
●Colocation
●Replication
●Caching

■Colocation is a technique to reduce latency by moving two components close
to each other.

■For example, move a database in the same machine where your application
logic is to eliminate network round-trip latency.

■Colocation is not always possible, unless you have multiple copies…
Colocation

■Replication and caching are techniques to reduce latency by having a copy of
the data close to where it is used.

■Both techniques have their pros and cons around consistency.

■Often a good technique to reduce latency if you can deal with the storage
amplification.
Replication and caching

Avoiding work

■You can reduce latency by doing less:
●Tame algorithmic complexity
●Control memory management
●Optimize your code
●Avoid CPU-intensive computation
Avoiding work

■Understand if the algorithm you use is suitable for low latency:
●O(1) algorithm is (probably) fine.
●O(n^2) algorithm and worse is (probably) not fine.

■Data structures you likely see in low latency code:
●Queues and stacks
●Arrays
●Hash tables

■Data structures you probably won’t see in low latency code:
●Linked lists
●Graphs

Taming algorithmic complexity

■Avoid dynamic memory management
●Allocating memory in fast-path is likely a source of latency outlier.
●You can do low latency with a pauseless GC, but you still need to avoid allocating objects.

■Avoid demand paging
●Virtual memory is an illusion of memory space as large as the disk space.
●This means virtual memory (in the worst case) runs at disk speed.
●Avoid demand paging.
Controlling memory management

■Optimizing code can reduce latency
●Reducing CPU cycles, cache misses, and so on is needed for low latency

■Find bottlenecks with profiler, optimize, and repeat the process.

■Beware of optimizing work at the expense of something else.
●For example, batching can reduce CPU cycles, but increase latency.
Optimizing your code

■CPU-intensive computation can hurt latency.

■Avoid long-running tasks by splitting the work.
Avoid CPU-intensive computation

Avoiding waiting

■Eliminate synchronization

■Use wait-free synchronization

■Don’t wait for the OS

■Don’t wait for the network
Avoiding waiting

■Synchronization such as mutual exclusion means threads wait

■Partition data to eliminate synchronization
●Thread-per-core is effective because CPUs run independently of each other.

■Make shared data structures read-only when possible
Eliminate synchronization

■Wait-free synchronization if you can’t partition data.

■Wait-free = finite steps, no waiting

■For example, single producer, single consumer queues are a great low level
primitive for low latency request processing.
Use wait-free synchronization
Herlihy. (1991) Wait-free synchronization. TOPLAS

■Avoid context switching
●Don’t create too many threads, be easy on system calls.

■Use non-blocking I/O
●Blocking a kernel thread on I/O is guaranteed bad tail latency.

■Use busy-polling (if energy is not a concern)
●It can be faster to poll for an event than wait for it.

■By-pass the kernel (if you can)
●For example, XDP and DPDK provide ways to by-pass the OS network stack for lower latency.
Don’t wait for the OS

■Disable Nagle's algorithm (use TCP_NODELAY)

■Avoid head-of-line blocking
Don’t wait for the network

■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline

Hiding latency

■Parallelize request processing
●Perform request processing in parallel instead of serially to reduce latency.

■Hedge requests
●Send request to multiple servers and use results from fastest one.

■Light-weight threads
●For example, GPUs hide latency by executing massive amounts of light-weight threads in
parallel, which hides memory access latency.
Hiding latency

■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline

Tuning the system

■Configure CPU frequency scaling

■Isolate CPUs for application threads

■Disable swap

■Configure network stack interrupt affinity
Tuning the system

■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline

Summary

■Latency is a distribution; measure and visualize it as such, and watch out for
coordinated omission.

■Avoid data movement, work and waiting to reduce latency.

■Hide latency if you can’t reduce it.

■Tune your system for low latency.
tl;dr

Thank you! Let’s connect.
Pekka Enberg
penberg@iki.fi
@penberg
http://penberg.org
50% discount:
P992024
Tags