Building for low latency is important, but the tips and tricks are often part of developer folklore and hard to discover on your own. This talk shares some of the important latency related patterns you want to know when working on low latency apps.
Size: 1.53 MB
Language: en
Added: Oct 17, 2024
Slides: 43 pages
Slide Content
A ScyllaDB Community
Patterns of Low Latency
Pekka Enberg
Founder/CTO at Turso
Pekka Enberg
Founder/CTO at Turso
■Previously ScyllaDB and Linux
■P99 happens often, the conference should be
P99.999
■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline
Why is latency important?
Low latency = good customer experience
■Bad latency often means unhappy customers.
■It’s very easy to write code with bad tail latency.
■Many users can experience bad latency of just one component.
■Tail latency is the high percentiles (e.g. 99th) of the latency distribution.
■Users experience tail latency fairly often because latency compounds.
■Example: If you fan out processing to 10 components and wait for all of them
to complete, 9% of user requests experience the 99th percentile latency.
●P(at least one slow) = 1 - P(fast request)ⁿ = 1 - (1 - P(slow request))ⁿ
Users experience tail latency often
Dean and Barroso. (2013) The Tail at Scale. Communications of the ACM
■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline
Measuring latency
Primer on measuring latency
■Latency is a distribution so measure it as such.
■Average latency is not an interesting metric (it’s the inverse of throughput).
■Maximum latency is an interesting metric, but hard to optimize.
■The 99th percentile latency and beyond is a good compromise.
■In coordinated omission, benchmark accidentally coordinates with the system
being measures.
■Outliers in the latency distribution are not measured.
Beware coordinated omission
Ivan Prisyazhynyy On Coordinated Omission (2021)
■Visualizing latency is important
■Histograms are good…
■…but eCDFs can be even better!
Visualizing latency
Marc Brooker Histogram vs eCDF (2022)
Examples
Example of latency histogram.
The x-axis represents percentiles and y-axis represents the latency.
Example of latency eCDF.
The x-axis represents the latency and the y-axis represents the cumulative density (probability)
■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline
Reducing latency
■Latency lurks everywhere.
■To reduce latency:
●Avoid data movement
●Avoid work
●Avoid waiting
Reducing latency
Avoiding data movement
Avoiding data movement
■Moving data is slow:
●Network round-trip between New York and London is at least 57 ms.
●Data center network round trip is about 100-500 μs.
●DRAM access latency is 100 ns.
■Move data where it is used:
●Colocation
●Replication
●Caching
■Colocation is a technique to reduce latency by moving two components close
to each other.
■For example, move a database in the same machine where your application
logic is to eliminate network round-trip latency.
■Colocation is not always possible, unless you have multiple copies…
Colocation
■Replication and caching are techniques to reduce latency by having a copy of
the data close to where it is used.
■Both techniques have their pros and cons around consistency.
■Often a good technique to reduce latency if you can deal with the storage
amplification.
Replication and caching
Avoiding work
■You can reduce latency by doing less:
●Tame algorithmic complexity
●Control memory management
●Optimize your code
●Avoid CPU-intensive computation
Avoiding work
■Understand if the algorithm you use is suitable for low latency:
●O(1) algorithm is (probably) fine.
●O(n^2) algorithm and worse is (probably) not fine.
■Data structures you likely see in low latency code:
●Queues and stacks
●Arrays
●Hash tables
■Data structures you probably won’t see in low latency code:
●Linked lists
●Graphs
Taming algorithmic complexity
■Avoid dynamic memory management
●Allocating memory in fast-path is likely a source of latency outlier.
●You can do low latency with a pauseless GC, but you still need to avoid allocating objects.
■Avoid demand paging
●Virtual memory is an illusion of memory space as large as the disk space.
●This means virtual memory (in the worst case) runs at disk speed.
●Avoid demand paging.
Controlling memory management
■Optimizing code can reduce latency
●Reducing CPU cycles, cache misses, and so on is needed for low latency
■Find bottlenecks with profiler, optimize, and repeat the process.
■Beware of optimizing work at the expense of something else.
●For example, batching can reduce CPU cycles, but increase latency.
Optimizing your code
■CPU-intensive computation can hurt latency.
■Avoid long-running tasks by splitting the work.
Avoid CPU-intensive computation
Avoiding waiting
■Eliminate synchronization
■Use wait-free synchronization
■Don’t wait for the OS
■Don’t wait for the network
Avoiding waiting
■Synchronization such as mutual exclusion means threads wait
■Partition data to eliminate synchronization
●Thread-per-core is effective because CPUs run independently of each other.
■Make shared data structures read-only when possible
Eliminate synchronization
■Wait-free synchronization if you can’t partition data.
■Wait-free = finite steps, no waiting
■For example, single producer, single consumer queues are a great low level
primitive for low latency request processing.
Use wait-free synchronization
Herlihy. (1991) Wait-free synchronization. TOPLAS
■Avoid context switching
●Don’t create too many threads, be easy on system calls.
■Use non-blocking I/O
●Blocking a kernel thread on I/O is guaranteed bad tail latency.
■Use busy-polling (if energy is not a concern)
●It can be faster to poll for an event than wait for it.
■By-pass the kernel (if you can)
●For example, XDP and DPDK provide ways to by-pass the OS network stack for lower latency.
Don’t wait for the OS
■Disable Nagle's algorithm (use TCP_NODELAY)
■Avoid head-of-line blocking
Don’t wait for the network
■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline
Hiding latency
■Parallelize request processing
●Perform request processing in parallel instead of serially to reduce latency.
■Hedge requests
●Send request to multiple servers and use results from fastest one.
■Light-weight threads
●For example, GPUs hide latency by executing massive amounts of light-weight threads in
parallel, which hides memory access latency.
Hiding latency
■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline
Tuning the system
■Configure CPU frequency scaling
■Isolate CPUs for application threads
■Disable swap
■Configure network stack interrupt affinity
Tuning the system
■Why is latency important?
■Measuring latency
■Reducing latency
■Hiding latency
■Tuning the system
■Summary
Outline
Summary
■Latency is a distribution; measure and visualize it as such, and watch out for
coordinated omission.
■Avoid data movement, work and waiting to reduce latency.