The Tale of Taming TigerBeetle's Tail Latency by Tobias Ziegler

ScyllaDB 0 views 32 slides Oct 17, 2025

Slide 1 of 32

About This Presentation

Learn how we reduced TigerBeetle’s tail latency through algorithm engineering. ‘Algorithm engineering goes beyond studying theoretical complexity and considers how algorithms are executed efficiently on modern super-scalar CPUs. Specifically, we will look at Radix Sort and a k-way merge and expl...

Size: 2.64 MB

Language: en

Added: Oct 17, 2025

Slides: 32 pages

Slide Content

A ScyllaDB Community
The Tale of Taming
TigerBeetle’s Tail
Tobias Ziegler
Software Engineer

Narrator: Tobias Ziegler (he/him)

■PhD in (Distributed) OLTP Databases
■I love performance engineering
■Sports fanatic & coffee enthusiast
Software Engineer at TigerBeetle

The Villain: TigerBeetle’s Tail

The Villain: TigerBeetle’s Tail

Throughput: 404k tx/s
Batch Latency:
■p50 = 13 ms
■p99 = 103 ms
■p100 = 104 ms

The Protagonist: TigerBeetle
High-Performance Financial Transactions Database (OLTP)
■Debit/Credit primitive instead of SQL (Oasis)
■Reliable, fast, highly available
■Replicated + persistent storage for fault tolerance

Are We I/O Bound?

I/O Is the Bottleneck of the Past
Network Bandwidth SSD Bandwidth

CPU Is the New Bottleneck
TODO: Nicer Figure

The Protagonist: TigerBeetle
Single-Core by Design
■Optimized for Updates & Writes
■Power-Law Workloads: Contention Cannot Be Parallelized
“Design is the beauty of turning
constraints into advantages”

So, let us change this!
What happens here?

(Simpliﬁed) Processing Model
TXN Batch
Processing
mutable table

Algorithm Selection & Design

K-Way Merge: Naive
Merge k sorted runs into one sorted sequence.

Naïve approach:

■Scan all k runs to find the minimum element.

■Repeat until all n elements are merged.

Time complexity: O(n · k)

R
u
n
R
u
n
R
u
n
R
u
n

Merge k sorted runs into one sorted sequence.

Tree-Based approaches:

■Select min-element in log(k).

■Repeat until all n elements are merged.

Time complexity: O(n · log(k))

K-Way Merge: Tree-Based
R
u
n
R
u
n
R
u
n
R
u
n

Tree of Losers

1
5
…
2
3
…
4
7
…
3
8
…
Initializing:

Tree of Losers

5
…

3
…
4
7
…
3
8
…
2 4
3
Previous
winner: 1
Only one path competes → 1 comparison per
level (vs. 2 in a min-heap). Asymptotically
optimal.

Did We Tame The Tail?
Not Really!
Throughput: 398k tx/s
Batch Latency:
■p50 = 13 ms
■p99 = 101 ms
■max = 102 ms

Algorithm Engineering
bridges theory and practice

Map our abstract algorithm onto actual hardware.

■Modern CPUs are quite complex:
●Pipelined, Super Scalar, Out Of Order, Speculative Execution
Need simple mental model to map our Algorithm to CPU.
■Everyone has a mental model, but how accurate is it?
■To ﬁnd this out we need two things
●Representative Benchmark (different Key and Value Sizes, …)
●Tooling (Perf, Polar Signals, VTune …)
Methodology & Tooling

CPU Counters
CPUs have lots of performance counters:
■Instructions: machine instructions executed (“retired”)
■Cache/ TLB hits/misses: for different caches
■Branch hits/misses: (in)correctly predicted branches
■…

CPU Counters with perf stat
> perf stat ./loser_tree

Ok, so what?

CPU Counters with PerfEvent
■Library that can measure some part of your code
■Normalizes the CPU counters
■This allows you to reason about the algorithm/data structure

> ./loser_tree

Let’s Talk Performance
This is the naive Tree of Losers that we implemented.
TODO perf counters

Optimizing for Cache Misses: Eytzinger Layout
Rule #1: Data must be ready for the CPU to enable instruction-level parallelism!

TODO perf counters

Optimizing for TLB Misses: Huge Pages
TLB Misses cost in the order of ~50+ CPU cycles
TODO perf counters

Avoid Branch Mispredictions: Go Branchless
Branch Mispredictions cost around 15-20 cycles.
TODO perf counters

Reduce Instructions: Unrolling and Co.
Make every instruction count!
TODO perf counters

Ablation Study and Final Result
TODO perf counters

Incremental
TODO perf counters

At Last We Tamed the Tail.

Thank you! Let’s connect.
Tobias Ziegler
[email protected]
@Tobias__Ziegler
github.com/toziegler

The Tale of Taming TigerBeetle's Tail Latency by Tobias Ziegler

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

The Tale of Taming TigerBeetle&#39;s Tail Latency by Tobias Ziegler

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 32

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx

The Tale of Taming TigerBeetle's Tail Latency by Tobias Ziegler