Building a Fast Lock-free Queue for Trading Systems by Sarthak Sehgal
ScyllaDB
1 views
34 slides
Oct 15, 2025
Slide 1 of 34
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
About This Presentation
When every microsecond counts, inter-thread communication must be lean and predictable. This talk dives into the design of a high-performance SPSC bounded queue for ultra-low latency trading systems. We’ll cover eliminating locks with atomics, reducing sync costs with memory ordering, and avoiding...
When every microsecond counts, inter-thread communication must be lean and predictable. This talk dives into the design of a high-performance SPSC bounded queue for ultra-low latency trading systems. We’ll cover eliminating locks with atomics, reducing sync costs with memory ordering, and avoiding traps like false sharing, all backed by real-world measurements and cache analysis. Attendees will gain practical strategies for writing lock-free data structures that deliver under pressure.
Size: 2.85 MB
Language: en
Added: Oct 15, 2025
Slides: 34 pages
Slide Content
A ScyllaDB Community
Building a Fast Lock-free
Queue for Trading Systems
Sarthak Sehgal
Tech Lead
Sarthak Sehgal he/him
Tech Lead at Maven Securities
■Working at a high frequency options market making firm
■Interested in finance, low level programming, and C++
under the hood
■Sometimes, I write about C++ at sartech.substack.com
Overview
■Motivation & Problem Setup
■Building an SPSC queue using std::atomic
■Optimizations
●Memory Ordering
●Cache alignment
●Variable caching
Motivation
Motivation
■On Nasdaq:
●1 million msgs/s
●Peaks at 3-4 million msgs/s at open/close
Exchange
Source: Stefan Schlamp LinkedIn
Problem Setup
Problem Setup
■One producer and one consumer
■Both threads are pinned on separate physical cores
■Fixed size buffer (bounded queue)
Queue using std::atomic
Producer
Consumer
Instruction Execution Order in CPU
Out-of-order execution
Out-of-order execution
Out-of-order execution
Out of order atomic update and memory access leads to incorrect behavior
Aspects of std::atomic
1.Atomicity - operations on the atomic object are indivisible
Aspects of std::atomic
1.Atomicity - operations on the atomic object are indivisible
2.Memory Ordering - determines how memory access (atomic and non-atomic)
surrounding an atomic operation are sequenced. This is used for memory
access synchronization across threads.
Aspects of std::atomic
1.Atomicity - operations on the atomic object are indivisible
2.Memory Ordering - determines how memory access (atomic and non-atomic)
surrounding an atomic operation are sequenced. This is used for memory
access synchronization across threads.
Fortunately, the default load and store operations ensure that reordering does not occur. This behavior is
governed by memory ordering. As we will uncover, memory ordering has a significant impact on the
performance.
Memory ordering in atomic
std::memory_order_relaxed
■No ordering constraints for reads or writes around the atomic variable
■Only ensures atomicity
Optimization: Using relaxed ordering
std::memory_order_acquire / release
std::memory_order_acquire / release
std::memory_order_acquire / release
Optimization: Using memory ordering
False Sharing
False Sharing
■False sharing occurs when two or more threads access different variables
that are located on the same cache line
Results
Checkout my slides and blog for the complete code
Resources
■Fedor Pikus’ introductory talk on std::atomic and memory ordering
■Source code of boost spsc lockfree queue
■Herb Sutter’s atomic weapons talk