Building a Fast Lock-free Queue for Trading Systems by Sarthak Sehgal

ScyllaDB 1 views 34 slides Oct 15, 2025
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

When every microsecond counts, inter-thread communication must be lean and predictable. This talk dives into the design of a high-performance SPSC bounded queue for ultra-low latency trading systems. We’ll cover eliminating locks with atomics, reducing sync costs with memory ordering, and avoiding...


Slide Content

A ScyllaDB Community
Building a Fast Lock-free
Queue for Trading Systems
Sarthak Sehgal
Tech Lead

Sarthak Sehgal he/him

Tech Lead at Maven Securities
■Working at a high frequency options market making firm
■Interested in finance, low level programming, and C++
under the hood
■Sometimes, I write about C++ at sartech.substack.com

Overview
■Motivation & Problem Setup
■Building an SPSC queue using std::atomic
■Optimizations
●Memory Ordering
●Cache alignment
●Variable caching

Motivation

Motivation
■On Nasdaq:
●1 million msgs/s
●Peaks at 3-4 million msgs/s at open/close
Exchange

Source: Stefan Schlamp LinkedIn

Problem Setup

Problem Setup
■One producer and one consumer
■Both threads are pinned on separate physical cores
■Fixed size buffer (bounded queue)

Queue using std::atomic

Producer

Consumer

Instruction Execution Order in CPU

Out-of-order execution

Out-of-order execution

Out-of-order execution
Out of order atomic update and memory access leads to incorrect behavior

Aspects of std::atomic
1.Atomicity - operations on the atomic object are indivisible

Aspects of std::atomic
1.Atomicity - operations on the atomic object are indivisible
2.Memory Ordering - determines how memory access (atomic and non-atomic)
surrounding an atomic operation are sequenced. This is used for memory
access synchronization across threads.

Aspects of std::atomic
1.Atomicity - operations on the atomic object are indivisible
2.Memory Ordering - determines how memory access (atomic and non-atomic)
surrounding an atomic operation are sequenced. This is used for memory
access synchronization across threads.
Fortunately, the default load and store operations ensure that reordering does not occur. This behavior is
governed by memory ordering. As we will uncover, memory ordering has a significant impact on the
performance.

Memory ordering in atomic

std::memory_order_relaxed
■No ordering constraints for reads or writes around the atomic variable
■Only ensures atomicity

Optimization: Using relaxed ordering

std::memory_order_acquire / release

std::memory_order_acquire / release

std::memory_order_acquire / release

Optimization: Using memory ordering

False Sharing

False Sharing
■False sharing occurs when two or more threads access different variables
that are located on the same cache line

Results

Checkout my slides and blog for the complete code

Resources
■Fedor Pikus’ introductory talk on std::atomic and memory ordering
■Source code of boost spsc lockfree queue
■Herb Sutter’s atomic weapons talk

Thank you! Let’s connect.
Sarthak Sehgal
[email protected]
linkedin/sarthaksehgal99
sartech.substack.com
Tags