Concurrency Testing using Custom Linux Schedulers by Jake Hillion & Johannes Bechberger
ScyllaDB
1 views
32 slides
Oct 13, 2025
Slide 1 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
About This Presentation
New features of the Linux kernel allow us to develop our own Linux schedulers. This talk shows how anyone can write a basic Linux scheduler and use it, for example, to fuzz for concurrency bugs or optimize for specific workloads.
Size: 3.49 MB
Language: en
Added: Oct 13, 2025
Slides: 32 pages
Slide Content
A ScyllaDB Community
Jake Hillion
Software Engineer
Johannes Bechberger
OpenJDK Developer
Concurrency Testing using
Custom Linux Schedulers
Heisenbugs
Producers and consumers
Producer Consumer
Best-before Best-before
Crashes on expiry
Producer Consumer
Best-before Best-before
Crashes on expiry - even in the Linux kernel!
Producer Consumer
Best-before Best-before
Scheduling
Work-Conserving
Responsive
Fair
Work-Conserving
Responsive
Fair
This is predictable.
ERRATIC TBD We need an erratic scheduler!
Work-Conserving
Responsive
Fair
Work-Conserving
Responsive
Fair
Not
Not
Maybe
Work-Conserving
Responsive
Fair
Not
Not
Maybe
This is not predictable.
Building our scheduler
eBPF is a crazy
technology, it’s like
putting JavaScript
into the Linux kernel
Brendan Gregg
“
https://www.facesofopensource.com/brendan-gregg/
sched_ext
Ease of experimentation and exploration
Customization
Rapid scheduler deployments
Let's create our own
+ +
T0
CPU 1
Local
Queue
CPU 2
Local
Queue
Global
Queue
Scheduler
...
...
Scheduler dance
T1
@BPF(license = "GPL")
public abstract class SampleScheduler
extends BPFProgram implements Scheduler {
static final long SHARED_DSQ_ID = 0;
@Override
public int init() {
return scx_bpf_create_dsq(SHARED_DSQ_ID, -1);
}
}
@BPF(license = "GPL")
public abstract class SampleScheduler {
@Override
public void enqueue(Ptr<task_struct> p,
long enq_flags) {
scx_bpf_dispatch(p, SHARED_DSQ_ID,
5_000_000, enq_flags);
}
}
@BPF(license = "GPL")
public abstract class SampleScheduler {
@Override
public void dispatch(int cpu,
Ptr<task_struct> prev) {
scx_bpf_consume(SHARED_DSQ_ID);
}
}
Fixing a kernel driver scheduling bug
with scx_chaos
int hsmp_send_message(struct hsmp_message *msg) {
take_per_socket_lock();
msg_socket->send_request(msg);
while (!timed_out) {
resp = msg_socket->gather_response();
if (resp.ready)
break;
usleep_range(100, 2000); // sleep for
100-2000 micros
}
release_per_socket_lock();
}
int hsmp_send_message(struct hsmp_message *msg) {
take_per_socket_lock();
msg_socket->send_request(msg);
while (!timed_out) {
resp = msg_socket->gather_response();
if (resp.ready)
break;
usleep_range(100, 2000); // sleep for
100-2000 micros
}
release_per_socket_lock();
}
Defensive code gone wrong
int hsmp_send_message(struct hsmp_message *msg) {
take_per_socket_lock();
msg_socket->send_request(msg);
while (true) {
resp = msg_socket->gather_response();
if (resp.ready)
break;
if (timed_out)
break;
usleep_range(100, 2000); // sleep for 100-2000 micros
}
release_per_socket_lock();
}
Reproduction
“Normally we just add sleeps
and hope for the best”
-Anonymous kernel developer
Demo
scx_chaos helps to find bugs
due to invalid concurrency
assumptions.