Concurrency Testing using Custom Linux Schedulers by Jake Hillion & Johannes Bechberger

ScyllaDB 1 views 32 slides Oct 13, 2025
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

New features of the Linux kernel allow us to develop our own Linux schedulers. This talk shows how anyone can write a basic Linux scheduler and use it, for example, to fuzz for concurrency bugs or optimize for specific workloads.


Slide Content

A ScyllaDB Community
Jake Hillion
Software Engineer
Johannes Bechberger
OpenJDK Developer
Concurrency Testing using
Custom Linux Schedulers

Heisenbugs

Producers and consumers
Producer Consumer
Best-before Best-before

Crashes on expiry
Producer Consumer
Best-before Best-before

Crashes on expiry - even in the Linux kernel!
Producer Consumer
Best-before Best-before

Scheduling

Work-Conserving
Responsive
Fair

Work-Conserving
Responsive
Fair

This is predictable.

ERRATIC TBD We need an erratic scheduler!

Work-Conserving
Responsive
Fair

Work-Conserving
Responsive
Fair

Not
Not
Maybe

Work-Conserving
Responsive
Fair

Not
Not
Maybe
This is not predictable.

Building our scheduler

eBPF is a crazy
technology, it’s like
putting JavaScript
into the Linux kernel 
Brendan Gregg

https://www.facesofopensource.com/brendan-gregg/

sched_ext
Ease of experimentation and exploration
Customization
Rapid scheduler deployments

Let's create our own
+ +

T0
CPU 1
Local
Queue
CPU 2
Local
Queue
Global
Queue
Scheduler
...
...
Scheduler dance
T1

@BPF(license = "GPL")
public abstract class SampleScheduler
extends BPFProgram implements Scheduler {

static final long SHARED_DSQ_ID = 0;

@Override
public int init() {
return scx_bpf_create_dsq(SHARED_DSQ_ID, -1);
}
}

@BPF(license = "GPL")
public abstract class SampleScheduler {

@Override
public void enqueue(Ptr<task_struct> p,
long enq_flags) {
scx_bpf_dispatch(p, SHARED_DSQ_ID,
5_000_000, enq_flags);
}

}

@BPF(license = "GPL")
public abstract class SampleScheduler {

@Override
public void dispatch(int cpu,
Ptr<task_struct> prev) {
scx_bpf_consume(SHARED_DSQ_ID);
}

}

Fixing a kernel driver scheduling bug
with scx_chaos

int hsmp_send_message(struct hsmp_message *msg) {
take_per_socket_lock();
msg_socket->send_request(msg);
while (!timed_out) {
resp = msg_socket->gather_response();
if (resp.ready)
break;
usleep_range(100, 2000); // sleep for
100-2000 micros
}
release_per_socket_lock();
}

int hsmp_send_message(struct hsmp_message *msg) {
take_per_socket_lock();
msg_socket->send_request(msg);
while (!timed_out) {
resp = msg_socket->gather_response();
if (resp.ready)
break;
usleep_range(100, 2000); // sleep for
100-2000 micros
}
release_per_socket_lock();
}
Defensive code gone wrong

int hsmp_send_message(struct hsmp_message *msg) {
take_per_socket_lock();
msg_socket->send_request(msg);
while (true) {
resp = msg_socket->gather_response();
if (resp.ready)
break;
if (timed_out)
break;
usleep_range(100, 2000); // sleep for 100-2000 micros
}
release_per_socket_lock();
}

Reproduction

“Normally we just add sleeps
and hope for the best”
-Anonymous kernel developer

Demo

scx_chaos helps to find bugs
due to invalid concurrency
assumptions.

https://github.com/parttimenerd/
concurrency-fuzz-scheduler
https://github.com/sched-ext/scx/tree/
main/scheds/rust/scx_chaos

Thank you! Let’s connect.
Jake Hillion
@jakehillion.me on Bluesky
https://matrix.to/#/@jake:hillion.co.uk
blog.hillion.co.uk
Johannes Bechberger
@mostlynerdless.de on Bluesky
https://mastodon.social/@parttimenerd
mostlynerdless.de
Tags