Finding Performance Needles in Haystacks with APerf by Geoffrey Blake

ScyllaDB 0 views 19 slides Oct 09, 2025
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

Finding performance issues in modern software is like finding a needle in a haystack and intuition on where to look first is often wrong. APerf is an open source tool we have used many times to help with performance debugging by looking "wide" before going "deep". This session wi...


Slide Content

A ScyllaDB Community
Finding Performance Needles in
Haystacks with APerf
Geoffrey Blake
Principal Engineer

Geoffrey Blake (he/him)

Principal Engineer at AWS
■Help AWS and customers optimize on Graviton
■Enjoy solving performance puzzles in all domains
■When not at work enjoy flying little airplanes to
weird destinations

HELP!
We upgraded and
performance is worse!
Weird, it should
be faster!

Groovy web-app P99 latency on AWS Instances
3ms
40ms

13x slower

Now what?
flamegraphs
eBPF
strace
ftrace
tracepoints
JFR+JMC
tcpdump
/proc stats netstat
mpstat
sysstat
iostat
sysctl
kernel config

Wide then deep debugging
■Intuition/instincts unreliable
■Big gains hide in plain sight
■Breadth first search


■APerf tool for wide then deep






github.com/aws/aperf

Complementary tool in the toolbox
■APerf to look for signals
●Simple to use
●100’s of system metrics
●High ?????? low level
■Specific tools to go deep
●eBPF
●Wireshark
●ftrace
●…

APerf quick start
%> wget
https://github.com/aws/aperf/releases/download/v0.1.16-alpha/aperf-v0.1.16-al
pha-aarch64.tar.gz

%> tar –zxf aperf-v0.1.16-alpha-aarch64.tar.gz
%> export PATH=$PATH:$PWD/aperf-v0.1.16-alpha-aarch64

%> sudo aperf record -r debug_session_1 -i 1 -p 600

%> aperf report -r debug_session_1 -n debug_session_report
<generates debug_session_report.tar.gz>

https://github.com/aws/aperf/blob/main/EXAMPLE.md

APerf report UI
10x bugs
2x bugs
<2x bugs

APerf CPU utilization view

APerf PMU stats view

APerf Net Stats view
sysctl-explorer.net
/net/ipv4/
tcp_autocorking/

TCPDump deep-dive

Why?
tcp_push(...) {
...
if (tcp_should_autocork(...)) {
if (!test_bit(TSQ_THROTTLED, sk)){
set_bit(TSQ_THROTTLED, sk);
}
if (refcount_read(sk->wmem) > n)
return
}
...
__tcp_push_pending_frames(...);
tcp_wfree(...) { ...
refcount_sub_test(sk->wmem);
oval = smp_load_acquire(sk);
do {
if (!(oval & TSQ_THROTTLED))
goto out;
nval = (oval & ~TSQ_THROTTLED);
} while(!try_cmpxchg(sk, oval, nval);
...
tasklet_schedule();
return;
out: sk_free(sk);
/net/ipv4/tcp.c:tcp_push() net/ipv4/tcp_output.c:tcp_wfree()

The fix
tcp_push(...) {
...
if (tcp_should_autocork(...)) {
if (!test_bit(TSQ_THROTTLED, sk)){
set_bit(TSQ_THROTTLED, sk);
+ smp_mb__after_atomic();
}
if (refcount_read(sk->wmem) > n)
return
}
...
__tcp_push_pending_frames(...);
tcp_wfree(...) { ...
refcount_sub_test(sk->wmem);
oval = smp_load_acquire(sk);
do {
if (!(oval & TSQ_THROTTLED))
goto out;
nval = (oval & ~TSQ_THROTTLED);
}while(!try_cmpxchg(sk, oval, nval);
...
tasklet_schedule();
return;
out: sk_free(sk);

P99 latency after 1-liner kernel fix

P99’s hide anywhere

Wide then deep

APerf can help
github.com/aws/aperf

Thank you! Let’s connect.
Geoffrey Blake
[email protected]
linkedin – Geoffrey Blake
Tags