Reliability vs Memory Efficiency of eBPF Instrumentation: Why Not Both? by Dom Delnano

ScyllaDB 0 views 21 slides Oct 08, 2025
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

Zero-instrumentation eBPF tooling promises instant insights, but making it reliable and lightweight is a serious challenge. This talk shows how CNCF Pixie profiled its own memory use, uncovered issues with DWARF-intensive Uprobes, and redesigned them to cut memory usage while preserving probe stabil...


Slide Content

A ScyllaDB Community
Reliability vs Memory Efficiency
of eBPF Instrumentation:
Why Not Both?
Dom Delnano
Co-Founder/CEO

Dom Delnano (he/him)

Co-Founder/CEO Cosmic
■CNCF Pixie Maintainer
■Outside of work: Weight lifting, snowboarding &
racket sports

Pixe Introduction
■Observability tool that provides:
●Microservice protocol traces (latency,
full payload)
●CPU flame graphs
■Our eBPF probes have been in production
for 7 years

■Throughout this journey, we've learned a
thing or two about designing probes

Why Reliable eBPF tracing matters
■Reliable tracing: Stability of probes in the face of target app changes
●Kernel upgrades
●Library changes (OpenSSL, BoringSSL, gRPC, etc)
●Application changes (language version, code changes, etc.)
■eBPF instrumentation promises "zero-instrumentation"
●Black box tracing loses user trust quickly if tooling is brittle

Sources of eBPF probe instability
Struct Layout
Instability
API ("Func Arg")
Instability

struct bio_st {
const BIO_METHOD *method;
+ BIO_callback_fn callback;
+ BIO_callback_fn_ex callback_ex;
...

int num;
}

func (t *http2Server) operateHeaders(
+ ctx context.Context,
frame *http2.MetaHeadersFrame,
handle func(*Stream)
) error
Golang gRPC v1.59.0 -> v1.60.0OpenSSL v1.1.0 -> v1.1.1

Sources of eBPF probe instability
Struct Layout
Instability
API ("Func Arg")
Instability

struct bio_st {
const BIO_METHOD *method;
+ BIO_callback_fn callback;
+ BIO_callback_fn_ex callback_ex;
...

int num;
}

func (t *http2Server) operateHeaders(
+ ctx context.Context,
frame *http2.MetaHeadersFrame,
handle func(*Stream)
) error
OpenSSL v1.1.0 -> v1.1.1 Golang gRPC v1.59.0 -> v1.60.0
frame: PT_REGS_PARM2 –>
PT_REGS_PARM4

Additional API ("Func Arg") complexity
■eBPF PT_REGS_PARM*() helpers use native arch's calling convention

■Most native code is compatible, Golang is an exception (non-native cc.)

■Golang introduced an ABI change in Go 1.17: stack -> register based ABI

Overview of probe stability
eBPF Probe

Struct Layout
Stability
API Stability
("Func Argument")
Overall
Stability
Socket syscalls (tracepoint) ✅ (BTF covers) ✅ Stable ✅ High
Kernel tracing (kprobe) ✅ (BTF covers) ⚠ Prone to change ⚠ Medium
TLS Library (uprobe) ⚠ Prone to
change
⚠ Prone to change ⚠ Medium
Go gRPC Library (uprobe)

⚠ Prone to
change
❌ Prone to change +
ABI changes
❌ Low

Complexity in handling instability
■eBPF needs runtime information for tracing target
●User space must inject into eBPF program
+ volatile const bool is_new_frame_pos;

// func (d *s) operateHeaders(frame *http2.MetaHeadersFrame)
// for version 1.60 and above:
+ // func (d *s) operateHeaders(ctx context.Context,
+ // frame *http2.MetaHeadersFrame,
SEC("uprobe/http2Server_operateHeaders")
int uprobe_http2Server_operateHeaders(struct pt_regs *ctx)
{
u64 frame_pos = 2;
+ if (is_new_frame_pos) {
+ frame_pos = 4;
+ }
void *frame_ptr = get_argument(ctx, frame_pos);

Solution: DWARF-based runtime location
■DWARF solves this for debuggers
■eBPF needs state at function entry/exit instead of journey through call stack
■Provides:
●Struct offsets
●Func arg location
●ABI agnostic – host ABI, Golang stack & reg ABI, stack arg spill

Without vs. With DWARF
SEC("uprobe/http2Server_operateHeaders")
int uprobe_http2Server_operateHeaders(
struct pt_regs *ctx)
{
u64 frame_pos = 2;
if (is_new_frame_pos) {
frame_pos = 4;
}
void *frame_ptr = get_argument(
ctx, frame_pos);
SEC("uprobe/http2Server_operateHeaders")
int uprobe_http2Server_operateHeaders(
struct pt_regs *ctx)
{
const void* sp = (const void*)ctx->sp;
uint64_t* regs = go_regabi_regs(ctx);
void* frame_ptr = NULL;

assign_arg(
&frame_ptr, sizeof(frame_ptr),
symaddrs->frame_location,
sp, regs);

❌ Version specific logic ✅ DWARF-Driven Runtime Discovery

Reliable tracing … but at what cost?
■DWARF parsing is memory intensive

■Pixie's memory profiling surfaced this optimization
opportunity
# Orchestrated via wrapper script (1)
$ px run -o json -f collect_heap_dumps.pxl > $pxl_heap_output_file

# Memory mapped (/proc/$PID/maps) binaries automatically collected
# for post hoc symbolization.

$ pprof -http=localhost:8888 $heap_profile
(1) Pixie heap profiling routines

Reliable ?????? Efficient Tracing

"The best code is no code at all."
- Jeff Atwood

■Can struct offsets and function arg locations be pre-computed?

■Not viable for all toolchains (cartesian explosion) but works for Go

■Pixie's memory challenges were specific to Go uprobes
Balancing Reliability & Efficiency

■Pixie can calculate these if an application binary is provided
●e.g. Go 1.20 application w/ grpc v1.60

■Lacks the ability to discover all versions (Go and deps) and automate
collection of all combinations

■For each Go and dependency version
●Compile the binary and compute the struct and func. arg information
Challenge: Automation for pre-computation

■opentelemetry-go-instrumentation generates
pre-computed struct offsets

■Automates compiling sample programs per version
●Stdlib package: Go version
●Module package: Module version

■Generates JSON file containing relevant offsets
OpenTelemetry's Infrastructure

Extending Open Telemetry
■Pixie's instrumentation also needed function
arg information

■OTel’s BPF programs handle this more
manually

■Extend generation to include function
argument locations (1)

(1) pixie-io/opentelemetry-go-instrumentation#1

■Uses the same DWARF "runtime"
API (pixie#2207)



■Prevents ~100 MiB allocation
spikes for large binaries
Integration with Pixie

■You choose your tradeoff:
●Efficient: pre-computed offsets
●Efficient & Reliable: pre-computed offsets w/ runtime fallback

■No modifications needed for existing eBPF programs

■Pixie + OTel: Better together
Key takeaways

■Upstream OTel function argument location changes
●Benefit: provide argument order agnostic tracing for OTel eBPF probes

■End to end memory profiling in Pixie: missing in-line symbolization & plumbing
Future work
$ px run -o json -f collect_heap_dumps.pxl

Thank you! Let’s connect.
Dom Delnano
[email protected]
@ddelnano
Tags