Motivation
•My local workflow is simple: add printk(), recompile the kernel, transfer
the image to the board, and test.
•Customer build environments are often complex and tightly controlled.
•The person reporting the bug may need to coordinate with their internal
build team to generate a new image.
•We often work across different time zones, increasing round-trip times
for communication and patch testing.
•This slows down debugging and delays root cause analysis.
•There are tracing techniques in kernel which can be used to get
information from running system
MARVELL
ftrace
•ftrace is a powerful dynamic tracing framework built into the Linux
kernel.
•ftraceis where modifying a running kernel began
•Tracing is controlled via special filesystem called tracefs
•Trace data is written to ring buffer internally
•Ideal for analyzing performance, call paths, and runtime behavior.
•Lightweight and safe for use in production environments.
MARVELL
•mount -t debugfsnone /sys/kernel/debug; cd /sys/kernel/debug/tracing/
MARVELL
ftrace–function graph tracer
•Understand the kernel flow using function graph tracer
MARVELL
ftrace–function graph tracer
•With latest kernel, function arguments and return values can also be traced
MARVELL
ftrace–function graph tracer
Tracepoints
•Tracepoints are predefined hooks placed in the kernel source code.
•They allow developers to emit structured trace data at specific locations.
•Can be used with tools like perf, trace-cmd, or bpftrace.
•Ideal for observing kernel events like scheduler activity, memory
management, or device drivers.
•Low overhead and safe for production use.
•Almost every subsystem in kernel has tracepoints to help in debugging
MARVELL
Tracepoints
MARVELL
VF
PF
AF
VF2PF mailbox region
PF2VF mailbox region
PF2AF mailbox region
AF2PF mailbox region
Stage 1
1.Allocate msg in HW
shared mboxregion
2.Send msg to PF by
triggering interrupt to PF
3.Wait for response/ack
Stage 2
1.Upon receiving INT from
VF copy the messages to
PF2AF mbox
2.Send msg to AF by
triggering interrupt to AF
3.Wait for response/ack
Stage 3
1.Upon receiving INT from PF
process the message
2.Send response msg to PF by
triggering interrupt to PF
Stage 4
1.Upon receiving INT from
AF copy the response
messages to PF2VF mbox
2.Send response msg to VF
by triggering interrupt to
VF
Stage 5
1.Upon receiving INT from PF
check ACK/responses.
Tracepoints –example
•Example of a tracepoint and its format
MARVELL
Tracepoints –example
MARVELL
Tracepoints –debug example
Problem
•‘ifconfigeth0 up’ is taking longer time
MARVELL
Tracepoints –debug example
•Enable workqueue tracepoints to confirm
MARVELL
Fprobes
•fprobes are a newer, more efficient alternative to kprobes, designed to
reduce overhead.
•They allow attaching probes to multiple functions using entry/exit hooks
with minimal performance impact.
•Can be dynamically added and removed at runtime, making them
suitable for live systems.
•When BTF (BPF Type Format) data is available, fprobes can access
function arguments by name, improving readability and ease of use.
•Ideal for tracing large sets of functions with minimal setup.
MARVELL
•Problem –kernel warning when bringing up an interface
MARVELL
fprobes–debug example
•Let’s check the WARN_ON at mm/page_alloc.cat line 4935
MARVELL
fprobes–debug example
•Too many page allocations happening system wide
MARVELL
fprobes–debug example
•Let’s add a filter to capture only allocations with bigger order
MARVELL
fprobes–debug example
•Check whether it is really from interface open callsite
MARVELL
fprobes–debug example
•dma_alloc_attrshas a tracepoint in it enable and check the parameters of
it
MARVELL
fprobes–debug example
•Repeat the same steps on working system
MARVELL
fprobes–debug example
•Looking at the code PAGE_SIZE is the only variable between working and
non-working cases!
MARVELL
fprobes–debug example
Kprobes
•When tracepoints are missing in the code path, kprobes provide a
flexible way to instrument almost any kernel function or instruction.
•Unlike tracepoints, kprobes can be inserted dynamically at runtime,
without requiring any prior instrumentation in the source.
•Internally, kprobes work by placing a breakpoint instruction at the probe
location, which introduces some overhead.
•Note: The mapping of function arguments to registers or stack locations
depends on the architecture-specific ABI.
•It’s better to use perf probe to simplify probe creation and argument
handling.
MARVELL
kprobes-perf
•Use perf to simplify adding a probe (vmlinuxis also needed)
•After probe is created access it via tracefs
MARVELL
kprobes-perf
•Not only function and its argswe can add a kprobein middle of function
•This helps to check how variables are changing (needs the kernel source!)
MARVELL
kprobes-perf
•Check for all the variables which can be probed at our line of interest
MARVELL
Kprobes-perf
•Add two probes at two lines with variable names and enable the probes
MARVELL
kprobes-perf
•Check how variable changes between probes
MARVELL
eBPF(Extended Berkeley Packet Filter)
•Until now, tracing tools like ftrace, kprobes, and tracepoints allowed
us to observe kernel internals.
•With eBPF, we can now execute custom logic inside the kernel when
a probe is hit
•A kernel technology that runs sandboxed programs in the Linux
kernel without modifying kernel code or loading modules.
•Programs are compiled to bytecode and executed in a lightweight
eBPFvirtual machine inside the kernel.
•Verifier ensures safety by checking for valid memory access, program
termination, and restricted operations.
•Maps are key-value stores used to share data between kernel and
user space or across eBPFprograms.
•Programs attach to hook points like tracepoints, kprobes, network
events etc.,
MARVELL
eBPF–memleakdetector
•Let’s write an eBPFbased memory leak detector in C
•Hook simple eBPFprograms into kmallocand kfreetracepoints
•Track allocations during any custom "alloccommand" (e.g., module
insertion, interface up)
•Verify that corresponding "free command" (e.g., module removal,
interface down) cleaned up all memory allocated before
•No need to dive into memory management internals or allocation
paths
•Lightweight and easy to extend
MARVELL
eBPF–kernel program
MARVELL
Key Value
Kernel
kmalloc{
tracepoint;
return addr;
}
kfree(addr) {
tracepoint;
}
ptr1 calltrace1
ptr2 calltrace2
ptr3 calltrace2
•Store memory block addresses and corresponding call traces in a map during kmalloc
•Search with memory block address as key and if found remove the element from the map
during kfree
map_kmalloc
eBPF–kernel program (maps)
MARVELL
Key Value
ptr1 stackid1
ptr2 stackid2
ptr3 stackid2
map_kmalloc(BPF_MAP_TYPE_HASH)
Key Value
stackid1 __kmalloc
ext4_htree_store_dirent
htree_dirblock_to_tree
ext4_htree_fill_tree
ext4_readdir
iterate_dir
stackid2 __kmalloc_node_track_caller
kmalloc_reserve
__alloc_skb
__napi_alloc_skb
napi_get_frags
smap_kmalloc(BPF_MAP_TYPE_STACK_TRACE)
•To get call trace inside an eBPFprogram use bpf_get_stackid() helper
•Helper requires map of type BPF_MAP_TYPE_STACK_TRACE as argument and returns a unique stack id for
the call trace
•Key for stack map is stackidand value is array of function addresses/call trace which lead to kmalloc
eBPF–kernel program
MARVELL
eBPF–kernel program
MARVELL
Problem:
Free/teardown sequence is also calling kmalloc
eBPF–kernel program (maps)
MARVELL
Key Value
ptr1 stackid1
ptr2 stackid2
ptr3 stackid2
Key Value
stackid1__kmalloc
ext4_htree_store_dirent
htree_dirblock_to_tree
ext4_htree_fill_tree
ext4_readdir
iterate_dir
stackid2__kmalloc_node_track_caller
kmalloc_reserve
__alloc_skb
__napi_alloc_skb
napi_get_frags
Key Value
0 flags(MAP_DO_KMALLOC)
map_config(BPF_MAP_TYPE_ARRAY)
map_kmalloc(BPF_MAP_TYPE_HASH)
map_kmalloc(BPF_MAP_TYPE_STACK_TRACE)
•Use map of type BPF_MAP_TYPE_ARRAY as a flag to control eBPFprogram from userspace
eBPF–kernel program
MARVELL
•Let's add another map which act as a flag to inform kernel when to track allocations and set it from userspaceprogram.
•Userspace program now sets flag -> system(alloc_cmd) -> clears flag -> system(free_cmd)
eBPF–kernel program
Improvement
•when kmallocfunction is called in a loop in driver then entire output is
filled with stack traces from same call site.
•Let's add another map where key is calltrace/stack id and value is counter
which gets incremented
•So, our tool output will be clear showing count which implies number of
allocations happened at same calltrace
MARVELL
eBPF–kernel program (maps)
MARVELL
Key Value
stackid1 1
stackid2 2
smap_count(BPF_MAP_TYPE_HASH)
Key Value
ptr1 stackid1
ptr2 stackid2
ptr3 stackid2
Key Value
stackid1__kmalloc
ext4_htree_store_dirent
htree_dirblock_to_tree
ext4_htree_fill_tree
ext4_readdir
iterate_dir
stackid2__kmalloc_node_track_caller
kmalloc_reserve
__alloc_skb
__napi_alloc_skb
napi_get_frags
Key Value
0 flags(MAP_DO_KMALLOC)
map_config(BPF_MAP_TYPE_ARRAY)
map_kmalloc(BPF_MAP_TYPE_HASH)
map_kmalloc(BPF_MAP_TYPE_STACK_TRACE)
•Count number of same call traces using another map, smap_countof type BPF_MAP_TYPE_HASH
eBPF–kernel program
MARVELL
•smap_countmap counts the same call traces
eBPF–kernel program
Improvement
•Output is somewhat nicer after displaying counts of same call traces
instead of one-by-one call trace
•Lot of allocations are happening system wide in addition to my driver
allocations between allocand free window
•So, enhanced user space program for more post processing like it can take a
text file with function names of my driver so that it displays any leaks
related to my driver only
•Take help of C code browsing tools to capture all function names of a
driver/folder
•ctags-x --c-types=f drivers/net/ethernet/marvell/octeontx2/nic/* | cut -f1 -
d" " > test.txt
MARVELL
eBPF–kernel program
Did it work now?
•Let's leak some memory in netdevsim driver and check
MARVELL
eBPF –kernel program
Yes!
MARVELL
eBPF-bpftrace
•bpftraceis a high-level tracing language for Linux.
•Provides a quick and easy way for people to write observability-
based eBPFprograms, especially those unfamiliar with the
complexities of eBPF.
•Uses LLVM as a backend to compile scripts to eBPF-bytecode
•Makes use of libbpfand bcc for interacting with the Linux BPF
subsystem, existing Linux tracing capabilities: kernel dynamic
tracing (kprobes), user-level dynamic tracing (uprobes),
tracepoints, etc.
•The bpftracelanguage is inspired by awk and C
•Easy to install when using a distro like Ubuntu, Redhatetc.
MARVELL
eBPF–bpftraceexample
•Find number of tagged packets sent out from all interfaces
MARVELL