Linux debugging, profiling and tracing training

usualhope3223 18 views 300 slides Jul 25, 2024
Slide 1
Slide 1 of 300
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146
Slide 147
147
Slide 148
148
Slide 149
149
Slide 150
150
Slide 151
151
Slide 152
152
Slide 153
153
Slide 154
154
Slide 155
155
Slide 156
156
Slide 157
157
Slide 158
158
Slide 159
159
Slide 160
160
Slide 161
161
Slide 162
162
Slide 163
163
Slide 164
164
Slide 165
165
Slide 166
166
Slide 167
167
Slide 168
168
Slide 169
169
Slide 170
170
Slide 171
171
Slide 172
172
Slide 173
173
Slide 174
174
Slide 175
175
Slide 176
176
Slide 177
177
Slide 178
178
Slide 179
179
Slide 180
180
Slide 181
181
Slide 182
182
Slide 183
183
Slide 184
184
Slide 185
185
Slide 186
186
Slide 187
187
Slide 188
188
Slide 189
189
Slide 190
190
Slide 191
191
Slide 192
192
Slide 193
193
Slide 194
194
Slide 195
195
Slide 196
196
Slide 197
197
Slide 198
198
Slide 199
199
Slide 200
200
Slide 201
201
Slide 202
202
Slide 203
203
Slide 204
204
Slide 205
205
Slide 206
206
Slide 207
207
Slide 208
208
Slide 209
209
Slide 210
210
Slide 211
211
Slide 212
212
Slide 213
213
Slide 214
214
Slide 215
215
Slide 216
216
Slide 217
217
Slide 218
218
Slide 219
219
Slide 220
220
Slide 221
221
Slide 222
222
Slide 223
223
Slide 224
224
Slide 225
225
Slide 226
226
Slide 227
227
Slide 228
228
Slide 229
229
Slide 230
230
Slide 231
231
Slide 232
232
Slide 233
233
Slide 234
234
Slide 235
235
Slide 236
236
Slide 237
237
Slide 238
238
Slide 239
239
Slide 240
240
Slide 241
241
Slide 242
242
Slide 243
243
Slide 244
244
Slide 245
245
Slide 246
246
Slide 247
247
Slide 248
248
Slide 249
249
Slide 250
250
Slide 251
251
Slide 252
252
Slide 253
253
Slide 254
254
Slide 255
255
Slide 256
256
Slide 257
257
Slide 258
258
Slide 259
259
Slide 260
260
Slide 261
261
Slide 262
262
Slide 263
263
Slide 264
264
Slide 265
265
Slide 266
266
Slide 267
267
Slide 268
268
Slide 269
269
Slide 270
270
Slide 271
271
Slide 272
272
Slide 273
273
Slide 274
274
Slide 275
275
Slide 276
276
Slide 277
277
Slide 278
278
Slide 279
279
Slide 280
280
Slide 281
281
Slide 282
282
Slide 283
283
Slide 284
284
Slide 285
285
Slide 286
286
Slide 287
287
Slide 288
288
Slide 289
289
Slide 290
290
Slide 291
291
Slide 292
292
Slide 293
293
Slide 294
294
Slide 295
295
Slide 296
296
Slide 297
297
Slide 298
298
Slide 299
299
Slide 300
300

About This Presentation

Debugging


Slide Content

Linux debugging, profiling and tracing training
Linux debugging, profiling and
tracing training
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Latest update: April 29, 2024.
Document updates and training details:
TxxvsGddsVVxUt_DcVudxwSt_t_gddesuggt_g
Corrections, suggestions, contributions and translations are welcome!
Send them [email protected]  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 1/300

Linux debugging, profiling and tracing training
▶These slides are the training materials for Bootlin’sLinux
debugging, profiling and tracingtraining course.
▶If you are interested in following this course with an experienced
Bootlin trainer, we offer:
•Public online sessions, opened to individual registration. Dates
announced on our site, registration directly online.
•Dedicated online sessions, organized for a team of engineers
from the same company at a date/time chosen by our customer.
•Dedicated on-site sessions, organized for a team of engineers
from the same company, we send a Bootlin trainer on-site to
deliver the training.
▶Details and registrations:
TxxvsGddsVVxUt_DcVudxwSt_t_gddesuggt_g
▶Contact:xwSt_t_g-sVVxUt_DcVu
Icon by Eucalyp, Flaticon
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 2/300

About Bootlin
About Bootlin
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 3/300

Bootlin engineering services
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 5/300

Bootlin training courses
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 6/300

Bootlin, an open-source contributor
▶Strong contributor to theLinuxkernel
•In the top 30 of companies contributing to Linux worldwide
•Contributions in most areas related to hardware support
•Several engineers maintainers of subsystems/platforms
•8000 patches contributed

TxxvsGddsVVxUt_DcVudcVuuu_txydcV_xwtsuxtV_sd2ew_eU, cV_xwtsuxtV_sd
▶Contributor toYocto Project
•Maintainer of the official documentation
•Core participant to the QA effort
▶Contributor toBuildroot
•Co-maintainer
•5000 patches contributed
▶Significant contributions to U-Boot, OP-TEE, Barebox, etc.
▶Fullyopen-source training materials
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 7/300

Generic course information
Generic course
information
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 9/300

Supported hardware
Discovery Kits from STMicroelectronics: STM32MP157A-DK1, STM32MP157D-DK1,
STM32MP157C-DK2 or STM32MP157F-DK2
▶STM32MP157 (Dual Cortex-A7 + Cortex-M4) CPU
from STMicroelectronics
▶512 MB DDR3L RAM
▶Gigabit Ethernet port
▶4 USB 2.0 host ports, 1 USB-C OTG port
▶1 Micro SD slot
▶On-board ST-LINK/V2-1 debugger
▶Misc: buttons, LEDs, audio codec
▶LCD touchscreen (DK2 only) DK1 Discovery Kit
Board and CPU documentation, design files, software:A-DK1,D-DK1,C-DK2,F-DK2
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 10/300

Shopping list: hardware for this course
▶STMicroelectronics STM32MP157D-DK1 Discovery kit
▶USB-C cable for the power supply
▶USB-A to micro B cable for the serial console
▶RJ45 cable for networking
▶A micro SD card with at least 128 MB of capacity
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 11/300

Training quiz and certificate
▶You have been given a quiz to test your knowledge on the topics covered by the
course. That’s not too late to take it if you haven’t done it yet!
▶At the end of the course, we will submit this quiz to you again. That time, you
will see the correct answers.
▶It allows Bootlin to assess your progress thanks to the course. That’s also a kind
of challenge, to look for clues throughout the lectures and labs / demos, as all the
answers are in the course!
▶Another reason is that we only give training certificates to people who achieve at
least a 50% score in the final quizandwho attended all the sessions.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 12/300

Collaborate!
As in the Free Software and Open Source community, collaboration
between participants is valuable in this training session:
▶Use the dedicated Matrix channel for this session to add
questions.
▶If your session offers practical labs, you can also report issues,
share screenshots and command output there.
▶Don’t hesitate to share your own answers and to help others
especially when the trainer is unavailable.
▶The Matrix channel is also a good place to ask questions outside
of training hours, and after the course is over.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 14/300

Practical lab - Training Setup
Prepare your lab environment
▶Download and extract the lab archive
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 15/300

Debugging, Profiling, Tracing
Debugging, Profiling,
Tracing
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 16/300

Debugging, Profiling, Tracing
▶Debugging, profiling and tracing are often used
for development purposes
▶All of these methods have different goals which
aim at perfecting the software that is being
developed
▶Requires some knowledge about underlying
mechanisms to correctly identify and fix bugs
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 17/300

Debugging
▶Finding and fixing bugs that might exist in your software/system
▶Use of various tools and methods to achieve that
•Interactive debugging (With GDB for instance)
•Postmortem analysis (Using coredump for instance)
•Control flow analysis (With tracing tools)
•Testing (Targeted tests)
▶Most commonly done through debuggers in development environment
▶Generally intrusive, allowing to pause and resume execution
”Everyone knows that debugging is twice as hard as writing a program in the first place. So if
you’re as clever as you can be when you write it, how will you ever debug it?”
- Brian Kernighan
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 18/300

Profiling
▶Analysis at program runtime to assist performance optimizations
▶Often achieved by sampling counters during execution
▶Uses specific tools, libraries and operating system features to measure
performance.
•Usingperf,OProfilefor instance.
▶First step consists in gathering data from program execution
•Function call count, memory usage, CPU load, cache miss, etc
▶Then extracting meaningful information from these data and modify the program
to optimize it
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 19/300

Tracing
▶Following the execution flow of an application to understand the bottlenecks and
problems.
▶Achieved by instrumenting code either at compile time or runtime.
•Can be done using specific tracers such asLTTng,trace-cmd,SystemTapetc
▶Goes from the user space called functions up to the kernel ones
▶Allows to identify functions and values that are used while application executes
▶Often works by recording traces during runtime and then visualizing data.
•Implies a large amount of recorded data since the complete execution trace is
recorded
•Often bigger overhead than profiling.
▶Can also be used for debugging purpose since data can be extracted with
tracepoints.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 20/300

Linux Application Stack
Linux Application Stack
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 21/300

Linux Application Stack
User/Kernel mode
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 22/300

User/Kernel mode
▶User mode vs Kernel mode are often used to refer to the privilege level of
execution.
▶This mode actually refers to the processor execution mode which is a hardware
mode.
•Might be named differently between architectures but the goal is the same
▶Allows the kernel to control the full processor state (handle exceptions, MMU,
etc) whereas the userspace can only do basic control and execute under the kernel
supervision.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 23/300

Linux Application Stack
Introduction to Processes and Threads
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 24/300

Processes and Threads (1/2)
▶A process is a group of resources that are allocated by the operating to allow the
execution of a program.
•Memory regions, threads, files, etc.
▶A process is identified by a PID (ProcessID) and all the information that are
specific to this process are exposed indvwVcd< vtdH .
•A special file nameddvwVcdseU/ accessible by the process points to the proc folder
associated to it.
▶When starting a process, it initially has one execution thread that is represented
by asxwucx xSs2Rsxwucx and that can be scheduled.
•A process is represented in the kernel by a thread associated to multiple resources.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 25/300

Linux Application Stack
MMU and memory management
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 27/300

MMU and memory management
▶Under Linux Kernel (when usingCONkILRnn(=y ), all addresses that are accessed
by the CPU are virtual
▶The Memory Management Unit allows to map these virtual addresses to physical
memory (either RAM or IO)
▶All these mappings are inserted into the page table that is used by the MMU
hardware to translate the CPU access from virtual to physical addresses
▶The MMU allows to restrict access to the page mappings via some attributes
•No Execute, Writable, Readable bits, Privileged/User bit, cacheability
▶The MMU base unit for mappings is called a page
▶Page size is fixed and depends on the architecture/kernel configuration.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 28/300

Userspace/Kernel memory layout
▶Each process has its own set of virtual
memory areas (uufield of
sxwucx xSs2Rsxwucx ).
▶Also have their own page table
•But share the same kernel mappings
▶By default, all user mapping addresses are
randomized to minimize attack surface
(base of heap, stack, text, data, etc).
•AddressSpaceLayoutRandomization
•Can be disabled using_VwS_7uSvX
command line parameter
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 29/300

Userspace/Kernel memory layout
Multiple processes have different user memory spaces
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 30/300

Kernel memory map
▶The kernel has it own memory mapping.
▶Linear mapping is setup at kernel startup
by inserting all the entries in the kernel init
page table.
▶Multiple areas are identified and their
location differs between the architectures.
▶KernelAddressSpaceLayout
Randomization also allows to randomize
kernel address space layout.
•Can be disabled using_V2SXUw command
line parameter
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 31/300

Userspace memory segments
▶When starting a process, the kernel sets up severalVirtual Memory Areas (VMA),
backed bysxwucx yuRSweSRsxwucx , with different execution attributes.
▶VMA are actually memory zones that are mapped with specific attributes
(R/W/X).
▶A segmentation fault happens when a program tries to access an unmapped area
or a mapped area with an access mode that is not allowed.
•Writing data in a read-only segment
•Executing data from a non-executable segment
▶New memory zones can be created usinguuSv() (uS_ uuSv(e) )
▶Per application mappings are visible in/proc/<pid>/maps
f/E8!!seS000,f/E8!!sec000 ww,v 00030000 E03G0E 3408F!0 Ud,eD33DsV
f//c0EFe!000,f//c0EF4F000 ww,v 00000000 00G00 0 qsxSc2]
f//c0EFe!000,f//c0EFe9000 w,,v 00000000 00G00 0 qyySw]
f//c0EFe9000,f//c0EFes000 w,zv 00000000 00G00 0 qydsV]
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 32/300

Userspace memory types
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 33/300

Terms for memory in Linux tools
▶When using Linux tools, four terms are used to describe memory:
•VSS/VSZ: Virtual Set Size (Virtual memory size, shared libraries included).
•RSS: Resident Set Size (Total physical memory usage, shared libraries included).
•PSS: Proportional Set Size (Actual physical memory used, divided by the number of
times it has been mapped).
•USS: Unique Set Size (Physical memory occupied by the process, shared mappings
memory excluded).
▶VSS >= RSS >= PSS >= USS.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 34/300

Linux Application Stack
The process context
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 35/300

Process context
▶Theprocess contextcan be seen as the content of the CPU registers associated to
a process: execution register, stack register...
▶This context also designates an execution state and allows to sleep inside kernel
mode.
▶A process that is executing in process context can be preempted.
▶While executing in such context, the current processsxwucx xSs2Rsxwucx can be
accessed usinggexRcuwwe_x() .
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 36/300

Linux Application Stack
Scheduling
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 37/300

Scheduling
▶The scheduler can be invoked for various reasons
•On a periodic tick caused by interrupt (lQ)
•On a programmed interrupt on tickless systems (CONkILRNORlZ=y )
•Voluntarily by callingscTeduUe() in code
•Implicitly by calling functions that can sleep (blocking operations such as
2uSUUVc() ,wStxReye_x() ).
▶When entering the schedule function, the scheduler will elect a new
sxwucx xSs2Rsxwucx to run and will eventually call theswtxcTRxV() macro.

swtxcTRxV() is defined by architecture code and it will save the current task
process context and restore the one of the next task to be run while setting the
new current task running.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 38/300

The Linux Kernel Scheduler
▶The Linux Kernel Scheduler is a key piece in having a real-time behaviour
▶It is in charge of deciding whichrunnabletask gets executed
▶It also elects on which CPU the task runs, and is tightly coupled to CPUidle and
CPUFreq
▶It schedules bothuserspacetasks andkerneltasks
▶Each task is assigned onescheduling classorpolicy
▶The class determines the algorithm used to elect each task
▶Tasks with different scheduling classes can coexist on the system
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 39/300

Non-Realtime Scheduling Classes
There are 3Non-RealTimeclasses

PClKjROplK1 : The default policy, using a time-sharing algorithm

PJlKjRiIpJl : Similar toPClKjROplK1 , but designed for CPU-intensive loads that
affect the wakeup time

PJlKjR<jNK : Very low priority class. Tasks with this policy will run only if nothing
else needs to run.

PClKjROplK1 and PJlKjRiIpJl use thenicevalue to increase or decrease their
scheduling frequency
•A higher nice value means that the tasks gets scheduledlessoften
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 40/300

Realtime Scheduling Classes
There are 3Realtimeclasses
▶Runnable tasks will preempt any other lower-priority task

PClKjRkIkO : All tasks with the same priority are scheduledFirst in, First out

PJlKjR11 : Similar toPClKjRkIkO but with a time-sharing round-robin between
tasks with the same priority
▶Both PClKjRkIkO and PJlKjR11 can be assigned a priority between 1 and 99

PJlKjRjKIjN<9K : For tasks doing recurrent jobs, extra attributes are attached to
a task
•A computation time, which represents the time the task needs to complete a job
•A deadline, which is the maximum allowable time to compute the job
•A period, during which only one job can occur
▶Using one of these classes is necessary but not sufficient to get real-time behavior
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 41/300

Changing the Scheduling Class
▶The Scheduling Class is set per-task, and defaults toPClKjROplK1
▶The uS_ scTedRsexscTeduUew(e) syscall allows changing the class of a task
▶The cTwx tool uses it to allow changing the class of a running task:

cTwx , /d, sd, Vd, wd, d , v O1IO OIj
▶It can also be used to launch a new program with a dedicated class:

cTwx , /d, sd, Vd, wd, d O1IO Cnj
▶To show the current class and priority:

cTwx , v OIj
▶New processes will inherit the class of their parent except if the
PClKjR1KPKpRONRkO1m flag is set withuS_ scTedRsexscTeduUew(e)
▶See uS_ scTed(f) for more information
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 42/300

Linux Application Stack
Context switching
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 43/300

Context switching
▶Context switching is the action of changing the execution mode of the processor
(Kernel↔User).
•Explicitly by executing system calls instructions (synchronous request to the kernel
from user mode).
•Implicitly when receiving exceptions (MMU fault, interrupts, breakpoints, etc).
▶This state change will end up in a kernel entrypoint (often call vectors) that will
execute necessary code to setup a correct state for kernel mode execution.
▶The kernel takes care of saving registers, switching to the kernel stack and
potentially other things depending on the architecture.
•Does not use the user stack but a specific kernel fixed size stack for security
purposes.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 44/300

Exceptions
▶Exceptions designate the kind of events that will trigger a CPU execution mode
change to handle the exception.
▶Two main types of exceptions exist: synchronous and asynchronous.
•Asynchronous exceptions when a fault happens while executing (MMU, bus abort,
etc) or when an interrupt is received (either software or hardware).
•Synchronous when executing some specific instructions (breakpoint, syscall, etc)
▶When such exception is triggered, the processor will jump to the exception vector
and execute the code that was setup for this exception.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 45/300

Interrupts
▶Interrupts are asynchronous signals that are generated by the hardware
peripherals.
•Can also be synchronous when generated using a specific instruction (Inter
ProcessorInterrupts for instance).
▶When receiving an interrupt, the CPU will change its execution mode by jumping
to a specific vector and switching to kernel mode to handle the interrupt.
▶When multiple CPUs (cores) are present, interrupts are often directed to a single
core.
▶This is called ”IRQ affinity” and it allows to control the IRQ load for each CPU
•See cVwe, SvtdtwWdtwW, S//t_txy andman irqbalance(1)
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 46/300

Interrupts
▶While handling the interrupts, the kernel is executing in a specific context named
interrupt context.
▶This context does not have access to userspace and should not use
gexRcuwwe_x() .
▶Depending on the architecture, might use an IRQ stack.
▶Interrupts are disabled (no nested interrupt support)!
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 47/300

System Calls (1/2)
▶A system call allows the user space to request services from the kernel by executing
a special instruction that will switch to the kernel mode (uS_ syscSUU(e) )
•When executing functions provided by the libc (weSd() ,wwtxe() , etc), they often
end up executing a system call.
▶System calls are identified by a numeric identifier that is passed via the registers.
•The kernel exports some defines (inY_tXx7DT ) that are namedRRN1R< sycSUUH and
defines the syscall identifiers.
b76/t_6 RR91Rw6S7 F8
bde/t_e RRN1Rwwtxe F4
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 48/300

System Calls (2/2)
▶The kernel holds a table of function pointers which matches these identifiers and
will invoke the correct handler after checking the validity of the syscall.
▶System call parameters are passed via registers (up to 6).
▶When executing this instruction the CPU will change its execution state and
switch to the kernel mode.
▶Each architecture uses a specific hardware mechanism (uS_ syscSUU(e) )
uVy w8 : bRRN1Rgexvtd
syc b0
xXx_6 z0 : zE
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 49/300

Linux Application Stack
Kernel execution contexts
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 50/300

Kernel execution contexts
▶The kernel runs code in various contexts depending on the event it is handling.
▶Might have interrupts disabled, specific stack, etc.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 51/300

Kernel threads
▶Kernel threads (kthreads) are a special kind ofsxwucx xSs2Rsxwucx that do not
have any user resources associated (uu >> 9(NN ).
▶These processes are cloned from the2xTw6S77 process and can be created using
2xTweSdRcweSxe() .
▶Kernel threads are scheduled and are allowed to sleep much like a process
executing in process context.
▶Kernel threads are visible and their names are displayed between brackets underps:
$ vX ,,vvt7 e ,v e ,V u_Sue:vtd:vvtd:cud:cUs
(PK1 O<j OO<j Jnj JNP
wVVx e 0 q2xTweSdd] pP
wVVx 8 e qwcuRgv] pP
wVVx = e qwcuRvSwRgv] pP
wVVx ! e q_ex_s] pP
wVVx f e q2wVw2ewd0G0l,eye_xsRTtgTvw pP
wVVx E0 e quuRvewcvuRwW] pP
wVVx EE e qwcuRxSs2sR2xTweSd] pP
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 52/300

Workqueues
▶Workqueues allows to schedule some work to be executed at some point in the
future
▶Workqueues are executing the work functions in kernel threads.
•Allows to sleep while executing the deferred work.
•Interrupts are enabled while executing
▶Work can be executed either in dedicated work queues or in the default workqueue
that is shared by multiple users.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 53/300

softirq
▶SoftIRQs is a specific kernel mecanism that is executed in software interrupt
context.
▶Allows to execute code that needs to be deferred after interrupt handling but
needs low latency.
•Executed right after hardware IRQ have been handled in interrupt context.
•Same context as executing interrupt handler so sleeping is not allowed.
▶Tasklets are using softirqs to execute their work so they run in the same context
and the same constraints are applied.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 54/300

Interrupts & Softirqs
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 55/300

Threaded interrupts
▶Threaded interrupts are a mecanism that allows to handle the interrupt using a
hard IRQ handler and a threaded IRQ handler.
▶A threaded IRQ handler will allow to execute work that can potentially sleep in a
kthread.
▶One kthread is created for each interrupt line that was requested as a threaded
IRQ.
•kthreadis namedtwWd< twWH , < _SueH and can be seen usingps.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 56/300

Allocations and context
▶Allocating memory in the kernel can be done using multiple functions:

yVt7 C 2uSUUVc ( Xtff6Rx stffe: g/vRx g/vRuSs2)g

yVt7 C 2ffSUUVc ( Xtff6Rx stffe: g/vRx g/vRuSs2)g

u_stg_ed UV_g RRgexR/weeRvSges(g/vRx g/vRuSs2: u_stg_ed t_x Vwdew)
▶All allocation functions take ag/vRuSs2 parameter which allows to designate the
kind of memory that is needed.

LkORmK19KN : Normal allocation, can sleep while allocating memory (can not be used
in interrupt context).

LkORIpOnIC : Atomic allocation, won’t sleep while allocating data.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 57/300

Practical lab - Preparing the system
Prepare the STM32MP157D board
▶Build an image using Buildroot
▶Connect the board
▶Load the kernel from SD card
▶Mount the root filesystem over NFS
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 58/300

Linux Common Analysis & Observability Tools
Linux Common Analysis
& Observability Tools
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 59/300

Linux Common Analysis & Observability Tools
Pseudo Filesystems
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 60/300

Pseudo Filesystems
▶Some virtual filesystems are exposed by the kernel and provide a lot of information
on the system.
▶procfscontains information about processes and system information.
•Mounted on dvwVc
•Often parsed by tools to display raw data in a more user-friendly way.
▶sysfsprovides informations about hardware/logical devices, association between
devices and drivers.
•Mounted on dX0X
▶debugfsexposes information related to debug.
•Typically mounted ondsysd2ew_eUddesugd

uVu_x , x desug/s _V_e dsysd2ew_eUddesug
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 61/300

procfs
▶procfsexposes information about processes and system (uS_ vwVc(!) ).

dvwVcdcvut_/V CPU information.

dvwVcdueut_/V memory information (used, free, total, etc).

dvwVcdsysd contains system parameters that can be tuned. The list of parameters
that can be modified is available atSdut_, gutdedsyscxUdt_dez

dvwVcdt_xewwuvxs : interrupt count per CPU for each interrupt in use
We also have one entry per interrupt indvwVcdtwW for specific configuration/status
for each interrupt line

dvwVcd< vtdH d process related information
dvwVcd< vtdH dsxSxus process basic information
dvwVcd< vtdH duSvs process memory mappings
dvwVcd< vtdH d/d file descriptors of the process
dvwVcd< vtdH dxSs2 descriptors of threads belonging to the process

dvwVcdseU/d will refer to the process used to access the file
▶A list of all availableprocfsfile and their content is described at
/tUesysxeusdvwVc and uS_ vwVc(!)
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 62/300

sysfs
▶sysfsfilesystem exposes information about various kernel subsystems, hardware
devices and association with drivers (uS_ sys/s(!) ).
▶This allows to find the link between drivers and devices through a file hierarchy
representing the kernel internal tree of devices.

dX0Xd26w_6U contains interesting files for kernel debugging:

twWwith information about interrupts (mapping, count, etc).

xwSct_g for tracing control.

Sdut_, gutdedSst, sxSsUe
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 63/300

debugfs
▶debugfsis a simple RAM-based filesystem which exposes debugging information.
▶Used by some subsystems (clk,block,dma,gpio, etc) to expose debugging
information related to the internals.
▶Usually mounted ondsysd2ew_eUddesug
•Dynamic debug features exposed throughdsysd2ew_eUddesugddy_SutcRdesug (also
exposed invwVc)
•Clock tree exposed throughdsysd2ew_eUddesugdcU2dcU2RsuuuSwy .
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 64/300

Linux Common Analysis & Observability Tools
ELF file analysis
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 65/300

ELF files
Executable andLinkableFormat
▶File starting with a header which holds binary structures
defining the file
▶Collection of segments and sections that contain data

Dx6zx section: Code

D7SxS section: Data

DwV7SxS section: Read-only Data

DdesugRt_/V section: Contains debugging information
▶Sections are part of a segment which can be loadable in
memory
▶Same format for all architectures supported by the kernel
and alsoyuUt_Yz format
•Also used by a lot of other operating systems as the
standard executable file format...
.data
.rodata
.text
Program header table
ELF header
Section header table












- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 66/300

binutils for ELF analysis
▶The binutils are used to deal with binary files, either object files or executables.
•IncludesU7,SXand other useful tools.
▶readelfdisplays information about ELF files (header, section, segments, etc).
▶objdumpallows to display information and disassemble ELF files.
▶objcopycan convert ELF files or extract/translate some parts of it.
▶nmdisplays the list of symbols embedded in ELF files.
▶addr2linefinds the source code line/file pair from an address using an ELF file
with debug information
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 67/300

binutils example (1/2)
▶Finding the address of2sysRweSd() kernel function usingnm:
$ _u yuUt_uz | gwev 2sysRweSd
c0ecf040 p 2sysRweSd
▶Usingaddr2lineto match a kernel OOPS address or a symbol name with source
code:
$ SddweUt_e ,s ,/ ,e yuUt_uz ////////8E4!S8s0
WueueRwcRsTVw
sU2,sys/sDcG!EF
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 68/300

binutils example (2/2)
▶Display an elf header withreadelf:
$ w6S76U/ ,T st_Sw0
KNk l6S76wG
nSgtcG f/ 4! 4c 4F 0e 0E 0E 00 00 00 00 00 00 00 00 00
JUSXXG KNkF=
jSxSG e's cVuvUeue_x: UtxxUe e_dtS_
)6wXtV_G E (cuwwe_x)
OPdIiIG (9<. , P0Xx6u )
Ii< )6wXtV_G 0
p0v6G jYN (OVstxtV_,I_deve_de_x KzecuxSsUe /tUe)
nScTt_eG IdyS_ced ntcwV jeytces X8F,F4
DDD
▶Convert an elf file to a flat binary file usingobjcopy:
$ Vs3cVvy ,O st_Swy /tUeDeU/ /tUeDst_
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 69/300

ldd
▶In order to display the shared libraries used by an ELF binary, one can useldd
(Generally packaged with C library. SeeuS_ Udd(E) ).
▶lddwill list all the libraries that were used at link time.
•Libraries that are loaded at runtime usingdUope_() are not displayed.
$ U77 dYXwdst_dsSXT
Ut_ux-vdsoDsoDE (0x0000fffdf3fc6000)
UtsreSdUt_eDsoD8 => dusrdUtsdUtsreSdUt_eDsoD8 (0x0000ffSedeSef000)
UtscDsoD6 => dusrdUtsdUtscDsoD6 (0x0000ffSede905000)
Uts_curseswDsoD6 => dusrdUtsdUts_curseswDsoD6 (0x0000ffSede88e000)
dUts64dUd-Ut_ux-x86-64DsoDe => dusrdUts64dUd-Ut_ux-x86-64DsoDe (0x0000ffSedec88000)
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 70/300

Linux Common Analysis & Observability Tools
Monitoring tools
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 71/300

Monitoring Tools
▶Lots of monitoring tools on Linux to allow monitoring various part of the system.
▶Most of the time, these are CLI interactive programs.
•Processes withps,top,htop, etc
•Memory withfree,vmstat
•Networking
▶Almost all these tools relies on thesysfsorprocfsfilesystem to obtain the
processes, memory and system information but will display them in a more human
readable way.
•Networking tools uses a netlink interface with the networking subsystem of the
kernel.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 72/300

Linux Common Analysis & Observability Tools
Process and CPU monitoring tools
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 73/300

Processes withps
▶Thepscommand allows to display a snapshot of active processes and their
associated information (uS_ vs(E) )
•Lists both user processes and kernel threads.
•Displays PID, CPU usage, memory usage, uptime, etc.
•Uses/proc/<pid>/directory to obtain process information.
•Always present on almost all embedded platforms (provided byBusybox).
▶By default, displays only the current user/current tty processes.
▶Useful for scripting and parsing since its output is static.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 74/300

Processes withps
▶Display all processes in a friendly way:
$ vX SYz
(PK1 OIj %CO( %nKn )PZ 1PP ppY PpIp PpI1p pInK COnnINj
wVVx E 0D0 0D0 EF88F4 Ee800 h Ps 09G08 0G00 dsst_dt_tx
wVVx e 0D0 0D0 0 0 h P 09G08 0G00 q2xTweSdd]
wVVx 3 0D0 0D0 0 0 h I< 09G08 0G00 qwcuRgv]
wVVx 4 0D0 0D0 0 0 h I< 09G08 0G00 qwcuRvSwRgv]
wVVx ! 0D0 0D0 0 0 h I< 09G08 0G00 q_ex_s]
DDD
wVVx 9E4 0D0 0D0 39FeEF EFee0 h PsU 09G08 0G04 duswdUtsezecdudts2sedudts2sd
SySTt 9e9 0D0 0D0 8fe8 4Ee h P 09G08 0G00 SySTt,dSeuV_G cTwVVx TeUvew
wVVx 9!F 0D0 0DE eF0304 E90e4 h PsU 09G08 0G0e duswdsst_dNexwVw2nS_Sgew ,,_V,dSeuV_
wVVx 9F0 0D0 0D0 Ef040 !f04 h Ps 09G08 0G00 dsst_dwvSRsuvvUtcS_x ,u ,s ,O dwu_dwvSRsuvvUt
wVVx 9Fe 0D0 0D0 3EfF44 EE89F h PsU 09G08 0G00 duswdsst_dnVdeunS_Sgew
y_XxSx 98f 0D0 0D0 !!EF 3F9F h Ps 09G08 0G00 duswdsst_dy_sxSxd ,_
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 75/300

Processes withtop
▶topcommand output information similar topsbut dynamic and interactive
(uS_ xVv(E) ).
•Also almost always present on embedded platforms (provided byBusybox)
$ xVv
xVv , E8G38GEE uv 9Ge9: E usew: UVSd SyewSgeG eD84: eDf4: eD0e
pSs2sG 3fE xVxSU: E wu__t_g: 3f0 sUeevt_g: 0 sxVvved: 0 ffVuste
% Cvu(s)G ! D5 YX: e DE X0: 0 D0 _t: ff D= t7: E= Df ZS: 0 D0 Tt: 0 D0 st: 0 D0 sx
nti neu G E!94fDF xVxSU: E4fFD9 /wee: fF8!Df used: Ff84D9 su//dcScTe
nti PwSvG E!e!9D0 xVxSU: E!e38Df /wee: e0De usedD ff4eD3 SyStU neu
O<j (PK1 O1 NI )I1p 1KP Pl1 P %CO( %nKn pInKc COnnINj
e988 cUegew e0 0 !E848EF EDeg 430e44 P eFDf fD9 F0Ge4Def /twe/Vz,esw
43eF cUegew e0 0 EFD4g e08E04 8E!04 P eFDf ED3 9GefD33 cVde
909 wVVx ,!E 0 0 0 0 P E3D3 0D0 E!GEeDE! twWdE04,_ytdtS
4Ef04 cUegew e0 0 38D4g 3f3f44 EEF984 P E3D3 eD3 E3Ge!DfF cVde
9E9eF cUegew e0 0 e!E4f84 E4!3F0 9!E44 P E3D3 0D9 EGe9D8! Wes CV_xe_x
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 76/300

mpstat
▶mpstatdisplays Multiprocessor statistics (uS_ uvsxSx(E) ).
▶Useful to detect unbalance CPU workloads, bad IRQ affinity, etc.
$ uvXxSx ,O INN
Nt_uz FD0D0,E,SudF4 (/tze) E9dE0de0ee Rz5FRF=R (4 CO()
EfG0eG!0 CO( % usw %_tce %sys %tVwStx %twW %sV/x %sxeSU %guesx %g_tce %tdUe
EfG0eG!0 SUU F:ff 0:00 e:09 EE:Ff 0:00 0:0F 0:00 0:00 0:00 f9:40
EfG0eG!0 0 F:88 0:00 E:93 8:ee 0:00 0:E3 0:00 0:00 0:00 8e:84
EfG0eG!0 E 4:9E 0:00 E:!0 8:9E 0:00 0:03 0:00 0:00 0:00 84:F4
EfG0eG!0 e F:9F 0:00 E:f4 f:e3 0:00 0:0E 0:00 0:00 0:00 84:0F
EfG0eG!0 3 9:3e 0:00 e:80 !4:Ff 0:00 0:00 0:00 0:00 0:00 33:e0
EfG0eG!0 4 !:40 0:00 E:e9 4:9e 0:00 0:00 0:00 0:00 0:00 88:40
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 77/300

Linux Common Analysis & Observability Tools
Memory monitoring tools
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 78/300

free
▶freeis a simple program that displays the amount of free and used memory in the
system (uS_ free(E) ).
•Useful to check if the system suffers from memory exhaustion
•Uses dprocdueut_fo to obtain memory information.
$ /w66 ,T
xVxSU YX67 free sTSred suffdcScTe SvStUSsUe
n6uG E!Lt fD!Lt ED=Lt E4ent FDFLt fD!Lt
PZSvG E=Lt e0nt E=Lt
▶A small/w66 value does not mean that your system suffers from memory
depletion ! Linux considers any unused memory as ”wasted” so it uses it for
buffers and caches to optimize performance. See alsodropRcScTes from
uS_ proc(5) to observe buffers/cache impact on free/available memory
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 79/300

vmstat
▶vmstatdisplays information about system virtual memory usage
▶Can also display stats from processes, memory, paging, block IO, traps, disks and
cpu activity (uS_ yusxSx(8) ).
▶Can be used to gather data at periodic interval using
yusxSx < t_xewySUH < _uusewH
$ yuXxSx E F
vwVcs ,,,,,,,,,,,ueuVwy,,,,,,,,,, ,,,swSv,, ,,,,,tV,,,, ,sysxeu,, ,,,,,,cvu,,,,,
w s swvd /wee su// cScTe st sV st sV t_ cs us sy td wS sx
3 0 e!3440 Ee3fe3F E9493F 9e8F980 3 F E8F !40 E34 E!f 3 ! 8e E0 0
▶Note: vmstat consider a kernel block to be 1024 bytes
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 80/300

pmap

vuSv displays process mappings more easily than accessingdvwVcd< vtdH duSvs
(uS_ vuSv(E) ).
b vuSv e00e
e00eG duswdst_ddsus,dSeuV_ ,,sesstV_ ,,Sddwess=sysxeudG ,,_V/Vw2 ,,_Vvtd/tUe ,,sysxeud,ScxtySxtV_ ,,sysUVg,V_Uy
DDD
0000f/3/9!8ss000 !Fm w,,,, Utsdsus,EDsVD3D3eDE
0000f/3/9!8c9000 E9em w,z,, Utsdsus,EDsVD3D3eDE
0000f/3/9!8/9000 84m w,,,, Utsdsus,EDsVD3D3eDE
0000f/3/9!90e000 5m w,,,, Uts7sYX,EDXVD8D8eDE
0000f/3/9!9E0000 4m ww,,, Utsdsus,EDsVD3D3eDE
0000f/3/9!93f000 8m ww,,, q S_V_ ]
0000f/3/9!939000 5m w,,,, U7,Ut_Yz,z5F,F=DXVDe
0000f/3/9!93s000 E!em w,z,, Ud,Ut_uz,z8F,F4DsVDe
0000f/3/9!9FE000 44m w,,,, Ud,Ut_uz,z8F,F4DsVDe
0000f/3/9!9Fc000 5m w,,,, U7,Ut_Yz,z5F,F=DXVDe
0000f/3/9!9Fe000 8m ww,,, Ud,Ut_uz,z8F,F4DsVDe
0000f//eE38!f000 E3em ww,,, q sxSc2 ]
0000f//eE3934000 EFm w,,,, q S_V_ ]
0000f//eE3938000 8m w,z,, q S_V_ ]
xVxSU EE088m
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 81/300

Linux Common Analysis & Observability Tools
I/O monitoring tools
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 82/300

iostat
▶iostatdisplays information about IOs per device on the system.
▶Useful to see if a device is overloaded by IOs.
$ tVXxSx
Nt_uz !DE9D0,e,SudF4 (/tze) EEdE0de0ee Rz5FRF=R (Ee CO()
Syg,cvuG %usew %_tce %sysxeu %tVwStx %sxeSU %tdUe
8:43 0:00 E:!e 8:ff 0:00 8E:e8
jeytce xvs 2iRweSdds 2iRwwx_ds 2iRdscdds 2iRweSd 2iRwwx_ 2iRdscd
_yue0_E !!:89 E09F:88 E49:33 0:00 !EEf334 F9FFF8 0
X7S 0:03 0:9e 0:00 0:00 4308 0 0
X7s E04:4e ef4:!! eEeF:F4 0:00 Ee808!3 99eE488 0
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 83/300

iotop
▶iotopdisplays information about IOs much liketopfor each process.
▶Useful to find applications generating too much I/O traffic.
•Needs CONkILRpIPmPpIpP=y ,CONkILRpIPmRjKNIYRICCp=y and
CONkILRpIPmRIORICCO(NpINL=y to be enabled in the kernel.
b tVxVv
pVxSU j<Pm 1KIjG e0DFE mds | pVxSU jIPm W1IpKG !ED!e mdX
JYww6_x j<Pm 1KIjG e0DFE mds | Cuwwe_x jIPm W1IpKG e4D04 mds
pIj O1IO (PK1 jIPm 1KIj jIPm W1IpKH COnnINj
eFe9 sed4 cUegew e0DFE mds 44DF! mds /twe/Vz,esw qCScTee IdO]
8ee s6d8 wVVx 0D00 ids 3D43 mds q3sded_yue0_EvE,8]
390!! sed4 cUegew 0D00 ids 3D43 mds /twe/Vz,esw qjOnCScTepTweSd]
E s6d= wVVx 0D00 ids 0D00 ids t_tx
e s6d= wVVx 0D00 ids 0D00 ids q2xTweSdd]
3 sed0 wVVx 0D00 ids 0D00 ids qwcuRgv]
4 sed0 wVVx 0D00 ids 0D00 ids qwcuRvSwRgv]
DDD
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 84/300

Practical lab - System Status
Check what is running on a system and its load
▶Observe processes and IOs
▶Display memory mappings
▶Monitor resources
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 85/300

Application Debugging
Application Debugging
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 86/300

Application Debugging
Good practices
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 87/300

Good practices
▶Some good practices can allow you to save time before even needing to use a
debugger
▶Compiler are now smart enough to detect a wide range of errors at compile-time
using warnings
•Using, %6wwVw , %SUU , %6zxwS is recommended if possible to catch errors as early
as possible
▶Compilers now offer static analysis capabilities
•GCC allows to do so using the-fanalyzerflag
•LLVM providesdedicated toolsthat can be used in build process
▶You can also enable component-specific helpers/hardening
•If you are using the GNU C library, you can for example enable
_FORTIFY_SOURCE macro to add runtime checks on inputs (e.g: buffers)
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 88/300

Application Debugging
Building with debug information
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 89/300

Debugging with ELF files
▶GDB uses ELF files since they are containing the debugging
information
▶Debugging information uses the DWARF format
▶Allows the debugger to match addresses and symbol names,
call sites, etc
▶Debugging information is generated by the compiler and
included in the ELF file when compiled with, g

, gE: minimal debug information (enough for backtraces)

, ge: default debug level when using, g

, g3: includes extra debugging information (macro
definitions)
▶SeeGCC documentationabout debugging for more
information
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 90/300

Debugging with compiler optimizations
▶Compiler optimizations (, O< UeyeUH ) can lead to optimizing out some variables
and function calls.
▶Trying to display them with GDB will display

$E = < ySUue Vvxtutffed VuxH
▶If one wants to inspect variables and functions, it is possible to compile the code
using, O0(no optimization).
•Note: The kernel can only be compiled with, Oeor, Os
▶It is also possible to annotate function with compiler attributes:

RRSxxwtsuxeRR((Vvxtutffe(BO0B)))
▶Remove functionsxSxtc qualifier to avoid inlining the function
•Note: LTO (Link Time Optimization) can defeat this.
▶Set a specific variable asyVUSxtU6 to prevent the compiler from optimizing it out.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 91/300

Application Debugging
Instrumenting code crashes
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 92/300

Instrumenting code crashes
▶Displaying a backtrace from your application were the crash happened is useful to
debug and can be done usingsSc2xwSce() (uS_ sSc2xwSce(3) ) GNU extension
function:
cTSw CC sSc2xwSceRsyusVUs ( yVt7 C cV_sx CsY//6w: t_x stffe)g
▶Thanks tostg_SU() (man signal(3)) we can add hooks on specific signals to print
our backtrace
•This is for example very useful to catchP<LPKL) signal to dump our current
backtrace
yVt7 (Cstg_SU( t_x stg: yVt7 (C/u_c)( t_x )))( t_x )g
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 93/300

Custom code crash report
qDDD]
yVtd cSUUee( yVtd Cvxw) {
t_x Cuyvxw = ( t_x C)vxwg
vwt_x/( BKzecuxt_g susvtctVus VvewSxtV_ \_ B )g
uyvxwq e ] = 0g
fi
yVtd cSUUew( yVtd ) {
yVtd Cvxw = N(NN g
cSUUee(vxw)g
fi
yVtd seg/SuUxRTS_dUew( t_x stg) {
yVtd CSwwSyq e0 ]g
stffeRx stffeg
/vwt_x/(sxdeww: BPegue_xSxtV_ /SuUx! \_ B )g
stffe = sSc2xwSce(SwwSy: e0 )g
sSc2xwSceRsyusVUsR/d(SwwSy: stffe: PpjK11RkINKNO)g
eztx( E )g
fi
t_x uSt_() {
stg_SU(PILPKL): seg/SuUxRTS_dUew)g
vwt_x/( BCSUUt_g S /SuUxy /u_cxtV_ \_ B )g
cSUUew()g
wexuw_ 0g
fi
qwVVx-SwcT,sVVxUt_,SUezts cusxVuRsSc2xwSce]b DduSt_
CSUUt_g S /SuUxy /u_cxtV_
Kzecuxt_g susvtctVus VvewSxtV_
Pegue_xSxtV_ /SuUx!
DduSt_(seg/SuUxRTS_dUewc0zF0)q0z!!cFe4cEfe3c]
duswdUtsdUtscDsVDF(c0z38/!0)q0zf/ecs0S9!/!0]
DduSt_(cSUUeec0zes)q0z!!cFe4cEfEs4]
DduSt_(cSUUewc0zEc)q0z!!cFe4cEfEd9]
DduSt_(uSt_c0zec)q0z!!cFe4cEfe9S]
duswdUtsdUtscDsVDF(c0ze3f90)q0zf/ecs0S80f90]
duswdUtsdUtscDsVDF(RRUtscRsxSwxRuSt_c0z8S)q0zf/ecs0S8084S]
DduSt_(RsxSwxc0ze!)q0z!!cFe4cEf0s!]
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 94/300

Application Debugging
The ptrace system call
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 95/300

ptrace
▶Theptracemechanism allows processes to trace other processes by accessing
tracee memory and register contents
▶A tracer can observe and control the execution state of another process
▶Works by attaching to a tracee process using thevxwSce() system call (see
uS_ vxwSce(e) )
▶Can be executed directly using thevxwSce() call but often used indirectly using
other tools.
UV_g vxwSce ( 6_Yu RRvxwSceRweWuesx weWuesx: vt7Rx vt7: yVt7 CS77w: yVt7 CdSxS)g
▶Used byGDB,straceand all debugging tools that need access to the tracee
process state
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 96/300

Application Debugging
GDB
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 97/300

GDB: GNU Project Debugger
▶The debugger on GNU/Linux, available for most embedded
architectures.
▶Supported languages: C, C++, Pascal, Objective-C, Fortran,
Ada...
▶Command-line interface
▶Integration in many graphical IDEs
▶Can be used to
•control the execution of a running program, set breakpoints or
change internal variables
•to see what a program was doing when it crashed: post mortem
analysis

TxxvsGddwwwDg_uDVwgdsV/xwSwedgdsd

TxxvsGdde_Dwt2tvedtSDVwgdwt2tdLds
▶New alternative:lldb(TxxvsGddUUdsDUUyuDVwgd )
from the LLVM project.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 98/300

GDB crash course (1/3)
▶GDB is used mainly to debug a process by starting it withgdb

$ gds < vwVgwSuH
▶GDB can also be attached to running processes using the program PID

$ gds , v < vtdH
▶When using GDB to start a program, the program needs to be run with

(gds) wu_ qvwVgRSwgE qvwVgRSwge] DDD]
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 99/300

GDB crash course (2/3)
A few useful GDB commands

sw6S2 /VVsSw (s)
Put a breakpoint at the entry of function/VVsSw()

sweS2 /VVsSwDcG4e
Put a breakpoint in/VVsSwDc , line 42

vwt_x ySw ,vwt_x $weg orvwt_x xSs2, H /tUesq0]D/d (v)
Print the variableySw, the register$wegor a more complicated reference. GDB can also
nicely display structures with all their members

t_/V wegtsxews
Display architecture registers
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 100/300

GDB crash course (3/3)

cV_xt_ue (c)
Continue the execution after a breakpoint

_6zx (_)
Continue to the next line, stepping over function calls

Xx6v (X)
Continue to the next line, entering into subfunctions

Xx6vt (Xt)
Continue to the next instruction

/t_tXT
Execute up to function return

sSc2xwSce (sx)
Display the program stack
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 101/300

GDB advanced commands (1/3)

t_/V xTw6S7X (t xTw6S7X )
Display the list of threads that are available

t_/V sw6S2vVt_xX (t s)
Display the list of breakpoints/watchpoints

deUexe < _H (d < _H )
Delete breakpoint <n>

xTweSd < _H (x < _H )
Select thread number <n>

/wSue < _H (/ < _H )
Select a specific frame from the backtrace, the number being the one displayed when
usingsSc2xwSce at the beginning of each line
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 102/300

GDB advanced commands (2/3)

wSxcT < ySwtSsUeH orwSxcT \C< SddwessH
Add a watchpoint on a specific variable/address.

vwt_x ySwtSsU6 > ySUY6 (v ySwtSsU6 > ySUY6 )
Modify the content of the specified variable with a new value

sweS2 /VVsSwDcG4e t/ cV_dtxtV_ == ySUue
Break only if the specified condition is true

wSxcT < ySwtSsUeH t/ cV_dtxtV_ == ySUue
Trigger the watchpoint only if the specified condition is true

dtsvUSy < ezvwH
Automatically prints expression each time program stops

zd< _H < uH < SddwessH
Display memory at the provided address._is the amount of memory to display,Yis the
type of data to be displayed (sdTdwdg ). Instructions can be displayed using thettype.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 103/300

GDB advanced commands (3/3)

Utsx < ezvwH
Display the source code associated to the current program counter location.

dtsSsseusUe < UVcSxtV_: sxSwxRV//sex: e_dRV//sexH (7tXSX )
Display the assembly code that is currently executed.

v /u_cxtV_(Swguue_xs)
Execute a function using GDB. NOTE: be careful of any side effects that may happen
when executing the function

v $_ewySw = ySUue
Declare a new gdb variable that can be used locally or in command sequence

de/t_e < cVuuS_dR_SueH
Define a new command sequence. GDB will prompt for the sequence of commands.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 104/300

Remote debugging
▶In a non-embedded environment, debugging takes place usinggdsor one of its
front-ends.

gdshas direct access to the binary and libraries compiled with debugging symbols.
▶However, in an embedded context, the target platform environment is often too
limited to allow direct debugging withgds(2.4 MB on x86).
▶Remote debugging is preferred

I1Cl, Ut_uz, gds is used on the development workstation, offering all its features.

gdssewyew is used on the target system (only 400 KB on arm). ARCH-linux-gdb
gdbserver
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 105/300

Remote debugging: architecture
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 106/300

Remote debugging: target setup
▶On the target, run a program throughgdssewyew .
Program execution will not start immediately.
gdssewyew G< vVwxH < ezecuxSsUeH < SwgsH
gdssewyew ddeydxxyP0 < ezecuxSsUeH < SwgsH
▶Otherwise, attachgdssewyew to an already running program:
gdssewyew , , SxxScT G< vVwxH < vtdH
▶You can also start gdbserver without passing any program to start or attach (and
set the target program later, on client side):
gdssewyew , , uuUxt G< vVwxH
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 107/300

Remote debugging: host setup
▶Then, on the host, startI1Cl, Ut_uz, gds < ezecuxSsUeH ,
and use the followinggdscommands:
•To tellgdswhere shared libraries are:
gdsH sex syswVVx < UtswSwy, vSxTH (typically path to build space withoutUtsd)
•To connect to the target:
gdsH xSwgex weuVxe < tv, SddwH G< vVwxH (networking)
gdsH xSwgex weuVxe ddeydxxy(Pi0 (serial link)
Make sure to replacexSwgex weuVxe withxSwgex ezxe_ded, weuVxe if you have
started gdbserver with the, , uYUxt option
•If you did not set the program to debug on gdbserver commandline:
gdsH sex weuVxe ezec, /tUe < vSxTRxVRvwVgwSuRV_RxSwgexH
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 108/300

Coredumps for post mortem analysis
▶When an application crashes due to asegmentation faultand the application was
not under control of a debugger, we get no information about the crash
▶Fortunately, Linux can generate acVwe file that contains the image of the
application memory at the moment of the crash in the ELF format. gdb can use
thiscVwe file to let us analyze the state of the crashed application
▶On the target
•Use uUtutx , c u_Ututxed in the shell starting the application, to enable the
generation of acVwe file when a crash occurs
•The output name for the coredump file can be modified using
dvwVcdsysd2ew_eUdcVweRvSxxew_ .
•See uS_ cVwe(!)
▶On the host
•After the crash, transfer thecVwe file from the target to the host, and run
I1Cl, Ut_uz, gds , c cVwe, /tUe SvvUtcSxtV_, st_Swy
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 109/300

minicoredumper
▶Coredumps can be huge for complex applications
▶minicoredumper is a userspace tool based on the standard core dump feature
•Based on the possibility to redirect the core dump output to a user space program
via a pipe
▶Based on a JSON configuration file, it can:
•save only the relevant sections (stack, heap, selected ELF sections)
•compress the output file
•save additional information fromdvwVc

TxxvsGddgtxTusDcVuddtSuV_dut_tcVweduuvew
▶“Efficient and Practical Capturing of Crash Data on Embedded Systems”
•Presentation by minicoredumper author John Ogness
•Video:TxxvsGddwwwDyVuxuseDcVudwSxcThy=WeffuwwgNMLs
•Slides:elinux.org/images/8/81/Eoss2023_ogness_minicoredumper.pdf
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 110/300

GDB: going further
▶Tutorial: Debugging Embedded Devices using GDB - Chris Simmonds, 2020
•Slides:TxxvsGddeUt_uzDVwgdtuSgesd0d0Edjesuggt_g, wtxT, gds, cstuuV_ds,
eUce, e0e0Dvd/
•Video:TxxvsGddwwwDyVuxuseDcVudwSxcThy=MLTIgdeSRC2
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 111/300

GDB Python Extension
▶GDB features apython integration, allowing to script some debugging operations
▶When executing python under GDB, a module namedgdbis available and all the
GDB specific classes are accessible under this module
▶Allows to add new types of commands, breakpoint, printers
•Used by the kernel to create new commands with the python GDB scripts
▶Allows full control and observability over the debugged program using GDB
capabilities from Python scripts
•Controlling execution, adding breakpoints, watchpoints, etc
•Accessing the process memory, frames, symbols, etc
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 112/300

GDB Python Extension (1/2)
cUSss Owt_xOve_kj (gdsDkt_tsTiweS2vVt_x)G
76/ RRt_txRR (seU/ : /tU6 )G
seU/ D/tUe = /tU6
suvew (Owt_xOve_kj: seU/ )D RRt_txRR ()
76/ sxVv (seU/ )G
vwt_x (B,,,H ktUe B c seU/ D/tU6 c B Vve_ed wtxT /d B c sxw (seU/ Dwexuw_RySUue))
wexuw_ kSUse
cUSss Owt_xOve_ (gdsDiweS2vVt_x)G
76/ sxVv (seU/ )G
Owt_xOve_kj(gdsDvSwseRS_dReySU( B/tU6B )Dsxwt_g())
wexuw_ kSUse
cUSss pwScekjs (gdsDCVuuS_d)G
76/ RRt_txRR (seU/ )G
suvew (pwScekjs: seU/ )D RRt_txRR (BxwSce/dsB : gdsDCOnnINjR(PK1)
76/ t_yV2e (seU/ : Swg: /wVuRxxy)G
vwt_x (BlVV2t_g Vve_() wtxT cusxVu sweS2vVt_xB )
Owt_xOve_( BVve_B )
pwScekjs()
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 113/300

GDB Python Extension (2/2)
▶Python scripts can be loaded using gdbsVuwce command
•Or the script can be named <program>-gdb.py and will be loaded automatically by
GDB
(gds) sVuwce xwSceR/dsDvy
(gds) xwSce/ds
lVV2t_g Vve_() wtxT cusxVu sweS2vVt_x
iweS2vVt_x E Sx 0z33e0
(gds) wu_
PxSwxt_g vwVgwSuG duswdst_dxVucT /VV sSw
peuvVwSwy sweS2vVt_x e Sx 0z!!!!!!!!8fdS
,,,H ktUe /VV Vve_ed wtxT /d 3
peuvVwSwy sweS2vVt_x 3 Sx 0z!!!!!!!!8fdS
,,,H ktUe sSw Vve_ed wtxT /d 0
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 114/300

Common debugging issues
▶You will likely encounter some issues while debugging, like poor address->symbols
conversion, ”optimized out” values or functions, empty backtraces...
▶A quick checklist before starting debugging can spare you some troubles:
•Make sure your host binary hasdebug symbols: with gcc, ensure, gis provided, and
use non-stripped version with host gdb
•Disableoptimizationson final binary (, O0) if possible, or at least use a less intrusive
level (, Og)
Static functions can for example be folded into caller depending on the optimization
level, so they would be missing from backtraces
•Prevent code optimization from reusing frame pointer register: with GCC, make sure
, /_V, Vutx, /wSu6, vVt_x6w option is set
Not only true for debugging: any profiling/tracing tool relying on backtraces will
benefit from it
▶Your application is probably composed of multiple libraries: you will need to apply
those configurations on all used components!
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 115/300

Practical lab - Solving an application crash
Debugging an application crash
▶Code generation analysis with compiler-explorer
▶Using GDB and its Python support
▶Analyzing and using a coredump
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 116/300

Application Tracing
Application Tracing
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 117/300

Application Tracing
strace
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 118/300

strace
System call tracer -TxxvsGddsxwSceDtV
▶Available on all GNU/Linux systems
Can be built by your cross-compiling toolchain generator or by
your build system.
▶Allows to see what any of your processes is doing: accessing files,
allocating memory... Often sufficient to find simple bugs.
▶Usage:
sxwSce < cVuuS_dH (starting a new process)
sxwSce , / < cVuuS_dH (follow child processes too)
sxwSce , v < vtdH (tracing an existing process)
sxwSce , c < cVuuS_dH (time statistics per system call)
sxwSce , e < ezvwH < cVuuS_dH (useexpression for advanced
filtering)
Seethe strace manualfor details.
Image credits:TxxvsGddsxwSceDtVd
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 119/300

strace example output> strace cat Makefile
[...]
fstat64(3, {st_mode=S_IFREG|0644, st_size=111585, ...}) = 0
mmap2(NULL, 111585, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f69000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/tls/i686/cmov/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320h\1\0004\0\0\0\344"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1442180, ...}) = 0
mmap2(NULL, 1451632, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e06000
mprotect(0xb7f62000, 4096, PROT_NONE) = 0
mmap2(0xb7f66000, 9840, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f66000
close(3) = 0
[...]
openat(AT_FDCWD, "Makefile", O_RDONLY) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=173, ...}, AT_EMPTY_PATH) = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7290d28000
read(3, "ifneq ($(KERNELRELEASE),)obj-m "..., 131072) = 173
write(1, "ifneq ($(KERNELRELEASE),)obj-m "..., 173ifneq ($(KERNELRELEASE),)
Hint: follow the open file descriptors returned byVve_() . This tells you what files are
handled by further system calls.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 120/300

strace -c example output> strace -c cheese
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
36.24 0.523807 19 27017 poll
28.63 0.413833 5 75287 115 ioctl
25.83 0.373267 6 63092 57321 recvmsg
3.03 0.043807 8 5527 writev
2.69 0.038865 10 3712 read
2.14 0.030927 3 10807 getpid
0.28 0.003977 1 3341 34 futex
0.21 0.002991 3 1030 269 openat
0.20 0.002889 2 1619 975 stat
0.18 0.002534 4 568 mmap
0.13 0.001851 5 356 mprotect
0.10 0.001512 2 784 close
0.08 0.001171 3 461 315 access
0.07 0.001036 2 538 fstat
...
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 121/300

Application Tracing
ltrace
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 122/300

ltrace
A tool to tracesharedlibrary calls used by a program and all the signals it receives
▶Very useful complement tosxwSce , which shows only system calls.
▶Of course, works even if you don’t have the sources
▶Allows to filter library calls with regular expressions, or just by a list of function
names.
▶With the, Poption it shows system calls too!
▶Also offers a summary with its, coption.
▶Manual page:TxxvsGddUt_uzDdteD_exduS_dEdUxwSce
▶Works better withglibc. UxwSce used to be broken withuClibc(now fixed), and is
not supported withMusl(Buildroot 2022.11 status).
See TxxvsGdde_Dwt2tvedtSDVwgdwt2tdNxwSce for details
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 123/300

ltrace example output
b UxwSce //uveg ,/ ytdeV4Ut_uze ,ytdeVRstffe !44ze88 ,t_vuxR/VwuSx u3veg ,t ddey
dytdeV0 ,vtzR/ux wgs!F!Ue ,/ /sdey ddeyd/s0
RRUtscRsxSwxRuSt_(q B//uvegB: B,/B: BytdeV4Ut_uzeB: B,ytdeVRstffeBDDD ] <u_/t_tsTed DDDH
sexysu/(0zsFS0ec80: _tU: e: 0) = 0
SyRUVgRsexR/USgs(E: 0: E: 0) = E
sxwcTw(B/B: 'G') = _tU
sxwUe_(B/B) = E
sxw_cuv(B/B: BNB: E) = eF
sxw_cuv(B/B: BTB: E) = ,e
sxw_cuv(B/B: BhB: E) = 39
sxw_cuv(B/B: BTeUvB: E) = ,e
sxw_cuv(B/B: B,TeUvB: E) = !f
sxw_cuv(B/B: ByewstV_B: E) = ,EF
sxw_cuv(B/B: BsutUdcV_/B: E) = 4
sxw_cuv(B/B: B/VwuSxsB: E) = 0
sxwUe_(B/VwuSxsB) = f
sxw_cuv(B/B: BuuzewsB: E) = ,f
sxw_cuv(B/B: BdeuuzewsB: E) = e
sxw_cuv(B/B: BdeytcesB: E) = e
sxw_cuv(B/B: BcVdecsB: E) = 3
DDD
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 124/300

ltrace summary
Example summary at the end of the ltrace output (, coption)
% xtue secV_ds usecsdcSUU cSUUs /u_cxtV_
,,,,,, ,,,,,,,,,,, ,,,,,,,,,,, ,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,
!eDF4 !D9!8FF0 !9!8FF0 E RRUtscRsxSwxRuSt_
e0DF4 eD33F33E e33F33E E Sy/VwuSxR/t_7RXxw6SuRt_/V
E=D5f EDF5e54! 4eE 399! sxw_cuv
fDEf 0D8EEeE0 8EEeE0 E Sy/VwuSxRVv6_Rt_vYx
0Df! 0D08!e90 !5= E=F SyR/w66v
0D49 0D0!!E!0 =8= Eef XxwU6_
0De9 0D033008 FF0 !0 SyRUVg
0Dee 0D0e!090 =F= !4 sxwcuv
0De0 0D0ee83F ee58F E Sy/VwuSxRcUVseRt_vux
0DEF 0D0Eff88 F8! e8 SyRdtcxR/wee
0DE! 0D0EF8E9 F=F eF SyRdtcxRgex
0DE! 0D0EFf!3 440 38 sxwcTw
0DE3 0D0E4!3F !5E e! u6uX6x
DDD
,,,,,, ,,,,,,,,,,, ,,,,,,,,,,, ,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,
E00D00 EED3E8ff3 =fFe xVxSU
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 125/300

Application Tracing
LD_PRELOAD
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 126/300

Shared libraries
▶Shared libraries are provided as.sofiles that are actually ELF files
•Loaded at startup byU7DXV (the dynamic loader)
•Or at runtime usingdUVve_() from your code
▶When starting a program (an ELF file actually), the kernel will parse it and load
the interpreter that needs to be invoked.
•Most of the timeOpR<9pK1O program header of the ELF file is set toU7, Ut_YzDXV .
▶At loading time, the dynamic loaderU7DXV will resolve all the symbols that are
present in dynamic libraries.
▶Shared libraries are loaded only once by the OS and then mappings are created for
each application that uses the library.
•This allows to reduce the memory used by libraries.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 127/300

Hooking Library Calls
▶In order to do some more complex library call hooks, one can use the
LD_PRELOADenvironment variable.
▶LD_PRELOADis used to specify a shared library that will be loaded before any
other library by the dynamic loader.
▶Allows to intercept all library calls by preloading another library.
•Overrides libraries symbols that have the same name.
•Allows to redefine only a few specific symbols.
•”Real” symbol can still be loaded and used with7UX0u (uS_ dUsyu(3) )
▶Used by some debugging/tracing libraries (libsegfault,libefence)
▶Works for C and C++.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 128/300

LD_PRELOADexample
▶Library snippet that we want to preload usingLD_PRELOAD:
bt_cUude <sxrt_gDT>
bt_cUude <u_tsxdDT>
XXtff6Rx w6S7 ( t_x /7: yVt7 C7SxS: Xtff6Rx stffe) {
ueusex(dSxS, 0x4e , stffe)g
w6xYw_ Xtff6g
fi
▶Compilation of the library forLD_PRELOADusage:
$ gcc -sTSred -fOIC -o uyRUtsDso uyRUtsDc
▶Preloading the new library usingLD_PRELOAD:
$ NDRORKNOAD >Ddu0RUtsDXV Dd6z6
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 129/300

Application Tracing
uprobes and perf
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 130/300

uprobes
▶uprobeis a mechanism offered by the kernel allowing to trace userspace code.
▶Tracepoints can be added dynamically on any userspace symbol
•Internally patches theDx6zx section with breakpoints that are handled by the kernel
trace system
▶Exposed by filedsysd2ew_eUddesugdxwSct_gduvwVseReye_xs
▶Often wrapped up by other tools (v6w/,sccfor instance).

xwSceduvwVsexwScew
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 131/300

Theperftool
▶perftool was started as a tool to profile application under Linux using
performance counters (uS_ vew/(E) ).
▶It became much more than that and now allows to manage tracepoints, kprobes
and uprobes.
▶perfcan profile both user-space and kernel-space execution.
▶perfis based on thev6w/R6y6_x interface that is exposed by the kernel.
▶Provides a set of operations, each having specific arguments (seeperfhelp).

XxSx,wecVwd ,w6vVwx ,xVv,S__VxSx6 ,/xwSce ,UtXx,vwVs6 , etc
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 132/300

Usingperf record
▶perf recordallows to record performance events per-thread, per-process and
per-cpu basis.
▶Kernel needs to be configured withCONkILROK1kRK)KNpP=y .
▶This is the first command that needs to be run to gather data from program
execution and output them intov6w/D7SxS .

v6w/D7SxS file can then be analyzed usingv6w/ S__VxSx6 and v6w/ w6vVwx .
•Useful on embedded systems to analyze data on another computer.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 133/300

Probing userspace functions
▶List functions that can be probed in a specific executable:
$ vew/ vwVse ,,sVuwce=<sVuwceRdtwH ,z uyRSvv ,k
▶List lines number that can be probed in a specific executable/function:
$ vew/ vwVse ,,sVuwce=<sVuwceRdtwH ,z uyRSvv ,N uyR/u_c
▶Create uprobes on user-space library/executable functions:
$ vew/ vwVse ,z dUtsdUtscDsV DF vwt_x/
$ vew/ vwVse ,z Svv uyR/u_cG 8 u0RySw
$ vew/ vwVse ,z Svv uyR/u_c% w6xYw_ wex=%w0
▶Record the execution of these tracepoints:
$ vew/ wecVwd ,e vwVseRSvvGuyR/u_c ,e vwVseRUtscGvwt_x/
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 134/300

Practical lab - Application tracing
Analyzing of application interactions
▶Analyze dynamic library calls from an
application usingltrace.
▶Overriding a library function withNjRO1KNOIj .
▶Usingstraceto analyze program syscalls.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 135/300

Memory Issues
Memory Issues
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 136/300

Usual Memory Issues
▶Programming (almost) always involves accessing memory
▶If done incorrectly, a large variety of errors can be triggered
•Segmentation Faults can happen when accessing invalid memory addresses (NULL
pointers or use-after-free for instance)
•Buffer Overflows can happen if accessing a buffer outside its boundaries
•Memory Leaks when allocating memory and forgetting to free it after usage
▶Fortunately, there are tools to debug these errors
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 137/300

Segmentation Faults
▶Segmentation Faults are generated by the kernel when a program tries to access a
memory area that it is not allowed to or to access it in an incorrect way
•Might be generated by a write on a read only memory zone
•Can also be triggered when trying to execute memory that is not executable
t_x Cvxw > 9(NN g
Cvxw > E g
▶Execution will yield aPegue_xSxtV_ /SuUx message in the terminal
$ DdvwVgwSu
Pegue_xSxtV_ /SuUx
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 138/300

Buffer Overflows
▶Buffer Overflows are easily triggered when accessing an array outside of its
boundaries (most often past the end)
▶Such access might generate a crash or not depending on the access
•Writing past the end of auSUUVc() ’ed array will most often overwrite the malloc
data structure leading to corruption
•Writing past the end of an array allocated on the stack can corrupt data on the stack
•Reading past the end of an array might generate a segfault but not always, this
depends on the area of memory that is accessed
Yt_x8eRx CSwwSy = uSUUVc( E0 C Xtff6V/ (CSwwSy))g
SwwS0q E0 ] = 0zjKIjiKKk g
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 139/300

Memory Leaks
▶Memory leaks are another class of memory errors that will not directly trigger a
crash but will exhaust the system memory (sooner or later)
▶This happens when allocating memory in your program and not releasing it after
using it
▶Can trigger in production when the program runs for a very long time
•Better to debug that kind of problem early in the development process
yVt7 /u_cE ( yVt7 ) {
Yt_x8eRx CSwwSy = uSUUVc( E0 C Xtff6V/ (CSwwSy))g
dVRsVuexTt_gRwtxTRSwwSy(SwwSy)g
fi
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 140/300

Memory Issues
Valgrind memcheck
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 141/300

Valgrind (1/2)
▶Valgrindis an instrumentation framework for building dynamic
analysis tools

ySUgwt_d is also a tool that is based on this framework and
provides a memory error detector, heap profilers and others
profilers.
▶It is supported on all the popular platforms: Linux on x86,
x86_64, arm (armv7 only), arm64, mips32, s390, ppc32 and
ppc64.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 142/300

Valgrind (2/2)
▶Works by adding its own instrumentation to your code and then
running in on its own virtual cpu core. Significantly slows down
execution, and thus is suited for debugging and profiling
▶Memcheckis the defautvalgrindtool and it detects
memory-management errors
•Access to invalid memory zones, use of uninitialized values,
memory leaks, bad freeing of heap blocks, etc
•Can be run on any application, no need to recompile them
$ ySUgwt_d ,,xVVU=ueucTec2 ,,UeS2,cTec2=/uUU <vwVgwSuH
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 143/300

Valgrind Memcheck usage and report
$ ySUgwt_d DdueuRUeS2
==e0eE04== neucTec2: S ueuVwy ewwVw dexecxVw
==e0eE04== CVvywtgTx (C) e00e,e0Ef: S_d LN( LON'd: sy MuUtS_ PewSwd ex SUD
==e0eE04== (st_g )SUgwt_d,3DE8DE S_d Nts)KXg wewu_ wtxT ,T /Vw cVvywtgTx t_/V
==e0eE04== CVuuS_dG DdueuRUeS2
==e0eE04==
==e0eE04== CV_dtxtV_SU 3uuv Vw uVye deve_ds V_ u_t_txtSUtsed ySUue(s)
==e0eE04== Sx 0zE09EFEG dVRScxuSUR3uuv (t_ dTVuedusewdueuRUeS2)
==e0eE04== sy 0zE09E8fG cVuvuxeRSddwess (t_ dTVuedusewdueuRUeS2)
==e0eE04== sy 0zE09EIeG dVR3uuv (t_ dTVuedusewdueuRUeS2)
==e0eE04== sy 0zE09EjfG uSt_ (t_ dTVuedusewdueuRUeS2)
==e0eE04==
==e0eE04== lKIO P(nnI1YG
==e0eE04== t_ use Sx eztxG Ee0 syxes t_ E sUVc2s
==e0eE04== xVxSU TeSv usSgeG E SUUVcs: 0 /wees: Ee0 syxes SUUVcSxed
==e0eE04==
==e0eE04== NKIm P(nnI1YG
==e0eE04== de/t_txeUy UVsxG Ee0 syxes t_ E sUVc2s
==e0eE04== t_dtwecxUy UVsxG 0 syxes t_ 0 sUVc2s
==e0eE04== vVsstsUy UVsxG 0 syxes t_ 0 sUVc2s
==e0eE04== sxtUU weScTSsUeG 0 syxes t_ 0 sUVc2s
==e0eE04== suvvwessedG 0 syxes t_ 0 sUVc2s
==e0eE04== 1ewu_ wtxT ,,UeS2,cTec2=/uUU xV see dexStUs V/ UeS2ed ueuVwy
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 144/300

Valgrind and VGDB
▶Valgrind can also act as a GDB server which can receive and process commands.
One can interact with valgrind gdb server either with a gdb client, or directly with
ygds program (provided with valgrind).ygds can be used in different ways:
•As a standalone CLI program to send ”monitor” commands to valgrind
•As a relay between a gdb client and an existing valgrind session
•As a server to drive multiple valgrind sessions from a remote gdb client
▶See uS_ ygds(E) for available modes, commands and options
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 145/300

lbBM; :." rBi? J2K+?2+F
I
pH;`BM/ HHQrb iQ ii+? rBi? :." iQ i?2 T`Q +2bb i?i Bb +m``2MiHv MHvx2/X
$ valgrind --tool=memcheck --leak-check=full --vgdb=yes --vgdb-error= 0 ./mem_leak
I
h?2M ii+? ;/# iQ i?2 pH;`BM/ ;/#b2`p2` mbBM; p;/#
$ gdb ./mem_leak
(gdb) target remote | vgdb
I
A7 pH;`BM/ /2i2+ib M 2``Q`- Bi rBHH biQT i?2 2t2+miBQM M/ #`2F BMiQ :."X
(gdb) continue
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000000109161 in do_actual_jump (p=0x4a52040) at mem_leak.c:5
5 if (p[1])
(gdb) bt
#0 0x0000000000109161 in do_actual_jump ( p=0x4a52040) at mem_leak.c:5
#1 0x0000000000109188 in compute_address ( p=0x4a52040) at mem_leak.c:11
#2 0x00000000001091a3 in do_jump ( p=0x4a52040) at mem_leak.c:16
#3 0x00000000001091d8 in main () at mem_leak.c:27
@ E2`M2H- /`Bp2`b M/ 2K# 2//2/ GBMmt @ .2p2HQTK2Mi- +QMbmHiBM;- i`BMBM; M/ bmTT Q`i @ https://bootlin.com R9efjyy

Memory Issues
Electric Fence
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 147/300

libefence (1/2)
▶libefenceis more lightweight thanvalgrindbut less precise
▶Allows to catch two types of common memory errors
•Buffer overflows and use after free
▶libefencewill actually trigger a segfault upon the first error encountered in order
to generate a coredump.
▶Uses a shared library that can either be linked with statically (, Ue/e_ce ) or
preloaded usingNjRO1KNOIj .
$ gcc ,g vwVgwSuDc ,V vwVgwSu
$ NjRO1KNOIj =Utse/e_ceDsVD0D0 DdvwVgwSu
KUecxwtc ke_ce eDe CVvywtgTx (C) E98f,E999 iwuce Oewe_s <swuce-vewe_sDcVuH
Pegue_xSxtV_ /SuUx (cVwe duuved)
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 148/300

libefence (2/2)
▶Upon segfault, a coredump will be generated in the current directory
▶This coredump can be opened with GDB and will pinpoint the exact location
where the error happened
$ gds DdvwVgwSu cVwe,vwVgwSu,348!
1eSdt_g syusVUs /wVu DdUtse/e_ceDDD
qNew NWO !f4Fe]
qpTweSd desuggt_g ust_g UtsxTweSdRds e_SsUed]
(st_g TVsx UtsxTweSdRds UtswSwy BdUtsdz8FRF4,Ut_uz,g_udUtsxTweSdRdsDsVDEBD
CVwe wSs ge_ewSxed sy rDdUtse/e_ce'D
OwVgwSu xewut_Sxed wtxT stg_SU PILPKL): Pegue_xSxtV_ /SuUxD
b 0 uSt_ () Sx Utse/e_ceDcG8
8 dSxSq99] = Eg
(gds)
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 149/300

Practical lab - Debugging Memory Issues
Debug various memory issues using specific tooling
▶Memory leak and misbehavior detection with
valgrindandvgdb.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 150/300

Application Profiling
Application Profiling
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 151/300

Profiling
▶Profiling is the act of gathering data from a program execution in order to analyze
them and then optimize or fix performance issues.
▶Profiling is achieved by using programs that insert instrumentation in the code or
leverage kernel/userspace mechanisms.
•Profiling function calls and count of calls allow to optimize performance.
•Profiling processor usage allows to optimize performance and reduce power usage.
•Profiling memory usage allows to optimize memory consumption.
▶After profiling, the data set must be analyzed to identify potential improvements
(and not the reverse!).
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 152/300

Performance issues
”Premature optimization is the root of all evil”, Donald Knuth
▶Profiling is often useful to identify and fix performance issues.
▶Performances can be affected by memory usage, IOs overload, or CPU usage.
▶Gathering profiling data before trying to fix performance issues is needed to do the
correct choices.
▶Profiling is often guided by a first coarse-grained analysis using some classic tools.
▶Once the class of problems has been identified, a fine grain profiling analysis can
be done.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 153/300

Profiling metrics
▶Multiple tools allows to profile various metrics.
▶Memory usage withMassif,TeSvxwSc2 or memusage.
▶Function calls usingperfand callgrind.
▶CPU hardware usage (Cache, MMU, etc) usingperf.
▶Profiling data can include both the user space application and kernel.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 154/300

Visualizing data with flamegraphs
▶Visualization based on hierarchical stacks
▶Allows to quickly find bottlenecks and explore the call stack
▶Popularized by Brendan Gregg tools which allows to generate flamegraphs from
v6w/ results.
•Scripts to generate flamegraphs are available at
TxxvsGddgtxTusDcVudswe_dS_gweggdkUSueLwSvT
Image credits:TxxvsGddwwwDswe_dS_gweggDcVud/USuegwSvTsDTxuU
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 155/300

Going further with Flamegraphs
▶Really nice technical presentation from Brendann Gregg explaining the use of
flamegraphs for various metrics.
•Video:TxxvsGddwwwDyVuxuseDcVudwSxcThy=j!3pEK3tgEo
•Slides:TxxvsGddwwwDsUtdesTSweD_exdswe_dS_gweggduse_tz, Sxc, e0Ef,
ytsuSUtfft_g, vew/VwuS_ce, wtxT, /USue, gwSvTs
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 156/300

Application Profiling
Memory profiling
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 157/300

Memory profiling
▶Profiling memory usage (heap/stack) in a application is useful for optimization.
▶Allocating too much memory can lead to system memory exhaustion.
▶Allocating/freeing memory too much can lead to the kernel spending a
considerable amount of time incUeSwRvSge() .
•The kernel clears pages before giving them to processes to avoid data leakage.
▶Reducing application memory footprint can allow optimizing cache usage as well
as page miss.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 158/300

Massif usage
▶Massifis a tool provided byvalgrindwhich allows to profile heap usage during the
program execution (user-space only).
▶Works by making snapshots of allocations.
$ ySUgwt_d ,,xVVU=uSsst/ ,,xtue,u_tx=i vwVgwSu
▶Once executed, amassif.out.<pid>file will be generated in the current directory

uXRvwt_x tool can then be used to display a graph of heap allocation
$ usRvwt_x uSsst/DVuxDef!099

b: Peak allocation

-: Detailed snapshot (count can be adjusted thanks to, , 76xStU67, /w6W )
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 159/300

Massif report
mi
!4fD0^ b GG G G- G GG G
| -GbGGGGGGG-GGGGGGGG-
| GG-GbGGGGGGG-GGGGGGGG-GG
| GGGGG-GbGGGGGGG-GGGGGGGG-GGGGG
| GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| --------GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| - GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| - GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| GGGGGGG- GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| G - GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| GGGGGGG - GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| G G - GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| GGGGGG G - GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| G G G - GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| GGGGG G G - GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| GGGG G G G - GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
| GGGG G G G G - GGGGGGG-GbGGGGGGG-GGGGGGGG-GGGGGGG
0 c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Hmi
0 830D!
Nuusew V/ s_SvsTVxsG !e
jexStUed s_SvsTVxsG q9: E9: ee (veS2): 3e: 4e]
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 160/300

uSsst/, ytsuSUtffew - Visualizing massif profiling data
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 161/300

heaptrack usage
▶heaptrackis a heap memory profiler for Linux.
•Works withNjRO1KNOIj library.
▶Finer tracking than with Massif and visualizing tool is more advanced.
•Each allocation is associated to a stacktrace.
•Allows finding memory leaks, allocation hotspots and temporary allocations.
▶Results can be seen using GUI (TeSvxwSc2Rgut ) or CLI tool (TeSvxwSc2Rvwt_x ).

TxxvsGddgtxTusDcVudmjKdTeSvxwSc2
$ TeSvxwSc2 vwVgwSu
▶This will generate aTeSvxwSc2D< vwVcessR_SueH D< vtdH Dffsx file that can be
analyzed usingTeSvxwSc2Rgut on another computer.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 162/300

TeSvxwSc2Rgut - Visualizing heaptrack profiling data
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 163/300

TeSvxwSc2Rgut - Flamegraph view
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 164/300

memusage
▶memusage is a program that leveragesUtsueuusSgeDsV
to profile memory usage (uS_ ueuusSge(E) )
(user-space only).
▶Can profile heap, stack and also mmap memory usage.
▶Profiling information can be shown on the console,
logged to a file for post-treatment or catch on a PNG
file.
▶Lightweight solution compared to valgrindMassiftool
since it uses theNjRO1KNOIj mechanism.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 165/300

memusage usage
$ ueuusSge cV_yewx /VVDv_g /VVD3vg
neuVwy usSge suuuSwyG TeSv xVxSUG eF3!8!f: TeSv veS2G ee!08!F: sxSc2 veS2G 83F9F
xVxSU cSUUs xVxSU ueuVwy /StUed cSUUs
uSUUVc| E49F eFe3F48 0
weSUUVc| F 3f44 0 (_VuVyeG0: decG0: /weeG0)
cSUUVc| EF 84F! 0
/wee| E480 e!eE334
ltsxVgwSu /Vw sUVc2 stffesG
0,E! 3e9 eE% ==================================================
EF,3E e39 E!% ====================================
3e,4f e8f E8% ===========================================
48,F3 3eE eE% ================================================
F4,f9 43 e% ======
80,9! E4E 9% =====================
DDD
eE4e4,eE439 E <E%
8efF5,8ef58 E <E%
3e8EF,3e83E E <E%
USwge 3 <E%
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 166/300

Application Profiling
Execution profiling
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 167/300

Execution profiling
▶In order to optimize a program, one may have to understand what hardware
resources are used.
▶Many hardware elements can have an impact on the program execution:
•CPU cache performance can be degraded by an application without memory spatial
locality.
•Page miss due to using too much memory without spatial locality.
•Alignment faults when doing misaligned accesses.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 168/300

Usingperf stat

v6w/ XxSx allows to profile an application by gathering performance counters.
•Using performance counters might requirerootpermissions. This can be modified
usingb ecTV , E H dvwVcdsysd2ew_eUdvew/Reye_xRvSwS_Vtd
▶The number of performance counters that are present on the hardware are often
limited.
▶Requesting more events than possible will result in multiplexing and perf will scale
the results.
▶Collected performance counters are then approximate.
•To acquire more precise numbers, reduce the number of events observed and run
v6w/ multiple times changing the events set to observe all the expected events.
•Seeperf wikifor more informations.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 169/300

perf statexample (1/2)
$ perf sxSx co_verx fooDp_g fooDjpg
OerforuS_ce cou_xer sxSxs for 'co_verx fooDp_g fooDjpg'G
45,5e usec xSsk-cUock b E,333 COUs uxtUtffed
4 co_xexx-swtxcTes b 8f,8f4 dsec
0 cpu-utgrSxto_s b 0,000 dsec
E Ffe pSge-fSuUxs b 36,f3E mdsec
E46 E54 800 cycUes b 3,eEE Llff (8E,E6%)
6 984 f4E sxSUUed-cycUes-fro_xe_d b 4,f8% fro_xe_d cycUes tdUe (9E,eE%)
8E 00e 469 sxSUUed-cycUes-sScke_d b 55,4e% sScke_d cycUes tdUe (9E,36%)
eee 68f 505 t_sxrucxto_s b E,5e t_s_ per cycUe
b 0,36 sxSUUed cycUes per t_s_ ( 4E ,eE%)
3f ff6 Ef4 srS_cTes b 8e9,884 ndsec (f4,5E%)
56f 408 srS_cT-utsses b E,50% of SUU srS_cTes (f0,6e%)
0,034E568E9 seco_ds xtue eUSpsed
0,04E509000 seco_ds user
0,0046Ee000 seco_ds sys
▶NOTE: the percentage displayed at the end denotes the time during which the
kernel measured the event due to multiplexing
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 170/300

perf statexample (2/2)
▶List all events:
$ v6w/ UtXx
Ntsx of pre-deft_ed eve_xs (xo se used t_ -e)G
srS_cT-t_sxrucxto_s OR srS_cTes qlSrdwSre eve_x]
srS_cT-utsses qlSrdwSre eve_x]
cScTe-utsses qlSrdwSre eve_x]
cScTe-refere_ces qlSrdwSre eve_x]
DDD
▶CountL1-dcache-load-missesandbranch-load-missesevents for a specific
command
$ perf sxSx -e NE-dcScTe-UoSd-utsses,srS_cT-UoSd-utsses cSx dexcdfsxSs
DDD
OerforuS_ce cou_xer sxSxs for 'cSx dexcdfsxSs'G
e3 4E8 NE-dcScTe-UoSd-utsses
f E9e srS_cT-UoSd-utsses
DDD
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 171/300

Cachegrind
▶Cachegrindis a tool provided byvalgrindfor profiling program interactions with
the instruction and data cache hierarchy.
•Cachegrindalso profiles branch prediction success.
▶Simulate a machine with independentI$and D$backed with a unified L2 cache.
▶Really helpful to detect cache usage problems (too many misses, etc).
$ vSUgrt_d --xooU=cScTegrt_d DduyRprogrSu
▶It generates acScTegrt_dDouxD< ptd> file containing the measures

cgRS__oxSxe is a CLI tool used to visualize cachegrind simulation results.
▶It also has a, , 7t// option to allow comparing two measures files
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 172/300

Kcachegrind - Visualizing Cachegrind profiling data
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 173/300

Callgrind
▶Provided byvalgrindand allowing to profile an application call graph (user-space
only).
▶Collects the number of instructions executed during your program execution and
associate these data with the source lines
▶Records the call relationship between functions and their call count.
$ vSUgrt_d --xooU=cSUUgrt_d DduyRprogrSu

cSUUgrt_dRS__oxSxe is a CLI tool used to visualize callgrind simulation results.
▶Kcachegrind can visualizecallgrindresults too.
▶The cache simulation (done using cachegrind) has some accuracy shortcomings
(SeeCachegrind accuracy)
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 174/300

Kcachegrind - Visualizing Callgrind profiling data
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 175/300

Practical lab - Profiling applications
Profiling an application using various tools
▶Profiling application heap usingMassif.
▶Profiling an application withCachegrind,
CallgrindandKCachegrind.
▶Analyzing application performance withperf.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 176/300

System-wide Profiling & Tracing
System-wide Profiling &
Tracing
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 177/300

System-wide Profiling & Tracing
▶Sometimes, the problems are not tied to an application but rather due to the
usage of multiple layers (drivers, application, kernel).
▶In that case, it might be useful to analyze the whole stack.
▶The kernel already includes a large number of tracepoints that can be recorded
using specific tools.
▶New tracepoints can also be created statically or dynamically using various
mechanisms (kprobes for instance).
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 178/300

System-wide Profiling & Tracing
kprobes
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 179/300

Kprobes
▶Kprobes allows to insert breaks at almost any kernel address dynamically and
allows to extract debugging and performance information
▶Uses code patching to modify text code to insert calls to specific handlers

2vwVs6X allows to execute specific handlers when the hooked instruction is executed

2w6xvwVs6X will trigger when returning from a function allowing to extract the return
value of functions but also display the parameters that were used for the function call
▶Support should be enabled usingCONkILRmO1OiKP=y
▶Moreover, since probes are inserted using modules,CONkILRnOj(NKP=y and
CONkILRnOj(NKR(NNOIj=y must be set to be able to register probes.
▶Also requiresCONkILRmINNPYnPRINN=y when hooking probes usingX0usVUR_Su6
field
▶See xwSced2vwVses for more information
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 180/300

Registering a Kprobe

2vwVs6X can be registered dynamically by loading a module that registers a
sxwucx 2vwVse withwegtsxewR2vwVse()
▶Probes should be unregistered at module exit usingu_wegtsxewR2vwVse()
sxwucx 2vwVs6 vwVs6 > ffi
DX0usVUR_Su6 > B7VR6ztxB :
Dvw6RTS_7U6w > vwVs6Rvw6:
DvVXxRTS_7U6w > vwVs6RvVXx:
fig
wegtsxewR2vwVse(&vwVse)g
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 181/300

Registering a kretprobe

2w6xvwVs6X can be registered the same way than regular probes but using a
sxwucx 2wexvwVse withwegtsxewR2wexvwVse()
•Provided handlers will be called on function entry and exit
•Probe should be unregistered at module exit usingu_wegtsxewR2wexvwVse()
t_x (C2wexvwVseRTS_dUewRx) ( sxwucx 2wexvwVseRt_sxS_ce C: sxwucx vxRwegs C)g
sxwucx 2w6xvwVs6 vwVs6 > ffi
D2vDX0usVUR_Su6 > B7VR/Vw2B :
D6_xw0RTS_7U6w > vwVs6R6_xw0:
DTS_7U6w > vwVs6R6ztx:
fig
wegtsxewR2wexvwVse(&vwVse)g
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 182/300

System-wide Profiling & Tracing
perf
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 183/300

perf
▶perfallows to do a wide range of tracing and recording operations.
▶The kernel already contains events and tracepoints that can be used. The list is
given usingv6w/ UtXx .
▶Syscall tracepoints should be enabled in kernel configuration using
CONkILRkpRACKRPYPCANNP .
▶New tracepoint can be created dynamically on all symbols and registers when
debug info are not present.
▶Tracing functions, recording variables and parameters content using their names
will require a kernel compiled withCONkILRDKiULRINkO .
▶If perf does not findyuUt_Yz you have to provide it using- k < vuUt_ux> .
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 184/300

perfexample
▶List all events that matchessyscSUUsGC
$ perf Utsx syscSUUsGC
Ntsx of pre-deft_ed eve_xs (xo se used t_ -e)G
syscSUUsGsysRe_xerRSccepx qprScepot_x eve_x]
syscSUUsGsysRe_xerRSccepx4 qprScepot_x eve_x]
syscSUUsGsysRe_xerRSccess qprScepot_x eve_x]
syscSUUsGsysRe_xerRSdjxtuexRxtue3e qprScepot_x eve_x]
syscSUUsGsysRe_xerRst_d qprScepot_x eve_x]
DDD
▶Record allsyscSUUsGsysRe_xerRreSd events forXTSe!FXYu command into
v6w/D7SxS file.
$ perf record -e syscSUUsGsysRe_xerRreSd sTSe56suu dst_dsusysox
q perf recordG Woke_ up E xtues xo wrtxe dSxS ]
q perf recordG CSpxured S_d wroxe 0D0E8 ni perfDdSxS (eE5 sSupUes) ]
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 185/300

perf reportexample
▶Display the collected samples ordered by time spent.
$ v6w/ w6vVwx
PSupUesG 59E of eve_x 'cycUes', Kve_x cou_x (SpproxD)G 3938ff06e
OverTeSd CouuS_d PTSred Osjecx P0usVU
ee,88% ftrefox-esr q_vtdtS] qk] R_v03E568ru
3,eE% ftrefox-esr Ud-Ut_ux-x86-64DsoDe qD] RRut_tuSURreSUUoc
e,00% ftrefox-esr UtscDsoD6 qD] RRsxp_cpyRssse3
E,86% ftrefox-esr UtsgUts-eD0DsoD0Df400D0 qD] gRTSsTRxSsUeRUookup
E,6e% ftrefox-esr Ud-Ut_ux-x86-64DsoDe qD] RdURsxrxouU
E,56% ftrefox-esr qker_eUDkSUUsyus] qk] cUeSrRpSgeRrep
E,5e% ftrefox-esr UtscDsoD6 qD] RRsxr_cpyRsseeRu_SUtg_ed
E,3f% ftrefox-esr Ud-Ut_ux-x86-64DsoDe qD] sxr_cup
E,30% ftrefox-esr ftrefox-esr qD] uSUUoc
E,ef% ftrefox-esr UtscDsoD6 qD] RRLIRRRsxrcSsecupRURssse3
E,e3% ftrefox-esr q_vtdtS] qk] R_v0E3E65ru
E,09% ftrefox-esr q_vtdtS] qk] R_v00fe98ru
E,03% ftrefox-esr qker_eUDkSUUsyus] qk] u_uSpRpSgeRrS_ge
0,9E% ftrefox-esr Ud-Ut_ux-x86-64DsoDe qD] RRut_tuSURfree
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 186/300

perf probe
▶perfallows to create dynamic tracepoints on both kernel functions and user-space
functions.
▶In order to be able to insert probes,CONkILRmOROiK must be enabled in the
kernel.
•Note:libelfis required to compileperfwithprobecommand support.
▶New dynamic probes can be created and then used usingperf record.
▶Often on embedded platforms,yuUt_Yz is not present on the target and thus only
symbols and registers can be used.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 187/300

perf probeexamples (1/3)
▶List all the kernel symbols that can be probed (no debug info needed):
$ perf prose --fu_cs
▶Create a new probe on7VRX0XRVv6_Sxe withfilenamenamed parameter (debug
info required).
$ perf prose --vuUt_ux=vuUt_uxRftUe doRsysRope_Sxe ftUe_SueGsxrt_g
Added _ew eve_xG
proseGdoRsysRope_Sxe (o_ doRsysRope_Sxe wtxT ftUe_SueGsxrt_g)
▶ExecutexStU and capture previously created probe event:
$ perf record -e proseGdoRsysRope_Sxe xStU dvSrdUogduessSges
DDD
q perf recordG Woke_ up E xtues xo wrtxe dSxS ]
q perf recordG CSpxured S_d wroxe 0D003 ni perfDdSxS (E9 sSupUes) ]
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 188/300

perf probeexamples (2/3)
▶Display the recorded tracepoints withperf script:
$ perf scrtpx
xStU E64 q000] 355eD9565f3G proseGdoRsysRope_SxeG (c0ec3f50) ftUe_SueRsxrt_g=BdexcdUdDsoDcScTeB
xStU E64 q000] 355eD95664eG proseGdoRsysRope_SxeG (c0ec3f50) ftUe_SueRsxrt_g=BdUtsdxUsdvfUd_eo_dvfpdUtsresoUvDsoDeB
DDD
▶Create a new probe on2X0XRw6S7 return value using registerr0(ARM) alias with
”ret” name:
$ perf prose ksysRreSd%rexur_ w6x =%r0
▶ExecuteXTSe!FXYu and capture previously created probe events:
$ perf record -e proseGksysRreSdRRrexur_ sTSe56suu dexcdfsxSs
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 189/300

perf probeexamples (3/3)
▶List all probes that have been created:
$ v6w/ vwVs6 ,U
proseGksysRreSdRRrexur_ (o_ ksysRreSd%rexur_ wtxT rex)
▶Remove an existing tracepoint:
$ v6w/ vwVs6 ,7 vwVs6G2X0XRw6S7RRw6xYw_
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 190/300

perf recordexample
▶Record all events for all cpus (system-wide mode):
$ perf record -S
^C
▶Display recorded events from perf.data usingperf scrtpx
$ perf scrtpx
DDD
kUogd 85 q000] e08D609fEeG EE6584 cycUesG s6dd55Ec ueusexc0xec (dUtsdUtscDsoD6)
kUogd 85 q000] e08D609898G EeEe6f cycUesG c0S44c84 RrSwRspt_Ru_UockRtrqc0x34 (vuUt_ux)
kUogd 85 q000] e08D6E0094G Eef434 cycUesG c0ef3ef4 kueuRcScTeRSUUocc0xd0 (vuUt_ux)
perf E30 q000] e08D6E03EEG E3e9E5 cycUesG c0S44c84 RrSwRspt_Ru_UockRtrqc0x34 (vuUt_ux)
perf E30 q000] e08D6E983EG E43834 cycUesG c0S44cf4 RrSwRspt_Ru_UockRtrqresxorec0x3c (vuUt_ux)
kUogd 85 q000] e08D6e0048G E43834 cycUesG c0ES0ff8 sysUogRprt_xc0xEf0 (vuUt_ux)
kUogd 85 q000] e08D6e0e4EG Ee63e8 cycUesG c0E00E84 vecxorRswtc0x44 (vuUt_ux)
kUogd 85 q000] e08D6e0434G Ee845E cycUesG c096fee8 u_txRdgrSuRse_dusgc0x46c (vuUt_ux)
kworkerd0Ge-uuR 44 q000] e08D6e0653G E33E04 cycUesG c0S44c84 RrSwRspt_Ru_UockRtrqc0x34 (vuUt_ux)
perf E30 q000] e08D6e0859G E38065 cycUesG c0E98460 UockRScqutrec0xE84 (vuUt_ux)
DDD
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 191/300

Usingperf trace

vew/ xwSce captures and displays all tracepoints/events that have been triggered
when executing a command
$ vew/ xwSce ,e B_6xGCB vt_g ,c E E4e DEF5DEDE
OINL E9eDEF8DEDE (E9eDEF8DEDE) !F(84) syxes V/ dSxSD
0D000 vt_gd3f8e0 _exG_exRdeyRWueue(s2sSddwG 0z////9fsscFSEf900: Ue_G 98:
_SueG Be_v34s0B)
0D00! vt_gd3f8e0 _exG_exRdeyRsxSwxRzutx(_SueG Be_v34s0B:
s2sSddwG 0z////9fsscFSEf900: vwVxVcVUG e048: Ue_G 98:
_exwVw2RV//sexG E4: xwS_svVwxRV//sexRySUtdG E: xwS_svVwxRV//sexG 34)
0D009 vt_gd3f8e0 _exG_exRdeyRzutx(s2sSddwG 0z////9fsscFSEf900: Ue_G 98:
_SueG Be_v34s0B)
F4 syxes /wVu E9eDEF8DEDEG tcuvRseW=E xxU=F4 xtue=0D8Ff us
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 192/300

Usingperf top

v6w/ xVv allows to do a live analysis of the running kernel
▶It will sample all function calls and display them ordered by most time consuming
one.
▶This allows to profile the whole system usage
$ v6w/ xVv
PSuvUesG E9m V/ eye_x 'cycUes': 4000 lff: Kye_x cVu_x (SvvwVzD)G 4!fEf34e04 UVsxG 0d0 dwVvG 0d0
OyewTeSd PTSwed Os3ecx P0usVU
e:0E% q_ytdtS] q2] R_y0e33F8wu
0:94% q2ew_eU] q2] RRsxSxtcRcSUURxezxRe_d
0:89% qydsV] qD] 0z0000000000000F!!
0:8E% q_ytdtS] q2] R_y0eff33wu
0:f9% q2ew_eU] q2] cUeSwRvSgeRwev
0:fF% q2ew_eU] q2] vstRgwVuvRcTS_ge
0:f0% q2ew_eU] q2] cTec2RvweeuvxtV_RdtsSsUed
0:F9% cVde qD] 0z000000000Fe3E08/
0:F0% cVde qD] 0z000000000Fe3E083
0:!9% q2ew_eU] q2] vweeuvxRcVu_xRSdd
0:!4% q2ew_eU] q2] uVduUeRgexR2SUUsyu
0:!3% q2ew_eU] q2] cVvyRusewRge_ewtcRsxwt_g
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 193/300

System-wide Profiling & Tracing
ftrace and trace-cmd
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 194/300

ftrace
▶ftraceis a tracing framework within the kernel which stands for ”Function Tracer”.
▶It offers a wide range of tracing capabilities allowing to observe the system
behavior.
•Trace static tracepoints already inserted at various locations in the kernel (scheduler,
interrupts, etc).
•Relies on GCC mcount() capability and kernel code patching mechanism to call
ftracetracing handlers.
▶All traces are recorded in a ring buffer that is optimized for tracing.
▶Usestracefsfilesystem to control and display tracing events.

b uVu_x ,x xwSce/s _Vdey dsysd2ew_eUdxwSct_g .
▶ftracesupport must be enabled in the kernel usingCONkILRkp1ICK=y .

CONkILRjYNInICRkp1ICK allows to have a zero overhead tracing support.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 195/300

ftrace files
▶ftracecontrols are exposed through some specific files located under
dsysd2ew_eUdxwSct_g .

cuwwe_xRxwScew : Current tracer that is used.

SyStUSsUeRxwScews : List of available tracers that are compiled in the kernel.

xwSct_gRV_ : Enable/disable tracing.

xwSce : Acquired trace in human readable format. Format will differ depending on
the tracer used.

xwSceRvtve : same asxwSce , but each read consumes the trace as it is read.

xwSceRuSw2ew{RwSwfi : Emit comments from userspace in the trace buffer.

sexR/xwSceR/tUxew : Filter some specific functions.

sexRgwSvTR/u_cxtV_ : Graph only the specified functions child.
▶Many other files are exposed, seexwSced/xwSce .
▶trace-cmdCLI andKernelsharkGUI tools allow to record and visualize tracing
data more easily.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 196/300

ftrace tracers
▶ftrace provides several ”tracers” which allow to trace different things.
▶The tracer to be used should be written to thecuwwe_xRxwScew file

_Vv: Trace nothing, used to disable all tracing.

/u_cxtV_ : Trace all kernel functions that are called.

/u_cxtV_RgwSvT : Similar to/u_cxtV_ but traces both entry and exit.

TwUSx : Trace hardware latency.

twWXV// : Trace sections where interrupts are disabled.

swS_cT : Trace likely()/unlikely() prediction errors.

uutVxwSce : Trace all accesses to the hardware (weSdqswUW]dwwtxeqswUW] ).
▶Warning: Some tracers can be expensive!
b ecTV B/u_cxtV_B H dsysd2ew_eUdxwSct_gdcuwwe_xRxwScew
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 197/300

function_graphtracer report example
▶Thefunction_graphtraces all the function that executed and their associated
callgraphs
▶Will display the process, CPU, timestamp and function graph:
$ xrSce-cud reporx
DDD
dd-EE3 q000] 304D5e6590G fu_cgrSpTRe_xryG | sysRwrtxe() {
dd-EE3 q000] 304D5e659fG fu_cgrSpTRe_xryG | ksysRwrtxe() {
dd-EE3 q000] 304D5e6603G fu_cgrSpTRe_xryG ffl RRfdgexRpos() {
dd-EE3 q000] 304D5e6609G fu_cgrSpTRe_xryG FD!=E YX ffl RRfgexRUtgTx()g
dd-EE3 q000] 304D5e66eEG fu_cgrSpTRextxG c E8D500 us | fi
dd-EE3 q000] 304D5e66efG fu_cgrSpTRe_xryG ffl vfsRwrtxe() {
dd-EE3 q000] 304D5e6634G fu_cgrSpTRe_xryG FD88= YX ffl rwRvertfyRSreS()g
dd-EE3 q000] 304D5e6646G fu_cgrSpTRe_xryG 6De08 us | wrtxeR_uUU()g
dd-EE3 q000] 304D5e6658G fu_cgrSpTRe_xryG FDe4e YX ffl RRfs_oxtfyRpSre_x()g
dd-EE3 q000] 304D5e6669G fu_cgrSpTRextxG c 43D04e us | fi
dd-EE3 q000] 304D5e66f5G fu_cgrSpTRextxG c f5D588 YX ffl fi
dd-EE3 q000] 304D5e6680G fu_cgrSpTRextxG c 4EDe4E YX ffl fi
dd-EE3 q000] 304D5e6689G fu_cgrSpTRe_xryG | sysRreSd() {
dd-EE3 q000] 304D5e6695G fu_cgrSpTRe_xryG | ksysRreSd() {
dd-EE3 q000] 304D5e6f0eG fu_cgrSpTRe_xryG ffl RRfdgexRpos() {
dd-EE3 q000] 304D5e6f08G fu_cgrSpTRe_xryG FDEFf YX ffl RRfgexRUtgTx()g
dd-EE3 q000] 304D5e6fE9G fu_cgrSpTRextxG c E8D083 us | fi
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 198/300

irqsofftracer
▶ftraceirqsofftracer allows to trace the irqs latency due to interrupts being
disabled for too long.
▶Helpful to find why interrupts have high latencies on a system.
▶This tracer will record the longest trace with interrupts being disabled.
▶This tracer needs to be enabled withIRoPOkkRpRACKR=y .

vw66uvxV// ,vw6uvxtwWXV// tracers also exist to trace section of code were
preemption is disabled.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 199/300

irqsofftracer report example
b USxe_cyG efF us, bE04dE04, COUb0 | (nGpreeupx VOG0, mOG0, POG0 lOG0 bOGe)
b ,,,,,,,,,,,,,,,,,
b | xSskG sxress-_g-EE4 (utdG0 _tceG0 poUtcyG0 rxRprtoG0)
b ,,,,,,,,,,,,,,,,,
b => sxSrxed SxG RRtrqRusr
b => e_ded SxG trqRextx
b
b
b R------=> COUb
b d R-----=> trqs-off
b | d R----=> _eed-rescTed
b || d R---=> TSrdtrqdsofxtrq
b ||| d R--=> preeupx-depxT
b |||| d deUSy
b cud ptd ||||| xtu6 | cSUUer
b \ d ||||| \ | d
sxress-_-EE4 0dDDD eus G RRtrqRusr
sxress-_-EE4 0dDDD fus G gtcRTS_dUeRtrq <-RRtrqRusr
sxress-_-EE4 0dDDD E0us G RRTS_dUeRdouSt_Rtrq <-gtcRTS_dUeRtrq
DDD
sxress-_-EE4 0dDDD ef0us G RRUocSURsTRdtsSsUeRtp <-RRdoRsofxtrq
sxress-_-EE4 0dDsD ef5us G RRdoRsofxtrq <-trqRextx
sxress-_-EE4 0dDsD ef9uscG xrScerRTSrdtrqsRo_ <-trqRextx
sxress-_-EE4 0dDsD e90us G <sxSck xrSce>
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 200/300

Hardware latency detector
▶ftracehwlattracer will help to find if the hardware generates latency.
•Sytem Management interrupts for instance are non maskable and directly trigger
some firmware support feature, suspending CPU execution.
•Interrupts handled by secure monitor can also cause this kind of latency.
▶If some latency is found with this tracer, the system is probably not suitable for
real time usage.
▶Uses a single core looping while interrupts are disabled and measuring the time
elapsed between two consecutive time reads.
▶Needs to be builtin the kernel withCONkILRlWNIpRp1ICK1=y .
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 201/300

trace_printk()

trace_printk() allows to emit strings in the trace buffer
▶Useful to trace some specific conditions in your code and display it in the trace
buffer
#include <linux/ftrace.h>
yVt7 read_hw ()
ffi
t/ (condition)
trace_printk( "Condition is true! ?_ B );
fi
▶Will display the following in the trace buffer forfunction_graph tracer
1) ffl read_hw() {
1) ffl /* Condition is true! */
1) 2.657 us | fi
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 202/300

trace-cmd
▶trace-cmdis a tool written by Steven Rostedt which allows interacting withftrace
(uS_ xrSce- cud(E) ).
▶The tracers supported bytrace-cmdare those exposed by ftrace.
▶trace-cmdoffers multiple commands:
•list: List available plugins/events that can be recorded.
•record: Record a trace into the filexrSceDdSx .
•report: DisplayxrSceDdSx acquisition results.
▶At the end of recording, axrSceDdSx file will be generated.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 203/300

Remote tracing withtrace-cmd
▶trace-cmdoutput can be quite big and thus difficult to store on an embedded
platform with limited storage.
▶For that purpose, aUtXx6_ command is available and allows sending the
acquisitions over the network:
•Run xwSce, cud Utsxe_ , v F!f8 on the remote system that will be collecting the
traces
•On the target system, usexwSce, cud wecVwd , N < xSwgexRtvH GF!f8 to specify the
remote system that will collect the traces
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 204/300

trace-cmdexamples (1/3)
▶List available tracers
$ xrSce-cud Utsx -x
sUk uutoxrSce fu_cxto_RgrSpT fu_cxto_ _op
▶List available events
$ xrSce-cud Utsx -e
DDD
utgrSxeGuuRutgrSxeRpSgesRsxSrx
utgrSxeGuuRutgrSxeRpSges
xUsGxUsR/UYXT
syscSUUsGsysRextxRprocessRvuRwrtxev
DDD
▶List available functions for filtering withfu_cxto_ and fu_cxto_RgrSpT tracers
$ xrSce-cud Utsx -f
DDD
wStxRforRt_txrSufs
RRfxrSceRt_vSUtdRSddressRRR64
cSUtsrSxto_RdeUSyRdo_e
cSUtsrSxeRdeUSy
DDD
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 205/300

trace-cmdexamples (2/3)
▶Start the function tracer and record data globally on the system
$ xrSce-cud record -p fu_cxto_
▶Use the function tracer but filter onlyXvtRC functions
$ xrSce-cud record -U sptRC -p fu_cxto_
▶Trace theddcommand using the function graph tracer:
$ xrSce-cud record -p fu_cxto_RgrSpT dd t/ =ddevduucsUk0 V/ >VYx sX > !Ee cou_x > E0
▶Visualize the data that have been acquired inxrSceDdSx :
$ xrSce-cud reporx
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 206/300

trace-cmdexamples (3/3)
▶Reset all theftracebuffers and remove tracers
$ xrSce-cud resex
▶Run theirqsofftracer on the system:
$ xrSce-cud record -p trqsoff
▶Record onlytwWRTS_7U6wR6ztxdtwWRTS_7U6wR6_xw0 events on the system:
$ xrSce-cud record -e trqGtrqRTS_dUerRextx -e trqGtrqRTS_dUerRe_xry
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 207/300

Adding ftrace tracepoints (1/2)
▶For some custom needs, it might be needed to add custom tracepoints
▶First, one needs to declare the tracepoint definition in aDTfile
bY_76/ p1IJKRP+PpKn
b76/t_6 p1IJKRP+PpKn XYsX0X
bt/ !de/t_ed(Rp1ICKRP(iPYPRl) || de/t_ed(p1ICKRlKIjK1Rn(NpIR1KIj)
b76/t_6 Rp1IJKRP(iP+PRl
bt_cUude <Ut_uzdxwScevVt_xDTH
jKCNI1KRp1ICK(sussysReye_x_Sue:
pORO1OpO( t_x /twsxSwg: sxwucx xSs2Rsxwucx Cv):
pORI1LP(/twsxSwg: v))g
b6_7t/ dC Rp1IJKRP(iP+PRl Cd
dC pTts vSwx uusx se Vuxstde vwVxecxtV_ Cd
bt_cUude <xwScedde/t_eRxwSceDTH
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 208/300

Adding ftrace tracepoints (2/2)
▶Then, emit tracepoint in aDcfile using that header file
bt_cUude <xwScedeye_xsdsussysDTH
bde/t_e C1KIpKRp1ICKROOINpP
jKkINKRp1ICK(sussysReye_x_Sue)g
yVt7 S_yR/u_c ( yVt7 )
ffi
DDD
xwSceRsussysReye_x_Sue(Swg: xSs2)g
DDD
fi
▶See xwScedxwScevVt_xs for more information
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 209/300

Kernelshark
▶Kernelshark is a Qt-based graphical interface for
processingtrace-cmdtrace.dat reports.
▶Can also setup and acquire data usingtrace-cmd.
▶Displays CPU and tasks as different colors along
with the recorded events.
▶Useful when a deep analysis is required for a
specific bug.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 210/300

kernelshark
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 211/300

System-wide Profiling & Tracing
eBPF
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 212/300

eBPF (1/2)
▶BPF stands for Berkeley Packet Filter and was initially used for network packet
filtering
▶eBPFframework in the kernel allows running user-written BPF programs within
the kernel in a safe and efficient way (Added in kernel 3.15)
▶Execution is event-driven and can be hooked using Kprobes, tracepoints and other
methods of tracing
▶Executes complex actions and reports data to userspace for events that took place
in the kernel.
▶Used to hook into various places of the kernel: VFS, Network stack, syscalls, load
balancing, security, etc
Image credits:TxxvXGdd6sv/DtVd
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 213/300

eBPF (2/2)
▶Programs are loaded using thesv/() system call (uS_ sv/(e) ) and then verified
by the kernel BPF verifier before being executed.
•Check of privileges to execute BPF program
•Verifies that the BPF program always runs to completion and does not loop forever
▶Almost all architectures have a BPF JIT support which allows translating the BPF
format into native CPU instruction, thus being (almost) as fast as natively
compiled code
▶BPF programs can return values in maps of various types (hash tables, arrays, etc)
which allows sharing data between user-space, eBPF programs and kernel space.
▶Only some functions (called helpers) can be called in eBPF programs.
▶eBPF programs are attached to events (invoked on trigger).
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 214/300

Writing eBPF programs
▶eBPF programs can be written in (restricted) C and are compiled using clang
compiler
▶BCC (BPF Compiler Collection) provides a toolkit to write BPF programs more
easily using C language (also provides LUA and Python front-ends)
•Allows to write tracing and profiling program easily
▶bpftraceis a high level language allowing to easily write tracing functions
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 215/300

BCC
▶BPF Compiler Collection (BCC) is (as its name suggests) a
collection of BPF based tools.
▶BCC provides a large number of ready-to-use tools written
in BPF.
▶Also provides an interface to write, load and hook BPF
programs more easily than using ”raw” BPF language.
▶Available on a large number of architecture (Unfortunately,
not ARM32).
•On debian, when installed, all tools are named
< xVVUH , sv/cc .
▶BCC requires a kernel version >= 4.1.
▶BCC evolves quickly, many distributions have old versions:
you may need to compile from the latest sources
Image credits:
TxxvsGddgtxTusDcVudtVytsVwdscc
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 216/300

BCC tools
Image credits:TxxvsGddwwwDswe_dS_gweggDcVudesv/DTxuU
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 217/300

BCC Tools example

vwV/tU6Dv0 is a CPU profiler allowing to capture stack traces of current
execution. Its output can be used for flamegraph generation:
$ gtx cUV_e TxxvsGddgtxTusDcVudswe_dS_gweggdkUSueLwSvTDgtx
$ vwV/tU6Dv0 ,7/ ,k 44 E0 | DdkUSueLwSvTd/USuegwSvTDvU H /USuegwSvTDsyg

xcvcV__ecxDvy script displays all new TCP connection live
$ xcvcV__ecx
OIj COnn <O PIjj1 jIjj1 jOO1p
ee03eE ssT F GGE GGE ee
ee03eE ssT 4 EefD0D0DE EefD0D0DE ee
EfFfF CTwVueRCTtUd F eS0EGcsE!G8Ee4G8E00G3fc/Gd4!sGd8fdGd9fd eF0FG!0c0G8003GGE!4 443
qDDD]
▶And much more to discover atTxxvsGddgtxTusDcVudtVytsVwdscc
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 218/300

Using BCC with python
▶BCC python support allows to easily write and hook C program for BPF tracing.
▶Hook with akprobeon thecUV_e() system call and displayBleUUV: WVwUd!B
each time it is called
/wVu scc tuvVwx iOk
b de/t_e iOk vwVgwSu
vwVg = BBB
t_x TeUUV(yVtd Ccxz) {
sv/RxwSceRvwt_x2(BleUUV: WVwUd! ?? _B)g
wexuw_ 0g
fi
BBB
b UVSd iOk vwVgwSu
s = iOk(xezx=vwVg)
sDSxxScTR2vwVse(eye_x=sDgexRsyscSUUR/__Sue( BcUV_eB ): /_R_Sue= BT6UUVB )
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 219/300

bpftrace
▶bpftrace is a high level tracing language allowing to write
tracing expressions easily (TxxvsGddsv/xwSceDVwgd )
▶Also provide tools to trace various parts of the kernel
•Internally uses LLVM to compile script and BCC to interact
with the BPF programs
▶bpftrace is inspired by awk and C, and predecessor tracers
such as DTrace and SystemTap
▶Rich syntax documented atTxxvsGddgtxTusDcVudtVytsVwd
sv/xwScedsUVsduSsxewddVcsdwe/ewe_ceRgutdeDud
Image credits:
TxxvsGddsv/xwSceDVwgd
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 220/300

bpftrace tools
Image credits:TxxvsGddwwwDswe_dS_gweggDcVudesv/DTxuU
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 221/300

Using bpftrace
▶Counting all syscalls per process:
$ sudV sv/xwSce ,e 'xwScevVt_xGwSwRsyscSUUsGsysRe_xew { -qcVuu] = cVu_x()g fi'
IxxScTt_g E vwVseDDD
^C
-qvSc2Sge2txd]G E
-qL(ssKye_xpTweSd]G E
-qgy/s,S/c,yVUuue]G E
-qtsus,ezxe_stV_,]G 4
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 222/300

eBPF: resources
▶A Beginner’s Guide to eBPF Programming - Liz Rice, 2020
•Slides:TxxvsGddsveS2ewdec2DcVudUtffwtcedsegt__ews, gutde, xV, esv/
•Video:TxxvsGddwwwDyVuxuseDcVudwSxcThy=UwPKzp/P, to
•Resources:TxxvsGddgtxTusDcVudUtffwtcedesv/, segt__ews
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 223/300

System-wide Profiling & Tracing
LTTng
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 224/300

LTTng(1/2)
▶LTTng is an open source tracing framework for
Linux maintained by theEfficiOScompany.
▶LTTng allows understanding the interactions
between the kernel and applications (C, C++,
Java, Python).
•Also expose addevdUxx_g- Uogger that can be
used from any application.
▶Tracepoints are associated with a payload (data).
▶LTTng is focused on low-overhead tracing.
▶LTTng provides a unified logging of all events
(kernel/user).
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 225/300

LTTng(2/2)
▶Uses theCTFtrace format (Common Trace Format).
▶LTTng is made of multiple components:
•LTTng-tools: Libraries and command-line interface to control tracing.
•LTTng-modules: Linux kernel modules to instrument and trace the kernel.
•LTTng-UST: Libraries and Java/Python packages to instrument and trace user
applications.
▶Already packaged by various distribution (debian, fedora, etc) and present in
Buildroot and openembedded-core.
▶Uses a single toolUxx_g to control tracing.
▶No need to recompile the kernel but a few options are need

CONkILRnODUNKP ,CONkILRmANNPYnP ,CONkILRlILlRRKPRpInKRP ,
CONkILRpRACKOOINpP ,CONkILRmOROiKP
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 226/300

LTTng architecture
Image credits:TxxvsGddUxx_gDVwgd
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 227/300

Tracepoints withLTTng
▶LTTng can use and trace the following instrumentation points:
•LTTng kernel tracepoints
•kprobes and kretprobes
•Linux kernel system calls
•Linux user space probe
•User space LTTng tracepoints
▶LTTng works with a session daemon that receive all events from kernel and
userspace LTTng tracing components.
▶Session daemon should be started as daemon and the user should be in the
tracinggroup.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 228/300

Creating userspace tracepoints withLTTng
▶New userspace tracepoints can be defined using LTTng.
▶Tracepoints have multiple characteristics:
•A provider namespace
•A name identifying the tracepoint
•Parameters of various types (int, char *, etc)
•Fields describing how to display the tracepoint parameters (decimal, hexadecimal,
etc)
▶Tracepoints are defined using a tracepoint provider header file template and a
tracepoint provider package file.
•The tracepoint provider header file template contains the definition of the
tracepoints.
•The tracepoint provider package is the instantiation of the tracepoints.
▶SeeLTTng-ustmanpage for types
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 229/300

Defining aLTTngtracepoint (1/2)
▶Tracepoint provider header file (TeUUVRwVwUd, xvDT ):
bu_de/ NppNLR(PpRp1ICKOOINpRO1O)IjK1
bde/t_e NppNLR(PpRp1ICKOOINpRO1O)IjK1 TeUUVRwVwUd
bu_de/ NppNLR(PpRp1ICKOOINpRINCN(jK
bde/t_e NppNLR(PpRp1ICKOOINpRINCN(jK BDdTeUUV,xvDTB
bt/ !de/t_ed(RlKNNORpORl) || de/t_ed(NppNLR(PpRp1ICKOOINpRlKIjK1Rn(NpIR1KIj)
bde/t_e RlKNNORpORl
bt_cUude <Uxx_gdxwScevVt_xDTH
NppNLR(PpRp1ICKOOINpRK)KNp(
TeUUVRwVwUd:
uyR/twsxRxwScevVt_x:
NppNLR(PpRpORI1LP(
t_x : uyRt_xegewRSwg:
cTSw C: uyRsxwt_gRSwg
):
NppNLR(PpRpORkIKNjP(
Uxx_gRusxR/teUdRt_xegew( t_x : uyRt_xegewR/teUd: uyRt_xegewRSwg)
Uxx_gRusxR/teUdRsxwt_g(uyRsxwt_gR/teUd: uyRsxwt_gRSwg)
)
)
b6_7t/ dC RlKNNORpORl Cd
bt_cUude <Uxx_gdxwScevVt_x,eye_xDTH
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 230/300

Defining aLTTngtracepoint (2/2)
▶Tracepoint provider package (TeUUVRwVwUd, xvDc ):
bde/t_e NppNLR(PpRp1ICKOOINpRC1KIpKRO1OiKP
bde/t_e NppNLR(PpRp1ICKOOINpRjKkINK
bt_cUude BT6UUV,xvDTB
▶Tracepoint usage (TeUUVRwVwUdDc ):
bt_cUude <sxdtVDTH
bt_cUude BT6UUV,xvDTB
t_x uSt_ (t_x Swgc: cTSw CSwgyq])
ffi
Uxx_gRusxRxwScevVt_x(TeUUVRwVwUd: uyR/twsxRxwScevVt_x: e8 : BTt xTewe!B )g
w6xYw_ 0g
fi
▶Compilation:
$ gcc TeUUVRwVwUdDc TeUUVRwVwUd,xvDc ,UUxx_g,usx ,V TeUUVRwVwUd
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 231/300

Generating tracepoints usingUxx_g, ge_, xv
▶Writing both theDTand Dcboilerplate can be avoided usingUxx_g, ge_, xv .

Uxx_g, ge_, xv takes a template file (Dxv) as input and will generate both the
provider header and package files (DT,Dcand DVfiles):
NppNLR(PpRp1ICKOOINpRK)KNp(
dd pwScevVt_x vwVytdew _Sue
TeUUVRwVwUd:
dd pwScevVt_xdeye_x _Sue
/twXxRxv:
dd pwScevVt_x Swguue_xs (t_vux)
NppNLR(PpRpORI1LP(
cTSw C: x6zx
):
dd pwScevVt_xdeye_x /teUds (Vuxvux)
NppNLR(PpRpORkIKNjP(
Uxx_gRusxR/teUdRsxwt_g(uessSge: xezx)
)
)
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 232/300

UsingLTTng
$ Uxx_g cweSxe uy,xwSct_g,sesstV_ ,,Vuxvux=DduyRxwSces
$ Uxx_g Utsx ,,2ew_eU
$ Uxx_g Utsx ,,usewsvSce
$ Uxx_g e_SsUe,eye_x ,,usewsvSce TeUUVRwVwUdGuyR/twsxRxwScevVt_x
$ Uxx_g e_SsUe,eye_x ,,2ew_eU ,,syscSUU Vve_:cUVse:wwtxe
$ Uxx_g sxSwx
$ dC 1u_ yVuw SvvUtcSxtV_ Vw dV sVuexTt_g Cd
$ Uxx_g desxwVy
$ sSseUxwScee DduyRxwSces
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 233/300

Remote tracing withLTTng
▶LTTng allows to record traces over the network.
▶Useful for embedded systems with limited storage capabilities.
▶On the remote computer, runUxx_g, weUSyd command
$ Uxx_g,weUSyd ,,Vuxvux= ${ O%j fi dxwSces
▶Then on the target, at session creation, use the, , X6x, YwU
$ Uxx_g cweSxe uy,sesstV_ ,,sex,uwU=_exGddweuVxe,sysxeu
▶Traces will then be recorded directly on the remote computer.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 234/300

System-wide Profiling & Tracing
Choosing the right tool
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 235/300

Choosing the right tool
▶Before starting to profile or trace, one should know which type of tool to use.
▶This choice is guided by the level of profiling
▶Often start by analyzing/optimizing the application level using application
tracing/profiling tools (valgrind, perf, etc).
▶Then analyze user space + kernel performance
▶Finally, trace or profile the whole system if the performance problems happens
only when running under a loaded system.
•For ”constant” load problems, snapshot tools works fine.
•For sporadic problems, record traces and analyze them.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 236/300

Practical lab - System wide profiling
Profiling a system from userspace to kernel space
▶Profiling with ftrace, uprobes and kernelshark
▶Profiling with LTTng and trace-compass
▶Profiling with perf
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 237/300

Kernel Debugging
Kernel Debugging
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 238/300

Kernel Debugging
Preventing bugs
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 239/300

Static code analysis
▶Static analysis can be run with thesparsetool
▶sparseworks with annotation and can detect various errors at compile time
•Locking issues (unbalanced locking)
•Address space issues, such as accessing user space pointer directly
▶Analysis can be run usinguS26 J>e to run only on files that are recompiled
▶Or withuS26 J>E to run on all files
▶Example of an unbalanced locking scheme:
wff_ERS!vswDcG8EGE3G wSw_t_gG cV_xezx tusSUS_ce t_ 'S!vswRwegRwuw' , wwV_g cVu_x
Sx 6ztx
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 240/300

Good practices in kernel development (1/2)
▶When writing driver code, never expect the user to provide correct values. Always
check these values.
▶Use theWI1NRON() macro if you want to display a stacktrace when a specific
condition did happen.

duuvRsxSc2() can also be used during debugging to show the current call stack.
sxSxtc sVVU cTec2R/USgs (u3e /USgs)
ffi
t/ (WI1NRON(/USgs & PpIpKRIN)INIj))
w6xYw_ ,K<9)INg
w6xYw_ 0 g
fi
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 241/300

Good practices in kernel development (2/2)
▶If the values can be checked at compile time (configuration input,stffeV/()
structure fields), use thei(INjRi(LRON() macro to ensure the condition is true.
i(INjRi(LRON( Xtff6V/ (cxz,HRRwesewyed) != Xtff6V/ (wesewyed))g
▶If during compilation you have some warnings about unused variables/parameters,
they must be fixed.
▶Apply cTec2vSxcTDvU , , sxwtcx when possible which might find some potential
problems in your code.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 242/300

Kernel Debugging
Linux Kernel Debugging
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 243/300

Linux Kernel Debugging
▶The Linux Kernel features some very useful tools for debugging.
▶These tools are builtin the kernel since their activation often selects
instrumentation code for debugging
•Erroneous memory accesses debugging tools (KASAN,Kmemleak,KFENCE)
•Undefined behavior code debugging (UBSAN)
•Locking errors analysis (lockdep)
▶All the debug features are located under the
mew_eU TSc2t_g , H mew_eU desuggt_g menuconfig entry.

CONkILRjKi(LRmK1NKN should be set to ”y” to enable other debug options.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 244/300

Kernel Debugging
Debugging using messages
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 245/300

Debugging using messages (1/3)
Three APIs are available
▶The oldvwt_x2() , no longer recommended for new debugging messages
▶The vwRC() family of functions:vwReuewg() ,vwRSUewx() ,vwRcwtx() ,vwReww() ,
vwRwSw_() ,vwR_Vxtce() ,vwRt_/V() ,vwRcV_x()
and the specialvwRdesug() (see next pages)
•Defined int_cUudedUt_uzdvwt_x2DT
•They take a classic format string with arguments
•Example:
vwRt_/V( BiVVxt_g CO( %d ?_ B : cvu)g
•Here’s what you get in the kernel log:
q e0eD3!00F4] iVVxt_g CO( E

vwt_xRTezRduuvRdesug() : useful to dump a buffer withT6z7Yuv like display
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 246/300

Debugging using messages (2/3)
▶The deyRC() family of functions:deyReuewg() ,deyRSUewx() ,deyRcwtx() ,
deyReww() ,deyRwSw_() ,deyR_Vxtce() ,deyRt_/V()
and the specialdeyRdsg() (see next page)
•They take a pointer tosxwucx deytce as first argument, and then a format string
with arguments
•Defined int_cUudedUt_uzddeyRvwt_x2DT
•To be used in drivers integrated with the Linux device model
•Example:
deyRt_/V(&vdey,Hdey: Bt_ vwVs6 ?_ B )g
•Here’s what you get in the kernel log:
q e!D8f838e] sewtSU 480e4000DsewtSUG t_ vwVse
q e!D8848f3] sewtSU 48ES8000DsewtSUG t_ vwVse

CRwSxeUtutxed() version exists which limits the amount of print if called too
much based ondvwVcdsysd2ew_eUdvwt_x2RwSxeUtutx{Rsuwsxfi values
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 247/300

Debugging using messages (3/3)
▶The kernel defines many more format specifiers than the standardvwt_x/()
existing ones.

%v: Display the hashed value of pointer by default.

%vz: Always display the address of a pointer (use carefully on non-sensitive
addresses).

%vm: Display hashed pointer value, zeros or the pointer address depending on
2vxwRwesxwtcx sysctl value.

%vOk: Device-tree node format specifier.

%vw: Resource structure format specifier.

%vS: Physical address display (work on all architectures 32/64 bits)

%ve: Error pointer (displays the string corresponding to the error number)

dvwVcdsysd2ew_eUd2vxwRwesxwtcx should be set toEin order to display pointers
which uses%vm
▶See cVwe, Svtdvwt_x2, /VwuSxs for an exhaustive list of supported format
specifiers
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 248/300

pr_debug() and dev_dbg()
▶When the driver is compiled withjKi(L defined, all these messages are compiled
and printed at the debug level.jKi(L can be defined byb76/t_6 jKi(L at the
beginning of the driver, or usingcc/USgs, $(CONkILRj1I)K1) c= , jjKi(L in the
nS26/tU6
▶When the kernel is compiled withCONkILRjYNInICRjKi(L , then these messages
can dynamically be enabled on a per-file, per-module or per-message basis, by
writing commands todvwVcddy_SutcRdesugdcV_xwVU . Note that messages are
not enabled by default.
•Details inSdut_, gutdeddy_Sutc, desug, TVwxV
•Very powerful feature to only get the debug messages you’re interested in.
▶When neitherjKi(L norCONkILRjYNInICRjKi(L are used, these messages are not
compiled in.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 249/300

pr_debug() and dev_dbg() usage
▶Debug prints can be enabled using thedvwVcddy_SutcRdesugdcV_xwVU file.

cSx dvwVcddy_SutcRdesugdcV_xwVU will display all lines that can be enabled in the
kernel
•Example: t_txduSt_DcGE4ef quSt_]wu_Rt_txRvwVcess =v B \%s\0EeB
▶A syntax allows to enable individual print using lines, files or modules

ecTV B/tUe dwtyewsdvt_cxwUdcVweDc cvB H dvwVcddy_SutcRdesugdcV_xwVU will
enable all debug prints indwtyewsdvt_cxwUdcVweDc

ecTV BuVduUe vcteTv cvB H dvwVcddy_SutcRdesugdcV_xwVU will enable the
debug print located in thevcteTv module

ecTV B/tUe t_txduSt_Dc Ut_e E4ef cvB H dvwVcddy_SutcRdesugdcV_xwVU will
enable the debug print located at line 1247 of filet_txduSt_Dc
•Replacecvwith, vto disable the debug print
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 250/300

Debug logs troubleshooting
▶When using dynamic debug, make sure that your debug call is enabled: it must be
visible incV_xwVU file in debugfsandbe actived (>v)
▶Is your log output only in kernel log buffer?
•You can see it thanks toduesg
•You can lower theUVgUeyeU to output it to the console directly
•You can also settg_VweRUVgUeyeU in the kernel command line to force all kernel
logs to console
▶If you are working on an out-of-tree module, you may prefer to definejKi(L in
your module source or Makefile instead of using dynamic debug
▶If configuration is done through kernel command line, is it properly interpreted?
•Starting from 5.14, kernel will let you know about faulty command line:
(_2_Vw_ 2ew_eU cVuuS_d Ut_e vSwSuexews /VV: wtUU se vSssed xV usew
svSceD
•You may need to take care of special characters escaping (e.g: quotes)
▶Be aware that a few subsystems bring their own logging infrastructure, with
specific configuration/controls, eg:dwuDdesug=0zE//
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 251/300

Kernel early debug
▶When booting, the kernel sometimes crashes even before displaying the system
messages
▶On ARM, if your kernel doesn’t boot or hangs without any message, you can
activate early debugging options

CONkILRjKi(LRNN=y to enable ARM early serial output capabilities

CONkILRKI1NYO1INpm=y will allow printk to output the prints earlier

6SwU0vwt_x2 command line parameter should be given to enable early printk
output
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 252/300

Kernel Debugging
Kernel crashes and oops
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 253/300

Kernel crashes
▶The kernel is not immune to crash, many errors can be done and lead to crashes
•Memory access error (NULL pointer, out of bounds access, etc)
•Voluntarily panicking on error detection (usingvS_tc() )
•Kernel incorrect execution mode (sleeping in atomic context)
•Deadlocks detected by the kernel (Soft lockup/locking problem)
▶On error, the kernel will display a message on the console that is called a ”Kernel
oops”
Icon by Peter van Driel, TheNounProject.com
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 254/300

Kernel oops (1/2)
▶The content of this message depends on the architecture that is used.
▶Almost all architectures display at least the following information:
•CPU state when the oops happened
•Registers content with potential interpretation
•Backtrace of function calls that led to the crash
•Stack content (last X bytes)
▶Depending on the architecture, the crash location can be identified using the
content of the PC registers (sometimes named IP, EIP, etc).
▶To have a meaningful backtrace with symbol names useCONkILRmINNPYnP=y
which will embed the symbol names in the kernel image.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 255/300

Kernel oops (2/2)
▶Symbols are displayed in the backtrace using the following format:

< syusVUR_SueH c< TezRV//sexH d< syusVURstffeH
▶If the oops is not critical (taken in process context), then the kernel will kill
process and continue its execution
•The kernel stability might be compromised!
▶Tasks that are taking too much time to execute and that are hung can also
generate an oops (CONkILRjKpKCpRl(NLRpIPm )
▶If KGDB support is present and configured, on oops, the kernel will switch to
KGDB mode.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 256/300

Oops example (1/2)
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 257/300

Oops example (2/2)
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 258/300

Kernel oops debugging:S77weUt_6
▶In order to convert addresses/symbol name from this display to source code lines,
one can use addr2line

SddweUt_e , e yuUt_uz < SddwessH
▶GNU binutils >= 2.39 takes the symbol+offset notation too:

SddweUt_e , e yuUt_uz < syusVUR_SueH c< V//H
▶The symbol+offset notation can be used with older binutils versions via the
/S77weUt_6 script in the kernel sources:

scwtvxsd/SddweUt_e yuUt_uz < syusVUR_SueH c< V//H
▶The kernel must have been compiled withCONkILRjKi(LRINkO=y to embed the
debugging information into the vmlinux file.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 259/300

Kernel oops debugging:decVdeRsxSc2xwSceDsT

S77weUt_6 decoding of oopses can be automated usingdecVdeRsxSc2xwSceDsT
script which is provided in the kernel sources.
▶This script will translate all symbol names/addresses to the matching file/lines
and will display the assembly code where the crash did trigger.

DdscwtvxsddecVdeRsxSc2xwSceDsT yuUt_uz Ut_uzRsVuwceRvSxTd < VVvsR
wevVwxDxzx H decVdedRVVvsDxzx
▶NOTE: C1OPPRCOnOINK and I1Jl env var should be set to obtain the correct
disassembly dump.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 260/300

Oops behavior configuration
▶Sometimes, crash might be so bad that the kernel will panic and halt its execution
entirely by stopping scheduling application and staying in a busy loop.
▶Automatic reboot on panic can be enabled viaCONkILROINICRpInKO(p
•0: never reboots
•Negative value: reboot immediately
•Positive value: seconds to wait before rebooting
▶OOPS can be configured to always panic:
•at boot time, addingVVvs=vS_tc to the command line
•at build time, settingCONkILROINICRONROOOP=y
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 261/300

Kernel Debugging
The Magic SysRq
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 262/300

The Magic SysRq
Functionality provided by serial drivers
▶Allows to run multiple debug/rescue commands even when the kernel seems to be
in deep trouble
•On embedded: in the console, send a break character
(Picocom: pressqCxwU] + Sfollowed byqCxwU] + ?), then press< cTSwScxewH
•By echoing< cTSwScxewH indvwVcdsyswW, xwtggew
▶Example commands:

T: show available commands

X: sync all mounted filesystems

s: reboot the system

w: shows the kernel stack of all sleeping processes

x: shows the kernel stack of all running processes

g: enter kgdb mode

ff: flush trace buffer

c: triggers a crash (kernel panic)
•You can even register your own!
▶Detailed inSdut_, gutdedsyswW
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 263/300

Kernel Debugging
Built-in kernel self tests
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 264/300

Kernel memory issue debugging
▶The same kind of memory issues that can happen in user space can be triggered
while writing kernel code
•Out of bounds accesses
•Use-after-free errors (dereferencing a pointer after2/wee() )
•Out of memory due to missing2/wee()
▶Various tools are present in the kernel to catch these issues
•KASANto find use-after-free and out-of-bound memory accesses
•KFENCEto find use-after-free and out-of-bound in production systems
•Kmemleakto find memory leak due to missing free of memory
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 265/300

KASAN
▶Kernel Address Space Sanitizer
▶Allows to find use-after-free and out-of-bounds memory accesses
▶Uses GCC to instrument the kernel at compile-time
▶Supported by almost all architectures (ARM, ARM64, PowerPC, RISC-V, S390,
Xtensa and X86)
▶Needs to be enabled at kernel configuration withCONkILRmAPAN
▶Can then be enabled for files by modifying Makefile

mIPI9RPI9<p<QKR/tU6DV G> 0 for a specific file

mIPI9RPI9<p<QK G> 0 for all files in the Makefile folder
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 266/300

Kmemleak
▶Kmemleak allows to find memory leaks for dynamically allocated objects with
kuSUUoc()
•Works by scanning the memory to detect if allocated address are not referenced
anymore anywhere (large overhead).
▶Once enabled withCONkILRDKiULRmnKnNKAm , kmemleak control files will be visible
indebugfs
▶Memory leaks is scanned every 10 minutes
•can be disabled viaCONkILRDKiULRmnKnNKAmRAUpORPCAN
▶An immediate scan can be triggered using

b ecTo scS_ > dsysdker_eUddesugdkueuUeSk
▶Results are displayed in debugfs

b cSx dsysdker_eUddesugdkueuUeSk
▶See 76y, xVVUXd2u6uU6S2 for more information
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 267/300

Kmemleakreport
b cSx dsysdker_eUddesugdkueuUeSk
u_refere_ced osjecx 0x8ed43E00 (stffe 64)G
couu Bt_suodB, ptd E40, jtfftes 4e949434e4 (Sge ef0D4e0s)
Tex duup (ftrsx 3e syxes)G
s4 ss eE 8f c8 S4 eE 8f 8c ce eE 8f 88 c6 eE 8f DDDDDDDDDDDDDDDD
E0 S5 eE 8f E8 ee eE 8f Sc c6 eE 8f 0c cE eE 8f DDDDDDDDDDDDDDDD
sSckxrSceG
q<c3Ef5s59>] sUSsRposxRSUUocRTookc0xS8d0xEs8
q<c8e00Sds>] kueuRcScTeRSUUocRxrScec0xs8d0xE04
q<E836406s>] 0xff005038
q<89fff56d>] doRo_eRt_txcSUUc0x80d0xES8
q<3Ed908e3>] doRt_txRuoduUec0x50d0xeE0
q<e658dd55>] UoSdRuoduUec0xe08cd0xeEEc
q<eEd48fE5>] sysRft_txRuoduUec0xe4d0xf4
q<EdeEe5e9>] rexRfSsxRsyscSUUc0x0d0x54
q<fee8Ef34>] 0xfecS8c80
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 268/300

UBSAN
▶UBSAN is a runtime checker for code with undefined behavior
•Shifting with a value larger than the type
•Overflow of integers (signed and unsigned)
•Misaligned pointer access
•Out of bound access to static arrays
•https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
▶It uses compile-time instrumentation to insert checks that will be executed at
runtime
▶Must be enabled usingCONkILRUiPAN=y
▶Then, can be enabled for specific files by modifying Makefile

(iPI9RPI9<p<QKR/tU6DV G> 0 for a specific file

(iPI9RPI9<p<QK G> 0 for all files in the Makefile folder
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 269/300

UBSAN: example of UBSAN report
▶Report for an undefined behavior due to a shift with a value > 32.
UiPANG U_deft_ed seTSvtour t_ uudpSgeRSUUocDcG3EEfGE9
sTtfx expo_e_x 5E ts xoo USrge for 3e-stx xype 't_x'
COUG 0 OIDG 65e0 CouuG syff-execuxorE Nox xSt_xed 4DE9D0-rce bE
lSrdwSre _SueG oKnU PxS_dSrd OC (t440kX c OIIX, E996), iIOP iocTs 0Ed0Ede0EE
CSUU prSceG
RRduupRsxSck UtsdduupRsxSckDcGff qt_Ut_e]
duupRsxSckc0xded0xE48 UtsdduupRsxSckDcGEE3
ussS_ReptUoguec0xEed0x94 UtsdussS_DcGE59
RRussS_RTS_dUeRsTtfxRouxRofRsou_dsc0xes6d0x30s UtsdussS_DcG4e5
DDD
RIOG 0033G0x449fs9
CodeG e8 8c 9f 0e 00 48 83 c4 E8 c3 0f Ef 80 00 00 00 00 48 89 f8 48
89 ff 48 89 d6 48 89 cS 4d 89 ce 4d 89 c8 4c 8s 4c e4 08 0f 05 <48> 3d
0E f0 ff ff 0f 83 9s 6s fc ff c3 66 ee 0f Ef 84 00 00 00 00
RPOG 00esG0000ffs5ef0eec68 KkNALPG 00000e46 ORILRRAXG 00000000000000E0
RAXG ffffffffffffffdS RiXG 0000ffs5ef0e36cc RCXG 0000000000449fs9
RDXG 00000000e0000040 RPIG 0000000000000e58 RDIG 00000000000000E4
RiOG 0000000000fEseS0 R08G 0000000000000000 R09G 0000000000000000
RE0G 0000000000000000 REEG 0000000000000e46 REeG 00000000ffffffff
RE3G 0000000000005490 RE4G 00000000006ed530 RE5G 0000ffs5ef0e3f00
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 270/300

Debugging locking
▶Lock debugging: prove locking correctness

CONkILRO1O)KRNOCmINL
•Adds instrumentation to kernel locking code
•Detect violations of locking rules during system life, such as:
Locks acquired in different order (keeps track of locking sequences and compares
them).
Spinlocks acquired in interrupt handlers and also in process context when interrupts
are enabled.
•Not suitable for production systems but acceptable overhead in development.
•See UVc2t_gdUVc2dev, destg_ for details

CONkILRjKi(LRIpOnICRPNKKO allows to detect code that incorrectly sleeps in
atomic section (while holding lock typically).
•Warning displayed induesg in case of such violation.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 271/300

Concurrency issues
▶Kernel Concurrency SANitizer framework

CONkILRmCPIN , introduced in Linux 5.8.
▶Dynamic race detector relying on compile time instrumentation.
▶Can find concurrency issues (mainly data races) in your system.
▶See dey, xVVUsd2csS_ and TxxvsGddUw_D_exdIwxtcUesd8EF8!0d for details.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 272/300

Kernel Debugging
KGDB
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 273/300

kgdb - A kernel debugger

CONkILRmLji inKernel hacking.
▶The execution of the kernel is fully controlled bygdsfrom another machine,
connected through a serial line.
▶Can do almost everything, including inserting breakpoints in interrupt handlers.
▶Feature supported for the most popular CPU architectures

CONkILRLjiRPC1IOpP allows to build GDB python scripts that are provided by the
kernel.
•See dey, xVVUsdgds, 2ew_eU, desuggt_g for more information
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 274/300

kgdb kernel config

CONkILRjKi(LRmK1NKN=y to make KGDB support visible

CONkILRmLji=y to enable KGDB support

CONkILRjKi(LRINkO=y to compile the kernel with debug info (, g)

CONkILRk1InKROOINpK1=y to have more reliable stacktraces

CONkILRmLjiRPK1IINRCONPONK=y to enable KGDB support over serial

CONkILRLjiRPC1IOpP=y to enable kernel GDB python scripts

CONkILR1INjOnIZKRiIPK=_ to disable KASLR

CONkILRWIpCljOL=_ to disable watchdog

CONkILRnILICRPYP1o=y to enable Magic SysReq support

CONkILRPp1ICpRmK1NKNR1WX=_ to disable memory protection on code section,
thus allowing to put breakpoints
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 275/300

kgdb pitfalls
▶KASLR should be disabled to avoid confusing gdb with randomized kernel
addresses
•Disablekaslr mode using_V2SXUw command line parameter if enabled in your kernel.
▶Disable the platform watchdog to avoid rebooting while debugging.
•When interrupted by KGDB, all interrupts are disabled thus, the watchdog is not
serviced.
•Sometimes, watchdog is enabled by upper boot levels. Make sure to disable the
watchdog there too.
▶Can not interrupt kernel execution from gdb usingt_x6wwYvx command or
JxwU c J .
▶Not possible to break everywhere (seeCONkILRmLjiRlONO(1RiNOCmNIPp ).
▶Need a console driver with polling support.
▶Some architecture lacks functionalities (No watchpoints on arm32 for instance)
and some instabilities might happen!
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 276/300

Using kgdb (1/2)
▶Details available in the kernel documentation:dey, xVVUsd2gds
▶You must include a kgdb I/O driver. One of them is2gds over serial console
(2gdsVc : 2gds over console, enabled byCONkILRmLjiRPK1IINRCONPONK )
▶Configure2gdsVc at boot time by passing to the kernel:

2gdsVc=< xxy, deytceH : < sSudsH .
•For example:2gdsVc=xxyP0: EE!e00
▶Or at runtime using sysfs:

ecTV xxyP0 H dsysduVduUed2gdsVcdvSwSuexewsd2gdsVc
•If the console does not have polling support, this command will yield an error.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 277/300

Using kgdb (2/2)
▶Then also pass2gdswStx to the kernel: it makes2gds wait for a debugger
connection.
▶Boot your kernel, and when the console is initialized, interrupt the kernel with a
break character and thengin the serial console (see ourMagic SysRq
explanations).
▶On your workstation, startgdsas follows:

Swu, Ut_uz, gds DdyuUt_uz

(gds) sex weuVxesSud EE!e00

(gds) xSwgex weuVxe ddeydxxyP0
▶Once connected, you can debug a kernel the way you would debug an application
program.
▶On GDB side, the first threads represent the CPU context (ShadowCPU<x>),
then all the other threads represents a task.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 278/300

KernelGDBscripts

CONkILRLjiRPC1IOpP allows to build a set of python script which ease the kernel
debugging by adding new commands and functions.
▶When using gds yuUt_uz , the scripts present in vmlinux-gdb.py file at the root of
build dir will be loaded automatically.

Uz, X0usVUX : (Re)load symbols for vmlinux and modules

Uz, duesg : display kernel dmesg

Uz, UXuV7 : display loaded modules

Uz, deytce, {sus|cUSss|xweefi : display device bus, classes and tree

Uz, vX : vXlike view of tasks

$UzRcuwwe_x() contains the currentxSs2Rsxwucx

$UzRvewRcvu(ySw: cvu) returns a per-cpu variable

SvwVvVX Uz To display all available functions.

dey, xVVUsdgds, 2ew_eU, desuggt_g
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 279/300

KDB

CONkILRmLDiRmDi includes a kgdb frontend name ”KDB”
▶This frontend exposes a debug prompt on the serial console which allows
debugging the kernel without the need for an external gdb.
▶KDB can be entered using the same mechanism used for entering kgdb mode.
▶KDBandKGDBcan coexist and be used at the same time.
•Use thekgdscommand in KDB to enter kgdb mode.
•Send a maintenance packet from gdb usinguSt_xe_S_ce pSckex 3 to switch from
kgdb to KDB mode.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 280/300

kdmx
▶When the system has only a single serial port, it is not possible to use both KGDB
and the serial line as an output terminal since only one program can access that
port.
▶Fortunately, thekdmxtool allows to use both KGDB and serial output by splitting
GDB messages and standard console from a single port to 2 slave pty
(d76ydvxXdz )
▶https://git.kernel.org/pub/scm/utils/kernel/kgdb/agent-proxy.git
•Located in the subdirectory27uz
$ kdux -_ -d -pddevdxxyACn0 -sEE5e00
sertSU porxG ddevdxxyACn0
I_txSUtfft_g xTe sertSU porx xo EE5e00 8_E
d76ydvxXdF tX XUSy6 vx0 /Vw x6wut_SU 6uYUSxVw
ddevdpxsdf ts sUSve pxy for gds
Use <cxrU>C xo xerut_Sxe progrSu
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 281/300

Going further with KGDB
▶Good presentation from Doug Anderson with a lot of demos and explanations
•Video:TxxvsGddwwwDyVuxuseDcVudwSxcThy=liOwVPy1uys
•Slides:TxxvsGddeUt_uzDVwgdtuSgesdEdEsdKNCE9RPewtSUR2dsR2gdsDvd/
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 282/300

Kernel Debugging
crash
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 283/300

crash
▶crashis a CLI tool allowing to investigate kernel (dead or alive!)
•Uses /dev/mem or /proc/kcore on live systems
•RequiresCONkILRPpRICpRDKVnKn=_
▶Can use a coredump generated using kdump, kvmdump, etc.
▶Based ongdsand provides many specific commands to inspect the kernel state.
•Stack traces, dmesg (Uog), memory maps of the processes, irqs, virtual memory
areas, etc.
▶Allows examining all the tasks that are running on the system.
▶Hosted atTxxpsGddgtxTusDcoudcrSsT- uxtUtxydcrSsT
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 284/300

crashexample
$ crSsT vuUt_ux vucore
qDDD]
pIPmPG f!
NODKNAnKG sutUdroox
RKNKAPKG 5DE3D0
VKRPIONG bE PnO ORKKnOp pue Nov E5 E4G4eGe5 CKp e0ee
nAClINKG SruvfU (u_k_ow_ nTff)
nKnORYG 5Ee ni
OANICG BU_SsUe xo TS_dUe ker_eU NUNN pot_xer derefere_ce Sx vtrxuSU Sddress 000000f0B
OIDG Eef
COnnANDG BwSxcTdogB
pAPmG c3fE63c0 qplRKADRINkOG c3f00000]
COUG E
PpApKG pAPmRRUNNINL (OANIC)
crSsT> uScT
nAClINK pYOKG SruvfU
nKnORY PIZKG 5Ee ni
COUPG E
OROCKPPOR POKKDG (u_k_ow_)
lZG E00
OALK PIZKG 4096
mKRNKN VIRpUAN iAPKG c0000000
mKRNKN nODUNKP iAPKG sf000000
mKRNKN VnANNOC iAPKG e0000000
mKRNKN PpACm PIZKG 8E9e
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 285/300

Kernel Debugging
Post-mortem analysis
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 286/300

Kernel crash post-mortem analysis
▶Sometimes, accessing the crashed system is not possible or the system can’t stay
offline while waiting to be debugged
▶Kernel can generate crash dumps (avmcorefile) to a remote location, allowing to
quickly restart the system while still be able to perform post-mortem analysis with
GDB.
▶This feature relies onkexecandkdumpwhich will boot another kernel as soon as
the crash occurs right after dumping thevmcorefile.
•Thevmcorefile can be saved on local storage, via SSH, FTP etc.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 287/300

kexec & kdump (1/2)
▶On panic, the kernel kexec support will execute a ”dump-capture kernel” directly
from the kernel that crashed
•Most of the time, a specific dump-capture kernel is compiled for that task (minimal
config with specific initramfs/initrd)
▶kexecsystem works by saving some RAM for the kdump kernel execution at
startup

cwSsT2ew_eU parameter should be set to specify the crash kernel dedicated physical
memory region
▶kexec-toolsare then used to load dump-capture kernel into this memory zone
using the2ezec command
•Internally uses the2ezecRUVSd system calluS_ 2ezecRUVSd(e)
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 288/300

kexec & kdump (2/2)
▶Finally, on panic, the kernel will reboot into the ”dump-capture” kernel allowing
the user to dump the kernel coredump (dvwVcdyucVwe ) onto whatever media
▶Additional command line options depends on the architecture
▶See Sdut_, gutded2duuvd2duuv for more comprehensive explanations on how to
setup the kdump kernel with2ezec .
▶Additional user-space services and tools allow to automatically collect and dump
the vmcore file to a remote location.
•See kdump systemd service and theuS267Yuv/tU6 tool which can also compress the
vmcore file into a smaller file (Only for x86, PPC, IA64, S390).

TxxvsGddgtxTusDcVuduS2eduuv/tUeduS2eduuv/tUe
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 289/300

kdump
Image credits: Wikipedia
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 290/300

kexec config and setup
▶On the standard kernel:

CONkILRmKXKC=y to enable KEXEC support

2ezec, xVVUs to provide the2ezec command
•A kernel and a DTB accessible by2ezec
▶On the dump-capture kernel:

CONkILRC1IPlRj(nO=y to enable dumping a crashed kernel

CONkILRO1OCR)nCO1K=y to enabledvwVcdyucVwe support

CONkILRI(pORZ1KNIjj1=y on ARM32 platforms
▶Set the correctcwSsT2ew_eU command line option:

cwSsT2ew_eU=stffeqmnL]q-V//sexqmnL]]
▶Load a dump-capture kernel on the first kernel with2ezec :

2ezec , , xyve ffIuSge , v uyRffIuSge , , dxs=uyRdxsDdxs , ,
t_txwd=uyRt_txwd , , Svve_d=BcVuuS_d Ut_e VvxtV_B
▶Then simply wait for a crash to happen!
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 291/300

Going further with kexec & kdump
▶Presentation from Steven Rostedt about using kexec, kdump and ftrace with lot
of tips and tricks about using kexec/kdump
•Video:TxxvsGddwwwDyVuxuseDcVudwSxcThy=S(LNjMOv((g
•Slides:TxxvsGddsxSxtcDscTedDcVudTVsxedR/tUesdVss_Se0eedc0dOVsxuVwxeuR
%e0mezec%eC%e0mduuv%e0S_d%e0kxwSceDvd/
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 292/300

Practical lab - Kernel debugging
Debugging kernel crashes and driver problems
▶Debug locking issues using lockdep
▶Use kmemleak to detect memory leaks on the
system
▶Analyze an OOPS message
▶Debug a crash with KGDB
▶Setup kexec, kdump and extract a kernel
coredump
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 293/300

Going further
Going further
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 294/300

Debugging resources
▶Brendan GreggSystems performancebook
▶Brendan GreggLinux Performancepage
▶Tools and Techniques to Debug an Embedded Linux System, talk from Sergio
Prado,video,slides
▶Tracing with Ftrace: Critical Tooling for Linux Development, talk from Steven
Rostedt,video
▶Tutorial: Debugging Embedded Devices using GDB, tutorial from Chris
Simmonds,video
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 295/300

Going further (Tracing & Profiling)
▶Great book from Brendan Gregg, an expert in
tracing and profiling
▶https://www.brendangregg.com/blog/2020-07-
15/systems-performance-2nd-edition.html
▶Covers concepts, strategy, tools, and tuning for
Linux kernel and applications.
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 296/300

Going further (BPF)
▶Still from Brendan Gregg!
▶Covers more than 150 tools that uses BPF.
▶Explains how to analyze the results from these tools
to optimize your system.
▶https://www.brendangregg.com/bpf-performance-
tools-book.html
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 297/300

Last slides
Last slides
© Copyright 2004-2024, Bootlin.
Creative Commons BY-SA 3.0 license.
Corrections, suggestions, contributions and translations are welcome!  
embedded Linux and kernel engineering
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 298/300

Last slide
Thank you!
And may the Source be with you
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 299/300

Rights to copy
© Copyright 2004-2024, Bootlin
License: Creative Commons Attribution - Share Alike 3.0
TxxvsGddcweSxtyecVuuV_sDVwgdUtce_sesdsy, sSd3D0dUegSUcVde
You are free:
▶to copy, distribute, display, and perform the work
▶to make derivative works
▶to make commercial use of the work
Under the following conditions:
▶Attribution. You must give the original author credit.
▶Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only
under a license identical to this one.
▶For any reuse or distribution, you must make clear to others the license terms of this work.
▶Any of these conditions can be waived if you get permission from the copyright holder.
Your fair use and other rights are in no way affected by the above.
Document sources: TxxvsGddgtxTusDcVudsVVxUt_dxwSt_t_g, uSxewtSUsd
- Kernel, drivers and embedded Linux - Development, consulting, training and support -hxxvs:ddsooxUtnDcou 300/300
Tags