Using Libtracecmd to Analyze Your Latency and Performance Troubles

ScyllaDB 322 views 81 slides Jul 01, 2024
Slide 1
Slide 1 of 81
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81

About This Presentation

Trying to figure out why your application is responding late can be difficult, especially if it is because of interference from the operating system. This talk will briefly go over how to write a C program that can analyze what in the Linux system is interfering with your application. It will use tr...


Slide Content

Using Libtracecmd to Analyze Latency and Performance Troubles Steven Rostedt Software Engineer

Steven Rostedt Software Engineer One of the original authors of the PREEMPT_RT patch set Creator and maintainer of ftrace Creator of “make localmodconfig” Creator and maintainer of “ktest.pl” Linux testing framework

Tracing kernel latency on your application The kernel is a black box

Tracing kernel latency on your application The kernel is a black box Your application is at the whims of the kernel scheduler

Tracing kernel latency on your application The kernel is a black box Your application is at the whims of the kernel scheduler Interrupts can cause delays in your application

Tracing kernel latency on your application The kernel is a black box Your application is at the whims of the kernel scheduler Interrupts can cause delays in your application Kernel lock contention could add to the latency

Tracing kernel latency on your application The kernel is a black box Your application is at the whims of the kernel scheduler Interrupts can cause delays in your application Kernel lock contention could add to the latency Tracing can give you insight into the happenings of the kernel Monitor the scheduling decisions the kernel is making Record when interrupts are happening and for how long See how long kernel locks are held

Using cyclictest c yclictest tests the latency of the system Calls nanosleep() and sees when it woke up to when it expected to wake up

Using cyclictest cyclictest tests the latency of the system Calls nanosleep() and sees when it woke up to when it expected to wake up The Real Time Linux developers use this to test the jitter of the system Loads are run while cyclictest is measuring the latency Requires the latency to stay below a threshold May run for weeks or months

Using cyclictest cyclictest tests the latency of the system Calls nanosleep() and sees when it woke up to when it expected to wake up The Real Time Linux developers use this to test the jitter of the system Loads are run while cyclictest is measuring the latency Requires the latency to stay below a threshold May run for weeks or months I will use cyclictest as a example for this talk

Using cyclictest cyclictest tests the latency of the system Calls nanosleep() and sees when it woke up to when it expected to wake up The Real Time Linux developers use this to test the jitter of the system Loads are run while cyclictest is measuring the latency Requires the latency to stay below a threshold May run for weeks or months I will use cyclictest as a example for this talk But this works for any application

A typical cyclictest run ~ # cyclictest -p80 -i250 -a -t -m -d 0 # /dev/cpu_dma_latency set to 0us policy: fifo: loadavg: 0.00 0.00 0.00 1/9997 68021 T: 0 (68015) P:80 I:250 C: 18570 Min: 3 Act: 3 Avg: 3 Max: 7 T: 1 (68016) P:80 I:250 C: 18569 Min: 3 Act: 3 Avg: 3 Max: 5 T: 2 (68017) P:80 I:250 C: 18569 Min: 3 Act: 3 Avg: 3 Max: 9 T: 3 (68018) P:80 I:250 C: 18568 Min: 3 Act: 3 Avg: 3 Max: 6 T: 4 (68019) P:80 I:250 C: 18568 Min: 3 Act: 3 Avg: 3 Max: 5 T: 5 (68020) P:80 I:250 C: 18567 Min: 3 Act: 3 Avg: 3 Max: 95 T: 6 (68021) P:80 I:250 C: 18567 Min: 3 Act: 3 Avg: 3 Max: 9

A typical cyclictest run ~# cyclictest -p80 -i250 -a -t -m -d 0 # /dev/cpu_dma_latency set to 0us policy: fifo: loadavg: 0.00 0.00 0.00 1/9997 68021 T: 0 (68015) P:80 I:250 C: 18570 Min: 3 Act: 3 Avg: 3 Max: 7 T: 1 (68016) P:80 I:250 C: 18569 Min: 3 Act: 3 Avg: 3 Max: 5 T: 2 (68017) P:80 I:250 C: 18569 Min: 3 Act: 3 Avg: 3 Max: 9 T: 3 (68018) P:80 I:250 C: 18568 Min: 3 Act: 3 Avg: 3 Max: 6 T: 4 (68019) P:80 I:250 C: 18568 Min: 3 Act: 3 Avg: 3 Max: 5 T: 5 (68020) P:80 I:250 C: 18567 Min: 3 Act: 3 Avg: 3 Max: 95 T: 6 (68021) P:80 I:250 C: 18567 Min: 3 Act: 3 Avg: 3 Max: 9 -p PRIO --priority=PRIO priority of highest prio thread -i INTV --interval=INTV base interval of thread in us default=1000 -a [CPUSET] --affinity Run thread #N on processor #N, if possible, or if CPUSET given, pin threads to that set of processors in round- robin order. -t --threads one thread per available processor -m --mlockall lock current and future memory allocations -d DIST --distance=DIST distance of thread intervals in us, default=500

c yclictest and tracing Can break break when a latency is greater than a given threshold

cyclictest and tracing Can break break when a latency is greater than a given threshold Will write into the kernel tracing buffer

cyclictest and tracing Can break break when a latency is greater than a given threshold Will write into the kernel tracing buffer -b USEC --breaktrace=USEC send “breaktrace” command when latency > USEC --tracemark write a “tracemark” when -b latency is exceeded

Break when greater than 100 microseconds ~ # cyclictest -p80 -i250 -a -t -m -d 0 -b 100 --tracemark # /dev/cpu_dma_latency set to 0us INFO: debugfs mountpoint: /sys/kernel/debug/tracing/ policy: fifo: loadavg: 0.00 0.00 0.00 1/9998 68073 T: 0 (68066) P:80 I:250 C: 91103 Min: 3 Act: 3 Avg: 3 Max: 6 T: 1 (68067) P:80 I:250 C: 91103 Min: 3 Act: 3 Avg: 3 Max: 7 T: 2 (68068) P:80 I:250 C: 91102 Min: 3 Act: 3 Avg: 3 Max: 10 T: 3 (68069) P:80 I:250 C: 91101 Min: 3 Act: 3 Avg: 3 Max: 8 T: 4 (68070) P:80 I:250 C: 91101 Min: 3 Act: 3 Avg: 3 Max: 9 T: 5 (68071) P:80 I:250 C: 91100 Min: 3 Act: 3 Avg: 3 Max: 11 T: 6 (68072) P:80 I:250 C: 91100 Min: 3 Act: 3 Avg: 3 Max: 47 # Thread Ids: 68066 68067 68068 68069 68070 68071 68072 # Break thread: 68066 # Break value: 561

Show the trace output with trace-cmd ~ # trace-cmd show # tracer: nop # # entries-in-buffer/entries-written: 1/1 #P:8 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | <...>-68066 [000] ..... 816424.679452: tracing_mark_write: hit latency threshold (561 > 100)

Show the trace output with trace-cmd ~# trace-cmd show # tracer: nop # # entries-in-buffer/entries-written: 1/1 #P:8 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | <...>-68066 [000] ..... 816424.679452: tracing_mark_write: hit latency threshold (561 > 100) 561 us latency

Tracing cyclictest ~ # trace-cmd start -e sched_switch -e sched_waking -e timer -e irq -e irq_vectors \ cyclictest -p80 -i250 -a -t -m -d 0 -b 100 --tracemark

Tracing cyclictest ~ # trace-cmd start -e sched_switch -e sched_waking -e timer -e irq -e irq_vectors \ cyclictest -p80 -i250 -a -t -m -d 0 -b 100 --tracemark Scheduling events

Tracing cyclictest ~ # trace-cmd start -e sched_switch -e sched_waking -e timer -e irq -e irq_vectors \ cyclictest -p80 -i250 -a -t -m -d 0 -b 100 --tracemark Timer events

Tracing cyclictest ~ # trace-cmd start -e sched_switch -e sched_waking -e timer -e irq -e irq_vectors \ cyclictest -p80 -i250 -a -t -m -d 0 -b 100 --tracemark Interrupt events

Tracing cyclictest ~ # trace-cmd start -e sched_switch -e sched_waking -e timer -e irq -e irq_vectors \ cyclictest -p80 -i250 -a -t -m -d 0 -b 100 --tracemark # /dev/cpu_dma_latency set to 0us INFO: debugfs mountpoint: /sys/kernel/debug/tracing/ policy: fifo: loadavg: 0.03 0.01 0.00 1/9998 68166 T: 0 (68160) P:80 I:250 C: 226450 Min: 4 Act: 5 Avg: 4 Max: 8 T: 1 (68161) P:80 I:250 C: 226449 Min: 4 Act: 5 Avg: 4 Max: 52 T: 2 (68162) P:80 I:250 C: 226448 Min: 4 Act: 5 Avg: 4 Max: 11 T: 3 (68163) P:80 I:250 C: 226448 Min: 4 Act: 5 Avg: 4 Max: 10 T: 4 (68164) P:80 I:250 C: 226448 Min: 4 Act: 5 Avg: 4 Max: 12 T: 5 (68165) P:80 I:250 C: 226447 Min: 4 Act: 5 Avg: 5 Max: 48 T: 6 (68166) P:80 I:250 C: 226446 Min: 4 Act: 4 Avg: 4 Max: 11 # Thread Ids: 68160 68161 68162 68163 68164 68165 68166 # Break thread: 68161 # Break value: 121

Tracing cyclictest ~ # trace-cmd show | tail -20 <idle>-0 [006] d.h1. 818035.615492: hrtimer_expire_entry: hrtimer=00000000d57f52c5 function=hrtimer_wakeup now=818051328925800 <idle>-0 [006] d.h3. 818035.615492: sched_waking: comm=cyclictest pid=68165 prio=19 target_cpu=006 <idle>-0 [006] dNh1. 818035.615493: hrtimer_expire_exit: hrtimer=00000000d57f52c5 <idle>-0 [006] dNh1. 818035.615493: local_timer_exit: vector=236 <idle>-0 [006] d..2. 818035.615495: sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=cyclictest next_pid=68165 next_prio=19 cyclictest-68165 [006] ..... 818035.615496: hrtimer_init: hrtimer=00000000d57f52c5 clockid=CLOCK_MONOTONIC mode=ABS cyclictest-68165 [006] d..1. 818035.615496: hrtimer_start: hrtimer=00000000d57f52c5 function=hrtimer_wakeup expires=818051329174969 softexpires=818051329174969 mode=ABS cyclictest-68165 [006] d..2. 818035.615500: sched_switch: prev_comm=cyclictest prev_pid=68165 prev_prio=19 prev_state=S ==> next_comm=swapper/6 next_pid=0 next_prio=120 <idle>-0 [000] d.h1. 818035.615509: local_timer_entry: vector=236 <idle>-0 [000] d.h2. 818035.615509: hrtimer_cancel: hrtimer=00000000b12dfc3a <idle>-0 [000] d.h1. 818035.615509: hrtimer_expire_entry: hrtimer=00000000b12dfc3a function=hrtimer_wakeup now=818051328943482 <idle>-0 [000] d.h3. 818035.615509: sched_waking: comm=cyclictest pid=68160 prio=19 target_cpu=000 <idle>-0 [000] dNh1. 818035.615511: hrtimer_expire_exit: hrtimer=00000000b12dfc3a <idle>-0 [000] dNh1. 818035.615511: local_timer_exit: vector=236 <idle>-0 [000] d..2. 818035.615513: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=cyclictest next_pid=68160 next_prio=19 cyclictest-68160 [000] ..... 818035.615514: hrtimer_init: hrtimer=00000000b12dfc3a clockid=CLOCK_MONOTONIC mode=ABS cyclictest-68160 [000] d..1. 818035.615515: hrtimer_start: hrtimer=00000000b12dfc3a function=hrtimer_wakeup expires=818051329192610 softexpires=818051329192610 mode=ABS cyclictest-68160 [000] d..2. 818035.615519: sched_switch: prev_comm=cyclictest prev_pid=68160 prev_prio=19 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 sshd-51820 [001] d..2. 818035.615536: sched_switch: prev_comm=sshd prev_pid=51820 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=68161 next_prio=19 cyclictest-68161 [001] ..... 818035.615546: tracing_mark_write: hit latency threshold (121 > 100)

Tracing cyclictest ~ # trace-cmd export -o trace-cyclic.dat

Tracing cyclictest ~# trace-cmd export -o trace-cyclic.dat ~ # trace-cmd report -l trace-cyclic.dat | tail -20 <idle>-0 6d.h1. 818035.615492: hrtimer_expire_entry: hrtimer=0xffffa2c26d5ffe70 now=818051328925800 function=hrtimer_wakeup/0x0 <idle>-0 6d.h3. 818035.615492: sched_waking: comm=cyclictest pid=68165 prio=19 target_cpu=006 <idle>-0 6dNh1. 818035.615493: hrtimer_expire_exit: hrtimer=0xffffa2c26d5ffe70 <idle>-0 6dNh1. 818035.615493: local_timer_exit: vector=236 <idle>-0 6d..2. 818035.615495: sched_switch: swapper/6:0 [120] R ==> cyclictest:68165 [19] cyclicte-68165 6..... 818035.615496: hrtimer_init: hrtimer=0xffffa2c26d5ffe70 clockid=CLOCK_MONOTONIC mode=0x0 cyclicte-68165 6d..1. 818035.615496: hrtimer_start: hrtimer=0xffffa2c26d5ffe70 function=hrtimer_wakeup/0x0 expires=818051329174969 softexpires=818051329174969 cyclicte-68165 6d..2. 818035.615500: sched_switch: cyclictest:68165 [19] S ==> swapper/6:0 [120] <idle>-0 0d.h1. 818035.615509: local_timer_entry: vector=236 <idle>-0 0d.h2. 818035.615509: hrtimer_cancel: hrtimer=0xffffa2c26d5efe70 <idle>-0 0d.h1. 818035.615509: hrtimer_expire_entry: hrtimer=0xffffa2c26d5efe70 now=818051328943482 function=hrtimer_wakeup/0x0 <idle>-0 0d.h3. 818035.615509: sched_waking: comm=cyclictest pid=68160 prio=19 target_cpu=000 <idle>-0 0dNh1. 818035.615511: hrtimer_expire_exit: hrtimer=0xffffa2c26d5efe70 <idle>-0 0dNh1. 818035.615511: local_timer_exit: vector=236 <idle>-0 0d..2. 818035.615513: sched_switch: swapper/0:0 [120] R ==> cyclictest:68160 [19] cyclicte-68160 0..... 818035.615514: hrtimer_init: hrtimer=0xffffa2c26d5efe70 clockid=CLOCK_MONOTONIC mode=0x0 cyclicte-68160 0d..1. 81803 5.615515: hrtimer_start: hrtimer=0xffffa2c26d5efe70 function=hrtimer_wakeup/0x0 expires=818051329192610 softexpires=818051329192610 cyclicte-68160 0d..2. 818035.615519: sched_switch: cyclictest:68160 [19] S ==> swapper/0:0 [120] sshd-51820 1d..2. 818035.615536: sched_switch: sshd:51820 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615546: print: tracing_mark_write: hit latency threshold (121 > 100)

Tracing cyclictest ~# trace-cmd export -o trace-cyclic.dat ~ # trace-cmd report -l trace-cyclic.dat | tail -20 <idle>-0 6d.h1. 818035.615492: hrtimer_expire_entry: hrtimer=0xffffa2c26d5ffe70 now=818051328925800 function=hrtimer_wakeup/0x0 <idle>-0 6d.h3. 818035.615492: sched_waking: comm=cyclictest pid=68165 prio=19 target_cpu=006 <idle>-0 6dNh1. 818035.615493: hrtimer_expire_exit: hrtimer=0xffffa2c26d5ffe70 <idle>-0 6dNh1. 818035.615493: local_timer_exit: vector=236 <idle>-0 6d..2. 818035.615495: sched_switch: swapper/6:0 [120] R ==> cyclictest:68165 [19] cyclicte-68165 6..... 818035.615496: hrtimer_init: hrtimer=0xffffa2c26d5ffe70 clockid=CLOCK_MONOTONIC mode=0x0 cyclicte-68165 6d..1. 818035.615496: hrtimer_start: hrtimer=0xffffa2c26d5ffe70 function=hrtimer_wakeup/0x0 expires=818051329174969 softexpires=818051329174969 cyclicte-68165 6d..2. 818035.615500: sched_switch: cyclictest:68165 [19] S ==> swapper/6:0 [120] <idle>-0 0d.h1. 818035.615509: local_timer_entry: vector=236 <idle>-0 0d.h2. 818035.615509: hrtimer_cancel: hrtimer=0xffffa2c26d5efe70 <idle>-0 0d.h1. 818035.615509: hrtimer_expire_entry: hrtimer=0xffffa2c26d5efe70 now=818051328943482 function=hrtimer_wakeup/0x0 <idle>-0 0d.h3. 818035.615509: sched_waking: comm=cyclictest pid=68160 prio=19 target_cpu=000 <idle>-0 0dNh1. 818035.615511: hrtimer_expire_exit: hrtimer=0xffffa2c26d5efe70 <idle>-0 0dNh1. 818035.615511: local_timer_exit: vector=236 <idle>-0 0d..2. 818035.615513: sched_switch: swapper/0:0 [120] R ==> cyclictest:68160 [19] cyclicte-68160 0..... 818035.615514: hrtimer_init: hrtimer=0xffffa2c26d5efe70 clockid=CLOCK_MONOTONIC mode=0x0 cyclicte-68160 0d..1. 818035.615515: hrtimer_start: hrtimer=0xffffa2c26d5efe70 function=hrtimer_wakeup/0x0 expires=818051329192610 softexpires=818051329192610 cyclicte-68160 0d..2. 818035.615519: sched_switch: cyclictest:68160 [19] S ==> swapper/0:0 [120] sshd-51820 1d..2. 818035.615536: sched_switch: sshd:51820 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615546: print: tracing_mark_write: hit latency threshold (121 > 100) cyclictest marker

Tracing cyclictest ~# trace-cmd export -o trace-cyclic.dat ~ # trace-cmd report -l trace-cyclic.dat | tail -20 <idle>-0 6d.h1. 818035.615492: hrtimer_expire_entry: hrtimer=0xffffa2c26d5ffe70 now=818051328925800 function=hrtimer_wakeup/0x0 <idle>-0 6d.h3. 818035.615492: sched_waking: comm=cyclictest pid=68165 prio=19 target_cpu=006 <idle>-0 6dNh1. 818035.615493: hrtimer_expire_exit: hrtimer=0xffffa2c26d5ffe70 <idle>-0 6dNh1. 818035.615493: local_timer_exit: vector=236 <idle>-0 6d..2. 818035.615495: sched_switch: swapper/6:0 [120] R ==> cyclictest:68165 [19] cyclicte-68165 6..... 818035.615496: hrtimer_init: hrtimer=0xffffa2c26d5ffe70 clockid=CLOCK_MONOTONIC mode=0x0 cyclicte-68165 6d..1. 818035.615496: hrtimer_start: hrtimer=0xffffa2c26d5ffe70 function=hrtimer_wakeup/0x0 expires=818051329174969 softexpires=818051329174969 cyclicte-68165 6d..2. 818035.615500: sched_switch: cyclictest:68165 [19] S ==> swapper/6:0 [120] <idle>-0 0d.h1. 818035.615509: local_timer_entry: vector=236 <idle>-0 0d.h2. 818035.615509: hrtimer_cancel: hrtimer=0xffffa2c26d5efe70 <idle>-0 0d.h1. 818035.615509: hrtimer_expire_entry: hrtimer=0xffffa2c26d5efe70 now=818051328943482 function=hrtimer_wakeup/0x0 <idle>-0 0d.h3. 818035.615509: sched_waking: comm=cyclictest pid=68160 prio=19 target_cpu=000 <idle>-0 0dNh1. 818035.615511: hrtimer_expire_exit: hrtimer=0xffffa2c26d5efe70 <idle>-0 0dNh1. 818035.615511: local_timer_exit: vector=236 <idle>-0 0d..2. 818035.615513: sched_switch: swapper/0:0 [120] R ==> cyclictest:68160 [19] cyclicte-68160 0..... 818035.615514: hrtimer_init: hrtimer=0xffffa2c26d5efe70 clockid=CLOCK_MONOTONIC mode=0x0 cyclicte-68160 0d..1. 818035.615515: hrtimer_start: hrtimer=0xffffa2c26d5efe70 function=hrtimer_wakeup/0x0 expires=818051329192610 softexpires=818051329192610 cyclicte-68160 0d..2. 818035.615519: sched_switch: cyclictest:68160 [19] S ==> swapper/0:0 [120] sshd-51820 1d..2. 818035.615536: sched_switch: sshd:51820 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1 ..... 818035.615546: print: tracing_mark_write: hit latency threshold (121 > 100) CPU 1

Tracing cyclictest ># trace-cmd report -l --cpu 1 trace-cyclic.dat | tail -20 cyclicte-68161 1d..2. 818035.614923: sched_switch: cyclictest:68161 [19] S ==> swapper/1:0 [120] <idle>-0 1d.h2. 818035.615166: local_timer_entry: vector=236 <idle>-0 1d.h3. 818035.615166: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70 <idle>-0 1d.h2. 818035.615166: hrtimer_expire_entry: hrtimer=0xffffa2c26d5f7e70 now=818051328600710 function=hrtimer_wakeup/0x0 <idle>-0 1d.h4. 818035.615167: sched_waking: comm=cyclictest pid=68161 prio=19 target_cpu=001 <idle>-0 1dNh2. 818035.615168: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70 <idle>-0 1dNh2. 818035.615168: local_timer_exit: vector=236 <idle>-0 1d..2. 818035.615170: sched_switch: swapper/1:0 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615171: hrtimer_init: hrtimer=0xffffa2c26d5f7e70 clockid=CLOCK_MONOTONIC mode=0x0 cyclicte-68161 1d..1. 818035.615171: hrtimer_start: hrtimer=0xffffa2c26d5f7e70 function=hrtimer_wakeup/0x0 expires=818051328849935 softexpires=818051328849935 cyclicte-68161 1d..2. 818035.615173: sched_switch: cyclictest:68161 [19] S ==> swapper/1:0 [120] <idle>-0 1d..2. 818035.615339: sched_switch: swapper/1:0 [120] R ==> sshd:51820 [120] sshd-51820 1d.h1. 818035.615416: local_timer_entry: vector=236 sshd-51820 1d.h2. 818035.615416: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70 sshd-51820 1d.h1. 818035.615417: hrtimer_expire_entry: hrtimer=0xffffa2c26d5f7e70 now=818051328850788 function=hrtimer_wakeup/0x0 sshd-51820 1d.h3. 818035.615417: sched_waking: comm=cyclictest pid=68161 prio=19 target_cpu=001 sshd-51820 1dNh1. 818035.615419: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70 sshd-51820 1dNh1. 818035.615419: local_timer_exit: vector=236 sshd-51820 1d..2. 818035.615536: sched_switch: sshd:51820 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615546: print: tracing_mark_write: hit latency threshold (121 > 100)

Tracing cyclictest ># trace-cmd report -l --cpu 1 trace-cyclic.dat | tail -20 cyclicte-68161 1d..2. 818035.614923: sched_switch: cyclictest:68161 [19] S ==> swapper/1:0 [120] <idle>-0 1d.h2. 818035.615166: local_timer_entry: vector=236 <idle>-0 1d.h3. 818035.615166: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70 <idle>-0 1d.h2. 818035.615166: hrtimer_expire_entry: hrtimer=0xffffa2c26d5f7e70 now=818051328600710 function=hrtimer_wakeup/0x0 <idle>-0 1d.h4. 818035.615167: sched_waking: comm=cyclictest pid=68161 prio=19 target_cpu=001 <idle>-0 1dNh2. 818035.615168: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70 <idle>-0 1dNh2. 818035.615168: local_timer_exit: vector=236 <idle>-0 1d..2. 818035.615170: sched_switch: swapper/1:0 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615171: hrtimer_init: hrtimer=0xffffa2c26d5f7e70 clockid=CLOCK_MONOTONIC mode=0x0 cyclicte-68161 1d..1. 818035.615171: hrtimer_start: hrtimer=0xffffa2c26d5f7e70 function=hrtimer_wakeup/0x0 expires=818051328849935 softexpires=818051328849935 cyclicte-68161 1d..2. 818035.615173: sched_switch: cyclictest:68161 [19] S ==> swapper/1:0 [120] <idle>-0 1d..2. 818035.615339: sched_switch: swapper/1:0 [120] R ==> sshd:51820 [120] sshd-51820 1d.h1. 818035.615416: local_timer_entry: vector=236 sshd-51820 1d.h2. 818035.615416: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70 sshd-51820 1d.h1. 818035.615417: hrtimer_expire_entry: hrtimer=0xffffa2c26d5f7e70 now=818051328850788 function=hrtimer_wakeup/0x0 sshd-51820 1d.h3. 818035.615417: sched_waking: comm=cyclictest pid=68161 prio=19 target_cpu=001 sshd-51820 1dNh1. 818035.615419: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70 sshd-51820 1dNh1. 818035.615419: local_timer_exit: vector=236 sshd-51820 1d..2. 818035.615536: sched_switch: sshd:51820 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615546: print: tracing_mark_write: hit latency threshold (121 > 100) CPU 1

Tracing cyclictest ># trace-cmd report -l --cpu 1 trace-cyclic.dat | tail -20 cyclicte-68161 1d..2. 818035.614923: sched_switch: cyclictest:68161 [19] S ==> swapper/1:0 [120] <idle>-0 1d.h2. 818035.615166: local_timer_entry: vector=236 <idle>-0 1d.h3. 818035.615166: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70 <idle>-0 1d.h2. 818035.615166: hrtimer_expire_entry: hrtimer=0xffffa2c26d5f7e70 now=818051328600710 function=hrtimer_wakeup/0x0 <idle>-0 1d.h4. 818035.615167: sched_waking: comm=cyclictest pid=68161 prio=19 target_cpu=001 <idle>-0 1dNh2. 818035.615168: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70 <idle>-0 1dNh2. 818035.615168: local_timer_exit: vector=236 <idle>-0 1d..2. 818035.615170: sched_switch: swapper/1:0 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615171: hrtimer_init: hrtimer=0xffffa2c26d5f7e70 clockid=CLOCK_MONOTONIC mode=0x0 cyclicte-68161 1d..1. 818035.615171: hrtimer_start: hrtimer=0xffffa2c26d5f7e70 function=hrtimer_wakeup/0x0 expires=818051328849935 softexpires=818051328849935 cyclicte-68161 1d..2. 818035.615173: sched_switch: cyclictest:68161 [19] S ==> swapper/1:0 [120] <idle>-0 1d..2. 818035.615339: sched_switch: swapper/1:0 [120] R ==> sshd:51820 [120] sshd-51820 1d.h1. 818035.615416: local_timer_entry: vector=236 sshd-51820 1d.h2. 818035.615416: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70 sshd-51820 1d.h1. 818035.615417: hrtimer_expire_entry: hrtimer=0xffffa2c26d5f7e70 now=818051328850788 function=hrtimer_wakeup/0x0 sshd-51820 1d.h3. 818035.615417: sched_waking: comm=cyclictest pid=68161 prio=19 target_cpu=001 sshd-51820 1dNh1. 818035.615419: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70 sshd-51820 1dNh1. 818035.615419: local_timer_exit: vector=236 sshd-51820 1d..2. 818035.615536 : sched_switch: sshd:51820 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615546: print: tracing_mark_write: hit latency threshold (121 > 100) scheduled in

Tracing cyclictest ># trace-cmd report -l --cpu 1 trace-cyclic.dat | tail -20 cyclicte-68161 1d..2. 818035.614923: sched_switch: cyclictest:68161 [19] S ==> swapper/1:0 [120] <idle>-0 1d.h2. 818035.615166: local_timer_entry: vector=236 <idle>-0 1d.h3. 818035.615166: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70 <idle>-0 1d.h2. 818035.615166: hrtimer_expire_entry: hrtimer=0xffffa2c26d5f7e70 now=818051328600710 function=hrtimer_wakeup/0x0 <idle>-0 1d.h4. 818035.615167: sched_waking: comm=cyclictest pid=68161 prio=19 target_cpu=001 <idle>-0 1dNh2. 818035.615168: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70 <idle>-0 1dNh2. 818035.615168: local_timer_exit: vector=236 <idle>-0 1d..2. 818035.615170: sched_switch: swapper/1:0 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615171: hrtimer_init: hrtimer=0xffffa2c26d5f7e70 clockid=CLOCK_MONOTONIC mode=0x0 cyclicte-68161 1d..1. 818035.615171: hrtimer_start: hrtimer=0xffffa2c26d5f7e70 function=hrtimer_wakeup/0x0 expires=818051328849935 softexpires=818051328849935 cyclicte-68161 1d..2. 818035.615173: sched_switch: cyclictest:68161 [19] S ==> swapper/1:0 [120] <idle>-0 1d..2. 818035.615339: sched_switch: swapper/1:0 [120] R ==> sshd:51820 [120] sshd-51820 1d.h1. 818035.615416: local_timer_entry: vector=236 sshd-51820 1d.h2. 818035.615416: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70 sshd-51820 1d.h1. 818035.615417: hrtimer_expire_entry: hrtimer=0xffffa2c26d5f7e70 now=818051328850788 function=hrtimer_wakeup/0x0 sshd-51820 1d.h3. 818035.615417 : sched_waking : comm=cyclictest pid=68161 prio=19 target_cpu=001 sshd-51820 1dNh1. 818035.615419: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70 sshd-51820 1dNh1. 818035.615419: local_timer_exit: vector=236 sshd-51820 1d..2. 818035.615536: sched_switch: sshd:51820 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615546: print: tracing_mark_write: hit latency threshold (121 > 100) Woke up

># trace-cmd report -l --cpu 1 trace-cyclic.dat | tail -20 cyclicte-68161 1d..2. 818035.614923: sched_switch: cyclictest:68161 [19] S ==> swapper/1:0 [120] <idle>-0 1d.h2. 818035.615166: local_timer_entry: vector=236 <idle>-0 1d.h3. 818035.615166: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70 <idle>-0 1d.h2. 818035.615166: hrtimer_expire_entry: hrtimer=0xffffa2c26d5f7e70 now=818051328600710 function=hrtimer_wakeup/0x0 <idle>-0 1d.h4. 818035.615167: sched_waking: comm=cyclictest pid=68161 prio=19 target_cpu=001 <idle>-0 1dNh2. 818035.615168: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70 <idle>-0 1dNh2. 818035.615168: local_timer_exit: vector=236 <idle>-0 1d..2. 818035.615170: sched_switch: swapper/1:0 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615171: hrtimer_init: hrtimer=0xffffa2c26d5f7e70 clockid=CLOCK_MONOTONIC mode=0x0 cyclicte-68161 1d..1. 818035.615171: hrtimer_start: hrtimer= 0xffffa2c26d5f7e70 function=hrtimer_wakeup/0x0 expires=818051328849935 softexpires= 818051328849935 cyclicte-68161 1d..2. 818035.615173: sched_switch: cyclictest:68161 [19] S ==> swapper/1:0 [120] <idle>-0 1d..2. 818035.615339: sched_switch: swapper/1:0 [120] R ==> sshd:51820 [120] sshd-51820 1d.h1. 818035.615416: local_timer_entry: vector=236 sshd-51820 1d.h2. 818035.615416: hrtimer_cancel: hrtimer=0xffffa2c26d5f7e70 sshd-51820 1d.h1. 818035.615417: hrtimer_expire_entry: hrtimer= 0xffffa2c26d5f7e70 now= 818051328850788 function=hrtimer_wakeup/0x0 sshd-51820 1d.h3. 818035.615417: sched_waking: comm=cyclictest pid=68161 prio=19 target_cpu=001 sshd-51820 1dNh1. 818035.615419: hrtimer_expire_exit: hrtimer=0xffffa2c26d5f7e70 sshd-51820 1dNh1. 818035.615419: local_timer_exit: vector=236 sshd-51820 1d..2. 818035.615536: sched_switch: sshd:51820 [120] R ==> cyclictest:68161 [19] cyclicte-68161 1..... 818035.615546: print: tracing_mark_write: hit latency threshold (121 > 100) Tracing cyclictest Current time Time to wake up

l ibtracecmd to read the trace.dat file Automate the latency process

libtracecmd to read the trace.dat file Automate the latency process First search from the end of the trace Find the marker or flag that tells you where the problem happened

libtracecmd to read the trace.dat file Automate the latency process First search from the end of the trace Find the marker or flag that tells you where the problem happened Continue backwards looking for other events Search forward to collect timings

int main (int argc, char **argv) { struct tracecmd_input *handle ; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0) ; tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus(tep); cpu_size = CPU_ALLOC_SIZE(cpus); cpu_set = CPU_ALLOC(cpus); CPU_ZERO_S(cpu_size, cpu_set); CPU_SET_S(data.cpu, cpu_size, cpu_set); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true);

int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data ; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data (tep, & data ); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus(tep); cpu_size = CPU_ALLOC_SIZE(cpus); cpu_set = CPU_ALLOC(cpus); CPU_ZERO_S(cpu_size, cpu_set); CPU_SET_S(data.cpu, cpu_size, cpu_set); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true);

int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse ( handle , NULL, 0, find_trace_marker , & data , false ); printf(" cpu=%d threshold=%d latency=%d\n\n ", data . cpu , data . thresh , data . lat ); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus(tep); cpu_size = CPU_ALLOC_SIZE(cpus); cpu_set = CPU_ALLOC(cpus); CPU_ZERO_S(cpu_size, cpu_set); CPU_SET_S(data.cpu, cpu_size, cpu_set); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true);

static int find_trace_marker (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * marker ; static struct trace_seq seq ; struct data * d = data ; const char * str ; int thresh ; int type ; int lat ; marker = d -> marker ; type = tep_data_type ( tep , record ); if ( type != marker -> id ) return 0; /* Make sure that the print has the data we want */ if (! seq . buffer ) trace_seq_init (& seq ); tep_print_event ( tep , & seq , record , " %s ", TEP_PRINT_INFO ); trace_seq_terminate (& seq );

static int find_trace_marker (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * marker ; static struct trace_seq seq ; struct data * d = data ; const char * str ; int thresh ; int type ; int lat ; marker = d -> marker ; type = tep_data_type ( tep , record ); if ( type != marker -> id ) return 0; /* Make sure that the print has the data we want */ if (! seq . buffer ) trace_seq_init (& seq ); tep_print_event ( tep , & seq , record , " %s ", TEP_PRINT_INFO ); trace_seq_terminate (& seq ); Skip non-marker events

str = strstr( seq . buffer , " hit latency threshold "); if (! str ) { /* This is not the string you are looking for */ trace_seq_reset (& seq ); return 0; } sscanf( str , " hit latency threshold (%d > %d) ", & lat , & thresh ); d -> cpu = record -> cpu ; d -> lat = lat ; d -> thresh = thresh ; d -> pid = tep_data_pid ( tep , record ); d -> marker_time = record -> ts ; trace_seq_destroy (& seq ); seq . buffer = NULL; /* Stop the iterator, as we will now only search the current CPU */ return 1; } Parse the tracemarker

str = strstr( seq . buffer , " hit latency threshold "); if (! str ) { /* This is not the string you are looking for */ trace_seq_reset (& seq ); return 0; } sscanf( str , " hit latency threshold (%d > %d) ", & lat , & thresh ); d -> cpu = record -> cpu ; d -> lat = lat ; d -> thresh = thresh ; d -> pid = tep_data_pid ( tep , record ); d -> marker_time = record -> ts ; trace_seq_destroy (& seq ); seq . buffer = NULL; /* Stop the iterator, as we will now only search the current CPU */ return 1; } Record which CPU

int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus ( tep ); cpu_size = CPU_ALLOC_SIZE( cpus ); cpu_set = CPU_ALLOC( cpus ); CPU_ZERO_S( cpu_size , cpu_set ); CPU_SET_S( data . cpu , cpu_size , cpu_set ); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true);

int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus ( tep ); cpu_size = CPU_ALLOC_SIZE( cpus ); cpu_set = CPU_ALLOC( cpus ); CPU_ZERO_S( cpu_size , cpu_set ); CPU_SET_S( data . cpu , cpu_size , cpu_set ); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true); Only follow 1 CPU

int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus(tep); cpu_size = CPU_ALLOC_SIZE(cpus); cpu_set = CPU_ALLOC(cpus); CPU_ZERO_S(cpu_size, cpu_set); CPU_SET_S(data.cpu, cpu_size, cpu_set); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse ( handle , cpu_set , cpu_size , find_sched , & data , true ); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true); Look for sched_switch

static int find_sched (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * sched ; unsigned long long val ; struct data *d = data ; int type ; sched = d -> sched ; type = tep_data_type ( tep , record ); if ( type != sched -> id ) return 0; tep_read_number_field ( d -> next_pid , record -> data , & val ); d -> sched_time = record -> ts ; return -1; }

static int find_sched (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * sched ; unsigned long long val ; struct data *d = data ; int type ; sched = d -> sched ; type = tep_data_type ( tep , record ); if ( type != sched -> id ) return 0; tep_read_number_field ( d -> next_pid , record -> data , & val ); d -> sched_time = record -> ts ; return -1; } Exit if not sched_switch

static int find_sched (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * sched ; unsigned long long val ; struct data *d = data ; int type ; sched = d -> sched ; type = tep_data_type ( tep , record ); if ( type != sched -> id ) return 0; tep_read_number_field ( d -> next_pid , record -> data , & val ); d -> sched_time = record -> ts ; return -1; } Record sched_switch time

int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus(tep); cpu_size = CPU_ALLOC_SIZE(cpus); cpu_set = CPU_ALLOC(cpus); CPU_ZERO_S(cpu_size, cpu_set); CPU_SET_S(data.cpu, cpu_size, cpu_set); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest was woken up */ tracecmd_iterate_events_reverse ( handle , cpu_set , cpu_size , find_wakeup , & data , true ); Look for sched_waking

static int find_wakeup (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_format_field * waking_pid ; struct tep_event * waking ; unsigned long long val ; struct data * d = data ; int flags ; int type ; waking = d -> waking ; waking_pid = d -> waking_pid ; type = tep_data_type ( tep , record ); if ( type != waking -> id ) return 0; tep_read_number_field ( waking_pid , record -> data , & val ); if ( val != d -> pid ) return 0; /* Found the wakeup! */ d -> wakeup_time = record -> ts ; return -1; } Record wake up time

int main (int argc, char **argv) { [..] /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse ( handle , cpu_set , cpu_size , find_timer_expire , & data , true ); /* Find where the timer was added (the start of this cycle */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_start, &data, true); printf("expected expire: %lld\n", data.timer_expires_expect); printf("timer expired: %lld\n", data.timer_expire); printf(" jitter: (%lld)\n", data.timer_expire - data.timer_expires_expect); printf("wake up time: %lld\n", data.wakeup_time); printf(" from timer: (%lld)\n", data.wakeup_time - data.timer_expire); printf("scehduled time: %lld\n", data.sched_time); printf(" from wakeup: (%lld)\n", data.sched_time - data.wakeup_time); printf("marker time: %lld\n", data.marker_time); printf(" from schedule:(%lld)\n", data.marker_time - data.sched_time); tracecmd_close(handle); return 0; } Look for timer expire event

static int find_timer_expire (struct tracecmd_input * handle , struct tep_record * record , int cpu, void *data) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * timer ; unsigned long long val ; struct data * d = data ; int type ; timer = d -> hrtimer_expire ; type = tep_data_type ( tep , record ); if ( type != timer -> id ) return 0; tep_read_number_field ( d -> hrtimer_expire_ptr , record -> data , & val ); d -> hrtimer = val ; tep_read_number_field ( d -> timer_now , record -> data , & val ); d -> timer_expire = record -> ts ; d -> timer_delta = val - record -> ts ; return -1; }

static int find_timer_expire (struct tracecmd_input * handle , struct tep_record * record , int cpu, void *data) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * timer ; unsigned long long val ; struct data * d = data ; int type ; timer = d -> hrtimer_expire ; type = tep_data_type ( tep , record ); if ( type != timer -> id ) return 0; tep_read_number_field ( d -> hrtimer_expire_ptr , record -> data , & val ); d -> hrtimer = val ; tep_read_number_field ( d -> timer_now , record -> data , & val ); d -> timer_expire = record -> ts ; d -> timer_delta = val - record -> ts ; return -1; } Calculate timer event time to ring buffer time

int main (int argc, char **argv) { [..] /* Find where cyclictest was woken up */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_wakeup, &data, true); /* Find where the timer was added (the start of this cycle */ tracecmd_iterate_events_reverse ( handle , cpu_set , cpu_size , find_timer_start , & data , true ); printf("expected expire: %lld\n", data.timer_expires_expect); printf("timer expired: %lld\n", data.timer_expire); printf(" jitter: (%lld)\n", data.timer_expire - data.timer_expires_expect); printf("wake up time: %lld\n", data.wakeup_time); printf(" from timer: (%lld)\n", data.wakeup_time - data.timer_expire); printf("scehduled time: %lld\n", data.sched_time); printf(" from wakeup: (%lld)\n", data.sched_time - data.wakeup_time); printf("marker time: %lld\n", data.marker_time); printf(" from schedule:(%lld)\n", data.marker_time - data.sched_time); tracecmd_close(handle); return 0; } Look for timer start event

static int find_timer_start (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * timer ; unsigned long long val ; struct data * d = data ; int type ; timer = d -> hrtimer_start ; type = tep_data_type ( tep , record ); if ( type != timer -> id ) return 0; tep_read_number_field ( d -> hrtimer_start_ptr , record -> data , & val ); if ( val != d -> hrtimer ) return 0; tep_read_number_field ( d -> hrtimer_start_expires , record -> data , & val ); d -> timer_expires_expect = val - d -> timer_delta ; d -> timer_start = record -> ts ; return -1; }

static int find_timer_start (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * timer ; unsigned long long val ; struct data * d = data ; int type ; timer = d -> hrtimer_start ; type = tep_data_type ( tep , record ); if ( type != timer -> id ) return 0; tep_read_number_field ( d -> hrtimer_start_ptr , record -> data , & val ); if ( val != d -> hrtimer ) return 0; tep_read_number_field ( d -> hrtimer_start_expires , record -> data , & val ); d -> timer_expires_expect = val - d -> timer_delta ; d -> timer_start = record -> ts ; return -1; } Calculate expected time vs actual time

int main (int argc, char **argv) { [..] /* Find where cyclictest was woken up */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_wakeup, &data, true); /* Find where the timer was added (the start of this cycle */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_start, &data, true); printf(" expected expire: %lld\n ", data . timer_expires_expect ); printf(" timer expired: %lld\n ", data . timer_expire ); printf(" jitter: (%lld)\n ", data . timer_expire - data . timer_expires_expect ); printf(" wake up time: %lld\n ", data . wakeup_time ); printf(" from timer: (%lld)\n ", data . wakeup_time - data . timer_expire ); printf(" scehduled time: %lld\n ", data . sched_time ); printf(" from wakeup: (%lld)\n ", data . sched_time - data . wakeup_time ); printf(" marker time: %lld\n ", data . marker_time ); printf(" from schedule:(%lld)\n ", data . marker_time - data . sched_time ); tracecmd_close(handle); return 0; }

Now analyze the trace.dat file ~# trace-cyclic trace-cyclic.dat cpu=1 threshold=100 latency=121 expected expire: 818035615415764 timer expired: 818035615416617 jitter: (853) wake up time: 818035615416993 from timer: (376) scehduled time: 818035615535531 from wakeup: (118538) marker time: 818035615546097 from schedule:(10566) Time in nanoseconds

Now analyze the trace.dat file ~# trace-cyclic trace-cyclic.dat cpu=1 threshold=100 latency=121 expected expire: 818035615415764 timer expired: 818035615416617 jitter: (853) wake up time: 818035615416993 from timer: (376) scehduled time: 818035615535531 from wakeup: ( 118538 ) marker time: 818035615546097 from schedule:(10566) Problem here!

Can analyze any events you want This is just a small example of what you can do

Can analyze any events you want This is just a small example of what you can do Let’s look at locks

Can analyze any events you want This is just a small example of what you can do Let’s look at locks Using function tracing, trace all locking functions function tracing gives you the parent function too Can see where a lock was called

Can analyze any events you want This is just a small example of what you can do Let’s look at locks Using function tracing, trace all locking functions function tracing gives you the parent function too Can see where a lock was called Trace both the lock and unlock functions Can get the latency of how long they are held

Function trace all lock functions ># trace-cmd start -e sched_switch -e sched_waking -e timer -e irq -e irq_vectors \ -p function -l '*spin_lock*' -l '*spin_unlock*' -l '*spin_try*' -l '*read_lock*' \ -l '*read_unlock*' -l '*read_try*' -l '*write_lock*' -l '*write_try*' -l '*write_unlock' \ cyclictest -p80 -i250 -a -t -m -d 0 -b 100 --tracemark plugin 'function' # /dev/cpu_dma_latency set to 0us INFO: debugfs mountpoint: /sys/kernel/debug/tracing/ policy: fifo: loadavg: 0.07 0.02 0.00 1/9997 69853 T: 0 (69847) P:80 I:250 C: 27298 Min: 6 Act: 6 Avg: 6 Max: 29 T: 1 (69848) P:80 I:250 C: 27296 Min: 6 Act: 6 Avg: 6 Max: 13 T: 2 (69849) P:80 I:250 C: 27295 Min: 6 Act: 6 Avg: 6 Max: 25 T: 3 (69850) P:80 I:250 C: 27294 Min: 6 Act: 6 Avg: 6 Max: 19 T: 4 (69851) P:80 I:250 C: 27293 Min: 6 Act: 6 Avg: 6 Max: 37 T: 5 (69852) P:80 I:250 C: 27291 Min: 6 Act: 9 Avg: 6 Max: 17 T: 6 (69853) P:80 I:250 C: 27291 Min: 6 Act: 6 Avg: 6 Max: 17 # Thread Ids: 69847 69848 69849 69850 69851 69852 69853 # Break thread: 69850 # Break value: 102

int main (int argc, char **argv) { [..] /* Find where cyclictest was woken up */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_wakeup, &data, true); /* Find where the timer was added (the start of this cycle */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_start, &data, true); /* Now that we know all the times, look for the locks (going forward now) */ tracecmd_iterate_events ( handle , cpu_set , cpu_size , find_locks , & data ); printf("expected expire: %lld\n", data.timer_expires_expect); printf("timer expired: %lld\n", data.timer_expire); printf(" jitter: (%lld)\n", data.timer_expire - data.timer_expires_expect); printf("wake up time: %lld\n", data.wakeup_time); printf(" from timer: (%lld)\n", data.wakeup_time - data.timer_expire); printf("scehduled time: %lld\n", data.sched_time); printf(" from wakeup: (%lld)\n", data.sched_time - data.wakeup_time); printf("marker time: %lld\n", data.marker_time); printf(" from schedule:(%lld)\n", data.marker_time - data.sched_time);

int main (int argc, char **argv) { [..] /* Find where cyclictest was woken up */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_wakeup, &data, true); /* Find where the timer was added (the start of this cycle */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_start, &data, true); /* Now that we know all the times, look for the locks (going forward now) */ tracecmd_iterate_events ( handle , cpu_set , cpu_size , find_locks , & data ); printf("expected expire: %lld\n", data.timer_expires_expect); printf("timer expired: %lld\n", data.timer_expire); printf(" jitter: (%lld)\n", data.timer_expire - data.timer_expires_expect); printf("wake up time: %lld\n", data.wakeup_time); printf(" from timer: (%lld)\n", data.wakeup_time - data.timer_expire); printf("scehduled time: %lld\n", data.sched_time); printf(" from wakeup: (%lld)\n", data.sched_time - data.wakeup_time); printf("marker time: %lld\n", data.marker_time); printf(" from schedule:(%lld)\n", data.marker_time - data.sched_time); Go forward and find all locks

static int find_locks (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct data * d = data ; enum data_state state ; if ( record -> ts == d -> marker_time ) return -1; if ( record -> ts < d -> timer_expires_expect ) state = STATE_START ; else if ( record -> ts < d - > timer_expire ) state = STATE_EXPECT ; else if ( record -> ts < d -> wakeup_time ) state = STATE_EXPIRED ; else if ( record -> ts < d -> sched_time ) state = STATE_WAKEUP ; else state = STATE_SCHED ; handle_locks ( tep , record , data , state ); return 0; } Use saved times to know where the lock happened

static int handle_locks (struct tep_handle * tep , struct tep_record * record , struct data *data, enum data_state state ) { struct lock_stack * lock ; struct lock_list * llist ; unsigned long long pip , ip ; const char * func ; bool start = true ; int type ; int id ; switch ( state ) { case STATE_START : llist = & data -> locks_start ; break; case STATE_EXPECT : llist = & data -> locks_start_expire ; break; case STATE_EXPIRED : llist = & data -> locks_timer ; break; case STATE_WAKEUP : llist = & data -> locks_wakeup ; break; case STATE_SCHED : llist = & data -> locks_sched ; break; } type = tep_data_type ( tep , record ); if (! data -> function || data -> function -> id != type ) return 0; Save lock information with associated state

tep_read_number_field ( data -> function_ip , record -> data , & ip ); tep_read_number_field ( data -> function_pip , record -> data , & pip ); func = tep_find_function ( tep , ip ); id = find_start_lock ( func ); if ( id < 0) { id = find_stop_lock ( func ); if ( id < 0) return 1; start = false; } if ( start ) { push_lock ( data , id , pip , record -> ts , func ); return 1; } /* Pop the locks until we find our id */ lock = NULL; do { if ( lock ) free( lock ); lock = pop_stack ( data ); } while ( lock && lock -> id != id ); if (! lock ) return 1; /* Add the total time of this lock */ end_timer (& llist -> list , pip , record -> ts , lock -> name ); start_timer (& llist -> list , pip , lock -> time , NULL); free( lock ); return 1; } Calculate how long lock was held

static const char * locks [] = { " spin_ ", " raw_spin_ ", " _raw_spin_ ", " __raw_spin_ ", " read_ ", " raw_read_ ", " _raw_read_ ", " __raw_read_ ", " write_ ", " raw_write_ ", " _raw_write_ ", " __raw_write_ ", }; static int find_start_lock (const char * func ) { const char * lock ; int len ; int i; for (i = 0; i < ARRAY_SIZE( locks ); i++) { lock = locks [i]; len = strlen( lock ); if (strncmp( lock , func , len ) != 0) continue; if (strncmp( func + len , " lock ", 4) == 0 || strncmp( func + len , " try ", 3) == 0) return i; break; } return -1; } All known lock types

static const char * locks [] = { " spin_ ", " raw_spin_ ", " _raw_spin_ ", " __raw_spin_ ", " read_ ", " raw_read_ ", " _raw_read_ ", " __raw_read_ ", " write_ ", " raw_write_ ", " _raw_write_ ", " __raw_write_ ", }; static int find_start_lock (const char * func ) { const char * lock ; int len ; int i; for (i = 0; i < ARRAY_SIZE( locks ); i++) { lock = locks [i]; len = strlen( lock ); if (strncmp( lock , func , len ) != 0) continue; if (strncmp( func + len , " lock ", 4) == 0 || strncmp( func + len , " try ", 3) == 0) return i; break; } return -1; } See if it is a lock (not unlock)

int main (int argc, char **argv) { [..] tracecmd_iterate_events(handle, cpu_set, cpu_size, find_locks, &data); printf("expected expire: %lld\n", data.timer_expires_expect); printf("timer expired: %lld\n", data.timer_expire); printf(" jitter: (%lld)\n", data.timer_expire - data.timer_expires_expect); printf("wake up time: %lld\n", data.wakeup_time); printf(" from timer: (%lld)\n", data.wakeup_time - data.timer_expire); printf("scehduled time: %lld\n", data.sched_time); printf(" from wakeup: (%lld)\n", data.sched_time - data.wakeup_time); printf("marker time: %lld\n", data.marker_time); printf(" from schedule:(%lld)\n", data.marker_time - data.sched_time); print_lock_list ( tep , " Locks taken and released from start to expected time: ", & data . locks_start ); print_lock_list ( tep , " Locks held between expected time and timer :", & data . locks_start_expire ); print_lock_list ( tep , " Locks taken from timer to wakeup :", & data . locks_timer ); print_lock_list ( tep , " Locks taken from wake up to sched: ", & data . locks_wakeup ); print_lock_list ( tep , " Locks taken from sched to print: ", & data . locks_sched );

static void print_lock_list (struct tep_handle * tep , const char * msg , struct lock_list * llist ) { struct time_list * list = llist -> list ; bool printed = false; for (; list ; list = list -> next ) { if (! list -> total ) continue; if (! printed ) { printf(" \n%s\n ", msg ); printed = true; } print_lock( tep , list -> id ); printf(" total time: %lld (%d time%s) [%s]\n ", list -> total , list -> cnt , list -> cnt != 1 ? "s" : "", list -> name ); } }

Analyze the locks in the trace ~ # trace-cyclic trace-cyclic.dat cpu=4 threshold=100 latency=102 expected expire: 850947083216603 timer expired: 850947083217641 jitter: (1038) wake up time: 850947083217999 from timer: (358) scehduled time: 850947083316970 from wakeup: (98971) marker time: 850947083328710 from schedule:(11740) Locks taken and released from start to expected time: do_nanosleep+0x5f total time: 610 (1 time) [_raw_spin_lock_irqsave] dequeue_task_rt+0x28 total time: 180 (1 time) [_raw_spin_lock] finish_task_switch.isra.0+0x9b total time: 1450 (2 times) [_raw_spin_lock] get_next_timer_interrupt+0x7b total time: 155 (7 times) [_raw_spin_lock] hrtimer_get_next_event+0x47 total time: 140 (7 times) [_raw_spin_lock_irqsave] hrtimer_next_event_without+0x67 total time: 161 (7 times) [_raw_spin_lock_irqsave] sched_ttwu_pending+0xed total time: 1522 (1 time) [_raw_spin_lock] poll_freewait+0x3d total time: 182 (7 times) [_raw_spin_lock_irqsave] [..]

Analyze the locks in the trace ~ # trace-cyclic trace-cyclic.dat cpu=4 threshold=100 latency=102 expected expire: 850947083216603 timer expired: 850947083217641 jitter: (1038) wake up time: 850947083217999 from timer: (358) scehduled time: 850947083316970 from wakeup: ( 98971 ) marker time: 850947083328710 from schedule:(11740) Locks taken and released from start to expected time: do_nanosleep+0x5f total time: 610 (1 time) [_raw_spin_lock_irqsave] dequeue_task_rt+0x28 total time: 180 (1 time) [_raw_spin_lock] finish_task_switch.isra.0+0x9b total time: 1450 (2 times) [_raw_spin_lock] get_next_timer_interrupt+0x7b total time: 155 (7 times) [_raw_spin_lock] hrtimer_get_next_event+0x47 total time: 140 (7 times) [_raw_spin_lock_irqsave] hrtimer_next_event_without+0x67 total time: 161 (7 times) [_raw_spin_lock_irqsave] sched_ttwu_pending+0xed total time: 1522 (1 time) [_raw_spin_lock] poll_freewait+0x3d total time: 182 (7 times) [_raw_spin_lock_irqsave] [..] Biggest difference

Analyze the locks in the trace sigprocmask+0x85 total time: 199 (4 times) [_raw_spin_lock_irq] n_tty_read+0x217 total time: 198 (2 times) [_raw_spin_lock_irqsave] n_tty_read+0x5d9 total time: 233 (2 times) [_raw_spin_lock_irqsave] try_to_wake_up+0x77 total time: 183 (1 time) [_raw_spin_lock_irqsave] __wake_up_common_lock+0x7e total time: 415 (2 times) [_raw_spin_lock_irqsave] tcp_poll+0x2b total time: 178 (1 time) [_raw_spin_lock_irqsave] tcp_sendmsg+0x19 total time: 231 (1 time) [_raw_spin_lock_bh] nf_conntrack_tcp_packet+0x8e4 total time: 447 (1 time) [_raw_spin_lock_bh] sch_direct_xmit+0x43 total time: 168 (1 time) [_raw_spin_lock] Locks held between expected time and timer: __hrtimer_run_queues+0x120 total time: 456 (1 time) [_raw_spin_lock_irqsave] Locks taken from wake up to sched: enqueue_task_rt+0x1e8 total time: 169 (1 time) [_raw_spin_lock] try_to_wake_up+0x246 total time: 1674 (1 time) [_raw_spin_lock] try_to_wake_up+0x251 total time: 2346 (1 time) [_raw_spin_lock_irqsave] hrtimer_interrupt+0x11d total time: 222 (1 time) [_raw_spin_lock_irq] sch_direct_xmit+0x13a total time: 168948 (1 time) [_raw_spin_lock] __dev_queue_xmit+0x7fe total time: 279 (1 time) [_raw_spin_lock] rcu_note_context_switch+0x386 total time: 237 (1 time) [_raw_spin_lock] Locks taken from sched to print: finish_task_switch.isra.0+0x9b total time: 2056 (1 time) [_raw_spin_lock]

Analyze the locks in the trace sigprocmask+0x85 total time: 199 (4 times) [_raw_spin_lock_irq] n_tty_read+0x217 total time: 198 (2 times) [_raw_spin_lock_irqsave] n_tty_read+0x5d9 total time: 233 (2 times) [_raw_spin_lock_irqsave] try_to_wake_up+0x77 total time: 183 (1 time) [_raw_spin_lock_irqsave] __wake_up_common_lock+0x7e total time: 415 (2 times) [_raw_spin_lock_irqsave] tcp_poll+0x2b total time: 178 (1 time) [_raw_spin_lock_irqsave] tcp_sendmsg+0x19 total time: 231 (1 time) [_raw_spin_lock_bh] nf_conntrack_tcp_packet+0x8e4 total time: 447 (1 time) [_raw_spin_lock_bh] sch_direct_xmit+0x43 total time: 168 (1 time) [_raw_spin_lock] Locks held between expected time and timer: __hrtimer_run_queues+0x120 total time: 456 (1 time) [_raw_spin_lock_irqsave] Locks taken from wake up to sched: enqueue_task_rt+0x1e8 total time: 169 (1 time) [_raw_spin_lock] try_to_wake_up+0x246 total time: 1674 (1 time) [_raw_spin_lock] try_to_wake_up+0x251 total time: 2346 (1 time) [_raw_spin_lock_irqsave] hrtimer_interrupt+0x11d total time: 222 (1 time) [_raw_spin_lock_irq] sch_direct_xmit+0x13a total time: 168948 (1 time) [_raw_spin_lock] __dev_queue_xmit+0x7fe total time: 279 (1 time) [_raw_spin_lock] rcu_note_context_switch+0x386 total time: 237 (1 time) [_raw_spin_lock] Locks taken from sched to print: finish_task_switch.isra.0+0x9b total time: 2056 (1 time) [_raw_spin_lock] Long lock held!

Questions? Code is here: https://www.trace-cmd.org https://rostedt.org/code/trace-cyclic.c

Steven Rostedt [email protected] @[email protected] github.com/rostedt Thank you! Let’s connect.
Tags