Using Libtracecmd to Analyze Your Latency and Performance Troubles
ScyllaDB
322 views
81 slides
Jul 01, 2024
Slide 1 of 81
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
About This Presentation
Trying to figure out why your application is responding late can be difficult, especially if it is because of interference from the operating system. This talk will briefly go over how to write a C program that can analyze what in the Linux system is interfering with your application. It will use tr...
Trying to figure out why your application is responding late can be difficult, especially if it is because of interference from the operating system. This talk will briefly go over how to write a C program that can analyze what in the Linux system is interfering with your application. It will use trace-cmd to enable kernel trace events as well as tracing lock functions, and it will then go over a quick tutorial on how to use libtracecmd to read the created trace.dat file to uncover what is the cause of interference to you application.
Size: 5.46 MB
Language: en
Added: Jul 01, 2024
Slides: 81 pages
Slide Content
Using Libtracecmd to Analyze Latency and Performance Troubles Steven Rostedt Software Engineer
Steven Rostedt Software Engineer One of the original authors of the PREEMPT_RT patch set Creator and maintainer of ftrace Creator of “make localmodconfig” Creator and maintainer of “ktest.pl” Linux testing framework
Tracing kernel latency on your application The kernel is a black box
Tracing kernel latency on your application The kernel is a black box Your application is at the whims of the kernel scheduler
Tracing kernel latency on your application The kernel is a black box Your application is at the whims of the kernel scheduler Interrupts can cause delays in your application
Tracing kernel latency on your application The kernel is a black box Your application is at the whims of the kernel scheduler Interrupts can cause delays in your application Kernel lock contention could add to the latency
Tracing kernel latency on your application The kernel is a black box Your application is at the whims of the kernel scheduler Interrupts can cause delays in your application Kernel lock contention could add to the latency Tracing can give you insight into the happenings of the kernel Monitor the scheduling decisions the kernel is making Record when interrupts are happening and for how long See how long kernel locks are held
Using cyclictest c yclictest tests the latency of the system Calls nanosleep() and sees when it woke up to when it expected to wake up
Using cyclictest cyclictest tests the latency of the system Calls nanosleep() and sees when it woke up to when it expected to wake up The Real Time Linux developers use this to test the jitter of the system Loads are run while cyclictest is measuring the latency Requires the latency to stay below a threshold May run for weeks or months
Using cyclictest cyclictest tests the latency of the system Calls nanosleep() and sees when it woke up to when it expected to wake up The Real Time Linux developers use this to test the jitter of the system Loads are run while cyclictest is measuring the latency Requires the latency to stay below a threshold May run for weeks or months I will use cyclictest as a example for this talk
Using cyclictest cyclictest tests the latency of the system Calls nanosleep() and sees when it woke up to when it expected to wake up The Real Time Linux developers use this to test the jitter of the system Loads are run while cyclictest is measuring the latency Requires the latency to stay below a threshold May run for weeks or months I will use cyclictest as a example for this talk But this works for any application
A typical cyclictest run ~# cyclictest -p80 -i250 -a -t -m -d 0 # /dev/cpu_dma_latency set to 0us policy: fifo: loadavg: 0.00 0.00 0.00 1/9997 68021 T: 0 (68015) P:80 I:250 C: 18570 Min: 3 Act: 3 Avg: 3 Max: 7 T: 1 (68016) P:80 I:250 C: 18569 Min: 3 Act: 3 Avg: 3 Max: 5 T: 2 (68017) P:80 I:250 C: 18569 Min: 3 Act: 3 Avg: 3 Max: 9 T: 3 (68018) P:80 I:250 C: 18568 Min: 3 Act: 3 Avg: 3 Max: 6 T: 4 (68019) P:80 I:250 C: 18568 Min: 3 Act: 3 Avg: 3 Max: 5 T: 5 (68020) P:80 I:250 C: 18567 Min: 3 Act: 3 Avg: 3 Max: 95 T: 6 (68021) P:80 I:250 C: 18567 Min: 3 Act: 3 Avg: 3 Max: 9 -p PRIO --priority=PRIO priority of highest prio thread -i INTV --interval=INTV base interval of thread in us default=1000 -a [CPUSET] --affinity Run thread #N on processor #N, if possible, or if CPUSET given, pin threads to that set of processors in round- robin order. -t --threads one thread per available processor -m --mlockall lock current and future memory allocations -d DIST --distance=DIST distance of thread intervals in us, default=500
c yclictest and tracing Can break break when a latency is greater than a given threshold
cyclictest and tracing Can break break when a latency is greater than a given threshold Will write into the kernel tracing buffer
cyclictest and tracing Can break break when a latency is greater than a given threshold Will write into the kernel tracing buffer -b USEC --breaktrace=USEC send “breaktrace” command when latency > USEC --tracemark write a “tracemark” when -b latency is exceeded
l ibtracecmd to read the trace.dat file Automate the latency process
libtracecmd to read the trace.dat file Automate the latency process First search from the end of the trace Find the marker or flag that tells you where the problem happened
libtracecmd to read the trace.dat file Automate the latency process First search from the end of the trace Find the marker or flag that tells you where the problem happened Continue backwards looking for other events Search forward to collect timings
int main (int argc, char **argv) { struct tracecmd_input *handle ; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0) ; tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus(tep); cpu_size = CPU_ALLOC_SIZE(cpus); cpu_set = CPU_ALLOC(cpus); CPU_ZERO_S(cpu_size, cpu_set); CPU_SET_S(data.cpu, cpu_size, cpu_set); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true);
int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data ; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data (tep, & data ); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus(tep); cpu_size = CPU_ALLOC_SIZE(cpus); cpu_set = CPU_ALLOC(cpus); CPU_ZERO_S(cpu_size, cpu_set); CPU_SET_S(data.cpu, cpu_size, cpu_set); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true);
int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse ( handle , NULL, 0, find_trace_marker , & data , false ); printf(" cpu=%d threshold=%d latency=%d\n\n ", data . cpu , data . thresh , data . lat ); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus(tep); cpu_size = CPU_ALLOC_SIZE(cpus); cpu_set = CPU_ALLOC(cpus); CPU_ZERO_S(cpu_size, cpu_set); CPU_SET_S(data.cpu, cpu_size, cpu_set); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true);
static int find_trace_marker (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * marker ; static struct trace_seq seq ; struct data * d = data ; const char * str ; int thresh ; int type ; int lat ; marker = d -> marker ; type = tep_data_type ( tep , record ); if ( type != marker -> id ) return 0; /* Make sure that the print has the data we want */ if (! seq . buffer ) trace_seq_init (& seq ); tep_print_event ( tep , & seq , record , " %s ", TEP_PRINT_INFO ); trace_seq_terminate (& seq );
static int find_trace_marker (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * marker ; static struct trace_seq seq ; struct data * d = data ; const char * str ; int thresh ; int type ; int lat ; marker = d -> marker ; type = tep_data_type ( tep , record ); if ( type != marker -> id ) return 0; /* Make sure that the print has the data we want */ if (! seq . buffer ) trace_seq_init (& seq ); tep_print_event ( tep , & seq , record , " %s ", TEP_PRINT_INFO ); trace_seq_terminate (& seq ); Skip non-marker events
str = strstr( seq . buffer , " hit latency threshold "); if (! str ) { /* This is not the string you are looking for */ trace_seq_reset (& seq ); return 0; } sscanf( str , " hit latency threshold (%d > %d) ", & lat , & thresh ); d -> cpu = record -> cpu ; d -> lat = lat ; d -> thresh = thresh ; d -> pid = tep_data_pid ( tep , record ); d -> marker_time = record -> ts ; trace_seq_destroy (& seq ); seq . buffer = NULL; /* Stop the iterator, as we will now only search the current CPU */ return 1; } Parse the tracemarker
str = strstr( seq . buffer , " hit latency threshold "); if (! str ) { /* This is not the string you are looking for */ trace_seq_reset (& seq ); return 0; } sscanf( str , " hit latency threshold (%d > %d) ", & lat , & thresh ); d -> cpu = record -> cpu ; d -> lat = lat ; d -> thresh = thresh ; d -> pid = tep_data_pid ( tep , record ); d -> marker_time = record -> ts ; trace_seq_destroy (& seq ); seq . buffer = NULL; /* Stop the iterator, as we will now only search the current CPU */ return 1; } Record which CPU
int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus ( tep ); cpu_size = CPU_ALLOC_SIZE( cpus ); cpu_set = CPU_ALLOC( cpus ); CPU_ZERO_S( cpu_size , cpu_set ); CPU_SET_S( data . cpu , cpu_size , cpu_set ); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true);
int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus ( tep ); cpu_size = CPU_ALLOC_SIZE( cpus ); cpu_set = CPU_ALLOC( cpus ); CPU_ZERO_S( cpu_size , cpu_set ); CPU_SET_S( data . cpu , cpu_size , cpu_set ); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true); Only follow 1 CPU
int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus(tep); cpu_size = CPU_ALLOC_SIZE(cpus); cpu_set = CPU_ALLOC(cpus); CPU_ZERO_S(cpu_size, cpu_set); CPU_SET_S(data.cpu, cpu_size, cpu_set); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse ( handle , cpu_set , cpu_size , find_sched , & data , true ); /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_expire, &data, true); Look for sched_switch
static int find_sched (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * sched ; unsigned long long val ; struct data *d = data ; int type ; sched = d -> sched ; type = tep_data_type ( tep , record ); if ( type != sched -> id ) return 0; tep_read_number_field ( d -> next_pid , record -> data , & val ); d -> sched_time = record -> ts ; return -1; }
static int find_sched (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * sched ; unsigned long long val ; struct data *d = data ; int type ; sched = d -> sched ; type = tep_data_type ( tep , record ); if ( type != sched -> id ) return 0; tep_read_number_field ( d -> next_pid , record -> data , & val ); d -> sched_time = record -> ts ; return -1; } Exit if not sched_switch
static int find_sched (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * sched ; unsigned long long val ; struct data *d = data ; int type ; sched = d -> sched ; type = tep_data_type ( tep , record ); if ( type != sched -> id ) return 0; tep_read_number_field ( d -> next_pid , record -> data , & val ); d -> sched_time = record -> ts ; return -1; } Record sched_switch time
int main (int argc, char **argv) { struct tracecmd_input *handle; struct tep_handle *tep; struct data data; int ret; handle = tracecmd_open(argv[1], 0); tep = tracecmd_get_tep(handle); init_data(tep, &data); tracecmd_iterate_events_reverse(handle, NULL, 0, find_trace_marker, &data, false); printf("cpu=%d threshold=%d latency=%d\n\n", data.cpu, data.thresh, data.lat); /* Now we know what CPU it is on, look at just this CPU */ cpus = tep_get_cpus(tep); cpu_size = CPU_ALLOC_SIZE(cpus); cpu_set = CPU_ALLOC(cpus); CPU_ZERO_S(cpu_size, cpu_set); CPU_SET_S(data.cpu, cpu_size, cpu_set); /* Find where cyclictest was scheduled in */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_sched, &data, true); /* Find where cyclictest was woken up */ tracecmd_iterate_events_reverse ( handle , cpu_set , cpu_size , find_wakeup , & data , true ); Look for sched_waking
static int find_wakeup (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_format_field * waking_pid ; struct tep_event * waking ; unsigned long long val ; struct data * d = data ; int flags ; int type ; waking = d -> waking ; waking_pid = d -> waking_pid ; type = tep_data_type ( tep , record ); if ( type != waking -> id ) return 0; tep_read_number_field ( waking_pid , record -> data , & val ); if ( val != d -> pid ) return 0; /* Found the wakeup! */ d -> wakeup_time = record -> ts ; return -1; } Record wake up time
int main (int argc, char **argv) { [..] /* Find where cyclictest timer went off */ tracecmd_iterate_events_reverse ( handle , cpu_set , cpu_size , find_timer_expire , & data , true ); /* Find where the timer was added (the start of this cycle */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_start, &data, true); printf("expected expire: %lld\n", data.timer_expires_expect); printf("timer expired: %lld\n", data.timer_expire); printf(" jitter: (%lld)\n", data.timer_expire - data.timer_expires_expect); printf("wake up time: %lld\n", data.wakeup_time); printf(" from timer: (%lld)\n", data.wakeup_time - data.timer_expire); printf("scehduled time: %lld\n", data.sched_time); printf(" from wakeup: (%lld)\n", data.sched_time - data.wakeup_time); printf("marker time: %lld\n", data.marker_time); printf(" from schedule:(%lld)\n", data.marker_time - data.sched_time); tracecmd_close(handle); return 0; } Look for timer expire event
static int find_timer_expire (struct tracecmd_input * handle , struct tep_record * record , int cpu, void *data) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * timer ; unsigned long long val ; struct data * d = data ; int type ; timer = d -> hrtimer_expire ; type = tep_data_type ( tep , record ); if ( type != timer -> id ) return 0; tep_read_number_field ( d -> hrtimer_expire_ptr , record -> data , & val ); d -> hrtimer = val ; tep_read_number_field ( d -> timer_now , record -> data , & val ); d -> timer_expire = record -> ts ; d -> timer_delta = val - record -> ts ; return -1; }
static int find_timer_expire (struct tracecmd_input * handle , struct tep_record * record , int cpu, void *data) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * timer ; unsigned long long val ; struct data * d = data ; int type ; timer = d -> hrtimer_expire ; type = tep_data_type ( tep , record ); if ( type != timer -> id ) return 0; tep_read_number_field ( d -> hrtimer_expire_ptr , record -> data , & val ); d -> hrtimer = val ; tep_read_number_field ( d -> timer_now , record -> data , & val ); d -> timer_expire = record -> ts ; d -> timer_delta = val - record -> ts ; return -1; } Calculate timer event time to ring buffer time
int main (int argc, char **argv) { [..] /* Find where cyclictest was woken up */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_wakeup, &data, true); /* Find where the timer was added (the start of this cycle */ tracecmd_iterate_events_reverse ( handle , cpu_set , cpu_size , find_timer_start , & data , true ); printf("expected expire: %lld\n", data.timer_expires_expect); printf("timer expired: %lld\n", data.timer_expire); printf(" jitter: (%lld)\n", data.timer_expire - data.timer_expires_expect); printf("wake up time: %lld\n", data.wakeup_time); printf(" from timer: (%lld)\n", data.wakeup_time - data.timer_expire); printf("scehduled time: %lld\n", data.sched_time); printf(" from wakeup: (%lld)\n", data.sched_time - data.wakeup_time); printf("marker time: %lld\n", data.marker_time); printf(" from schedule:(%lld)\n", data.marker_time - data.sched_time); tracecmd_close(handle); return 0; } Look for timer start event
static int find_timer_start (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * timer ; unsigned long long val ; struct data * d = data ; int type ; timer = d -> hrtimer_start ; type = tep_data_type ( tep , record ); if ( type != timer -> id ) return 0; tep_read_number_field ( d -> hrtimer_start_ptr , record -> data , & val ); if ( val != d -> hrtimer ) return 0; tep_read_number_field ( d -> hrtimer_start_expires , record -> data , & val ); d -> timer_expires_expect = val - d -> timer_delta ; d -> timer_start = record -> ts ; return -1; }
static int find_timer_start (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct tep_event * timer ; unsigned long long val ; struct data * d = data ; int type ; timer = d -> hrtimer_start ; type = tep_data_type ( tep , record ); if ( type != timer -> id ) return 0; tep_read_number_field ( d -> hrtimer_start_ptr , record -> data , & val ); if ( val != d -> hrtimer ) return 0; tep_read_number_field ( d -> hrtimer_start_expires , record -> data , & val ); d -> timer_expires_expect = val - d -> timer_delta ; d -> timer_start = record -> ts ; return -1; } Calculate expected time vs actual time
int main (int argc, char **argv) { [..] /* Find where cyclictest was woken up */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_wakeup, &data, true); /* Find where the timer was added (the start of this cycle */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_start, &data, true); printf(" expected expire: %lld\n ", data . timer_expires_expect ); printf(" timer expired: %lld\n ", data . timer_expire ); printf(" jitter: (%lld)\n ", data . timer_expire - data . timer_expires_expect ); printf(" wake up time: %lld\n ", data . wakeup_time ); printf(" from timer: (%lld)\n ", data . wakeup_time - data . timer_expire ); printf(" scehduled time: %lld\n ", data . sched_time ); printf(" from wakeup: (%lld)\n ", data . sched_time - data . wakeup_time ); printf(" marker time: %lld\n ", data . marker_time ); printf(" from schedule:(%lld)\n ", data . marker_time - data . sched_time ); tracecmd_close(handle); return 0; }
Now analyze the trace.dat file ~# trace-cyclic trace-cyclic.dat cpu=1 threshold=100 latency=121 expected expire: 818035615415764 timer expired: 818035615416617 jitter: (853) wake up time: 818035615416993 from timer: (376) scehduled time: 818035615535531 from wakeup: (118538) marker time: 818035615546097 from schedule:(10566) Time in nanoseconds
Now analyze the trace.dat file ~# trace-cyclic trace-cyclic.dat cpu=1 threshold=100 latency=121 expected expire: 818035615415764 timer expired: 818035615416617 jitter: (853) wake up time: 818035615416993 from timer: (376) scehduled time: 818035615535531 from wakeup: ( 118538 ) marker time: 818035615546097 from schedule:(10566) Problem here!
Can analyze any events you want This is just a small example of what you can do
Can analyze any events you want This is just a small example of what you can do Let’s look at locks
Can analyze any events you want This is just a small example of what you can do Let’s look at locks Using function tracing, trace all locking functions function tracing gives you the parent function too Can see where a lock was called
Can analyze any events you want This is just a small example of what you can do Let’s look at locks Using function tracing, trace all locking functions function tracing gives you the parent function too Can see where a lock was called Trace both the lock and unlock functions Can get the latency of how long they are held
int main (int argc, char **argv) { [..] /* Find where cyclictest was woken up */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_wakeup, &data, true); /* Find where the timer was added (the start of this cycle */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_start, &data, true); /* Now that we know all the times, look for the locks (going forward now) */ tracecmd_iterate_events ( handle , cpu_set , cpu_size , find_locks , & data ); printf("expected expire: %lld\n", data.timer_expires_expect); printf("timer expired: %lld\n", data.timer_expire); printf(" jitter: (%lld)\n", data.timer_expire - data.timer_expires_expect); printf("wake up time: %lld\n", data.wakeup_time); printf(" from timer: (%lld)\n", data.wakeup_time - data.timer_expire); printf("scehduled time: %lld\n", data.sched_time); printf(" from wakeup: (%lld)\n", data.sched_time - data.wakeup_time); printf("marker time: %lld\n", data.marker_time); printf(" from schedule:(%lld)\n", data.marker_time - data.sched_time);
int main (int argc, char **argv) { [..] /* Find where cyclictest was woken up */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_wakeup, &data, true); /* Find where the timer was added (the start of this cycle */ tracecmd_iterate_events_reverse(handle, cpu_set, cpu_size, find_timer_start, &data, true); /* Now that we know all the times, look for the locks (going forward now) */ tracecmd_iterate_events ( handle , cpu_set , cpu_size , find_locks , & data ); printf("expected expire: %lld\n", data.timer_expires_expect); printf("timer expired: %lld\n", data.timer_expire); printf(" jitter: (%lld)\n", data.timer_expire - data.timer_expires_expect); printf("wake up time: %lld\n", data.wakeup_time); printf(" from timer: (%lld)\n", data.wakeup_time - data.timer_expire); printf("scehduled time: %lld\n", data.sched_time); printf(" from wakeup: (%lld)\n", data.sched_time - data.wakeup_time); printf("marker time: %lld\n", data.marker_time); printf(" from schedule:(%lld)\n", data.marker_time - data.sched_time); Go forward and find all locks
static int find_locks (struct tracecmd_input * handle , struct tep_record * record , int cpu, void * data ) { struct tep_handle * tep = tracecmd_get_tep ( handle ); struct data * d = data ; enum data_state state ; if ( record -> ts == d -> marker_time ) return -1; if ( record -> ts < d -> timer_expires_expect ) state = STATE_START ; else if ( record -> ts < d - > timer_expire ) state = STATE_EXPECT ; else if ( record -> ts < d -> wakeup_time ) state = STATE_EXPIRED ; else if ( record -> ts < d -> sched_time ) state = STATE_WAKEUP ; else state = STATE_SCHED ; handle_locks ( tep , record , data , state ); return 0; } Use saved times to know where the lock happened
static int handle_locks (struct tep_handle * tep , struct tep_record * record , struct data *data, enum data_state state ) { struct lock_stack * lock ; struct lock_list * llist ; unsigned long long pip , ip ; const char * func ; bool start = true ; int type ; int id ; switch ( state ) { case STATE_START : llist = & data -> locks_start ; break; case STATE_EXPECT : llist = & data -> locks_start_expire ; break; case STATE_EXPIRED : llist = & data -> locks_timer ; break; case STATE_WAKEUP : llist = & data -> locks_wakeup ; break; case STATE_SCHED : llist = & data -> locks_sched ; break; } type = tep_data_type ( tep , record ); if (! data -> function || data -> function -> id != type ) return 0; Save lock information with associated state
tep_read_number_field ( data -> function_ip , record -> data , & ip ); tep_read_number_field ( data -> function_pip , record -> data , & pip ); func = tep_find_function ( tep , ip ); id = find_start_lock ( func ); if ( id < 0) { id = find_stop_lock ( func ); if ( id < 0) return 1; start = false; } if ( start ) { push_lock ( data , id , pip , record -> ts , func ); return 1; } /* Pop the locks until we find our id */ lock = NULL; do { if ( lock ) free( lock ); lock = pop_stack ( data ); } while ( lock && lock -> id != id ); if (! lock ) return 1; /* Add the total time of this lock */ end_timer (& llist -> list , pip , record -> ts , lock -> name ); start_timer (& llist -> list , pip , lock -> time , NULL); free( lock ); return 1; } Calculate how long lock was held
static const char * locks [] = { " spin_ ", " raw_spin_ ", " _raw_spin_ ", " __raw_spin_ ", " read_ ", " raw_read_ ", " _raw_read_ ", " __raw_read_ ", " write_ ", " raw_write_ ", " _raw_write_ ", " __raw_write_ ", }; static int find_start_lock (const char * func ) { const char * lock ; int len ; int i; for (i = 0; i < ARRAY_SIZE( locks ); i++) { lock = locks [i]; len = strlen( lock ); if (strncmp( lock , func , len ) != 0) continue; if (strncmp( func + len , " lock ", 4) == 0 || strncmp( func + len , " try ", 3) == 0) return i; break; } return -1; } All known lock types
static const char * locks [] = { " spin_ ", " raw_spin_ ", " _raw_spin_ ", " __raw_spin_ ", " read_ ", " raw_read_ ", " _raw_read_ ", " __raw_read_ ", " write_ ", " raw_write_ ", " _raw_write_ ", " __raw_write_ ", }; static int find_start_lock (const char * func ) { const char * lock ; int len ; int i; for (i = 0; i < ARRAY_SIZE( locks ); i++) { lock = locks [i]; len = strlen( lock ); if (strncmp( lock , func , len ) != 0) continue; if (strncmp( func + len , " lock ", 4) == 0 || strncmp( func + len , " try ", 3) == 0) return i; break; } return -1; } See if it is a lock (not unlock)
int main (int argc, char **argv) { [..] tracecmd_iterate_events(handle, cpu_set, cpu_size, find_locks, &data); printf("expected expire: %lld\n", data.timer_expires_expect); printf("timer expired: %lld\n", data.timer_expire); printf(" jitter: (%lld)\n", data.timer_expire - data.timer_expires_expect); printf("wake up time: %lld\n", data.wakeup_time); printf(" from timer: (%lld)\n", data.wakeup_time - data.timer_expire); printf("scehduled time: %lld\n", data.sched_time); printf(" from wakeup: (%lld)\n", data.sched_time - data.wakeup_time); printf("marker time: %lld\n", data.marker_time); printf(" from schedule:(%lld)\n", data.marker_time - data.sched_time); print_lock_list ( tep , " Locks taken and released from start to expected time: ", & data . locks_start ); print_lock_list ( tep , " Locks held between expected time and timer :", & data . locks_start_expire ); print_lock_list ( tep , " Locks taken from timer to wakeup :", & data . locks_timer ); print_lock_list ( tep , " Locks taken from wake up to sched: ", & data . locks_wakeup ); print_lock_list ( tep , " Locks taken from sched to print: ", & data . locks_sched );
static void print_lock_list (struct tep_handle * tep , const char * msg , struct lock_list * llist ) { struct time_list * list = llist -> list ; bool printed = false; for (; list ; list = list -> next ) { if (! list -> total ) continue; if (! printed ) { printf(" \n%s\n ", msg ); printed = true; } print_lock( tep , list -> id ); printf(" total time: %lld (%d time%s) [%s]\n ", list -> total , list -> cnt , list -> cnt != 1 ? "s" : "", list -> name ); } }
Analyze the locks in the trace ~ # trace-cyclic trace-cyclic.dat cpu=4 threshold=100 latency=102 expected expire: 850947083216603 timer expired: 850947083217641 jitter: (1038) wake up time: 850947083217999 from timer: (358) scehduled time: 850947083316970 from wakeup: (98971) marker time: 850947083328710 from schedule:(11740) Locks taken and released from start to expected time: do_nanosleep+0x5f total time: 610 (1 time) [_raw_spin_lock_irqsave] dequeue_task_rt+0x28 total time: 180 (1 time) [_raw_spin_lock] finish_task_switch.isra.0+0x9b total time: 1450 (2 times) [_raw_spin_lock] get_next_timer_interrupt+0x7b total time: 155 (7 times) [_raw_spin_lock] hrtimer_get_next_event+0x47 total time: 140 (7 times) [_raw_spin_lock_irqsave] hrtimer_next_event_without+0x67 total time: 161 (7 times) [_raw_spin_lock_irqsave] sched_ttwu_pending+0xed total time: 1522 (1 time) [_raw_spin_lock] poll_freewait+0x3d total time: 182 (7 times) [_raw_spin_lock_irqsave] [..]
Analyze the locks in the trace ~ # trace-cyclic trace-cyclic.dat cpu=4 threshold=100 latency=102 expected expire: 850947083216603 timer expired: 850947083217641 jitter: (1038) wake up time: 850947083217999 from timer: (358) scehduled time: 850947083316970 from wakeup: ( 98971 ) marker time: 850947083328710 from schedule:(11740) Locks taken and released from start to expected time: do_nanosleep+0x5f total time: 610 (1 time) [_raw_spin_lock_irqsave] dequeue_task_rt+0x28 total time: 180 (1 time) [_raw_spin_lock] finish_task_switch.isra.0+0x9b total time: 1450 (2 times) [_raw_spin_lock] get_next_timer_interrupt+0x7b total time: 155 (7 times) [_raw_spin_lock] hrtimer_get_next_event+0x47 total time: 140 (7 times) [_raw_spin_lock_irqsave] hrtimer_next_event_without+0x67 total time: 161 (7 times) [_raw_spin_lock_irqsave] sched_ttwu_pending+0xed total time: 1522 (1 time) [_raw_spin_lock] poll_freewait+0x3d total time: 182 (7 times) [_raw_spin_lock_irqsave] [..] Biggest difference
Analyze the locks in the trace sigprocmask+0x85 total time: 199 (4 times) [_raw_spin_lock_irq] n_tty_read+0x217 total time: 198 (2 times) [_raw_spin_lock_irqsave] n_tty_read+0x5d9 total time: 233 (2 times) [_raw_spin_lock_irqsave] try_to_wake_up+0x77 total time: 183 (1 time) [_raw_spin_lock_irqsave] __wake_up_common_lock+0x7e total time: 415 (2 times) [_raw_spin_lock_irqsave] tcp_poll+0x2b total time: 178 (1 time) [_raw_spin_lock_irqsave] tcp_sendmsg+0x19 total time: 231 (1 time) [_raw_spin_lock_bh] nf_conntrack_tcp_packet+0x8e4 total time: 447 (1 time) [_raw_spin_lock_bh] sch_direct_xmit+0x43 total time: 168 (1 time) [_raw_spin_lock] Locks held between expected time and timer: __hrtimer_run_queues+0x120 total time: 456 (1 time) [_raw_spin_lock_irqsave] Locks taken from wake up to sched: enqueue_task_rt+0x1e8 total time: 169 (1 time) [_raw_spin_lock] try_to_wake_up+0x246 total time: 1674 (1 time) [_raw_spin_lock] try_to_wake_up+0x251 total time: 2346 (1 time) [_raw_spin_lock_irqsave] hrtimer_interrupt+0x11d total time: 222 (1 time) [_raw_spin_lock_irq] sch_direct_xmit+0x13a total time: 168948 (1 time) [_raw_spin_lock] __dev_queue_xmit+0x7fe total time: 279 (1 time) [_raw_spin_lock] rcu_note_context_switch+0x386 total time: 237 (1 time) [_raw_spin_lock] Locks taken from sched to print: finish_task_switch.isra.0+0x9b total time: 2056 (1 time) [_raw_spin_lock]
Analyze the locks in the trace sigprocmask+0x85 total time: 199 (4 times) [_raw_spin_lock_irq] n_tty_read+0x217 total time: 198 (2 times) [_raw_spin_lock_irqsave] n_tty_read+0x5d9 total time: 233 (2 times) [_raw_spin_lock_irqsave] try_to_wake_up+0x77 total time: 183 (1 time) [_raw_spin_lock_irqsave] __wake_up_common_lock+0x7e total time: 415 (2 times) [_raw_spin_lock_irqsave] tcp_poll+0x2b total time: 178 (1 time) [_raw_spin_lock_irqsave] tcp_sendmsg+0x19 total time: 231 (1 time) [_raw_spin_lock_bh] nf_conntrack_tcp_packet+0x8e4 total time: 447 (1 time) [_raw_spin_lock_bh] sch_direct_xmit+0x43 total time: 168 (1 time) [_raw_spin_lock] Locks held between expected time and timer: __hrtimer_run_queues+0x120 total time: 456 (1 time) [_raw_spin_lock_irqsave] Locks taken from wake up to sched: enqueue_task_rt+0x1e8 total time: 169 (1 time) [_raw_spin_lock] try_to_wake_up+0x246 total time: 1674 (1 time) [_raw_spin_lock] try_to_wake_up+0x251 total time: 2346 (1 time) [_raw_spin_lock_irqsave] hrtimer_interrupt+0x11d total time: 222 (1 time) [_raw_spin_lock_irq] sch_direct_xmit+0x13a total time: 168948 (1 time) [_raw_spin_lock] __dev_queue_xmit+0x7fe total time: 279 (1 time) [_raw_spin_lock] rcu_note_context_switch+0x386 total time: 237 (1 time) [_raw_spin_lock] Locks taken from sched to print: finish_task_switch.isra.0+0x9b total time: 2056 (1 time) [_raw_spin_lock] Long lock held!
Questions? Code is here: https://www.trace-cmd.org https://rostedt.org/code/trace-cyclic.c