Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is Significantly More Executed (FSE 2024)

andrehoraa 56 views 26 slides Jul 18, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

The literature has provided evidence that developers are likely to test some behaviors of the program and avoid other ones. Despite this observation, we still lack empirical evidence from real-world systems. In this paper, we propose to automatically identify the tested paths of a method as a way to...


Slide Content

Monitoring the Execution of 14K Tests:
Methods Tend to Have One Path That Is
Significantly More Executed
Andre Hora
DCC/UFMG
[email protected]
1
FSE 2024
Ideas, Visions and Reflections

Motivation & Problem
Having a good test suite is fundamental to ensuring software quality and
sustainable software evolution
Developers should focus on testing both the expected and unexpected behaviors
of the program to catch more bugs and protect against regressions
●Expected behavior: the normal execution, simpler to test
●Unexpected behavior: the abnormal execution, harder to test
2

Motivation & Problem
Having a good test suite is fundamental to ensuring software quality and
sustainable software evolution
Developers should focus on testing both the expected and unexpected behaviors
of the program to catch more bugs and protect against regressions
●Expected behavior: the normal execution, simpler to test
●Unexpected behavior: the abnormal execution, harder to test
3
In practice, it is well-known that developers are more
likely to test expected behaviors than unexpected ones

Motivation & Problem
However, existing research is mostly restricted to controlled experiments, like case
studies with students and developers
- Students are likely to (naively) test the “happy cases” [7]
- Expert developers may test the “sad cases” [25]

We still lack empirical evidence extracted from
real-world software systems and their test suites
4

5
Email Python Standard Library

6
Email Python Standard Library
Three possible behaviors at runtime:
1.Entering in both the for and if blocks
2.Entering in the for block and not in the if block
3.Not entering in the for block

7
Email Python Standard Library
Three possible behaviors at runtime:
1.Entering in both the for and if blocks
2.Entering in the for block and not in the if block
3.Not entering in the for block
At this point, it is unclear what
behaviors are the most and least
frequently tested by developers


Can you guess?

8

9
Interesting: the large
discrepancy between the
execution frequency of
different paths
Path 1 concentrates most
of the calls (70.9%)

Path 3 receives only 4.4%

Open Question
Are tested paths of real software likely to concentrate calls or do
calls tend to be more distributed among the tested paths?

Provide insights for developers to improve existing test suites
Support the creation of novel testing tools to better understand test suites
Reveal novel empirical data for researchers to quantify the difference between the
execution frequency of distinct paths in real-world software
10

Proposed Work
We propose an empirical study to assess the tested paths quantitatively
We monitor the execution of 14K tests from 25 real-world Python systems,
assessing 11K tested paths from 2,357 methods
11

Study Design
12

Study Design
1.Detecting the tested paths
2.Selecting software systems
3.Research questions
13

Study Design: Detecting the Tested Paths
1. Collecting executed lines of code
We execute an instrumented version of the
test suite that monitors the tests and collect
data from the execution trace
2. Detecting the tested paths
A tested path represents a set of input
values that make the method execute the
same lines of code
3. Ranking the tested paths
For each method with one or more tested
paths, we sort their paths in descending
order of path frequency
14

Study Design: Selecting Software Systems
25 Python systems
2,357 methods
14,177 tests
11,425 tested paths
15

Study Design: Research Questions
RQ1: Frequency of the most tested paths (top 1 vs. top 2)
RQ2: Frequency of the least tested paths (top 1 vs. top 3+)
16

Results
17

RQ1: Frequency of the Most Tested Paths
18
Top 1 vs. Top 2

RQ1: Frequency of the Most Tested Paths
19
Top 1 vs. Top 2
Finding 1: Overall, one tested path tends
to receive most of the calls. Top 1 receives
4x more calls than the Top 2.

RQ1: Frequency of the Most Tested Paths
20
Finding 1: Overall, one tested path tends
to receive most of the calls. Top 1 receives
4x more calls than the Top 2.
Top 1 vs. Top 2
Finding 2: In methods with two tested
paths, one path tends receive close to 5x
more calls than the second one.

RQ1: Frequency of the Most Tested Paths
21
Finding 2: In methods with two tested
paths, one path tends receive close to 5x
more calls than the second one.
Finding 3: Even methods with four or more
tested paths have one path that receives
the majority of the calls.
Top 1 vs. Top 2
Finding 1: Overall, one tested path tends
to receive most of the calls. Top 1 receives
4x more calls than the Top 2.

RQ2: Frequency of the Least Tested Paths
22
Top 1 vs. Top 3+

RQ2: Frequency of the Least Tested Paths
23
Top 1 vs. Top 3+

RQ2: Frequency of the Least Tested Paths
24
Top 1 vs. Top 3+
Finding 4: The top 3+ tested paths receive a
minority of the calls, ranging from 4% to 24%.

Overall, the most tested path of a method has
6.5x more calls than the top 3+.

Summary
We presented an empirical study to assess the tested paths quantitatively
We monitored the execution of over 14K tests and 11K tested paths
Overall, we found that one tested path is prevalent and receives most of the calls,
while others are significantly less executed
Possible applications:
●Provide insights for developers to improve existing test suites
●Support the creation of novel testing tools
●Reveal novel empirical data for researchers
25

Monitoring the Execution of 14K Tests:
Methods Tend to Have One Path That Is
Significantly More Executed
Andre Hora
DCC/UFMG
[email protected]
26
FSE 2024
Ideas, Visions and Reflections
Tags