Assessing the Threat of Untracked Changes in Software Evolution (ICSE 2018)
andrehoraa
18 views
46 slides
Jul 25, 2024
Slide 1 of 46
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
About This Presentation
While refactoring is extensively performed by practitioners, many Mining Software Repositories (MSR) approaches do not detect nor keep track of refactorings when performing source code evolution analysis. In the best case, keeping track of refactorings could be unnecessary work; in the worst case, t...
While refactoring is extensively performed by practitioners, many Mining Software Repositories (MSR) approaches do not detect nor keep track of refactorings when performing source code evolution analysis. In the best case, keeping track of refactorings could be unnecessary work; in the worst case, these untracked changes could significantly affect the performance of MSR approaches. Since the extent of the threat is unknown, the goal of this paper is to assess whether it is significant. Based on an extensive empirical study, we answer positively: we found that between 10 and 21% of changes at the method level in 15 large Java systems are untracked. This results in a large proportion (25%) of entities that may have their histories split by these changes, and a measurable effect on at least two MSR approaches. We conclude that handling untracked changes should be systematically considered by MSR studies.
Size: 4.61 MB
Language: en
Added: Jul 25, 2024
Slides: 46 pages
Slide Content
Assessing the Threat of Untracked
Changes in Software Evolution
André Hora, Danilo Silva,
Marco Tulio Valente, Romain Robbes
ICSE 2018
MSR researchers are aware about this
“threat”, but they often do not assess it
“Our tool is unable to verify if an entity in revision n has been renamed
in revision n+1” [48]
“The development history of a file can be lost in case of renaming
operations, copy or file split” [3]
“It is possible to miss bug-introducing changes when a file changes its
name since the approach does not track such name changes” [38]
“We detect renamed or moved units as units that are removed first and
added later” [50]
14
MSR researchers are aware about this
“threat”, but they often do not assess it
“Our tool is unable to verify if an entity in revision n has been renamed
in revision n+1” [48]
“The development history of a file can be lost in case of renaming
operations, copy or file split” [3]
“It is possible to miss bug-introducing changes when a file changes its
name since the approach does not track such name changes” [38]
“We detect renamed or moved units as units that are removed first and
added later” [50]
15
[2, 5, 6, 7, 12, 22, 26, 27, 28, 29, 34, 36,
42, 45, 53, 54, 59, 61, 62, 66, 67, 68…]
What is the impact of
refactoring on MSR
studies?
16
Tracked and Untracked
Changes
version 1 version 2
public void foo() {
obj.print()
}
public void foo() {
obj.println()
}
version 3
public void bar() {
obj.println()
}
tracked change: preserves the entity name and
modifies its source code
untracked change: modifies the entity name,
and may also modify its source code
18
Change Graph
class Foo {
mA() {…}
}
class Bar {
mB() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
}
class Foo {
mA() {…}
}
class Baz {
mY() {…}
}
class Qux {
mC() {…}
}
class Qux {
mC() {…}
mE() {…}
}
version 1 version 2 version 3 version 4
19
tracked change
untracked change
Legend
Research Questions
•RQ1.What is the frequency of untracked
changes?
•RQ2. What is the extension of untracked
changes?
•RQ3. What is the impact of untracked
changes in existing MSR-based
approaches?
21
Case Studies
22
Tracked and Untracked
Changes Computation
Refactoring resolution
•RefDiff [Silva et al., MSR 2017]
•Precision: 85.6% - 100%
•Recall: 89.8% - 93.9%
1.Rename Class
2.Move Class
3.Extract Superclass
4.Move and Rename Class
5.Extract Interface
6.Rename Method
7.Move Method
8.Extract Method
9.Inline Method
10.Pull Up Method
11.Push Down Method
23
RQ1
What is the frequency of untracked
changes?
25
RQ1. What is the frequency of
untracked changes? (example)
class Foo {
mA() {…}
}
class Bar {
mB() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
}
class Foo {
mA() {…}
}
class Baz {
mY() {…}
}
class Qux {
mC() {…}
}
class Qux {
mC() {…}
mE() {…}
}
version 1 version 2 version 3 version 4
26
17 changes
12 tracked changes
5 untracked changes
RQ1. What is the frequency of
untracked changes? (example)
class Foo {
mA() {…}
}
class Bar {
mB() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
}
class Foo {
mA() {…}
}
class Baz {
mY() {…}
}
class Qux {
mC() {…}
}
class Qux {
mC() {…}
mE() {…}
}
version 1 version 2 version 3 version 4
27
17 changes
12 tracked changes
5 untracked changes
RQ1. What is the frequency of
untracked changes? (example)
class Foo {
mA() {…}
}
class Bar {
mB() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
}
class Foo {
mA() {…}
}
class Baz {
mY() {…}
}
class Qux {
mC() {…}
}
class Qux {
mC() {…}
mE() {…}
}
version 1 version 2 version 3 version 4
28
17 changes
12 tracked changes
5 untracked changes
RQ1. What is the frequency of
untracked changes? (example)
class Foo {
mA() {…}
}
class Bar {
mB() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
}
class Foo {
mA() {…}
}
class Baz {
mY() {…}
}
class Qux {
mC() {…}
}
class Qux {
mC() {…}
mE() {…}
}
version 1 version 2 version 3 version 4
29
Not desirable: relevant
data may be missed !!!
17 changes
12 tracked changes
5 untracked changes
RQ1. What is the frequency of
untracked changes?
Untracked
changes
Classes
2% to 15%
Methods
10% to 21%
30
RQ1. What is the frequency of
untracked changes?
Untracked
changes
Classes
2% to 15%
Methods
10% to 21%
31
Untracked changes are frequent
RQ1. What is the frequency of
untracked changes?
Untracked
changes
Rename mtd: 26%
Extract mtd: 23%
Move mtd: 22%
Move class: 12%
32
RQ1. What is the frequency of
untracked changes?
Untracked
changes
Rename mtd: 26%
Extract mtd: 23%
Move mtd: 22%
Move class: 12%
33
Keeping track of renamings is not enough
RQ2
What is the extension of untracked
changes?
34
class Foo {
mA() {…}
}
class Bar {
mB() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
}
class Foo {
mA() {…}
}
class Baz {
mY() {…}
}
class Qux {
mC() {…}
}
class Qux {
mC() {…}
mE() {…}
}
version 1 version 2 version 3 version 4
7 paths
3 paths: only tracked
changes
4 paths: at least one
untracked changes
RQ2. What is the extension of
untracked changes? (example)
35
class Foo {
mA() {…}
}
class Bar {
mB() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
}
class Foo {
mA() {…}
}
class Baz {
mY() {…}
}
class Qux {
mC() {…}
}
class Qux {
mC() {…}
mE() {…}
}
version 1 version 2 version 3 version 4
RQ2. What is the extension of
untracked changes? (example)
36
1
2
3
7 paths
3 paths: only tracked
changes
4 paths: at least one
untracked changes
class Foo {
mA() {…}
}
class Bar {
mB() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
}
class Foo {
mA() {…}
}
class Baz {
mY() {…}
}
class Qux {
mC() {…}
}
class Qux {
mC() {…}
mE() {…}
}
version 1 version 2 version 3 version 4
RQ2. What is the extension of
untracked changes? (example)
37
1
2
3
4
7 paths
3 paths: only tracked
changes
4 paths: at least one
untracked changes
class Foo {
mA() {…}
}
class Bar {
mB() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
mC() {…}
}
class Foo {
mA() {…}
}
class Bar {
mX() {…}
}
class Foo {
mA() {…}
}
class Baz {
mY() {…}
}
class Qux {
mC() {…}
}
class Qux {
mC() {…}
mE() {…}
}
version 1 version 2 version 3 version 4
RQ2. What is the extension of
untracked changes? (example)
38
1
2
3
4
Not desirable: their
histories may be split !!!
7 paths
3 paths: only tracked
changes
4 paths: at least one
untracked changes
RQ2. What is the extension of
untracked changes?
39
18% to 41%
entities with at least
one untracked change
in their histories
RQ2. What is the extension of
untracked changes?
22% to 58%
entities with at least
one untracked change
in their histories
Only considering the
most changed entities
40
RQ2. What is the extension of
untracked changes?
22% to 58%
entities with at least
one untracked change
in their histories
Only considering the
most changed entities
41
Untracked changes cause splits in entity histories
RQ3. What is the impact of untracked changes
in existing MSR-based approaches?
•Approaches
•API evolution mining rule (eg, Vector —> List)
•API co-usage mining rule (eg, Map —> HashMap)
•Results
•Amount of mined rules: usually improves when taking into
account untracked changes (median: 0% to +7%)
•Quality of mined rules: slightly improves when including
untracked changes (median: -2% to +2%)
42
RQ3. What is the impact of untracked changes
in existing MSR-based approaches?
•Approaches
•API evolution mining rule (eg, Vector —> List)
•API co-usage mining rule (eg, Map —> HashMap)
•Results
•Amount of mined rules: usually improves when taking into
account untracked changes (median: 0% to +7%)
•Quality of mined rules: slightly improves when including
untracked changes (median: -2% to +2%)
43
The impact of untracked changes is difficult to predict,
and needs to be evaluated in a case-by-case basis
Untracked changes are frequent
(10-21% at method level)
MSR studies should resolve untracked changes to access potentially
relevant new mining data
Keeping track of renamings is not enough
(≈26%)
MSR studies should address “extraction” and “moving” for a more
complete resolution of untracked changes
Untracked changes cause splits in entity histories
(18-41%)
MSR studies should resolve untracked changes when performing
traceability analysis, for more precise entity lifespans
45
Assessing the Threat of Untracked
Changes in Software Evolution
André Hora, Danilo Silva,
Marco Tulio Valente, Romain Robbes
ICSE 2018