How different are different diff algorithms in Git?
yusufsn
167 views
12 slides
Jul 22, 2020
Slide 1 of 12
1
2
3
4
5
6
7
8
9
10
11
12
About This Presentation
This presentation slide describes the detail findings of the investigation of different diff algorithms in Git. This was presented virtually in the Journal-First Track of The 42nd International Conference on Software Engineering (ICSE) 2020.
Size: 10.09 MB
Language: en
Added: Jul 22, 2020
Slides: 12 pages
Slide Content
How different are different diff algorithms in Git? Yusuf S. Nugroho Hideaki Hata Kenichi Matsumoto
diff is essential in SE research field 2 Empirical Software Engineering Research git diff [<options>] <commit> <commit> [--] [<path>…]
Git offers 4 diff algorithms 3 Myers Minimal (improved Myers) Patience (try to give contextual diff) Histogram (enhanced Patience, normally faster) --diff-algorithm={ algorithm name } Documentation default algorithm Histogram was introduced in git 1.7.7 in 2011
Different algorithms produce different diff outputs 4 Differences: Number of changed lines Position of changed lines 9 added lines 2 deleted lines 4 deleted lines 9 added lines
2 sequential analyses Systematic Mapping Study How previous studies used git diff? Comparisons Study differences of diff outputs between Myers and Histogram 5
Results of Mapping Study 6 TSE 3 Journals EMSE TOSEM 8 Proceedings FSE ICSE OOPSLA PLDI ASE ICSME ISSTA MSR
Comparing diff outputs in 3 applications 7 Manual Comparison: Patches Myers Histogram ... some code ... - a deleted line + an added line + an added line - a deleted line ... some code ... ... some code ... - a deleted line - a deleted line + an added line + an added line ... some code ...
Different diff algorithms can report different location of identified changed lines of code 8
Histogram is better for describing code changes 9 code changes non-code changes
Use histogram diff algorithm when analyzing code changes 10 Different diff algorithms can produce different amount and location of changed lines. Histogram detects the changed lines more appropriately from source code.
Nugroho , Y.S., Hata , H. & Matsumoto, K., ” How Different are Different diff Algorithms in Git? Use --histogram for Code Changes ", Empirical Software Engineering 25, 79-823 (2020). Available at: https://doi.org/10.1007/s10664-019-09772-z 11 Publication
Application on actual tools Git extension -- (Feature request) https://github.com/gitextensions/gitextensions/issues/6991 12 Pydriller https://pydriller.readthedocs.io/en/latest/configuration.html#git-diff-algorithms