How different are different diff algorithms in Git?

yusufsn 167 views 12 slides Jul 22, 2020
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

This presentation slide describes the detail findings of the investigation of different diff algorithms in Git. This was presented virtually in the Journal-First Track of The 42nd International Conference on Software Engineering (ICSE) 2020.


Slide Content

How different are different diff algorithms in Git? Yusuf S. Nugroho Hideaki Hata Kenichi Matsumoto

diff is essential in SE research field 2 Empirical Software Engineering Research git diff [<options>] <commit> <commit> [--] [<path>…]

Git offers 4 diff algorithms 3 Myers Minimal (improved Myers) Patience (try to give contextual diff) Histogram (enhanced Patience, normally faster) --diff-algorithm={ algorithm name } Documentation default algorithm Histogram was introduced in git 1.7.7 in 2011

Different algorithms produce different diff outputs 4 Differences: Number of changed lines Position of changed lines 9 added lines 2 deleted lines 4 deleted lines 9 added lines

2 sequential analyses Systematic Mapping Study How previous studies used git diff? Comparisons Study differences of diff outputs between Myers and Histogram 5

Results of Mapping Study 6 TSE 3 Journals EMSE TOSEM 8 Proceedings FSE ICSE OOPSLA PLDI ASE ICSME ISSTA MSR

Comparing diff outputs in 3 applications 7 Manual Comparison: Patches Myers Histogram ... some code ... - a deleted line + an added line + an added line - a deleted line ... some code ... ... some code ... - a deleted line - a deleted line + an added line + an added line ... some code ...

Different diff algorithms can report different location of identified changed lines of code 8

Histogram is better for describing code changes 9 code changes non-code changes

Use histogram diff algorithm when analyzing code changes 10 Different diff algorithms can produce different amount and location of changed lines. Histogram detects the changed lines more appropriately from source code.

Nugroho , Y.S.,  Hata , H. & Matsumoto, K., ” How Different are Different diff Algorithms in Git? Use --histogram for Code Changes ", Empirical Software Engineering 25, 79-823 (2020). Available at: https://doi.org/10.1007/s10664-019-09772-z 11 Publication

Application on actual tools Git extension -- (Feature request) https://github.com/gitextensions/gitextensions/issues/6991 12 Pydriller https://pydriller.readthedocs.io/en/latest/configuration.html#git-diff-algorithms