On the Effectiveness of Machine Learning-based Call-Graph Pruning: An Empirical Study
AMir120
12 views
22 slides
Jun 06, 2024
Slide 1 of 22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
About This Presentation
I presented our paper, "On the Effectiveness of Machine Learning-based Call-Graph Pruning: An Empirical Study" at the 21st international conference on mining software repositories (MSR'24) in Lisbon, Portugal.
Size: 1.7 MB
Language: en
Added: Jun 06, 2024
Slides: 22 pages
Slide Content
On the Effectiveness of Machine Learning-based
Call-Graph Pruning: An Empirical Study
Amir M. Mir, Mehdi Keshani, Sebastian Proksch
Software Engineering Research Group, TU Delft, Netherlands
Introduction
2
Call Graphs
Source: Walunj, V., Gharibi, G., Ho, D. H., & Lee, Y. (2019, December). Graphevo: Characterizing and understanding software evolution using call graphs. In
2019 IEEE international conference on big data (big data) (pp. 4799-4807). IEEE.
3
Call Graphs
4
Over-approximation in Call Graphs
5
Motivation
-Static call graphs are over-approximated
-Leads to imprecision (superfluous edges)
-False warnings in the downstream analyses
-Reducing the size of call graphs
-For saving storage and memory
-Making downstream analyses more scalable
6
Call Graph Pruning
7
Related Work
Limitations
-Previous work: CGPruner [1] and AutoPruner [2]
-Trained and evaluated on a dataset without real-world programs
-Highly imbalanced dataset (actual invocations are rare)
-After CG pruning, the recall drops substantially by more than 25%
-Lack of comparison with context-sensitive CG techniques (k-CFA)
Source:
[1] Utture, A., Liu, S., Kalhauge, C. G., & Palsberg, J. (2022, May). Striking a balance: pruning false-positives from static call graphs. In Proceedings of the 44th
International Conference on Software Engineering (pp. 2043-2055).
[2] Le-Cong, T., Kang, H. J., Nguyen, T. G., Haryono, S. A., Lo, D., Le, X. B. D., & Huynh, Q. T. (2022, November). Autopruner: transformer-based call graph
pruning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp.
520-532).
8
Contribution
-A new benchmark dataset, NYXCorpus, containing real-world programs
-An empirical study on the effectiveness of ML-based call graph pruning, which
studies current issues, proposes solutions, and evaluates their effects.
-Adapting existing ML models to support weighted training and customizable
pruning through confidence levels
9
Research Methodology
Extracting Code Features
●Structural
○No. of nodes/edges in call graph
○No. of edges in/out of callee and caller
●Semantic
○Source and target method body
●Signature
●Combined
○Both structural and semantic
General performance of ML-based CG Pruners
-F2 scores is lower than
WALA’s
17
Conservative CG Pruning
●Incorporating weights into learning process
○Weight 0.99 to retain edges
○Big improvements to recall (30%)
○12% higher F2 scores
●Considering confidence levels gives control over recall/precision when
pruning
○An F2-score of 76% at 0.95 confidence level
18
The effect of CG pruning on vulnerability propagation
19
The effect of CG pruning on vulnerability propagation
-With paranoid pruning:
-1.4% loss in reachability
-22% faster analysis time
-~30% smaller CGs
20
Discussion
-Ground truth with assumptions based on high test coverage
-Better code features/representation
-A proper benchmark for different downstream analyses
21