Graspan: A Big Data System for Big Code Analysis

aftabhussain461 56 views 1 slides Jun 06, 2024
Slide 1
Slide 1 of 1
Slide 1
1

About This Presentation

We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.

We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large c...


Slide Content

A Big Data System for Big Code Analysis
Kai Wang, Aftab Hussain, ZhiqiangZuo, GuoqingXu, ArdalanAmiriSani
Department of Computer Science, University of California, Irvine
Graspan
The Work at a Glance
Why InterproceduralAnalysis?
Research Q&As
We address the scalability problem of inter-procedural static analysis for bug detection inbig code.
Can InterproceduralAnalysis improve checkers?
Is GraspanEfficient and Scalable?
Graspanimplementation v/s old implementation?
Graspanv/s existing graph systems?
Graphs and Program Analysis?
References
We built Graspan, a graph processing system that helped us uncover 85new Null pointer bugs in Linux 4.4.0.
void function1() {
int* ptr1;
if (<condition>) {
ptr1 = fn_explct_ret_null();
}
else {
ptr1 = fnA();
}
if (ptr1!=NULL) {
intb = *ptr1;
}
}
Pattern Checker cannot detect that ptr1can be
NULL, we need interproceduralanalysis, e.g.
dataflow analysis.
void function1() {
int* ptr1;
if (<condition>) {
ptr1= function2();
}
else {
ptr1 = fnA();
}
intb = *ptr1;
}
int* function2() {
int* ptr2 = NULL;
int* ptr3;
if (<condition>) {
ptr3= ptr2;
}
else {
ptr3 = fnB();
}
return ptr3;
}
NULL BUG CHECKER [1]
[1] Nicolas Palix, GaëlThomas, Suman Saha, Christophe Calvès, Julia Lawall, and Gilles
Muller,Faults in linux: ten years later,ASPLOS ‘11
[2] John Whaley and Monica S. Lam, Cloning-based context-sensitive pointer alias
analysis using binary decision diagrams, PLDI ‘04
[3] Thomas Reps, Susan Horwitz, and MoolySagiv,Precise interproceduraldataflow
analysis via graph reachability,POPL '95
[4] Manu Sridharanand RastislavBodík, Refinement-based context-sensitive points-
to analysis for Java,PLDI ‘06
Pattern Checker can detect that
ptr1can be NULL.
Absence of this
check
is a bug.
The Scalability Problem
Thescalabilityproblemofinterproceduralanalysisstemsfrom
producingdistinctsolutionsfordifferentcallingcontexts.
sum (a,b)
{
return a + b;
}
i: x=sum(10,1) j: y=sum(2,2)
(11, 4) (11, 4)
sum (a,b)
{
return a + b;
}
i: x=sum(10,1)
(11)
sum (a,b)
{
return a + b;
}
i: x=sum(2,2)
(4)
Context
Sensitive
(via cloning)
Context
In-sensitive
No.ofcallingcontextsgrowsexponentiallywithprogramsize.
Amoderate-sizedprogramcanhave10distinctcontexts.[2]
Non-Parallelizability
ThomasRepsetal.[3]showedmostinterprocedural
analyses,likedataflowanalysisandpointeranalysis,can
betransformedtoagraphreachabilityproblem.
Future Ambitions
Program Version #LOC
Linux 4.4.0-rc5 16M
PostgreSQL 8.3.9 700K
Apache httpd 2.2.18 300K
Our Analysis
14
imprecise
precise
a b c
Queriescanbesolvedusing
dynamictransitivecomputation.
Implementation Difficulty
Problem Graspan Results
Acknowledgements
This work was done under the supervision of GuoqingXu, University of California, Irvine.
Other contributors of this work are Kai Wang, ZhiqiangZuo, and ArdalanAmiriSani.
The work was submitted to OSDI 2016.
85 new bugs in Linux 4.4.0.
Computations took ~2 toless than 12 hrs, largest
graph generated had 1.1B edges.
Use an API to provide a grammar.
GraphChicrashed in 133 secs with 65M edges added.
Example Bug
(missed by
original checker)
Kl1 l2
Our Approach
Our Design
BenefitsHow it works
Find more bugs
on big code
In your machine
Performance-wise
Scalable
Parallelizable
Easy to Implement
Challenges
Preprocessing
Partition the graphs.
Edge-Pair Centric Computation
Analyze pairs of edges, and perform
DTC computation.
Post-Processing
Repartition oversized partitions
according to a threshold.
Duplicate Edges
Overburdens memory.
How to terminate edge
addition process?
Repartitioning
How do we partition the graph,
and then after computation,
how do we repartition
oversized partitions?
GRAMMAR
RULES
G
gives us dataflow info.
and alias info., which
can be fed into checkers
to find more bugs.
Toaddressscalability,practitionersapproximatetheiranalyses,
however,implementingthemiscomplicated.
InSridharan`sandBodik`s[4]work,morethan75%oftheir
entirecodewasdedicatedtotuningtheanalysis.
Mostexistingtechniquesfrequentlyinvolvedecisionmaking
basedoninformationtheydiscoverdynamically.
Extendsystemsupportforpath-sensitiveanalysis,constraint-
basedanalysesbyencodingconstraintsintoedgevalues.
Weimplementedfullycontext-sensitivedataflow
analysisandpointeranalysisontheseprograms:
l1 l2
K