TMPA-2017: Vellvm - Verifying the LLVM

Vellvm: Verifying the LLVM IR
Steve Zdancewic
University of Pennsylvania
TMPA 2017

Collaborators
• Jianzhou Zhao
• Dmitri Garbuzov
• William Mansky
• ChrisGne Rizkallah
• Richard Zhang
• Milo M.K. MarGn
• Santosh NagarakaLe
• Gil Hur
• Jeehon Kang
• Viktor Vafeiadis

The Science of Deep SpeciﬁcaGon

• Andrew Appel (Princeton)
• Adam Chlipala (MIT)
• Zhong Shao (Yale)
• Benjamin Pierce (U. Penn.)
• Stephanie Weirich (U. Penn.)

The Need For High Assurance So[ware
4
heartbleed
car hacking
stuxnet buﬀer overﬂow aLacks

Deep SpeciﬁcaGons

• Rich – expressive descripGon
• Formal – mathemaGcal, machine-checked
• 2-Sided – tested from both sides
• Live – connected to real, executable code
Goal: Advance the reliability, safety, security, and
cost-eﬀecGveness of so[ware (and hardware).

The Coq InteracGve Theorem Prover
• Based on dependent type theory
• Pure funcGonal language + datatypes
• ConstrucGve proofs ⇒ executable code
• AutomaGon: tacGcs + inference
⇒ formalizaGon tool of choice for DeepSpec team

[Developed at INRIA]

DeepSpec: InterconnecGons

LLVM: Low-Level Virtual Machine

LLVM Compiler Infrastructure
LLVM
Front
Ends
Code
Gen/Jit
OpGmizaGons/
TransformaGons
Typed SSA
IR
Analysis
[LaLner et al. ]

MoGvaGon: So[Bound/CETS
• Buﬀer overﬂow vulnerabiliGes.
• Detect spaGal/temporal memory
safety violaGons in legacy C code.
• Implemented as an LLVM pass.
• What about correctness?
[NagarakaLe, et al. PLDI ’09, ISMM ‘10]
hLp://www.cis.upenn.edu/acg/so[bound/

InspiraGon: CompCert
12
[Xavier Leroy INRIA Rocquencourt]
OpGmizing C Compiler:
proved correct end-to-end
with machine-checked proof in Coq
C language
CompCert
Compiler
ISA
rich, formal,
2-sided, live

Does Such VeriﬁcaGon Work?
LLVM
Csmith Tool
Random
test-case generaGon
{8 other C compilers}
+
CompCert
79 bugs
(25 criGcal)
202 bugs
325 bugs in
total
Source
Programs
[Yang et al. PLDI 2011]

YES! VeriﬁcaGon Works
"The striking thing about our CompCert results is that the
middle-end bugs we found in all other compilers are absent.
As of early 2011, the under-development version of CompCert
is the only compiler we have tested for which Csmith cannot
ﬁnd wrong-code errors. This is not for lack of trying: we have
devoted about six CPU-years to the task. The apparent
unbreakability of CompCert supports a strong argument that
developing compiler opEmizaEons within a proof framework,
where safety checks are explicit and machine-checked, has
tangible beneﬁts for compiler users."
– Regehr et. al 2011

LLVM Compiler Infrastructure
LLVM
Front
Ends
Code
Gen/Jit
OpGmizaGons/
TransformaGons
Typed SSA
IR
Analysis
[LaLner et al. ]

LLVM Compiler Infrastructure
LLVM
Front
Ends
Code
Gen/Jit
OpGmizaGons/
TransformaGons
Typed SSA
IR
Analysis
[LaLner et al.]

The Vellvm Project
OpGmizaGons/
TransformaGons
Typed SSA
IR
Analysis
• Formal semanGcs
• FaciliGes for creaGng
simulaGon proofs
• Implemented in Coq
• Extract passes for use
with LLVM compiler
• Example: veriﬁed
memory safety
instrumentaGon
[Zhao et al. POPL 2012, CPP 2012, PLDI 2013]

Vellvm Framework
Transform
C Source
Code
Other
OpGmizaGons
LLVM
IR
LLVM
IR
Target
LLVM
OCaml Bindings
Printer Parser
Coq
Syntax
OperaGonal
SemanGcs
Memory
Model
Type System
and SSA
Proof Techniques & Metatheory
Extract

Vellvm Framework
C Source
Code
Other
OpGmizaGons
LLVM
IR
LLVM
IR
Target
LLVM
OCaml Bindings
Printer Parser
Coq
Syntax
OperaGonal
SemanGcs
Memory
Model
Type System
and SSA
Proof Techniques & Metatheory
Extract
Veriﬁed
Transform

Plan
• Tour of the LLVM IR
• Vellvm infrastructure
– OperaGonal SemanGcs
– SSA Metatheory + Proof Techniques
• Case studies:
– So[Bound memory safety
– mem2reg
• Conclusion

LLVM IR by Example
entry:
r
0
= ...

r
1
= ...

r
2
= ...

Control-ﬂow Graphs:
+ Labeled blocks

exit:
r
7
= ...
r
8
= r
1
x r
2
r
9
= r
7
+ r
8

loop:
r
3
= ...
r
4
= r
1
x r
2
r
5
= r
3
+ r
4
r
6
= r
5
≥ 100

LLVM IR by Example
entry:
r
0
= ...

r
1
= ...

r
2
= ...

Control-ﬂow Graphs:
+ Labeled blocks
+ Binary OperaGons

exit:
r
7
= ...
r
8
= r
1
x r
2
r
9
= r
7
+ r
8

loop:
r
3
= ...
r
4
= r
1
x r
2
r
5
= r
3
+ r
4
r
6
= r
5
≥ 100

LLVM IR by Example
entry:
r
0
= ...

r
1
= ...

r
2
= ...

br r
0
loop exit
Control-ﬂow Graphs:
+ Labeled blocks
+ Binary OperaGons
+ Branches/Return

exit:
r
7
= ...
r
8
= r
1
x r
2
r
9
= r
7
+ r
8
ret r
9
loop:
r
3
= ...
r
4
= r
1
x r
2
r
5
= r
3
+ r
4
r
6
= r
5
≥ 100
br r
6
loop exit

LLVM IR by Example
entry:
r
0
= ...

r
1
= ...

r
2
= ...

br r
0
loop exit
Control-ﬂow Graphs:
+ Labeled blocks
+ Binary OperaGons
+ Branches/Return
+ StaGc Single Assignment

(each variable assigned
only once, staGcally)

exit:
r
7
= ...
r
8
= r
1
x r
2
r
9
= r
7
+ r
8
ret r
9
loop:
r
3
= ...
r
4
= r
1
x r
2
r
5
= r
3
+ r
4
r
6
= r
5
≥ 100
br r
6
loop exit

LLVM IR by Example
entry:
r
0
= ...

r
1
= ...

r
2
= ...

br r
0
loop exit
Control-ﬂow Graphs:
+ Labeled blocks
+ Binary OperaGons
+ Branches/Return
+ StaGc Single Assignment
+ φ nodes

exit:
r
7
= φ[0;entry][r
5
;loop]
r
8
= r
1
x r
2
r
9
= r
7
+ r
8
ret r
9
loop:
r
3
= φ[0;entry][r
5
;loop]
r
4
= r
1
x r
2
r
5
= r
3
+ r
4
r
6
= r
5
≥ 100
br r
6
loop exit

LLVM IR by Example
entry:
r
0
= ...

r
1
= ...

r
2
= ...

br r
0
loop exit
Control-ﬂow Graphs:
+ Labeled blocks
+ Binary OperaGons
+ Branches/Return
+ StaGc Single Assignment
+ φ nodes

(choose values based
on predecessor blocks)

exit:
r
7
= φ[0;entry][r
5
;loop]
r
8
= r
1
x r
2
r
9
= r
7
+ r
8
ret r
9
loop:
r
3
= φ[0;entry][r
5
;loop]
r
4
= r
1
x r
2
r
5
= r
3
+ r
4
r
6
= r
5
≥ 100
br r
6
loop exit

(UnopGmized) LLVM IR Code
27
example.c
define i32 @factorial(i32 %n) nounwind uwtable ssp {
entry:
%1 = alloca i32, align 4
%acc = alloca i32, align 4
store i32 %n, i32* %1, align 4
store i32 1, i32* %acc, align 4
br label %start
start: ; preds = %entry, %else
%3 = load i32* %1, align 4
%4 = icmp ugt i32 %3, 0
br i1 %4, label %then, label %else
then: ; preds = %start
%6 = load i32* %acc, align 4
%7 = load i32* %1, align 4
%8 = mul i32 %6, %7
store i32 %8, i32* %acc, align 4
%9 = load i32* %1, align 4
%10 = sub i32 %9, 1
store i32 %10, i32* %1, align 4
br label %start
else: ; preds = %start
%12 = load i32* %acc, align 4
ret i32 %12
}
example.ll
unsigned factorial(unsigned n) {
unsigned acc = 1;
while (n > 0) {
acc = acc * n;
n = n -1;
}
return acc;
}

Plan
• Tour of the LLVM IR
• Vellvm infrastructure
– OperaGonal SemanGcs
– SSA Metatheory + Proof Techniques
• Case studies:
– So[Bound memory safety
– mem2reg
• Conclusion

LLVM IR SemanGcs
SSA CFG ≈ funcGonal program
+
• Types & Memory Layout
– structured, recursive types
– type-directed projecGon
– type casts
• Eﬀects
– structured heap load/store
– system calls (I/O)
– nondeterminism
[Appel 1998]
We know how
to model this
and prove
properGes about
the models.

LLVM’s memory model
• Manipulate structured types.
%ST = type {i10,[10 x i8*]}
i10
i8*
i8*
i8*
i8*
i8*
i8*
i8*
i8*
i8*
i8*
High-level
RepresentaGon
%val = load %ST* %ptr
…
store %ST* %ptr, %new

LLVM’s memory model
• Manipulate structured types.
• SemanGcs is given in terms of
byte-oriented low-level
memory.
– padding & alignment
– physical subtyping
%ST = type {i10,[10 x i8*]}
b(10, 136)
0
b(10, 2)
1
uninit
2
uninit
3
ptr(Blk32,0,0)
4
ptr(Blk32,0,1)
5
ptr(Blk32,0,2)
6
ptr(Blk32,0,3)
7
ptr(Blk32,8,0)
8
ptr(Blk32,8,1)
9
ptr(Blk32,8,2)
10
ptr(Blk32,8,3)
11
…
12
…
…
i10
i8*
i8*
i8*
i8*
i8*
i8*
i8*
i8*
i8*
i8*
High-level
RepresentaGon
Low-level
RepresentaGon
%val = load %ST* %ptr
…
store %ST* %ptr, %new

Dynamic Physical Subtyping
b(10, 136)
0
b(10, 2)
1
uninit
2
uninit
3
ptr(Blk32,0,0)
4
ptr(Blk32,0,1)
5
ptr(Blk32,0,2)
6
ptr(Blk32,0,3)
7
ptr(Blk32,8,0)
8
ptr(Blk32,8,1)
9
ptr(Blk32,8,2)
10
ptr(Blk32,8,3)
11
…
12
…
…
Blk0 Blk1 Blk32
b(16, 1)
0
b(16, 0)
1
uninit
2
uninit
3
uninit
4
uninit
5
uninit
6
uninit
7
ptr(Blk1,0,0)
8
ptr(Blk1,0,1)
9
ptr(Blk1,0,2)
10
ptr(Blk1,0,3)
11
…
12
…
…
i10
load i16* 
⇒ 1
✓
load i16* 
⇒ undef
✗
[Nita, et al. POPL ’08]

Fatal Errors Target-dependent Results
Sources of Undeﬁned Behavior
• UniniGalized variables:

• UniniGalized memory:

• Ill-typed memory usage

• Out-of-bounds accesses
• Access dangling
pointers
• Free invalid pointers

• Invalid indirect calls
%v = add i32 %x, undef
%ptr = alloca i32
%v = load (i32*) %ptr
Nondeterminism Stuck States

Target-dependent Results
Sources of Undeﬁned Behavior
• UniniGalized variables:

• UniniGalized memory:

• Ill-typed memory usage

%v = add i32 %x, undef
%ptr = alloca i32
%v = load (i32*) %ptr
Nondeterminism Stuck States
Stuck(f, σ) = BadFree(f, σ)
˅ BadLoad(f, σ)
˅ BadStore(f, σ)
˅ …
˅ …0
Deﬁned by a predicate on
the program conﬁguraGon.

undef
• What is the value of %y a[er running the following?
• One plausible answer: 0
• Not LLVM’s semanGcs!
(LLVM is more liberal to permit more aggressive opGmizaGons)
%x = or i8 undef, 1
%y = xor i8 %x %x

undef
• ParGally deﬁned values are interpreted
nondeterminisEcally as sets of possible values:
⟦%x⟧= {a or b | a∈⟦i8 undef⟧, b ∈⟦1⟧} 
= {1,3,5,…,255}
⟦%y⟧ = {a xor b | a∈⟦%x⟧, b∈⟦%x⟧}
= {0,2,4,…,254}
%x = or i8 undef, 1
%y = xor i8 %x %x
⟦i8 undef⟧ = {0,…,255}
⟦i8 1⟧ = {1}

LLVM
ND
OperaGonal SemanGcs
• Deﬁne a transiGon relaGon:
f ⊢ σ
1
⟼ σ
2
– f is the program
– σ is the program state: pc, locals(δ), stack, heap
• NondeterminisGc
– δ maps local %uids to sets.
– Step relaGon is nondeterminisGc
• Mostly straigh~orward (given the heap model)
– One wrinkle: phi-nodes exectuted atomically

OperaGonal SemanGcs
Small Step Big Step
NondeterminisGc
DeterminisGc
LLVM
ND

DeterminisGc Reﬁnement
Small Step Big Step
NondeterminisGc
DeterminisGc
LLVM
ND
LLVM
D
∋︎
InstanGate ‘undef’ with default value (0 or null) ⇒ determinisGc.

Big-step DeterminisGc Reﬁnements
Small Step Big Step
NondeterminisGc
DeterminisGc
LLVM
ND
LLVM
D
LLVM
Interp ≈︎
∋︎
BisimulaGon up to “observable events”:
•  external funcGon calls

Big-step DeterminisGc Reﬁnements
[Tristan, et al. POPL ’08, Tristan, et al. PLDI ’09]
Small Step Big Step
NondeterminisGc
DeterminisGc
LLVM
ND
LLVM
D
LLVM
*
DFn
LLVM
*
DB
LLVM
Interp ≈︎≿︎ ≿︎
∋︎
SimulaGon up to “observable events”:
•  useful for encapsulaGng behavior of funcGon calls
•  large step evaluaGon of basic blocks

A Taste of Coq FormalizaGon
…

Plan
• Tour of the LLVM IR
• Vellvm infrastructure
– OperaGonal SemanGcs
– SSA Metatheory + Proof Techniques
• Case studies:
– So[Bound memory safety
– mem2reg
• Conclusion

Reasoning About LLVM Code
How do we prove that a program
transformaGon is correct with respect to the
deﬁned operaGonal semanGcs?
• Safety Invariants (preservaGon and progress)
• SimulaGon techniques

Key SSA Invariant
entry:
r
0
= ...

r
1
= ...

r
2
= ...

br r
0
loop exit
exit:
r
7
= φ[0;entry][r
5
;loop]
r
8
= r
1
x r
2
r
9
= r
7
+ r
8
ret r
9
loop:
r
3
= φ[0;entry][r
5
;loop]
r
4
= r
1
x r
2
r
5
= r
3
+ r
4
r
6
= r
5
≥ 100
br r
6
loop exit
DeﬁniGon of r
2
.
Use of r
2
. Uses of r
2
.

Key SSA Invariant
entry:
r
0
= ...

r
1
= ...

r
2
= ...

br r
0
loop exit
exit:
r
7
= φ[0;entry][r
5
;loop]
r
8
= r
1
x r
2
r
9
= r
7
+ r
8
ret r
9
loop:
r
3
= φ[0;entry][r
5
;loop]
r
4
= r
1
x r
2
r
5
= r
3
+ r
4
r
6
= r
5
≥ 100
br r
6
loop exit
DeﬁniGon of r
2
.
Use of r
2
. Uses of r
2
.
The deﬁniGon of a
variable must dominate
its uses.

Safety ProperGes
• A well-formed program never accesses undeﬁned variables.

• Ini=aliza=on:

• Preserva=on:
• Progress:

If ⊢ f and f ⊢ σ
0
⟼* σ then σ is not stuck.
⊢ f program f is well formed
σ program state
f ⊢ σ ⟼* σ evaluaGon of f
If ⊢ f then wf(f, σ
0
).
If ⊢ f and f ⊢ σ ⟼ σ’ and wf(f, σ) then wf(f, σ’)
If ⊢ f and wf(f, σ) then f ⊢ σ ⟼ σ’

Safety ProperGes
• A well-formed program never accesses undeﬁned variables.

• Ini=aliza=on:

• Preserva=on:
• Progress:

If ⊢ f and f ⊢ σ
0
⟼* σ then σ is not stuck.
⊢ f program f is well formed
σ program state
f ⊢ σ ⟼* σ evaluaGon of f
If ⊢ f then wf(f, σ
0
).
If ⊢ f and f ⊢ σ ⟼ σ’ and wf(f, σ) then wf(f, σ’)
If ⊢ f and wf(f, σ) then done(f,σ) or stuck(f,σ) or f ⊢ σ ⟼ σ’

Well-formed States
entry:
r
0
= ...

r
1
= ...

r
2
= ...

br r
0
loop exit
exit:
r
7
= φ[0;entry][r
5
;loop]
r
8
= r
1
x r
2
r
9
= r
7
+ r
8
ret r
9
loop:
r
3
= φ[0;entry][r
5
;loop]
r
4
= r
1
x r
2
r
5
= r
3
+ r
4
r
6
= r
5
≥ 100
br r
6
loop exit
pc
State σ is:
pc = program counter
δ = local values

Well-formed States
entry:
r
0
= ...

r
1
= ...

r
2
= ...

br r
0
loop exit
exit:
r
7
= φ[0;entry][r
5
;loop]
r
8
= r
1
x r
2
r
9
= r
7
+ r
8
ret r
9
loop:
r
3
= φ[0;entry][r
5
;loop]
r
4
= r
1
x r
2
r
5
= r
3
+ r
4
r
6
= r
5
≥ 100
br r
6
loop exit
pc
State σ is:
pc = program counter
δ = local values

sdom(f,pc) = variable defns.
that strictly dominate pc.

Well-formed States
entry:
r
0
= ...

r
1
= ...

r
2
= ...

br r
0
loop exit
exit:
r
7
= φ[0;entry][r
5
;loop]
r
8
= r
1
x r
2
r
9
= r
7
+ r
8
ret r
9
loop:
r
3
= φ[0;entry][r
5
;loop]
r
4
= r
1
x r
2
r
5
= r
3
+ r
4
r
6
= r
5
≥ 100
br r
6
loop exit
pc
State σ contains:
pc = program counter
δ = local values

sdom(f,pc) = variable defns.
that strictly dominate pc.
wf(f,σ) =
∀r∊sdom(f,pc). ∃v. δ(r) = ⎣v⎦
“All variables in scope
are iniGalized.”

Generalizing Safety
• DeﬁniGon of wf:

• Generalize like this:
• Methodology: for a given P prove three theorems:
IniEalizaEon(P)
PreservaEon(P)
Progress(P)
wf(f,(pc, δ)) = ∀r∊sdom(f,pc). ∃v. δ(r) = ⎣v⎦
wf(f,(pc, δ)) = P f (δ|
sdom(f,pc)
)

where P : Program ⟶ Locals ⟶ Prop
Consider only variables in
scope ⇒ P deﬁned
relaGve to the dominator
tree of the CFG.

InstanGaGng
• For usual safety:
• For semanGc properGes:
• Useful for verifying correctness of:
– code moGon, dead variable eliminaGon, common
expression eliminaGon, etc.
P
safety
f δ = ∀r∊dom(δ). ∃v. δ(r) = ⎣v⎦
P
sem
f δ = ∀r. f[r] = ⎣rhs⎦ ⇒ δ(r) = ⟦rhs⟧
δ

Plan
• Tour of the LLVM IR
• Vellvm infrastructure
– OperaGonal SemanGcs
– SSA Metatheory + Proof Techniques
• Case studies:
– So[Bound memory safety
– mem2reg
• Conclusion

So[Bound
So[Bound
C Source
Code
Other
OpGmizaGons
LLVM
IR
LLVM
IR
Target
• Implemented as an LLVM pass.
• Detect spaGal/temporal memory
safety violaGons in legacy C code.
• Good test case:
– Safety CriGcal ⇒ Proof cost warranted
– Non-trivial Memory transformaGon

So[Bound
So[Bound
C Source
Code
Other
OpGmizaGons
LLVM
IR
LLVM
IR
Target
%p = call malloc [10 x i8]
%q = gep %p, i32 0, i32 255
store i8 0, %q
%p = call malloc [10 x i8]
%p_base = gep %p, i32 0
%p_bound = gep %p, i32 0, i32 10
%q = gep %p, i32 0, i32 255
%q_base = %p_base
%q_bound = %p_bound
assert %q_base <= %q
/ %q+1 < % q_bound
store i8 0, %q
Maintain base and bound for all pointers
Propagate metadata on assignment
Check that a pointer is within its
bounds when being accessed

Disjoint Metadata
• Maintain pointer bounds in a separate memory space.
• Key Invariant: Metadata cannot be corrupted by bounds
violaGon.
User memory

Disjoint metadata
%p %p
base
%p
bound
%i
1
%q %q
base
%q
bound
%i
6
%i
3

Proving So[Bound Correct
1. Deﬁne So[Bound(f,σ) = (f
s
,σ
s
)
– TransformaGon pass implemented in Coq.
2. Deﬁne predicate: MemoryViolaGon(f,σ)
3. Construct a non-standard operaGonal semanGcs:
– Builds in safety invariants “by construcGon”
4. Show that the instrumented code simulates the “correct”
code:
SB
f ⊢ σ ⟼ σ’
SB
f ⊢ σ ⟼* σ’ ⇒ ¬MemoryViolaGon(f,σ’)
So[Bound(f,σ) = (f
s
,σ
s
) ⇒ [f ⊢ σ ⟼* σ’] ≿ [f
s
⊢ σ
s
⟼* σ’
s
]
SB

Lessons About So[Bound
• Found several bugs in our C++ implementaGon
– InteracGon of undef, ‘null’, and metadata
iniGalizaGon.
• SimulaGon proofs suggested a redesign of
So[Bound’s handling of stack pointers.
– Use a “shadow stack”
– Simplify the design/implementaGon
– Signiﬁcantly more robust (e.g. varargs)

0%
50%
100%
150%
200%
250%
Run5me overhead
Extracted
Competitive Runtime Overhead
The performance of extracted SoftBound is competitive
with the non-veriﬁed original

Plan
• Tour of the LLVM IR
• Vellvm infrastructure
– OperaGonal SemanGcs
– SSA Metatheory + Proof Techniques
• Case studies:
– So[Bound memory safety
– mem2reg
• Conclusion

mem2reg in LLVM
Front-ends
w/o SSA
construcGon
The LLVM IR
w/o φ-nodes
mem2reg
•  Promote stack allocas to temporaries
•  Insert minimal φ-nodes
•  imperaGve variables <z stack allocas
•  no φ-nodes
•  trivially in SSA form
Backends
SSA-based
opGmizaGons
The LLVM IR in the
minimal SSA form

mem2reg Example
int x = 0;
if (y > 0)  
x = 1;
return x;
l
1
: %p = alloca i32
store 0, %p
%b = %y > 0
br %b, %l
2
, %l
3

l
2
:
store 1, %p
br %l
3

l
3
:
%x = load %p
ret %x
The LLVM IR in the trivial SSA form

mem2reg Example
int x = 0;
if (y > 0)  
x = 1;
return x;
l
1
: %p = alloca i32
store 0, %p
%b = %y > 0
br %b, %l
2
, %l
3

l
2
:
store 1, %p
br %l
3

l
3
:
%x = load %p
ret %x
The LLVM IR in the trivial SSA form
l
1
:
%b = %y > 0
br %b, %l
2
, %l
3

l
2
:

br %l
3

l
3
:
%x = φ[ 1,%l
2
] [ 0,%l
1
]
ret %x
Minimal SSA a[er mem2reg
mem2reg

mem2reg Algorithm
• Main operaGons
– Phi placement (Lengauer-Tarjan algorithm)
– Renaming of the variables
– Removing loads/stores
• Intermediate stage breaks SSA invariant
– Deﬁning semanGcs & well formedness non-trivial

vmem2reg Algorithm
• Incremental algorithm
• Pipeline of "micro transformaGons"
– Preserves SSA semanGcs
– Preserves well-formedness
See: [Aycock & Horspool 2002.]
max φs
LAS/
LAA
DSE
DAE
elim φ
Find
alloca

How to Establish Correctness?
max φs
LAS/
LAA
DSE
DAE
elim φ
Find
alloca
1. Simple aliasing properGes
(e.g. to determine promotability)
2. InstanGate proof technique for
– SubsGtuGon
– Dead InstrucGon EliminaGon
P
DIE
= …
IniGalize(P
DIE
)
PreservaGon(P
DIE
)
Progress(P
DIE
)
4. Put it all together to prove
composiGon of “pipeline”
correct.

Aliasing
ProperGes
subst
DIE

vmem2reg is Correct
Theorem: The vmem2reg algorithm
preserves the semanGcs of the source
program.
Proof:
ComposiGon of simulaGon relaGons from the “mini”
transformaGons, each built using instances of the sdom
proof technique.
(See Coq Vellvm development.) □

RunGme overhead of veriﬁed mem2reg
0%
20%
40%
60%
80%
100%
120%
140%
160%
180%
200%
sjeng
go
compress
ijpeg gzip vpr
mesa
art
ammp equake
libquantum
lbm milc
bzip2 parser twolf
mcf h264
Geo.mean
Speedup Over LLVM-O0
LLVM's mem2reg Extracted mem2reg
Vmem2reg: 77% LLVM’s mem2reg: 81%
(LLVM’s mem2reg promotes allocas used by intrinsics)

Plan
• Tour of the LLVM IR
• Vellvm infrastructure
– OperaGonal SemanGcs
– SSA Metatheory + Proof Techniques
• Case studies:
– So[Bound memory safety
– mem2reg
• Conclusion

Ongoing Work
• Modular SemanGcs
– Factor out memory model [CAV 15]
– Linking/separate compilaGon
• For:
– more extensibility/robustness to changes
– verifying more analyses and opGmizaGons / program transformaGons
– support for (relaxed) concurrency
– beLer support for casts [PLDI 15]

LLVM SSA
core IR
Memory
Model / IO
Concurrency

• Deep SpeciﬁcaGons
– rich, formal, 2-sided, live
• Layers of abstracGon
– Layer Calculus in CerGKOS [Shao et al.]
– Good for proofs!
– Bad for performance?
– ImplicaGons for theory / proof engineering?
• ComposiGonal speciﬁcaGon
– ComposiGonal CompCert [Stewart, et al. PLDI 15]
• So[ware Engineering ⇒ Proof Engineering
– Coq development methodology [CPDT: Chlipala]
What engineering principles enable
large-scale deep specifications?

Conclusions
• Proof techniques for verifying LLVM transformaGons
• Veriﬁed:
– So[bound & vmem2reg
– Similar performance to naGve implementaGons
• Future:
– IntegraGon with other DeepSpec projects
[hLp://www.cis.upenn.edu/~stevez/vellvm/]

TMPA-2017: Vellvm - Verifying the LLVM

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

TMPA-2017: Vellvm - Verifying the LLVM

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Tags

Categories

Download