Clang: More than just a C/C++ Compiler

SamsungOSG 19,454 views 45 slides Oct 06, 2016
Slide 1
Slide 1 of 45
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45

About This Presentation

Tilmann Scheller explains why Clang is a great tool for any developer that works in C or C++


Slide Content

1
Samsung Open Source Group
Clang:
Much more than just a C/C++ Compiler
Tilmann Scheller
Principal Compiler Engineer
[email protected]
Samsung Open Source Group
Samsung Research UK
LinuxCon Europe 2016
Berlin, Germany, October 4 – 6, 2016

2
Samsung Open Source Group
Overview
●Introduction
●LLVM Overview
●Clang
●Summary

3
Samsung Open Source Group
Introduction

4
Samsung Open Source Group
What is LLVM?
●Mature, production-quality compiler framework
●Modular architecture
●Heavily optimizing static and dynamic compiler
●Supports all major architectures (x86, ARM, MIPS,
PowerPC, …)
●Powerful link-time optimizations (LTO)
●Permissive license (BSD-like)

5
Samsung Open Source Group
LLVM sub-projects
●Clang
C/C++/Objective C frontend and static analyzer
●LLDB
Next generation debugger leveraging the LLVM libraries, e.g. the Clang expression
parser
●lld
Framework for creating linkers, will make Clang independent of the system linker in
the future
●Polly
Polyhedral optimizer for LLVM, e.g. high-level loop optimizations and data-locality
optimizations

6
Samsung Open Source Group
Which companies are contributing?
®

7
Samsung Open Source Group
Who is using LLVM?
●Rust
●Android (NDK, RenderScript)
●Portable NativeClient (PNaCl)
●Majority of OpenCL implementations based on
Clang/LLVM
●CUDA
●LLVM on Linux: LLVMLinux, LLVMpipe (software
rasterizer in Mesa), AMDGPU drivers in Mesa

8
Samsung Open Source Group
Clang users
●Default compiler on macOS
●Default compiler on FreeBSD
●Default compiler for native applications on Tizen
●Default compiler on OpenMandriva Lx 3.0
●Debian experimenting with Clang as an additional
compiler (94.4% of ~24.5k packages successfully built with Clang 3.8.1)
●Android NDK defaults to Clang

9
Samsung Open Source Group
LLVM Overview

10
Samsung Open Source Group
LLVM
●LLVM IR (Intermediate Representation)
●Scalar optimizations
●Interprocedural optimizations
●Auto-vectorizer (BB, Loop and SLP)
●Profile-guided optimizations

11
Samsung Open Source Group
Compiler architecture
C Frontend
C++ Frontend
Fortran Frontend
Optimizer
x86 Backend
ARM Backend
MIPS Backend

12
Samsung Open Source Group
Compilation steps
●Many steps involved in the translation from C source code to machine code:
–Frontend:
●Lexing, Parsing, AST (Abstract Syntax Tree) construction
●Translation to LLVM IR
–Middle-end
●Target-independent optimizations (Analyses & Transformations)
–Backend:
●Translation into a DAG (Directed Acyclic Graph)
●Instruction selection: Pattern matching on the DAG
●Instruction scheduling: Assigning an order of execution
●Register allocation: Trying to reduce memory traffic

13
Samsung Open Source Group
LLVM Intermediate Representation
●The representation of the middle-end
●The majority of optimizations is done at LLVM IR level
●Low-level representation which carries type information
●RISC-like three-address code in static single assignment
form (SSA) with an infinite number of virtual registers
●Three different formats: bitcode (compact on-disk format),
in-memory representation and textual representation
(LLVM assembly language)

14
Samsung Open Source Group
LLVM IR Overview
●Arithmetic: add, sub, mul, udiv, sdiv, ...
–%tmp = add i32 %indvar, -512
●Logical operations: shl, lshr, ashr, and, or, xor
–%shr21 = ashr i32 %mul20, 8
●Memory access: load, store, alloca, getelementptr
–%tmp3 = load i64* %tmp2
●Comparison: icmp, select
–%cmp12 = icmp slt i32 %add, 1024
●Control flow: call, ret, br, switch, ...
–call void @foo(i32 %phitmp)
●Types: integer, floating point, vector, structure, array, ...
–i32, i342, double, <4 x float>, {i8, <2 x i16>}, [40 x i32]

15
Samsung Open Source Group
Target-independent code generator
●Part of the backend
●Domain specific language to describe the instruction set,
register file, calling conventions (TableGen)
●Pattern matcher is generated automatically
●Backend is a mix of C++ and TableGen
●Usually generates assembly code, direct machine code
emission is also possible

16
Samsung Open Source Group
Clang

17
Samsung Open Source Group
Clang
●Goals:
–Fast compile time
–Low memory usage
–GCC compatibility
–Expressive diagnostics
●Several tools built on top of Clang:
–Clang static analyzer
–clang-format, clang-tidy

18
Samsung Open Source Group
Clang Static Analyzer
●Part of Clang
●Tries to find bugs without executing the program
●Slower than compilation
●False positives
●Source annotations
●Works best on C code
●Runs from the commandline (scan-build), web interface
for results

19
Samsung Open Source Group
Clang Static Analyzer
●Core Checkers
●C++ Checkers
●Dead Code Checkers
●Security Checkers
●Unix Checkers

20
Samsung Open Source Group
Clang Static Analyzer

21
Samsung Open Source Group
Clang Static Analyzer

22
Samsung Open Source Group
Clang Static Analyzer - Example
...

23
Samsung Open Source Group
Clang Static Analyzer - Example

24
Samsung Open Source Group
Clang Static Analyzer - Example

25
Samsung Open Source Group
clang-format
●Automatic code formatting
●Consistent coding style is important
●Developers spend a lot of time on code formatting (e.g.
requesting trivial formatting changes in reviews)
●Supports different coding conventions (~80 settings)
●Includes configurations for LLVM, Google, Chromium,
Mozilla and WebKit coding conventions

26
Samsung Open Source Group
clang-format
●Once the codebase is "clang-format clean" the coding
conventions can be enforced automatically
●Simplifies reformatting after automated refactorings
●Uses the Clang lexer
●Supports the following programming languages: C/C++,
Java, JavaScript, Objective-C and Protobuf

27
Samsung Open Source Group
clang-tidy
●Detect bug prone coding patterns
●Enforce coding conventions
●Advocate modern and maintainable code
●Checks can be more expensive than compilation
●Currently 136 different checks
●Can run static analyzer checks as well

28
Samsung Open Source Group
Sanitizers
●LLVM/Clang-based Sanitizer projects:
–AddressSanitizer – Fast memory error detector
–ThreadSanitizer – Detects data races
–LeakSanitizer – Memory leak detector
–MemorySanitizer – Detects reads of uninitialized variables
–UBSanitizer – Detects undefined behavior

29
Samsung Open Source Group
AddressSanitizer: Stack Buffer Overflow
int main(int argc, char **argv) {
int stack_array[100];
stack_array[1] = 0;
return stack_array[argc + 100] ;
}
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html
$ clang++ -O1 -fsanitize=address a.cc; ./a.out
==10589== ERROR: AddressSanitizer stack-buffer-overflow
READ of size 4 at 0x7f5620d981b4 thread T0
#0 0x4024e8 in main a.cc:4
Address 0x7f5620d981b4 is located at offset 436 in frame
<main> of T0's stack:
This frame has 1 object(s):
[32, 432) 'stack_array'

30
Samsung Open Source Group
AddressSanitizer: Use-After-Free
int main(int argc, char **argv) {
int *array = new int[100] ;
delete [] array;
return array[argc];
}
$ clang++ -O1 -fsanitize=address a.cc && ./a.out
==30226== ERROR: AddressSanitizer heap-use-after-free
READ of size 4 at 0x7faa07fce084 thread T0
#0 0x40433c in main a.cc:4
0x7faa07fce084 is located 4 bytes inside of 400-byte region
freed by thread T0 here:
#0 0x4058fd in operator delete[](void*) _asan_rtl_
#1 0x404303 in main a.cc:3
previously allocated by thread T0 here:
#0 0x405579 in operator new[](unsigned long) _asan_rtl_
#1 0x4042f3 in main a.cc:2
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html

31
Samsung Open Source Group
AddressSanitizer: Stack-Use-After-Return
int main() {
LeakLocal();
return *g;
}
$ clang++ -g -fsanitize=address a.cc
$ ASAN_OPTIONS=detect_stack_use_after_return=1 ./a.out
==19177==ERROR: AddressSanitizer: stack-use-after-return
READ of size 4 at 0x7f473d0000a0 thread T0
#0 0x461ccf in main a.cc:8
Address is located in stack of thread T0 at offset 32 in frame
#0 0x461a5f in LeakLocal() a.cc:2
This frame has 1 object(s):
[32, 36) 'local' <== Memory access at offset 32
int *g;
void LeakLocal() {
int local;
g = &local;
}
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html

32
Samsung Open Source Group
MemorySanitizer: Uninitialized Data
int main(int argc, char **argv) {
int x[10];
x[0] = 1;
return x[argc];
}
$ clang -fsanitize=memory a.c -g; ./a.out
WARNING: Use of uninitialized value
#0 0x7f1c31f16d10 in main a.cc:4
Uninitialized value was created by an
allocation of 'x' in the stack frame of
function 'main'
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html

33
Samsung Open Source Group
UBSanitizer: Integer Overflow
int main(int argc, char **argv) {
int t = argc << 16;
return t * t;
}
$ clang -fsanitize=undefined a.cc -g; ./a.out
a.cc:3:12: runtime error:
signed integer overflow: 65536 * 65536
cannot be represented in type 'int'
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html

34
Samsung Open Source Group
UBSanitizer: Invalid Shift
int main(int argc, char **argv) {
return (1 << (32 * argc)) == 0;
}
$ clang -fsanitize=undefined a.cc -g; ./a.out
a.cc:2:13: runtime error: shift exponent 32 is
too large for 32-bit type 'int'
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html

35
Samsung Open Source Group
LibFuzzer
●Coverage-guided fuzz testing
●Coverage data provided by SanitizerCoverage (very low
overhead, tracking of function-level coverage causes no
measurable overhead)
●Best used in combination with the different Sanitizers
●LLVM project has bots which are fuzzing clang-format
and Clang continuously

36
Samsung Open Source Group
IDEs/code browsers using Clang
●Code browsers:
–SourceWeb
–Woboq Code Browser
–Coati
–Doxygen
●Editor plugins:
–YouCompleteMe
–clang_complete
–rtags
●IDEs:
–KDevelop
–Qt Creator
–CodeLite
–Geany

37
Samsung Open Source Group
Summary

38
Samsung Open Source Group
Summary
●Great compiler infrastructure
●Fast C/C++ compiler with expressive diagnostics
●Bug detection at compile time
●Automated formatting of code
●Detect bugs early with Sanitizers
●Highly accurate source code browsing, code completion

39
Samsung Open Source Group
Give it a try!
●Visit llvm.org
●Distributions with Clang/LLVM packages:
–Fedora
–Debian/Ubuntu
–openSUSE
–Arch Linux
–...and many more

Thank you.
40
Samsung Open Source Group

41
Samsung Open Source Group
Contact Information:
Tilmann Scheller
[email protected]
Samsung Open Source Group
Samsung Research UK

42
Samsung Open Source Group
Example
zx = zy = zx2 = zy2 = 0;
for (; iter < max_iter && zx2 + zy2 < 4; iter++) {
zy = 2 * zx * zy + y;
zx = zx2 - zy2 + x;
zx2 = zx * zx;
zy2 = zy * zy;
}

43
Samsung Open Source Group
Example
zx = zy = zx2 = zy2 = 0;
for (; iter < max_iter && zx2 + zy2 < 4; iter++) {
zy = 2 * zx * zy + y;
zx = zx2 - zy2 + x;
zx2 = zx * zx;
zy2 = zy * zy;
}
loop:
%zy2.06 = phi double [ %8, %loop ], [ 0.000000e+00, %preheader ]
%zx2.05 = phi double [ %7, %loop ], [ 0.000000e+00, %preheader ]
%zy.04 = phi double [ %4, %loop ], [ 0.000000e+00, %preheader ]
%zx.03 = phi double [ %6, %loop ], [ 0.000000e+00, %preheader ]
%iter.02 = phi i32 [ %9, %loop ], [ 0, %.lr.ph.preheader ]
%2 = fmul double %zx.03, 2.000000e+00
%3 = fmul double %2, %zy.04
%4 = fadd double %3, %y
%5 = fsub double %zx2.05, %zy2.06
%6 = fadd double %5, %x
%7 = fmul double %6, %6
%8 = fmul double %4, %4
%9 = add i32 %iter.02, 1
%10 = icmp ult i32 %9, %max_iter
%11 = fadd double %7, %8
%12 = fcmp olt double %11, 4.000000e+00
%or.cond = and i1 %10, %12
br i1 %or.cond, label %loop, label %loopexit

44
Samsung Open Source Group
Example
loop:
// zx = zy = zx2 = zy2 = 0;
%zy2.06 = phi double [ %8, %loop ], [ 0.000000e+00, %preheader ]
%zx2.05 = phi double [ %7, %loop ], [ 0.000000e+00, %preheader ]
%zy.04 = phi double [ %4, %loop ], [ 0.000000e+00, %preheader ]
%zx.03 = phi double [ %6, %loop ], [ 0.000000e+00, %preheader ]
%iter.02 = phi i32 [ %9, %loop ], [ 0, %preheader ]
// zy = 2 * zx * zy + y;
%2 = fmul double %zx.03, 2.000000e+00
%3 = fmul double %2, %zy.04
%4 = fadd double %3, %y
// zx = zx2 - zy2 + x;
%5 = fsub double %zx2.05, %zy2.06
%6 = fadd double %5, %x
// zx2 = zx * zx;
%7 = fmul double %6, %6
// zy2 = zy * zy;
%8 = fmul double %4, %4
// iter++
%9 = add i32 %iter.02, 1
// iter < max_iter
%10 = icmp ult i32 %9, %max_iter
// zx2 + zy2 < 4
%11 = fadd double %7, %8
%12 = fcmp olt double %11, 4.000000e+00
// &&
%or.cond = and i1 %10, %12
br i1 %or.cond, label %loop, label %loopexit
zx = zy = zx2 = zy2 = 0;
for (;
iter < max_iter
&& zx2 + zy2 < 4;
iter++) {
zy = 2 * zx * zy + y;
zx = zx2 - zy2 + x;
zx2 = zx * zx;
zy2 = zy * zy;
}

45
Samsung Open Source Group
Example
.LBB0_2:
@ d17 = 2 * zx
vadd.f64 d17, d12, d12
@ iter < max_iter
cmp r1, r0
@ d17 = (2 * zx) * zy
vmul.f64 d17, d17, d11
@ d18 = zx2 - zy2
vsub.f64 d18, d10, d8
@ d12 = (zx2 – zy2) + x
vadd.f64 d12, d18, d0
@ d11 = (2 * zx * zy) + y
vadd.f64 d11, d17, d9
@ zx2 = zx * zx
vmul.f64 d10, d12, d12
@ zy2 = zy * zy
vmul.f64 d8, d11, d11
bhs .LBB0_5
@ BB#3:
@ zx2 + zy2
vadd.f64 d17, d10, d8
@ iter++
adds r1, #1
@ zx2 + zy2 < 4
vcmpe.f64 d17, d16
vmrs APSR_nzcv, fpscr
bmi .LBB0_2
b .LBB0_5
zx = zy = zx2 = zy2 = 0;
for (;
iter < max_iter
&& zx2 + zy2 < 4;
iter++) {
zy = 2 * zx * zy + y;
zx = zx2 - zy2 + x;
zx2 = zx * zx;
zy2 = zy * zy;
}