Code Optimization
Consideration for Optimization, Scope of Optimization, Optimization Techniques, Flow graph
1
•Compilerfront-end:lexicalanalysis,syntaxanalysis,semanticanalysis
Tasks:understandingthesourcecode,makingsurethesourcecodeis
writtencorrectly
•Compilerback-end:Intermediatecodegeneration/improvement,and
Machinecodegeneration/improvement
Tasks:translatingtheprogramtoasemanticallythesameprogram(in
adifferentlanguage).
Design Of a Compiler
2
What is Code Optimization?
•Optimizationisaprogramtransformationtechnique,whichtriestoimprovethecodethat
consumelessresources(i.e.CPU,Memory)anddeliverhighspeed.
•Inoptimization,high-levelgeneralprogrammingconstructsarereplacedbyveryefficientlow-
levelprogrammingcodes.
Acodeoptimizingprocessmustfollowthethreerulesgivenbelow:
1.Theoutputcodemustnotchangethemeaningoftheprograminanyway.
Shouldnotchangetheoutputproducedforanyinput
Shouldnotintroduceanerror
2.Optimizationshouldincreasethespeedoftheprogramandifpossible,theprogramshould
demandlessresources.
3.Optimizationshoulditselfbefastandshouldnotdelaytheoverallcompilingprocess.
3
•SourceCode:
-Algorithmstransformationcanproducespectacularimprovements
•IntermediateCode:
-Compilercanimproveloops,procedurecallsandaddresscalculations
-Typicallyonlyoptimizingcompilersincludethisphase
•TargetCode:
-Compilerscanuseregistersefficiently
Optimizedcode’s features:
•Executesfaster
•Code size getreduced
•Efficient memoryusage
•Yielding betterperformance
•Reduces the time and space complexity
4
Improvements can be made at various phases:
Organization of an optimizing compiler
5
Control
flow
analysis
Data flow
analysis
Transformation
Code optimizer
Intermediate Code
Generator
Target Code Generator
•Flow analysis is a fundamental prerequisite for many important types of code improvement.
•Generally control flow analysis precedes data flow analysis.
•Control flow analysis (CFA) represents flow of control usually in form of graphs, CFA
constructs such as
•control flow graph -graphical representation ofcontrol flowor computation during the execution.
•Call graph -represents calling relationships between subroutines.
•Data flow analysis (DFA) is the process of asserting and collecting information prior to
program execution about the possible modification, preservation, and use of certain entities
(such as values or attributes of variables) in a computer program.
6
Flow analysis -Organization of an optimizing compiler
Basic Blocks
Basic blocks
•Basic blocks are sequences of intermediate code in which flow of control enters at the beginning and leaves at the
end without any halt or possibility of branching except at the end.
Example
•Sequence of three-address statements forms a basic block:
t1 : = a * a
t2 : = a * b
t3 : = 2 * t2
t4 : = t1 + t3
t5 : = b * b
t6 : = t4 + t5
•Basic blocks are represented as directed acyclic blocks(DAGs), which are in turn represented using the value-
numbering method applied on quadruples
•Optimizationis done on these basicblocks
A basic block begins in one of the followingways:
•the entry point into thefunction.
•the target of a branch (can be alabel)
•the instruction immediately following a branch or areturn
A basic blockends in any of the following ways:
•a jumpstatement
•a conditional or unconditionalbranch
•a returnstatement
Basic Block Representation
•A basic block is represented as a recordconsisting of
1.a count of the number of quadruplesin the block
2.a pointer to the leaderof the block which is
1.first statement of basic blocks
2.Any statement that is the target of a conditional or unconditional gotois a leader.
3.Any statement that immediately follows a gotoor conditional gotostatement is a leader.
3.pointers to the predecessorsof the block –Previous block of current block
4.pointers to the successorsof the block –Next block of current block
•Note that jump statements point to basic blocks and not quadruples so as to make code movement
easy
9
Basic Block Example
{
prod =0;
i=1;
do
{
prod =prod+ a[i] * b[i];
i=i+1;
}
while i< = 20
}
10
•Source code for dot product of two vectors a and
b of length 20 prod = 0
i= 1
L1: t1 = 4* i
t2 = a[t1] /*compute a[i] */
t3 = 4* i
t4 = b[t3] /*compute b[i] */
t5 = t2*t4
t6 = prod+t5
prod = t6
t7 = i+1
i= t7
if i<=20 gotoL1
•Three Address code of source code
CONTROL FLOW GRAPH (CFG)
•A graph representation of three-address statements, called a flow graph.
•The nodes of the CFG are basic blocks
•Nodes in the flow graph represent computations, and the edges represent the flow of control.
•Graph showing the different possible paths of programflow.
•CFG is constructed by dividing the code into basicblocks
•Flow graphs show control flow among basicblocks.
•It is useful for understanding code-generation algorithms, even if the graph is not explicitly
constructed by a code-generation algorithm.
Global Data Flow Analysis
•Collect information about the whole program.
•Distribute the information to each block in the flow graph.
•Data flow information: Information collected by data flow analysis.
•Data flow equations: A set of equations solved by data flow analysis to gather data flow information.
•Data flow analysis should never tell us that a transformation is safe.
•When doing data flow analysis we must be
•Conservative
•Do not consider information that may not preserve the behavior of the program
•Aggressive
•Try to collect information that is as exact as possible, so we can get the greatest benefit from
the optimizations.
13
Global Iterative Data Flow Analysis
•Global:
•Performed on the flow graph
•Goalis to collect information at the beginning and end of each basic block
•Iterative:
•Construct data flow equations that describe how information flows through each basic block and solve
them by iteratively converging on a solution.
•Components of data flow equations
•Sets containing collected information
•in set: information coming into the BB from outside (following flow of data)
•gen set: information generated/collected within the BB
•kill set: information that, due to action within the BB, will affect what has been collected outside
the BB
•out set: information leaving the BB
•Functions (operations on these sets)
•Transfer functionsdescribe how information changes as it flows through a basic block
•Meet functionsdescribe how information from multiple paths is combined.
14
•ACodeoptimizerlocatesbetweentheintermediatecodegeneratorandcode
generator.
–Candocontrolflowanalysis.
–Candodataflowanalysis.
–Doestransformationstoimprovetheintermediatecode.
15
The Golden Rules of Optimization
The 80/20 Rule
In general, 80% percent of a program’s execution time is spent executing 20%
of the code
Major Classifications of Code Optimization techniques
Machine Dependent Optimization
Machine dependent optimizations are based on register allocation and utilization of special machine-instruction sequences.
It involves CPU registers and may have absolute memory references rather than relative references.
Peephole optimization
Register Allocation and Instruction Selection (Special Hardware features)
Machine Independent Optimization
It is a program transformation that improve the target code without taking into consideration any properties (CPU registers
and/or absolute memory locations) of the target machine.
Local optimizations
Global Optimizations –Data flow analysis
Inter-procedural optimizations
Loop optimization
16
Scope OfOptimization
•Peephole analysis
•Within one or a few instructions
•Local analysis
•Within a basic block
•Global analysis
•Entire procedure or within a certain scope
•Inter-procedural analysis
•Beyond a procedure, consider the entire program
17
Classification ofoptimization
There are primarily 3 types ofoptimizations:
(1)Localoptimization
•Apply to a basic block inisolation
(2)Global optimization
•Apply across basic blocks
(3)peep-holeoptimization
•Apply acrossboundaries
Most compilers do (1), many do (2)and very few do(3)
18
ConstantFolding
•Evaluate constant expressions at compile time.
20
c:= 1 + 3 c:= 4
!true false
Example:
In the code fragment below, the expression (3 + 5) can be evaluated at compile time and replaced with the constant 8.
intf ()
{
return 3 + 5;
}
Below is the code fragment afterconstant folding.
intf ()
{
return 8;
}
Expressionswithconstantoperandscanbe
evaluatedatcompiletime,thusimprovingrun-time
performanceandreducingcodesizebyavoiding
evaluationatrun-time.
ConstantPropagation
•Variables that have constant value, e.g. b:= 3
•Later uses of bcan be replaced by the constant, if no change of b inbetween.
21
Example:
•In the code fragment below, the value of xcan be propagated to the use of x.
x = 3;
y = x + 4;
•Below is the code fragment afterconstant propagation and constant folding.
x = 3;
y = 7;
b := 3
c := 1 + b
d := b+ c
b := 3
c := 1 + 3
d := 3+ c
Constants assigned to a variable can be propagated through the flow graph and substituted at the use of the
variable.
Algebraic Simplification
•Use algebraic properties to simplify expressions
•Some expressions can be simplified by replacing them with an equivalent expression that is more efficient.
22
-(-i) i
Example:
The code fragment below contains expressions that can be simplified.
void f (inti)
{
a[0] = i+ 0;
a[1] = i* 0;
a[2] = i-i;
a[3] = 1 + i+ 1;
}
Below is the code fragment after expression simplification.
void f (inti)
{
a[0] = i;
a[1] = 0;
a[2] = 0;
a[3] = 2 + i;
}
Operator StrengthReduction
•Replace expensive operations with simpler ones
•Typical cases of strength reduction occurs in address calculation of array references.
•Example: Multiplications replaced by additions ,
23
y := x * 2 y := x + x
Replacement of an operatorwith a less costly one.
temp = 5;
for i=1 to 10 for i=1 to 10
{ {
… …
x = i* 5 x = temp
… …
temp = temp + 5
} }
CopyPropagation
•Given an assignmentx = y, replace later uses of xwith uses of y, provided there are no
intervening assignments to xor y.
•Example
x[i] = a; x[i] = a;
sum = x[i] + a; sum = a + a;
•Example
x := y; s := y * f(y)
s := x * f(x)
24
After y is assigned to x, use y to replace x till x is assigned again reduce the copying.
If y is reassigned in between, then this action cannot be performed.
Dead CodeElimination
•Dead Code are portion of the program which will not be executed in any path of the program. It can be removed
•Examples:
•No control flows into a basic block
•A variable is dead at a point (i.e) its value is not used anywhere in the program
•An assignment is dead (i.e)assignment assigns a value to a dead variable
•Ineffective statements:
x := y + 1 (x is immediately redefined in 3
rd
line without use, therefore eliminate)
y := 5 y := 5
x := 2 * z x := 2 * z
25
•A variable is dead if it is never used after last definition
•Eliminate assignments to dead variables
•Need to do data flow analysis to find dead variables
•Remove code never reached
26
if (false)
{a := 5}
if (false)
{}
Dead CodeElimination
Other types of Local Optimizations
•The following two optimizations can be applied only on DAG or tree representation of
basic block
•Reordering statements that do not depend on one another
•Reordering computations using algebraic laws
Optimization of Basic Blocks
•Many structure preserving transformations can be implemented by construction of DAGs of
basic blocks
27
Example of a Directed Acyclic Graph (DAG)
28
DAGrepresentation of Basic Block (BB)
•Leaves are labeled with unique identifier (variable name or constant)
•Interior nodes are labeled by an operator symbol
•Nodes optionally have a list of labels (identifiers)
•Edges relates operands to the operator (interior nodes are operator)
•Interior node represents computed value
•Identifier in the label are deemed to hold the value
29
Example: DAGfor BB
30
t
1:= 4 * i
t
1
*
i4
t
1:= 4 * i
t
3:= 4 * i
t
2:= t
1+ t
3
*
i
4
+
t
1, t
3
t
2
if (i <= 20)goto L
1
<=
i
20
(L
1)
Construction of DAGs for BB
•Input: Basic block, B
•Output: A DAGfor Bcontaining the following information:
1)A label for each node
2)For leaves the labels are identifiers or constants
3)For interior nodes the labels are operators
Data structure and functions:
•Node:
1)Label: label of the node
2)Left: pointer to the left child node
3)Right: pointer to the right child node
4)List: list of additional labels (empty for leaves)
•Node (id): returns the most recent node created for id. else return undef
•Create(id,l,r): create a node with labelidwith las left child and ras right
child. land rare optional parameters.
31
Algorithm:
For each 3AC, Ain BB
if A is any of the following forms:
1.x= yop z
2.x= op y
3.x= y
1.if ((n
y= node(y)) == undef)
n
y= Create (y);
if (A= = type 1) and((n
z= node(z)) == undef)
n
z= Create(z);
32
Construction of DAGs for BB
2.If (A= = type 1) //x= yop z
Find a node labelled ‘op’ with left and right as n
yand n
zrespectively [determination of common sub-expression]
If (not found)
n = Create (op, n
y, n
z);
If (A= = type 2) //x= op y
Find a node labelled ‘op’ with a single child as n
y
If (not found)
n = Create (op, n
y);
If (A= = type 3) n = Node (y); //x= y
2. Remove x from Node(x).list
Add xin n.list
Node(x) = n;
Example: DAGconstruction from BB
t
1:= 4 * i
33
*
i4
t
1
Example: DAGconstruction from BB
34
t
1:= 4 * i
t
2:= a [ t
1]
*
i4
t
1
[]
a
t
2
Example: DAGconstruction from BB
35
t
1:= 4 * i
t
2:= a [ t
1]
t
3:= 4 * i
*
i4
t
1, t
3
[]
a
t
2
Example: DAGconstruction from BB
36
t
1:= 4 * i
t
2:= a [ t
1]
t
3:= 4 * i
t
4:= b [ t
3]
*
i4
t
1, t
3
[]
a
t
2
[]
b
t
4
Example: DAGconstruction from BB
37
t
1:= 4 * i
t
2:= a [ t
1]
t
3:= 4 * i
t
4:= b [ t
3]
t
5:= t
2+ t
4
*
i4
t
1, t
3
[]
a
t
2
[]
b
t
4
t
5+
Example: DAGconstruction from BB
38
t
1:= 4 * i
t
2:= a [ t
1]
t
3:= 4 * i
t
4:= b [ t
3]
t
5:= t
2+ t
4
i:= t
5
*
i4
t
1, t
3
[]
a
t
2
[]
b
t
4
t
5,i+
•Observations:
•A leaf node for the initial value of an id
•A node nfor each statement s
•The children of node nare the last definition (prior to s) of the operands of n
Optimization of Basic Blocks
•Common sub-expression elimination: by construction of DAG
•Note:for common sub-expression elimination, we are actually targeting for expressions that compute the
same value.
39
a = b + c
b = b –d
c = c + d
e = b + c
Common expressions
(But do not generate the same result)
•DAG representation identifies expressions that yield the same result
a = b + c
b = b –d
c = c + d
e = b + c
b
0
c
0 d
0
+
++ -a b c
e
•Dead code elimination: Code generation from DAG eliminates dead code.
40
a := b + c
b := a –d
d := a –d
c := d + c
b is not live
c
a := b + c
d := a -d
c := d + c
b
0
c
0
d
0
+
-
+
a
b, d×
Optimization of Basic Blocks
Eliminating Redundant Loads and Stores
•If the target code contains the instruction sequence:
•Instruction2canalwaysberemovedifitdoesnothavealabel.
•If it is labeled, there is no guarantee that step 1 will always be executed
before step 2.
44
MOV R, a
MOV a, R
goto L1
...
...
MOV R, a
L1: MOV a, R
Eliminating Multiple Jumps
•If we have jumps to other jumps, then the unnecessary jumps can be
eliminated in either intermediate code or the target code.
•Example: If we have a jump sequence:
45
if a< bgotoL1
...
L1: gotoL2
can be replaced by:
if a< bgotoL2
...
L1: gotoL2
If there are no jumps toL1, then it may be possible to eliminate the
statement, provided it is preceded by an unconditional jump.
Eliminating Unreachable Code
•An unlabeled instruction that immediately follows an unconditional jump can possibly be
removed, and this operation can be repeated in order to eliminate a sequence of instructions.
46
intdebug = 0
if (debug) {
print debugging information
}
this may be translated as
if debug = = 1 gotoL1
gotoL2
L1: print debugging information
L2:
The statements that print the debugging information are
unreachable and can be eliminated
if debug != 1 gotoL2
print debugging information
L2:
Using Machine Idioms
•The target machine may have hardware instructions to implement certain specific operations
efficiently.
•Detecting situations that permit the use of these instructions can reduce execution time
significantly.
•For example, some machines have auto-increment and auto-decrement addressing modes.
•Using these addressing modes can greatly improve the quality of the code when pushing
or popping a stack.
•These modes can also be used for implementing statements likea=a+ 1.
48
replace Add #1,R
by IncR
Other types of Codeoptimization
1.Function preserving Transformations
Common Sub-expressionRemoval
Copy propagation,
Dead-code elimination,
Constant folding
2.LoopOptimization
3.Two types of basic block optimizations
I. Structure-Preserving Transformations
•Common sub-expression elimination
•Dead code elimination
•Renaming of temporary variables
•Interchange of two independent adjacent statements.
II.Algebraic Transformations
49
Common Sub expressionelimination
•Common Sub expression elimination is a optimization that searches for instances of
identical expressions (i.e. they all evaluate the same value), and
•Analyses whether it is worthwhile replacing with a single variable holding the computed
value.
50
Identify common sub-expression present in different expression, compute once, and use the result in all the places.
Example:
a := b * c temp := b * c
… a := temp
… …
x := b * c + 5 x := temp + 5
Common Sub-expressionelimination
•Common sub-expression elimination
•Example 1:
a := b + c a := b + c
c := b + c c := a
d := b + c d := a
•Example 2: in array index calculations
•c[i+1] := a[i+1] + b[i+1]
•During address computation, i+1 should be reused
•Not visible in high level code, but in intermediate code
52
Common Sub-expressionevaluation
z : = a + b + 10
a : = b
1
2 3
4
“a + b” is not a common
sub-expression in 1 and 4
x : = a + b
Dead codeOptimization:
•DeadCodeeliminationremovescodethatdoesnotaffectaprogram.
•Removingsuchcodehastwobenefits.
•Itshrinksprogramsize.
•Itavoidstheexecutingirrelevantoperations,whichreducesitsrunningtime.
•TwotypesofDeadCodeelimination
•UnreachableCode
•Redundantstatement
53
UnreachableCode -Dead codeOptimization
•In Computer Programming, Unreachable Code or dead code is code that exists in
the source code of a program but can never beexecuted.
Program Code
if(a>b)
m=a
elseif (a<b)
m=b
elseif (a==b)
m=0
else
m=-1
OptimizedCode
if (a>b) m=a
elseif (a<b) m=b
else
m=0
54
RedundantCode -Dead codeOptimization
•Redundant Code is code that is executed but has no effect on the output
from aprogram
main()
{
inta, b, c, r;
a=5;
b=6;
c=a +b;
r=2; r++;
printf(“%d”,c);
}
Adding time & spacecomplexity
55
Code Motion -Loop Optimization
•Example -Computation can be moved to outside of the loop
i=1
s=0
do{
s= s +i
a=5
i= i +1
}
while (i <=n)
i=1
s=0
a =5
do{
s= s + i
i= i +1
}
while (i <=n)
Bringing a=5 outside the do while loop, is called code motion.
57
•Example
for (i=0; i<n; i++)
a[i] = a[i] + x/y;
•Three address code
for (i=0; i<n; i++)
{
c = x/y;
a[i] = a[i] + c;
}
c = x/y;
for (i=0; i<n; i++)
a[i] = a[i] + c;
58
Code Motion -Loop Optimization
Code hoisting -Loop Optimization
•CodeSpacereduction:Similartocommonsub-expressioneliminationbutwiththe
objectivetoreducecodesize.
59
Example: Code hoisting
temp : = x ** 2
if (a< b) then if (a< b) then
z := x ** 2 z := temp
else else
y := x ** 2 + 10 y := temp + 10
“x ** 2“ is computed once in both cases, but the code size in the second case reduces.
Induction variable elimination
•If there are multiple induction variablesin a loop, can eliminate the ones which are used only in
the test condition
60
The code fragment below has three induction
variables (i1, i2, and i3)that can be replaced
with one induction variable
inta[SIZE];
intb[SIZE];
void f (void)
{
inti1, i2, i3;
for (i1 = 0, i2 = 0, i3 = 0; i1 < SIZE; i1++)
a[i2++] = b[i3++];
return;
}
The code fragment below shows the loop
after induction variable elimination.
inta[SIZE];
intb[SIZE];
void f (void)
{
inti1;
for (i1 = 0; i1 < SIZE; i1++)
a[i1] = b[i1];
return;
}
•Example
s := 0;
for (i=0; i<n; i++)
{
s := 4 * i;
…
}
61
Induction variable elimination
s := 0;
e := 4*n;
while (s < e)
{
s := s + 4;
}
//iis not referenced in loop
Loop Fusion -Loop Optimization
•Example
for (i=0; i<n; i++) {
A[i] = B[i] + 1
}
for (i=0; i<n; i++) {
C[i] = A[i] / 2
}
for (i=0; i<n; i++) {
D[i] = 1 / C[i+1]
}
Before Loop Fusion
for (i=0; i<n; i++) {
A[i] = B[i] + 1
C[i] = A[i] / 2
D[i] = 1 / C[i+1]
}
Is this correct?
Actually, cannot fuse the third loop
for (i=0; i<n; i++) {
A[i] = B[i] + 1
C[i] = A[i] / 2
}
for (i=0; i<n; i++) {
D[i] = 1 / C[i+1]
}
Loop unrolling or Loop collapsing -Loop Optimization
•Execute loop body multiple times at each iteration
•Try to get rid of the conditional branches, if possible
•Allow optimization to cross multiple iterations of the loop
•Especially for parallel instruction execution
63
Loop unrolling or Loop collapsing -Loop Optimization
64
Example:
In the code fragment below, the double-nested loop oniand jcan be collapsed into a single-nested loop.
inta[100][300];
for (i= 0; i< 300; i++)
for (j = 0; j < 100; j++)
a[j][i] = 0;
Here is the code fragment after the loop has been collapsed.
inta[100][300];
int*p = &a[0][0];
for (i= 0; i< 30000; i++)
*p++ = 0;
Renaming temporary variables
•A statement t : = b + c ( t is a temporary ) can be changed to u : = b + c (u is a new
temporary) and
•All uses of this instance of t can be changed to u without changing the value of the basic
block.
•Such a block is called a normal-form block.
65
Interchange of two independent adjacent statements
•Suppose a block has the following two adjacent statements:
t1 : = b + c
t2 : = x + y
•We can interchange the two statements without affecting the value of the block if and only
if neither x nor y is t1 and neither b nor c is t2.
Reference
•A.V. Aho, M.S. Lam, R. Sethi, J. D. Ullman, Compilers Principles,
Techniques and Tools, Pearson Edition, 2013.
P. Kuppusamy -Lexical Analyzer