Intermediate code generation1

shashwat2010 5,725 views 56 slides Oct 30, 2010
Slide 1
Slide 1 of 56
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56

About This Presentation

No description available for this slideshow.


Slide Content

Intermediate code generation Shashwat Shriparv [email protected] InfinitySoft

Intermediate Code Generation Translating source program into an “intermediate language.” Simple CPU Independent, …yet, close in spirit to machine language. Benefits is Retargeting is facilitated Machine independent Code Optimization can be applied.

Intermediate Code Generation Intermediate codes are machine independent codes, but they are close to machine instructions. The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator. Intermediate language can be many different languages, and the designer of the compiler decides this intermediate language. syntax trees can be used as an intermediate language. postfix notation can be used as an intermediate language. three-address code (Quadruples) can be used as an intermediate language we will use quadruples to discuss intermediate code generation quadruples are close to machine instructions, but they are not actual machine instructions.

Types of Intermediate Languages Graphical Representations. Consider the assignment a:=b*-c+b*-c: assign a + * * uminus uminus b c c b assign a + * uminus b c

Syntax Dir. Definition to produce syntax trees for Assignment Statements. PRODUCTION Semantic Rule S  id := E { S .nptr = mknode (‘assign’, mkleaf (id, id .entry ) , E. nptr ) } E  E 1 + E 2 {E. nptr = mknode (‘+’, E 1 . nptr ,E 2 .nptr ) } E  E 1 * E 2 {E. nptr = mknode (‘*’, E 1 . nptr ,E 2 .nptr ) } E  - E 1 {E. nptr = mknode (‘uminus’, E 1 . nptr ) } E  ( E 1 ) {E. nptr = E 1 . nptr } E  id {E. nptr = mkleaf (id, id .entry ) }

Three Address Code Statements of general form x:=y op z No built-up arithmetic expressions are allowed. As a result, x:=y + z * w should be represented as t 1 :=z * w t 2 :=y + t 1 x:=t 2 Observe that given the syntax-tree or the dag of the graphical representation we can easily derive a three address code for assignments as above. In fact three-address code is a linearization of the syntax tree. Three-address code is useful: related to machine-language/ simple/ optimizable. x,y,z - names,constants or compiler-generated temporaries t1 , t2 – compiler generated temporary names

3 address code for the syntax tree and the dag a:=b*-c+b*-c: assign a + * * uminus uminus b c c b assign a + * uminus b c Syntax tree Dag

3-address codes are t 1 :=- c t 2 :=b * t 1 t 5 :=t 2 + t 2 a:=t 5 t 1 :=- c t 2 :=b * t 1 t 3 :=- c t 4 :=b * t 3 t 5 :=t 2 + t 4 a:=t 5 Syntax tree Dag

Types of Three-Address Statements. Assignment Statement: x:=y op z Assignment Statement: x:=op z Copy Statement: x:=z Unconditional Jump: goto L Conditional Jump: if x relop y goto L Stack Operations: Push/pop More Advanced Procedure: param x 1 param x 2 … param x n call p,n Index Assignments: x:=y[ i ] x[ i ]:=y Address and Pointer Assignments: x:=&y x:=*y *x:=y Generated as part of call of proc. p(x1,x2,……,xn)

Syntax-Directed Translation into 3-address code.

Syntax-Directed Translation for 3-address code for assignment statements Use attributes E. place to hold the name of the “place” that will hold the value of E Identifier will be assumed to already have the place attribute defined. For example, the place attribute will be of the form t0, t1, t2, … for identifiers and v0,v1,v2 etc. for the rest. E. code to hold the three address code statements that evaluate E (this is the `translation’ attribute). Use function newtemp that returns a new temporary variable that we can use. Use function gen to generate a single three address statement given the necessary information (variable names and operations).

Syntax-Dir. Definition for 3-address code PRODUCTION Semantic Rule S  id := E { S .code = E. code || gen (id .place ‘=’ E. place ) } E  E 1 + E 2 {E. place = newtemp ; E .code = E 1 . code || E 2 . code || || gen (E. place ‘:=’E 1 . place ‘+’E 2 . place ) } E  E 1 * E 2 {E. place = newtemp ; E .code = E 1 . code || E 2 . code || || gen (E. place ‘=’E 1 . place ‘*’E 2 . place ) } E  - E 1 {E. place = newtemp ; E .code = E 1 . code || || gen (E. place ‘=’ ‘uminus’ E 1 . place ) } E  ( E 1 ) {E. place = E 1 . place ; E. code = E 1 . code } E  id {E. place = id .entry ; E. code = ‘’ } e.g. a := b * - (c+d) ‘||’: string concatenation

while statements E.g. while statements of the form “while E do S” (interpreted as while the value of E is not 0 do S) PRODUCTION S  while E do S 1 Semantic Rule S.begin = newlabel ; S.after = newlabel ; S. code = gen (S. begin ‘:’) || E. code || gen (‘if’ E. place ‘=’ ‘0’ ‘goto’ S. after ) || S 1 . code || gen(‘goto’ S. begin ) || gen(S. after ‘:’) E.code If E.place = 0 goto S.after S1.code Goto S.begin S.begin: S.after: ………………. To mark the 1 st stmt. In code for E stmt. following code S

Implementation of 3 address code Quadruples Triples Indirect triples

Quadruples A quadruple is a record structure with four fields: op , arg1 , arg2 , and result The op field contains an internal code for an operator Statements with unary operators do not use arg2 Operators like param use neither arg2 nor result The target label for conditional and unconditional jumps are in result The contents of fields arg1 , arg2 , and result are typically pointers to symbol table entries

Implementations of 3-address statements Quadruples t 1 :=- c t 2 :=b * t 1 t 3 :=- c t 4 :=b * t 3 t 5 :=t 2 + t 4 a:=t 5 op arg1 arg2 result (0) uminus c t 1 (1) * b t 1 t 2 (2) uminus c (3) * b t 3 t 4 (4) + t 2 t 4 t 5 (5) := t 5 a a:=b*-c+b*-c:

Triples Triples refer to a temporary value by the position of the statement that computes it Statements can be represented by a record with only three fields: op, arg1, and arg2 Avoids the need to enter temporary names into the symbol table Contents of arg1 and arg2: Pointer into symbol table (for programmer defined names) Pointer into triple structure (for temporaries)

Implementations of 3-address statements, II Triples t 1 :=- c t 2 :=b * t 1 t 3 :=- c t 4 :=b * t 3 t 5 :=t 2 + t 4 a:=t 5 op arg1 arg2 (0) uminus c (1) * b (0) (2) uminus c (3) * b (2) (4) + (1) (3) (5) assign a (4) a:=b*-c+b*-c:

Implementations of 3-address statements, III Indirect Triples stmt op arg1 arg2 (14) uminus c (15) * b (14) (16) uminus c (17) * b (16) (18) + (15) (17) (19) assign a (18) stmt (0) (14) (1) (15) (2) (16) (3) (17) (4) (18) (5) (19) t1:=- c t2:=b * t1 t3:=- c t4:=b * t3 t5:=t2 + t4 a:=t5 a:=b*-c+b*-c:

DECLARATIONS Declarations in a procedure Langs. like C , Pascal allows declarations in single procedure to be processed as a group A global variable offset keeps track of the next available relative addresses Before the I st declaration is considered, the value of offset is set to 0. When a new name is seen , name is entered in symbol table with current value as offset , offset incre. by width of data object denoted by name. Procedure enter(name,type,offset) creates symbol table entry for name , gives it type type , and rel.addr. offset in its data area Type , width – denotes no. of memory units taken by objects of that type

SDT to generate ICode for Declarations Using a global variable offset PRODUCTION Semantic Rule P  D { } D  D ; D D  id : T { enter ( id .name, T. type , offset ); offset := offset + T .width } T  integer {T. type = integer ; T. width = 4; } T  real {T.type = real ; T.width = 8} T  array [ num ] of T 1 {T. type = array (1.. num . val ,T 1 . type ) T. width = num .val * T 1 . width } T  ^ T 1 {T. type = pointer (T 1 . type );T 1 . width = 4}

Nested Procedure Declarations For each procedure we should create a symbol table . mktable(previous) – create a new symbol table where previous is the parent symbol table of this new symbol table enter(symtable,name,type,offset) – create a new entry for a variable in the given symbol table. enterproc(symtable,name,newsymbtable) – create a new entry for the procedure in the symbol table of its parent. addwidth(symtable,width) – puts the total width of all entries in the symbol table into the header of that table. We will have two stacks: tblptr – to hold the pointers to the symbol tables offset – to hold the current offsets in the symbol tables in tblptr stack.

SDT to generate ICode for Nested Procedures ( P  M D { addwidth ( top (tblptr), top (offset)); pop (tblptr); pop (offset) } M   { t:= mktable (null); push (t, tblptr); push (0, offset)} D  D 1 ; D 2 ... D  proc id ; N D ; S { t:= top (tblpr); addwidth (t, top (offset)); pop (tblptr); pop (offset); enterproc ( top (tblptr), id .name, t)} N   {t:= mktable ( top (tblptr)); push (t,tblptr); push (0,offset);} D  id : T { enter ( top (tblptr), id .name, T. type , top (offset); top (offset):= top (offset) + T. width Example: proc func1; D; proc func2 D; S; S

SDT to generate ICode for assignment statements Use attributes E. place to hold the name of the “place” that will hold the value of E Identifier will be assumed to already have the place attribute defined. For example, the place attribute will be of the form t0, t1, t2, … for identifiers and v0,v1,v2 etc. for the rest. E. code to hold the three address code statements that evaluate E (this is the `translation’ attribute). Use function newtemp that returns a new temporary variable that we can use. Use function gen to generate a single three address statement given the necessary information (variable names and operations).

Syntax-Dir. Definition for 3-address code PRODUCTION Semantic Rule S  id := E { S .code = E. code || gen (id .place ‘=’ E. place ) } E  E 1 + E 2 {E. place = newtemp ; E .code = E 1 . code || E 2 . code || || gen (E. place ‘:=’E 1 . place ‘+’E 2 . place ) } E  E 1 * E 2 {E. place = newtemp ; E .code = E 1 . code || E 2 . code || || gen (E. place ‘=’E 1 . place ‘*’E 2 . place ) } E  - E 1 {E. place = newtemp ; E .code = E 1 . code || || gen (E. place ‘=’ ‘uminus’ E 1 . place ) } E  ( E 1 ) {E. place = E 1 . place ; E. code = E 1 . code } E  id {E. place = id .entry ; E. code = ‘’ } e.g. a := b * - (c+d)

Boolean Expressions Boolean expressions has 2 purpose To compute Boolean values as a conditional expression for statements Methods of translating boolean expression: ( 2 methods to represent the value of Boolean expn ) Numerical methods: True is represented as 1 and false is represented as 0 Nonzero values are considered true and zero values are considered false By Flow-of-control : Represent the value of a boolean by the position reached in a program Often not necessary to evaluate entire expression

SDT for Numerical Representation for booleans Expressions evaluated left to right using 1 to denote true and 0 to donate false Example: a or b and not c t1 := not c t2 := b and t1 t3 := a or t2 Another example: a < b 100: if a < b goto 103 101: t : = 0 102: goto 104 103: t : = 1 104: …

Emit & nextstat emit fn. – places 3-address stmts into an o/p file in the right format nextstat fn. – gives the index of the next 3 - address stmt in o/p sequence E.place to hold the name of the “place” that will hold the value of E

Production Semantic Rules E  E 1 or E 2 E.place := newtemp; emit(E.place ':=' E 1 .place 'or' E 2 .place) E  E 1 and E 2 E.place := newtemp; emit(E.place ':=' E 1 .place 'and' E 2 .place) E  not E 1 E.place := newtemp; emit(E.place ':=' 'not' E 1 .place) E  (E 1 ) E.place := E1.place; SDT for Numerical Representation for booleans

Production Semantic Rules E  id 1 relop id 2 E.place := newtemp; emit('if' id 1 .place relop.op id 2 .place 'goto' nextstat +3); emit(E.place ':=' '0'); emit('goto' nextstat +2); emit(E.place ':=' '1'); E  true E.place := newtemp; emit(E.place ':=' '1') E  false E.place := newtemp; emit(E.place ':=' '0') SDT for Numerical Representation for booleans nextstat fn. – gives the index of the next 3 - address stmt in o/p sequence

Example: a<b or c<d and e<f 100: if a < b goto 103 101: t1 := 0 102: goto 104 103: t1 := 1 104: if c < d goto 107 105: t2 := 0 106: goto 108 107: t2 := 1 108: if e < f goto 111 109: t3 := 0 110: goto 112 111: t3 := 1 112: t4 := t2 and t3 113: t5 := t1 or t4

Flow of control Stmts S → if E then S 1 | if E then S 1 else S 2 |while E do S 1 Here E is the boolean expn. to be translated We assume that 3-address code can be labeled newlabel returns a symbolic label each time its called. E is associated with 2 labels E.true – label which controls flow if E is true E.false – label which controls flow if E is false S.next – is a label that is attached to the first 3 address instruction to be executed after the code for S

1. Code for if - then E.code S 1 .code to E.true to E.false ……….. S → If E then S 1 Semantic rules E.true := newlabel; E.false := S.next; S1.next := S.next; S.code := E.code || gen(E.true ':') || S1.code E.true: E.false:

2.Code for if-then-else S  if E then S 1 else S 2 E.code S 1 .code to E.true to E.false Semantic rules goto S.next S 2 .code E.true: E.false: ……….. S.next E.true := newlabel; E.false := newlabel; S1.next := S.next; S2.next := S.next; S.code := E.code || gen(E.true ':') || S1.code || gen(‘ goto‘ S.next) || gen ( E.false ‘:’ ) || S2.code

3. Code for while-do S  while E do S 1 E.code S 1 .code to E.true to E.false Semantic rules S.begin := newlabel; E.true := newlabel; E.false := S.next; S1.next := S.begin; S.code := gen(S.begin ':') || E.code || gen(E.true ':') || S1.code || gen('goto' S.begin) goto S.begin E.true: E.false: ……….. S.begin

Jumping code/Short Circuit code for boolean expression Boolean Expressions are translated in a sequence of conditional and unconditional jumps to either E.true or E.false . a < b. The code is of the form: if a < b then goto E.true goto E.false E1 or E2 . If E1 is true then E is true, so E1.true = E.true. Otherwise, E2 must be evaluated, so E1.false is set to the label of the first statement in the code for E2. E1 and E2 . Analogous considerations apply. not E1 . We just interchange the true and false with that for E.

Production Semantic Rules E  E 1 or E 2 E 1 .true := E.true; E 1 .false := newlabel; E 2 .true := E.true; E 2 .false := E.false; E.code := E 1 .code || gen(E 1 .false ':') || E 2 .code E  E 1 and E 2 E 1 .true := newlabel; E 1 .false := E.false; E 2 .true := E.true; E 2 .false := E.false; E.code := E 1 .code || gen(E 1 .true ':') || E 2 .code Control flow translation of boolean expression We will now see the code produced for the boolean expression E

Production Semantic Rules E  not E 1 E 1 .true := E.false; E 1 .false := E.true; E.code := E1.code E  (E 1 ) E 1 .true := E.true; E 1 .false := E.false; E.code := E 1 .code E  id 1 relop id 2 E.code := gen('if' id.place relop.op id2.place 'goto' E.true) || gen('goto' E.false) E  true E.code := gen('goto' E.true) E  false E.code := gen('goto' E.false)

Example while a < b do if c < d then x := y + z else x := y - z

Example Lbegin : if a < b goto L1 goto Lnext L1 : if c < d goto L2 goto L3 L2 : t1 := y + z x := t1 goto Lbegin L3 : t2 := y - z x := t2 goto Lbegin Lnext : while a < b do if c < d then x := y + z else x := y - z

Case Statements Switch <expression> begin case value : statement case value : statement …….. case value : statement default : statement end

Translation of a case stmt code to evaluate E into t goto test L1: code for S1 goto next … Ln-1: code for Sn-1 goto next Ln: code for Sn goto next test: if t = V1 goto L1 … if t = Vn-1 goto Ln-1 goto Ln next:

Backpatching Easiest way to implement Syntax directed defn. is to use 2 passes First, construct syntax tree Walk through syntax tree in depth-first order, computing translations May not know the labels to which control must flow at the time a jump is generated Affect boolean expressions and flow control statements Leave targets of jumps temporarily unspecified Add each such statement to a list of goto statements whose labels will be filled in later This filling in of labels is called back patching How backpatching is implemented in 1 pass….?

Lists of Labels Imagine that we are generating quadruples into a quadruple array. Labels are indices into this array To manipulate this list of labels we use 3 fns. makelist(i) Creates a new list containing only i , and index into the array of quadruples Returns a pointer to the new list merge(p1, p2) Concatenates two lists of labels Returns a pointer to the new list backpatch(p, i) – inserts i as target label for each statement on the list pointed to by p

Boolean Expressions and Markers E  E 1 or M E 2 | E 1 and M E 2 | not E 1 | (E 1 ) | id 1 relop id 2 | true | false M  ε

The New Marker , M Translation scheme suitable for producing quadruples during bottom-up pass The new marker has an associated semantic action which Picks up, at appropriate times, the index of the next quadruple to be generated M.quad := nextquad Nonterminal E will have two new synthesized attributes: E.truelist contains a list of statements that jump when expression is true E.falselist contains a list of statements that jump when expression is false

Example: E  E 1 and M E 2 If E 1 is false: Then E is also false So statements on E 1 .falselist become part of E.falselist If E 1 is true: Still need to test E 2 Target for statements on E 1 .truelist must be the beginning of code generated for E 2 Target is obtained using the marker M

New Syntax-Directed Definition (1) Production Semantic Rules E  E 1 or M E 2 backpatch(E 1 .falselist, M.quad); E.truelist := merge(E 1 .truelist, E 2 .truelist); E.falselist := E 2 .falstlist E  E 1 and M E 2 backpatch(E 1 .truelist, M.quad); E.truelist := E 2 .truelist; E.falselist := merge(E 1 .falselist, E 2 .falselist) E  not E 1 E.truelist := E 1 .falselist; E.falselist := E 1 .truelist E  (E 1 ) E.truelist := E 1 .truelist; E.falselist := E 1 .falselist

New Syntax-Directed Definition (2) Production Semantic Rules E  id 1 relop id 2 E.truelist := makelist(nextquad); E.falselist := makelist(nextquad+1); emit('if' id 1 .place relop.op id 2 .place 'goto _'); emit('goto _') E  true E.truelist := makelist(nextquad); emit('goto _') E  false E.falselist := makelist(nextquad); emit('goto _') M  ε M.quad := nextquad

Example Revisited (1) Reconsider: a<b or c<d and e<f First, a<b will be reduced, generating: 100: if a < b goto _ 101: goto _ Next, the marker M in E  E 1 or M E 2 will be reduced, and M.quad will be set to 102 Next, c<d will be reduced, generating: 102: if c < d goto _ 103: goto _

Example Revisited (2) Next, the marker M in E  E 1 and M E 2 will be reduced, and M.quad will be set to 104 Next, e<f will be reduced, generating: 104: if e < f goto _ 105: goto _ Next, we reduce by E  E 1 and M E 2 Semantic action calls backpatch({102}, 104) E 1 .truelist contains only 102 Line 102 now reads: if c <d goto 104

Example Revisited (3) Next, we reduce by E  E 1 or M E 2 Semantic action calls backpatch({101}, 102) E 1 .falselist contains only 101 Line 101 now reads: goto 102 Statements generated so far: 100: if a < b goto _ 101: goto 102 102: if c < d goto 104 103: goto _ 104: if e < f goto _ 105: goto _ Remaining goto instructions will have their addresses filled in later

Annotated Parse Tree

Procedure Calls Grammar S -> call id ( Elist ) Elist -> Elist , E Elist -> E Semantic Actions S -> call id (Elist) for each item p in queue do { gen(‘param’ p) gen(‘call’ id.place) }

Elist -> Elist , E {append E.place to the end of queue} Elist -> E { initialize queue to contain only E.place} e.g. P (x 1 ,x 2 ,x 3 ,…………….x n ) param x 1 param x 2 …………. …………. param x n call P

Shashwat Shriparv [email protected] InfinitySoft