Compiler design selective dissemination of information syntax direct translation sdd.pptx
457 views
43 slides
Apr 26, 2024
Slide 1 of 43
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
About This Presentation
Compiler design
Size: 405.83 KB
Language: en
Added: Apr 26, 2024
Slides: 43 pages
Slide Content
UNIT IV Syntax Directed Translation Syntax Directed Definitions , Evaluation order of SDD’s , Application of SDD and Syntax Directed Translation Schemes. Intermediate code Syntax tree , Polish notation and Three Address code ,static single assignment , Translation of Expressions ,Control flow Statements and Boolean Expressions Run time storage Storage Organization , Storage allocation Strategies , and Parameter passing techniques.
SDD(Syntax Directed Definition ) Syntax Directed Definition(SDD) is a Context Free Grammar(CFG) with semantic rules attached to the Productions. The Semantic rules define values for attributes associated with the Non-terminal Symbol of the Production. SDD is a CFG which consists of attributes and rules. Attributes=>They are associated with grammar Symbol. Rules=>They are associated with Productions. Example: If ‘X’ is a Grammar Symbol and ‘b’ is one of its Attributes , then X.b denotes the value of ‘b’ at a particular Parse-Tree node labeled as ‘X’. Attributes can be number ,types table reference or Strings. Production S emantic rules E->E+T E.val= E.val+T.val E->T E.val=T.val
Inherited and Synthesized Attributes: Attributes are the values attached to the nodes of a Parse tree . Attributes is any property of a Programming Language construct. Types of Attributes: 1. Synthesized Attribute: If a node takes value from its childen then it is called Synthesized Attribute . Example: Production rule A->x A.val=x.val A.val is Synthesized Attribute because its value depending on the children attribute value. A->BCD here A is parent node , BCD are child nodes A.val=B.val A.val=C.val A.val=D.val 2. Inherited Attribute : If a node takes value from its parent node or siblings. Example : Production Rule A-> xy x.val=A.val y.val=A.val x.val=y.val y . val =x.val A->BCD here A is parent node , BCD are child nodes C.val=A.val C.val=B.val Note: Terminals can have Synthesized Attributes but not Inherited Attributes
Evaluating an SDD at the Nodes of a Parse Tree: A Parse tree that shows the value of its attributes at each node is called an Annotated Parse Tree (APT ).The Process of computing the attribute values at the nodes is called ANNOTATING or DECORATING the parse-tree. Example: Construct Syntax Directed Definition for the following Grammar by using Synthesized Attribute and Construct Annotated P arse T ree for the I nput S tring 2+3*4 E->E+T E->T T->T*F T->F F->num Syntax Directed Definition : Production Semantic rule E->E+T E.val -> E.val+T.val E->T E.val ->T.val T->T*F T.val - >T.val* F.val T->F T.val - >F.val F->num F.val - > num.lexvalue
Syntax Tree : Parse T ree : + E 2 * E + T 3 4 T T * F F F num(4) num(2) num(3) Annotated Parse T ree : E.val=14 E.val=2 + T.val=12 T.val=2 T.val=3 * F.val=4 F.val=2 F.val =3 num(4) num(2) num(3)
Evaluation order of SDD’s The useful tool that can be used for determining an evaluation order for the attribute instances in a given parse tree is Dependency Graph . Annotated parse tree indicates attributes values and Dependency graph helps to determine how those values can be computed . Dependency Graph: Dependency Graph indicates the flow of information among the attributes instances in a Particular Parse tree : if an edge exists from one attribute instances to another , it means that the value of the first is needed to compute the second . These edges are used to express the constraints implied by the semantic rules. Generally, 1.The Dependency Graph has a node for each attribute associated with X ,for each node labeled by Grammar Symbol X. 2.Let ‘P’ be a Production .If a Semantic rule associated with P defines the value of Synthesized attribute A.b in terms of the value of X.a,then the dependency graph has an edge from X.a to A.b . 3.If a semantic rule associated with P defines the value of inherited attribute B.a interms of the value of X.a,then the Dependency Graph has an edge from X.a to B.a . Example: Consider the following production and rule: PRODUCTION SEMANTIC RULE E -> E1 + T E.val= E1.val + T.val At every node N labeled E, with children corresponding to the body of this production ,the synthesized attribute val at N is computed using the values of val at the two children , labeled E and T. Thus, a portion of the Dependency graph for every parse tree .we shall show the parse tree edges as dotted lines , while the edges of the dependency graph are solid .
Types of SDD: 1.S-Attribute SDD or S-Attribute grammar or S-Attribute Definition . 2.L-Attribute SDD or L-Attribute grammar or L-Attribute Definition . S-Attribute L-Attribute 1.A SDD that uses only Synthesized Attribute is called as S-Attribute SDD Ex: A->BCD A.val=B.val A.val=C.val A.val=D.val 1.A SDD that uses both Synthesized and Inherited Attribute is called a L-Attributed SDD,but each Inherited Attribute is restricted to inherits from parent or left sibling only. Ex:A ->xyz {y.val= A.val,y .. val =x.val} Wrong-y.val=z.val 2.Semantic Actions are always placed at right end of the production .It is also called as Postfix SDD. 2.Semantic Actions are placed any where on RHS 3.Attributes are Evaluated with bottom-up parsing 3.Attributes are evaluated by traversing parse tree depth first ,left to right order
S-Attributed Definitions: Attribute grammar can be used to translate the syntax tree directly into code for some specific machine or into some intermediate language. S-Attribute grammars are a class of attribute grammars Characterized by having no inherited attributes. An SDD is S-attributed if every attribute is synthesized. When an SDD is S-attributed, we can evaluate its attributes in any bottom up order of the nodes of the parse tree. It is often especially simple to evaluate the attributes by performing a Postorder traversal of the parse tree and evaluating the attributes at a node N when the traversal leaves N for the last time. That is, we apply the function Postorder , defined below, to the root of the parse tree Postorder (N) { for ( each child C of N, from the left ) Postorder (C); evaluate the attributes associated with node N; } S-attributed definitions can be implemented during bottom-up parsing, since a bottom-up parse corresponds to a Postorder traversal. Specifically, Postorder corresponds exactly to the order in which an LR parser reduces a production body to its head.
L-Attributed Definitions: The second class of SDD's is called L-attributed definitions. The idea behind this class is that, between the attributes associated with a production body, dependency-graph edges can go from left to right, but not from right to left(hence "L-attributed"). More precisely, each attribute must be either 1. Synthesized, or 2. Inherited, but with the rules limited as follows. Suppose that there is a production A->X1X2 …. Xn , and that there is an inherited attribute Xi.a computed by a rule associated with this production. Then the rule may use only: (a) Inherited attributes associated with the head A. (b) Either inherited or synthesized attributes associated with the occurrences of symbols X1 X2 …. Xn located to the left of Xi. (c) Inherited or synthesized attributes associated with this occurrence of Xi itself, but only in such a way that there are no cycles in a dependency graph formed by the attributes of this Xj .
SDT(Syntax Directed Translation): During the process of parsing the evoluation of attributes takes place by putting semantic actions in between { } at the right of the grammar symbol is called SDT. Production Semantic Rule T->T*F T.val={T.val* F.val } T->F F.val={F.val} Example 1: E->E*T E->T T->F-T T->F F->num Production Semantic rule E->E*T { E.val ->E.val* T.val } E->T { E.val ->T.val} T->F -T { T.val -> F.val-T.val } T->F { T.val ->F.val} F->num { F.val -> num.lexvalue }
Input : ((4-(2-4)*2) Parse tree: E E * T T F F - T num(2) num(4) F - T num(2) F num(4)
Exampe 2 : S-> xxW {print(1);} S->y { print(2);} W-> Sz { print(3);} String xxxxyzz Parse tree: S x x W S z x x W S z y Output 23131
Translation Scheme to convert infix to postfix expression String 2+3*4 Production Semantic rule parse tree E->E+T { printf (“+”);} --1 E E->T { } ---2 T->T*F { printf (“*”);}---3 E + T 1 T->F {}---4 F->num { printf ( num.lvalue }—5 T 2 T * F 3 F 4 F 4 num(4) 5 5 3 num(2) num(3)
Applications of SDD One of the major applications of SDD is the Construction of Syntax trees .we Already Know that there are two classes of SDD. 1.S-attributed definition ->Suitable for bottom-up parsing. 2.L-attributed definition ->Suitable for top-up parsing. Each node in a syntax tree represents a construct; the children of the node represent the meaningful components of the construct. A syntax-tree node representing an expression E1 + E2 has label + and two children representing the sub expressions E1 and E2.We shall implement the nodes of a syntax tree by objects with a suitable number of fields. Each object will have an op field that is the label of the node. The objects will have additional fields as follows: If the node is a leaf , an additional field holds the lexical value for the leaf. A constructor function Leaf (op, val ) creates a leaf object. Alternatively, if nodes are viewed as records, then Leaf returns a pointer to a new record for a leaf. If the node is an interior node , there are as many additional fields as the node has children in the syntax tree. A constructor function Node takes two or more arguments: Node(op,c1,c2,... ,ck) creates an object with first field op and k additional fields for the k children c1, c2,... ,ck. Example : Constructs Syntax Trees for a simple expression grammar involving only the binary operators + and -. As usual, these operators are at the same precedence level and are jointly left associative. All non-terminals have one synthesized attribute node, which represents a node of the syntax tree.
The Construction of a Syntax tree for the input a — 4 + c. The nodes of the syntax tree are shown as records, with the op field first. Syntax-tree edges are now shown as solid lines. The underlying parse tree, which need not actually be constructed, is shown with dotted edges. The third type of line, shown dashed, represents the values of E.node and T-node; each line points to the appropriate syntax-tree node. At the bottom we see leaves for a, 4 and c, constructed by Leaf. We suppose that the lexical value id. entry points into the symbol table, and the lexical value num.val is the numerical value of a constant. These leaves, or pointers to them, become the value of T.node at the three parse-tree nodes labeled T, according to rules 5 and 6. Note that by rule 3, the pointer to the leaf for a is also the value of E.node for the leftmost E in the parse tree.
Syntax Directed Translation Schemes(SDTS) Syntax-directed translation schemes are a complementary notation to syntax directed definitions. A syntax-directed translation scheme (SDT) is a context free grammar with program fragments embedded within production bodies . The program fragments are called semantic actions and can appear at any position within a production body. Any SDT can be implemented by first building a parse tree and then performing the actions in a left-to-right depth-first order; that is, during a preorder traversal. Typically, SDT's are implemented during parsing, without building a parse tree. To implement SDT's, two important classes of SDD's: 1. The underlying grammar is LR- parsable , and the SDD is S-attributed. 2. The underlying grammar is LL- parsable , and the SDD is L-attributed. 1. Postfix Translation Schemes: The simplest SDD implementation occurs when we can parse the grammar bottom-up and the SDD is S-attributed. In that case, we can construct an SDT in which each action is placed at the end of the production and is executed along with the reduction of the body to the head of that production. SDT's with all actions at the right ends of the production bodies are called postfix SDT's. Example : Let us Construct Postfix SDT for desk Calculator Here tha action for the production 1 Prints a value. The given grammar is LR , and SDD is S-Attributed.
2. Parser-Stack Implementation of Postfix SDT's: A Stack is being used by the bottom up parser to hold information of the parsed Subtrees .During LR Parsing ,Postfix SDT’s can be implemented by considering the attributes of each grammar on the stack in a place where grammar symbol is found during reduction. Parser stack with a field for Synthesized attributes Here TOS contains three grammar symbols XYZ . They are to be reduced based on the production A->XYZ. Here X.x is on attribute of X and so on . 3. Eliminating Left Recursion from SDT's: Since no grammar with left recursion can be parsed deterministically top-down, we examined left-recursion elimination. When the grammar is part of an SDT, we also need to worry about how the actions are handled. In order to Eliminate Left recursion if: A->A α | β then, Replace them by following Productions : - A-> β R R-> α R| Є , Where β doesn’t start with A.Here ‘R’ stands for ‘remainder’. State/grammar Symbol Synthesized attributes : : X X.x Y Y.y TOP - > Z Z.z : :
Intermediate code
Intermediate code: Intermediate Code is a modified input source program which is stored in some data structure. The front end translates a source program into an intermediate representation from which the back end generates target code. If the compiler directly translates source code into the machine code without generating intermediate code then a full native compiler is required for each new machine. Intermediate code generator receives input from its predecessor phase and semantic analyzer phase. It takes input in the form of an annotated syntax tree. The following are commonly used intermediate code representations(or) Forms of Intermediate Code 1. Syntax Tree 2.Postfix Notation 3. Three-Address Code
1.Syntax Tree: Some of the Front-End of compilers translate the input source program into an intermediate form known as Abstract Syntax tree(AST). Here Leaf node represents the Operands like a,b,5 etc. The Interior nodes correspond to the Operators like +.- etc. The Natural Hierarchical Structure is represented by Syntax Trees. The syntax tree not only condenses the parse tree but also offers an improved visual representation of the program’s syntactic structure, Example: x = (a + b * c) / (a – b * c)
2.Postfix Notation: Also known as reverse Polish notation or suffix notation . In the infix notation, the operator is placed between operands, e.g., a + b. Postfix notation positions the operator at the right end, as in ab + . For any postfix expressions e1 and e2 with a binary operator (+) , applying the operator yields e1e2+. Postfix notation eliminates the need for parentheses, as the operator’s position and arity allow unambiguous expression decoding. In postfix notation, the operator consistently follows the operand. Example 1: The postfix representation of the expression (a + b) * c is : ab + c * Example 2: The postfix representation of the expression (a – b) * (c + d) + (a – b) is : ab – cd + * ab -+ 3. Three-Address Code: A three address statement involves a maximum of three references , consisting of two for operands and one for the result . A sequence of three address statements collectively forms a three address code. The typical form of a three address statement is expressed as x = y op z , where x, y , and z represent memory addresses. Each variable (x, y, z) in a three address statement is associated with a specific memory location. While a standard three address statement includes three references, there are instances where a statement may contain fewer than three references, yet it is still categorized as a three address statement. Example : The three address code for the expression a + b * c + d T1 = b * c T2 = a + T1 T3 = T2 + d; T 1 , T2 , T3 are temporary variables.
Example 1: Write Three Address Code for the following expression a= b+c+d The Three Address Code for the above expression as t1= b+c t2=t1+d a=t2 Example 2: Write Three Address Code for the following expression -( a+b )+( c+d )-( a+b+c+d ) The Three Address Code for the above expression as t1= a+b t2= uminus t1 t3= c+d t4=t2+t3 t5= a+b t6=t3+t5 t7=t4-t6
Representation of Three Address Code : 1.Quadruples 2.Triples 3.Indirect Triples 1.Quadruples: In Quadruples representation each instruction is splitted into the following 4 different fields op arg1 arg2 result Here ,op field used for storing the internal code of the operator The arg1 and arg2 are used for storing the two operands. The result field is used for storing the result of the expression. Example: a=b* -c + b* -c Three address code as Quadruple representation as t1=-c t2=b*t1 t3=-c t4=b*t3 t5=t2+t4 Location op arg1 arg2 result (0) - c t1 (1) * b t1 t2 (2) - c t3 (3) * b t3 t4 (4) + t2 t4 t5
2.Triples : In Triples representation the use of temporary variables are avoided by refering the pointers in the symbol table.each instruction is splitted into the following 3 different fields op arg1 arg2 Here ,op field used for storing the internal code of the operator The arg1 and arg2 are used for storing the two operands. Example: a=b* -c + b* -c Three address code as Triple representation as t1=-c t2=b*t1 t3=-c t4=b*t3 t5=t2+t4 Location op arg1 arg2 (0) - c (1) * b (0) (2) - c (3) * b (2) (4) + (1) (3)
3.Indirect Triples : In this representation the listing of triples is being done and listing pointers are used instead of using statements. Example: a=b* -c + b* -c Three address code as Indirect Triple representation as t1=-c t2=b*t1 t3=-c t4=b*t3 t5=t2+t4 (0) 100 (1) 101 (2) 102 (3) 103 (4) 104
Ex1: Translate the following expression to Quadruples,Triples,Indirect Triples a+b *c/ e↑f+b *c Three address code as 1)Quadruple: t1= e↑f t2=b*c t3=t2/t1 t4=b*a t5=a+t3 T6=t5+t4 2)Triples: 3)Indirect Triples: LOC op arg1 arg2 result (0) ↑ e f t1 (1) * b c t2 (2) / t2 t1 t3 (3) * b a t4 (4) + a t3 t5 (5) + t5 t4 t6 loc op arg1 arg2 (0) ↑ e f (1) * b c (2) / (1) (0) (3) * b a (4) + a (2) (5) + (4) (3) 35 (0) 36 (1) 37 (2) 38 (3) 39 (4) 40 (5)
Ex 2: Translate the following expression to Quadruples,Triples,Indirect Triples Example: ( a+b )*(c-d) Three address code as 1)Quadruple representation as: t1= a+b t2=c-d t3=t1*t2 2)Triples: 3)Indirect Triples: LOC op arg1 arg2 result (0) + a b t1 (1) - c d t2 (2) * t1 t2 t3 loc op arg1 arg2 (0) + a b (1) - c d (2) * (0) (1) (0) (10) (1) (11) (2) (12) op arg1 arg2 (0) + a b (1) - c d (2) * (10) (11)
Ex 3: Translate the following expression to Quadruples,Triples,Indirect Triples Example: R=-( a+b )*( c+d )+( a+b+c ) Three address code as 1)Quadruple representation as: t1:= a+b t2:=-t1 t3:= c+d t4:=t2*t3 t5:=t1+c t6:=t4+t5 R:=t6 LOC op arg1 arg2 result (0) + a b t1 (1) - t1 t2 (2) + c d t3 (3) * t2 t3 t4 (4) + t1 c t5 (5) + t4 t5 t6 (6) = t6 R
R=-( a+b )*( c+d )+( a+b+c ) Three address code as 2)Triples: t1:= a+b t2:=-t1 t3:= c+d t4:=t2*t3 t5:=t1+c t6:=t4+t5 R:=t6 3)Indirect Triples: loc op arg1 arg2 (0) + a b (1) - (0) (2) + c d (3) * (1) (2) (4) + (0) c (5) + (3) (4) (6) = R (5) Triple no Statement (0) (10) (1) (11) (2) (12) (3) (13) (4) (14) (5) (15) (6) (16)
Advantages of Intermediate Code Generation: Easier to implement: Intermediate code generation can simplify the code generation process by reducing the complexity of the input code, making it easier to implement. Facilitates code optimization: Intermediate code generation can enable the use of various code optimization techniques, leading to improved performance and efficiency of the generated code. Platform independence: Intermediate code is platform-independent, meaning that it can be translated into machine code or bytecode for any platform. Code reuse: Intermediate code can be reused in the future to generate code for other platforms or languages. Easier debugging: Intermediate code can be easier to debug than machine code or bytecode , as it is closer to the original source code.
Static Single Assignment : SSA is also an Intermediate representation useful for code optimizations .SSA differs in 3AC as All assignments in SSA are variable with distinct names. Example: if 3AC Its Assignment form is a= b+d a1= b+d e=a-f e1=a1-f a=e*g a2=e1*g e=h-e e2=h-e3
Translation of Expressions: Let us see how to translate expressions.Consider assignment statements and expressions as: Example 1: F->F1+F2 Semantic Rules: F.addr =new temp() F.code = F1.code||F2.code||gen( F.addr ’=‘F1.addr ’+’F2.addr) Example 2: F-> F1*F2 Semantic Rules: F.addr =new temp () F.code = F1.code||F2.code||gen( F.addr ’=‘F1.addr ’*’F2.addr ) Example 3: F->id Semantic Rules : F.addr = top.get ( id,lexme )
Flow of Control Statements: The main idea of converting any flow of control statements to the 3AC is to simulate the Branching of the flow of control using the goto statement. Simple if: S->if(B) then S1 Semantic rule: B.true = newlabel ()-this function generates a three address code B.false = S.next ; S1.next= S.next ; S.code = B.code ||Label( B.true )||S1.code if else: S->if(B) then S1 else S2 Semantic rule: B.true = newlabel () B.false = newlabel () S1.next= S.next ; S2.next= S.next ; S.code = B.code ||Label( B.true )||S1.code||Label( B.false )||S2.code
Translation of boolean expressions: A Boolean expression can be Translated into semantic rules as shown: 1.B->B1 ||B2 B.code =B1.code||label(B1.false)||B2.code; 2. B->B1 &&B2 B.code =B1.code|| label(B1.true)|| B2.code ; 3.B- >!B1 B1.true= B.false B1.false= B.true
Run time Storage
Activation tree: The main aim of runtime systems is to map procedures to their activations. Each execution of the procedure is called an activation . A procedure Definition associates an identifier with a statement or a block of statements. The identifier to the procedure is called the procedure name An activation tree is used to determine the way the control flows between procedures. During program Execution ,the control flow is sequential among procedures. In a procedure ,the execution begins at the starting point of its body . Activation tree for the procedure call fib(4) as fib(4) fib(3) fib(2) fib(2) fib(1)
Storage organization: It deals with the division of memory into different areas ,management of the activation records and layout of the local data. a.Sub division of run-time memory: The underlying hardware and operating system provide a division of memory into the following areas. Code area: This contains the generated code Static area: This contains data whose absolute address can be determined at compile time.In C the global and static variables are kept in this area. Stack area: The local variables and parameters of all ‘C’ procedures would reside in this area. Heap area: This is data entered at run time. Heap area Stack area Static area Code area
Storage Allocation : Different types of storage allocation schemes are followed.They are 1.Static Allocation 2.Stack Allocation 3.Heap allocation 1.Static storage allocation In static allocation, names are bound to storage locations. If memory is created at compile time then the memory will be created in static area and only once. Static allocation supports the dynamic data structure that means memory is created only at compile time and deallocated after program completion. The drawback with static storage allocation is that the size and position of data objects should be known at compile time. Another drawback is restriction of the recursion procedure. 2.Stack Storage Allocation In static storage allocation, storage is organized as a stack. An activation record is pushed into the stack when activation begins and it is popped when the activation end. It works on the basis of last-in-first-out (LIFO) and this allocation supports the recursion process.
3.Heap Storage Allocation Heap allocation is the most flexible allocation scheme. Allocation and deallocation of memory can be done at any time and at any place depending upon the user's requirement. Heap allocation is used to allocate memory to the variables dynamically and when the variables are no more used then claim it back. Heap storage allocation supports the recursion process. Example: fact ( int n) { if (n<=1) return 1; else return (n * fact(n-1)); }
Parameter passing techniques: The communication medium among procedures is known as parameter passing. The values of the variables from a calling procedure are transferred to the called procedure by some mechanism. Before moving ahead , first go through some basic terminologies pertaining to the values in a program. r value The value of the Expression is called its r-value . T he value contained in a single variables also becomes an r-value if it appears on the right-hand side of the assignment operator. r-value can always be assigned to some other variables. l-value The location of the memory where an expression is stored is known as the l-value of that expression . It ialways appears at the left hand side of the assignment operator. Eg : Week = day * 7 Based on the parameters there are various parameter passing methods , the most common methods are 1.Pass by value The actual parameters are evaluated mand their r values are passed to called procedure. 2.Pass by reference T he l value ,the address of parameters passed to the called routines.