COMPILER DESIGN LECTURES -UNIT-2 ST.pptx

ranjith_kssr5 423 views 127 slides May 09, 2024
Slide 1
Slide 1 of 127
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127

About This Presentation

NOTES


Slide Content

UNIT-2 SYNTAX ANALYSIS

Introduction Syntax analysis is the second phase of the compiler. It gets the input from the tokens and generates a syntax tree or parse tree. The Syntax of programming language constructs can be specified by Context Free Grammar or BNF (Backus-Naur Form) notation.

Introduction Advantages of Grammar for Syntax Specification A grammar gives a precise and easy-to-understand syntactic specification of a programming language. An efficient parser can be constructed automatically from a properly designed grammar. A grammar imparts a structure to a source program that is useful for its translation into object code and for the detection of errors. New constructs can be added to a language more easily when there is a grammatical description of the language.

Introduction Role of the Parser The parser or syntactic analyzer obtains a string of tokens from the lexical analyzer and verifies that the string can be generated by the grammar for the source language. It reports any syntax errors in the program. It also recovers from commonly occurring errors so that it can continue processing its input.

Introduction Role of the Parser Position of a Parser in Compiler Model

Role of the Parser It verifies the structure generated by the tokens based on the grammar. It constructs the parse tree. It reports the errors. It performs error recovery.

Role of the Parser 3 Types of Parsers for Grammars Universal – Parse any grammar, but too inefficient to use in production compilers Top-down— Builds the parse tree from top (root) to the bottom(leaves) Bottom-up— Starts from the leaves and work their way up to the root In either case , the input to the parser is scanned from left to right, one symbol at a time

Issues : Parser cannot detect errors such as: 1. Variable re-declaration 2. Variable initialization before use 3. Data type mismatch for an operation. The above issues are handled by Semantic Analysis phase

Role of the Parser Representative Grammars The following grammar describes the expressions , terms and factors

Role of the Parser Representative Grammars The above grammar is left recursive. Expression grammar belongs to the class of LR grammars that are suitable for bottom-up parsing This grammar handles the additional operators and additional levels of precedence

Role of the Parser Representative Grammars The following Non-Left recursive variant of the expression grammar will be used for top –down parsing.

Role of the Parser Representative Grammars The following grammar is useful for describing techniques for handling the ambiguities during parsing. This grammar permits more than one parse tree for the expressions like a + b * c E -> E + E | E * E | ( E ) | id

Role of the Parser Syntax Error Handling Planning the error handling right from the start can both simplify the structure of the compiler and improve its handling of errors. Common Programming errors can occur at many different levels 1 Lexical errors include misspelling of identifiers, keywords or operators. 2. Syntactic errors include misplaced semicolons or extra or missing braces . 3. Semantic errors, include type mismatch between operators and operand. 4. Logical , such as an infinitely recursive call .

Role of the Parser Syntax Error Handling The error handler in a parser has the following goals It should report the presence of errors clearly and accurately. It should recover from each error quickly enough to be able to detect subsequent errors. It should not significantly slow down the processing of correct programs.

Role of the Parser Error-Recovery Strategies The different strategies that a parse uses to recover from a syntactic error are: 1 . Panic mode 2. Phrase level 3. Error productions 4. Global correction

Role of the Parser Error-Recovery Strategies 1.Panic mode: On discovering an error, the parser discards input symbols one at a time until a synchronizing token is found. The synchronizing tokens are usually delimiters, such as semicolon or end. It has the advantage of simplicity and does not go into an infinite loop. When multiple errors in the same statement are rare, this method is quite useful 2. Phrase level On discovering an error, the parser performs local correction on the remaining input that allows it to continue. Example: Insert a missing semicolon or delete an extraneous semicolon etc .

Role of the Parser Error-Recovery Strategies 3. Error productions The parser is constructed using augmented grammar with error productions. If an error production is used by the parser, appropriate error diagnostics can be generated to indicate the erroneous constructs recognized by the input 4. Global correction Given an incorrect input string x and grammar G, certain algorithms can be used to find a parse tree for a string y, such that the number of insertions, deletions and changes of tokens is as small as possible. However, these methods are in general too costly in terms of time and space.

Context Free Grammars-Formal Definition Grammars are used to systematically describe the syntax of programming language constructs like expressions and statements . A grammar is a finite set of formal rules for generating syntactically correct sentences or meaningful correct sentences. Context Free Grammar: A Context Free Grammar consists of set of Terminals (T),set of Non-terminals or Variables(V), A Start Symbol (S) and Productions (P) G=(V,T,P,S)

Context-Free Grammar-Formal Definition Ex : stmt  if ( expr ) stmt else stmt 1. Terminals (T) are the basic symbols from which strings are formed. Terminals denotes the tokens returned by lexical anlayzer . Ex: if, else , “(“ , ”)” 2. Non-Terminals (N) are the syntactic variables that denote sets of strings Ex: stmt , expr 3. In a grammar, one non-terminal is distinguished as the start symbol and the set of strings it denotes is the language generated by the grammar . 4 . The productions of a grammar specify the manner in which the terminals and non-terminals can be combined to form strings.

Context-Free Grammar-Formal Definition Each production consists of: a) A non terminal symbol called the head or left side of the production b) The symbol  c) A body or right side consisting of zero or more terminals and non terminals

CFG-Formal Definition Example for arithmetic expression

Context-Free Grammar-Notational Conventions Represents terminal symbols

Context-Free Grammar-Notational Conventions

Context-Free Grammar-Notational Conventions

Context-Free Grammar Example Notation

Context-Free Grammar

Derivations

Derivations

Derivations

Derivations

Derivation-Types

Parse Trees and Derivations The derivations can be shown in the form of a tree, such trees are called derivation trees or parse trees. The left most derivations as well as right most derivations can be represented using derivation trees or parse trees .

Parse Trees and Derivations A parse tree is a graphical representation of a derivation that filters out the order in which productions are applied to replace nonterminals . Each interior node of a parse tree represents the application of a production. The interior node is labeled with the non terminal A in the head of the production; the children of the node are labeled, from left to right, by the symbols in the body of the production

Parse Trees and Derivations

Ambiguous Grammar A grammar that produces more than one parse tree for some sentence is said to be ambiguous. or An ambiguous grammar is one that produces more than one leftmost derivation or more than one rightmost derivation for the same sentence.

Ambiguous Grammar

Verifying the Language Generated by Grammar A proof that a grammar (G) generates a language (L) has 2 parts: Show that every string generated by G is in L Conversely, Show that every string in L can be generated by G.

Context Free Grammar vs Regular Expressions

Context Free Grammar vs Regular Expressions

Writing a Grammar Lexical Analysis vs Syntactic Analysis Reasons for using Regular Expressions to define the lexical syntax of a language are as follows The lexical rules of a language are frequently quite simple, and to describe them we do not need a notation as powerful as grammars. Regular expressions generally provide a more concise and easier-to-understand notation for tokens than grammars. More efficient lexical analyzers can be constructed automatically from regular expressions than from arbitrary grammars.

Lexical Analysis vs Syntactic Analysis Regular expressions are most useful for describing the structure of constructs such as identifiers, constants, keywords, and whitespace. Grammars, on the other hand, are most useful for describing nested structures such as balanced parentheses, matching begin-end's, corresponding if-then-else's, and so on.

Eliminating Ambiguity The ambiguity whether to associate else with the first if statement or second if statement is called dangling else problem There are 2 methods for eliminating ambiguity in CFGs 1. Disambiguity rule 2.Using precedence and associativity of operators

Eliminating Ambiguity

Elimination of Left Recursion A grammar is left recursive if it has a nonterminal A such that there is a derivation A => A α ,for some string α . Top-down parsing methods cannot handle left-recursive grammars, so a transformation is needed to eliminate left recursion.

Elimination of Left Recursion

Elimination of Left Recursion

the left-recursive pair of productions A => A α | β could be replaced by the non-left-recursive productions

Replace the A-productions by

Left Factoring

Left Factoring

Left Factoring

Left Factoring

Left Factoring Algorithm

Types of Parsing

Types of Parsers

Types of Parsers

Top Down Parsing

Top Down Parsing

Top Down Parsing

Top Down Parsing

Recursive-Descent Parser

Recursive-Descent Parser

Recursive-Descent Parser else /* error has occurred */

Recursive-Descent Parser

Recursive-Descent Parser advance input_pointer else error (); End if }

Recursive-Descent Parser

Recursive-Descent Parser

Recursive-Descent Parser

Recursive-Descent Parser

Recursive-Descent Parser

Recursive-Descent Parser

Recursive-Descent Parser

Recursive-Descent Parser

Recursive-Descent Parser

Recursive-Descent Parser

First and Follow

Predictive –LL(1) It is a non recursive top down parser. In LL(1), 1st L represents that the scanning of the Input from Left to Right. Second L shows that in this Parsing technique we are going to use Left most Derivation Tree. 1 represents the number of look ahead, means how many symbols are going to see when you want to make a decision.

Construction of Predictive Parsing Table

Predictive –LL(1) Non-Recursive Predictive Parser

Predictive –LL(1) The predictive parser has an input, a stack, a parsing table, and an output. The input contains the string to be parsed, followed by $, the right end marker. The stack contains a sequence of grammar symbols, preceded by $, the bottom-of stack marker . The Stack holds left most derivation.

Predictive –LL(1) The parsing table is a two dimensional array M[A ,a], where A is a nonterminal, and a is a terminal or the symbol $. The parser is controlled by a program that behaves as follows: The program determines X, the symbol on top of the stack, and a the current input symbol. These two symbols determine the action of the parser.

Predictive –LL(1) There are three possibilities: 1.If X = a = $, the parser halts and announces successful completion of parsing. 2. If X = a ≠ $, the parser pops X off the stack and advances the input pointer to the next input symbol.

If X is a nonterminal , the program consults entry M[X, a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. If M[X, a] = {X → UVW}, the parser replaces X on top of the stack by WVU (with U on top). If M[X, a] = error, the parser calls an error recovery routine Predictive –LL(1)

Predictive –LL(1)  The Construction of predictive parser is aided by two functions associated with a grammar G.  These Functions, First and Follow allows us to fill the entries of predictive parsing table for grammar G

Predictive –LL(1)

Predictive –LL(1)

Predictive –LL(1)

Bottom-Up Parsing Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it reaches the root node. we start from a sentence or input string and then apply production rules in reverse manner (reduction) in order to reach the start symbol. The process of parsing halts successfully as soon as we reach to start symbol.

Bottom-Up Parsing Bottom-up parsing is the process of “reducing “ the string w to the start symbol of the grammar . At each reduction step , a s pecific substring matching the body of production is replaced by the nonterminal at the head of that production. The key decisions during Bottom-up parsing are about when to reduce and what production to apply as the parse proceeds.

Bottom-Up Parsing

A right most derivation in reverse can be obtained by handle pruning.

Shift-Reduce Parsing Shift Reduce Parsing is a form of Bottom-Up Parsing. It generates the Parse Tree from Leaves to the Root. Shift reduce parsing is a process of reducing a string to the start symbol of a grammar. Shift reduce parsing uses a stack to hold the grammar and an input buffer to hold the string..

Shift reduce parsing performs the actions: shift , reduce, accept,error At the shift action, the current symbol in the input string is pushed to a stack. At each reduction, the symbols will replaced by the non-terminals. The symbol is the right side of the production and non-terminal is the left side of the production.

LR-PARSERS LR parsers are used to parse the large class of context free grammars . This technique is called LR(k) parsing . L is left-to-right scanning of the input. R is for constructing a right most derivation in reverse. k is the number of input symbols of lookahead that are used in making parsing decisions.

LR-PARSERS

LR-PARSERS SLR(l) – Simple LR Works on smallest class of grammar. Few number of states, hence very small table Simple and fast construction . LR( 1) – LR parser     Also called as Canonical LR parser.     Works on complete set of LR(l) Grammar.     Generates large table and large number of states.     Slow construction.

LR-PARSERS LALR(l) – Look ahead LR parser o Works on intermediate size of grammar. o Number of states are same as in SLR(l). In all type of LR parsing, input, output and stack are same but parsing table is different. Reasons for attractiveness of LR parser LR parsers can handle a large class of context-free grammars. The LR parsing method is a most general non-back tracking shift-reduce parsing method. An LR parser can detect the syntax errors as soon as they can occur. LR grammars can describe more languages than LL grammars.

LR-PARSERS Drawbacks of LR parsers It is too much work to construct LR parser by hand. It needs an automated parser generator. If the grammar contains ambiguities or other constructs then it is difficult to parse in a left-to-right scan of the input.

MODEL OF LR-PARSER

MODEL OF LR-PARSER

MODEL OF LR-PARSER

Simple LR-(SLR)PARSER

Simple LR-(SLR)PARSER

Simple LR-(SLR)PARSER

Simple LR-(SLR)PARSER

Simple LR-(SLR)PARSER

CLR Parser

LALR Parser
Tags