Introduction Syntax analysis is the second phase of the compiler. It gets the input from the tokens and generates a syntax tree or parse tree. The Syntax of programming language constructs can be specified by Context Free Grammar or BNF (Backus-Naur Form) notation.
Introduction Advantages of Grammar for Syntax Specification A grammar gives a precise and easy-to-understand syntactic specification of a programming language. An efficient parser can be constructed automatically from a properly designed grammar. A grammar imparts a structure to a source program that is useful for its translation into object code and for the detection of errors. New constructs can be added to a language more easily when there is a grammatical description of the language.
Introduction Role of the Parser The parser or syntactic analyzer obtains a string of tokens from the lexical analyzer and verifies that the string can be generated by the grammar for the source language. It reports any syntax errors in the program. It also recovers from commonly occurring errors so that it can continue processing its input.
Introduction Role of the Parser Position of a Parser in Compiler Model
Role of the Parser It verifies the structure generated by the tokens based on the grammar. It constructs the parse tree. It reports the errors. It performs error recovery.
Role of the Parser 3 Types of Parsers for Grammars Universal – Parse any grammar, but too inefficient to use in production compilers Top-down— Builds the parse tree from top (root) to the bottom(leaves) Bottom-up— Starts from the leaves and work their way up to the root In either case , the input to the parser is scanned from left to right, one symbol at a time
Issues : Parser cannot detect errors such as: 1. Variable re-declaration 2. Variable initialization before use 3. Data type mismatch for an operation. The above issues are handled by Semantic Analysis phase
Role of the Parser Representative Grammars The following grammar describes the expressions , terms and factors
Role of the Parser Representative Grammars The above grammar is left recursive. Expression grammar belongs to the class of LR grammars that are suitable for bottom-up parsing This grammar handles the additional operators and additional levels of precedence
Role of the Parser Representative Grammars The following Non-Left recursive variant of the expression grammar will be used for top –down parsing.
Role of the Parser Representative Grammars The following grammar is useful for describing techniques for handling the ambiguities during parsing. This grammar permits more than one parse tree for the expressions like a + b * c E -> E + E | E * E | ( E ) | id
Role of the Parser Syntax Error Handling Planning the error handling right from the start can both simplify the structure of the compiler and improve its handling of errors. Common Programming errors can occur at many different levels 1 Lexical errors include misspelling of identifiers, keywords or operators. 2. Syntactic errors include misplaced semicolons or extra or missing braces . 3. Semantic errors, include type mismatch between operators and operand. 4. Logical , such as an infinitely recursive call .
Role of the Parser Syntax Error Handling The error handler in a parser has the following goals It should report the presence of errors clearly and accurately. It should recover from each error quickly enough to be able to detect subsequent errors. It should not significantly slow down the processing of correct programs.
Role of the Parser Error-Recovery Strategies The different strategies that a parse uses to recover from a syntactic error are: 1 . Panic mode 2. Phrase level 3. Error productions 4. Global correction
Role of the Parser Error-Recovery Strategies 1.Panic mode: On discovering an error, the parser discards input symbols one at a time until a synchronizing token is found. The synchronizing tokens are usually delimiters, such as semicolon or end. It has the advantage of simplicity and does not go into an infinite loop. When multiple errors in the same statement are rare, this method is quite useful 2. Phrase level On discovering an error, the parser performs local correction on the remaining input that allows it to continue. Example: Insert a missing semicolon or delete an extraneous semicolon etc .
Role of the Parser Error-Recovery Strategies 3. Error productions The parser is constructed using augmented grammar with error productions. If an error production is used by the parser, appropriate error diagnostics can be generated to indicate the erroneous constructs recognized by the input 4. Global correction Given an incorrect input string x and grammar G, certain algorithms can be used to find a parse tree for a string y, such that the number of insertions, deletions and changes of tokens is as small as possible. However, these methods are in general too costly in terms of time and space.
Context Free Grammars-Formal Definition Grammars are used to systematically describe the syntax of programming language constructs like expressions and statements . A grammar is a finite set of formal rules for generating syntactically correct sentences or meaningful correct sentences. Context Free Grammar: A Context Free Grammar consists of set of Terminals (T),set of Non-terminals or Variables(V), A Start Symbol (S) and Productions (P) G=(V,T,P,S)
Context-Free Grammar-Formal Definition Ex : stmt if ( expr ) stmt else stmt 1. Terminals (T) are the basic symbols from which strings are formed. Terminals denotes the tokens returned by lexical anlayzer . Ex: if, else , “(“ , ”)” 2. Non-Terminals (N) are the syntactic variables that denote sets of strings Ex: stmt , expr 3. In a grammar, one non-terminal is distinguished as the start symbol and the set of strings it denotes is the language generated by the grammar . 4 . The productions of a grammar specify the manner in which the terminals and non-terminals can be combined to form strings.
Context-Free Grammar-Formal Definition Each production consists of: a) A non terminal symbol called the head or left side of the production b) The symbol c) A body or right side consisting of zero or more terminals and non terminals
CFG-Formal Definition Example for arithmetic expression
Parse Trees and Derivations The derivations can be shown in the form of a tree, such trees are called derivation trees or parse trees. The left most derivations as well as right most derivations can be represented using derivation trees or parse trees .
Parse Trees and Derivations A parse tree is a graphical representation of a derivation that filters out the order in which productions are applied to replace nonterminals . Each interior node of a parse tree represents the application of a production. The interior node is labeled with the non terminal A in the head of the production; the children of the node are labeled, from left to right, by the symbols in the body of the production
Parse Trees and Derivations
Ambiguous Grammar A grammar that produces more than one parse tree for some sentence is said to be ambiguous. or An ambiguous grammar is one that produces more than one leftmost derivation or more than one rightmost derivation for the same sentence.
Ambiguous Grammar
Verifying the Language Generated by Grammar A proof that a grammar (G) generates a language (L) has 2 parts: Show that every string generated by G is in L Conversely, Show that every string in L can be generated by G.
Context Free Grammar vs Regular Expressions
Context Free Grammar vs Regular Expressions
Writing a Grammar Lexical Analysis vs Syntactic Analysis Reasons for using Regular Expressions to define the lexical syntax of a language are as follows The lexical rules of a language are frequently quite simple, and to describe them we do not need a notation as powerful as grammars. Regular expressions generally provide a more concise and easier-to-understand notation for tokens than grammars. More efficient lexical analyzers can be constructed automatically from regular expressions than from arbitrary grammars.
Lexical Analysis vs Syntactic Analysis Regular expressions are most useful for describing the structure of constructs such as identifiers, constants, keywords, and whitespace. Grammars, on the other hand, are most useful for describing nested structures such as balanced parentheses, matching begin-end's, corresponding if-then-else's, and so on.
Eliminating Ambiguity The ambiguity whether to associate else with the first if statement or second if statement is called dangling else problem There are 2 methods for eliminating ambiguity in CFGs 1. Disambiguity rule 2.Using precedence and associativity of operators
Eliminating Ambiguity
Elimination of Left Recursion A grammar is left recursive if it has a nonterminal A such that there is a derivation A => A α ,for some string α . Top-down parsing methods cannot handle left-recursive grammars, so a transformation is needed to eliminate left recursion.
Elimination of Left Recursion
Elimination of Left Recursion
the left-recursive pair of productions A => A α | β could be replaced by the non-left-recursive productions
Replace the A-productions by
Left Factoring
Left Factoring
Left Factoring
Left Factoring
Left Factoring Algorithm
Types of Parsing
Types of Parsers
Types of Parsers
Top Down Parsing
Top Down Parsing
Top Down Parsing
Top Down Parsing
Recursive-Descent Parser
Recursive-Descent Parser
Recursive-Descent Parser else /* error has occurred */
Recursive-Descent Parser
Recursive-Descent Parser advance input_pointer else error (); End if }
Recursive-Descent Parser
Recursive-Descent Parser
Recursive-Descent Parser
Recursive-Descent Parser
Recursive-Descent Parser
Recursive-Descent Parser
Recursive-Descent Parser
Recursive-Descent Parser
Recursive-Descent Parser
Recursive-Descent Parser
First and Follow
Predictive –LL(1) It is a non recursive top down parser. In LL(1), 1st L represents that the scanning of the Input from Left to Right. Second L shows that in this Parsing technique we are going to use Left most Derivation Tree. 1 represents the number of look ahead, means how many symbols are going to see when you want to make a decision.
Construction of Predictive Parsing Table
Predictive –LL(1) Non-Recursive Predictive Parser
Predictive –LL(1) The predictive parser has an input, a stack, a parsing table, and an output. The input contains the string to be parsed, followed by $, the right end marker. The stack contains a sequence of grammar symbols, preceded by $, the bottom-of stack marker . The Stack holds left most derivation.
Predictive –LL(1) The parsing table is a two dimensional array M[A ,a], where A is a nonterminal, and a is a terminal or the symbol $. The parser is controlled by a program that behaves as follows: The program determines X, the symbol on top of the stack, and a the current input symbol. These two symbols determine the action of the parser.
Predictive –LL(1) There are three possibilities: 1.If X = a = $, the parser halts and announces successful completion of parsing. 2. If X = a ≠ $, the parser pops X off the stack and advances the input pointer to the next input symbol.
If X is a nonterminal , the program consults entry M[X, a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. If M[X, a] = {X → UVW}, the parser replaces X on top of the stack by WVU (with U on top). If M[X, a] = error, the parser calls an error recovery routine Predictive –LL(1)
Predictive –LL(1) The Construction of predictive parser is aided by two functions associated with a grammar G. These Functions, First and Follow allows us to fill the entries of predictive parsing table for grammar G
Predictive –LL(1)
Predictive –LL(1)
Predictive –LL(1)
Bottom-Up Parsing Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it reaches the root node. we start from a sentence or input string and then apply production rules in reverse manner (reduction) in order to reach the start symbol. The process of parsing halts successfully as soon as we reach to start symbol.
Bottom-Up Parsing Bottom-up parsing is the process of “reducing “ the string w to the start symbol of the grammar . At each reduction step , a s pecific substring matching the body of production is replaced by the nonterminal at the head of that production. The key decisions during Bottom-up parsing are about when to reduce and what production to apply as the parse proceeds.
Bottom-Up Parsing
A right most derivation in reverse can be obtained by handle pruning.
Shift-Reduce Parsing Shift Reduce Parsing is a form of Bottom-Up Parsing. It generates the Parse Tree from Leaves to the Root. Shift reduce parsing is a process of reducing a string to the start symbol of a grammar. Shift reduce parsing uses a stack to hold the grammar and an input buffer to hold the string..
Shift reduce parsing performs the actions: shift , reduce, accept,error At the shift action, the current symbol in the input string is pushed to a stack. At each reduction, the symbols will replaced by the non-terminals. The symbol is the right side of the production and non-terminal is the left side of the production.
LR-PARSERS LR parsers are used to parse the large class of context free grammars . This technique is called LR(k) parsing . L is left-to-right scanning of the input. R is for constructing a right most derivation in reverse. k is the number of input symbols of lookahead that are used in making parsing decisions.
LR-PARSERS
LR-PARSERS SLR(l) – Simple LR Works on smallest class of grammar. Few number of states, hence very small table Simple and fast construction . LR( 1) – LR parser Also called as Canonical LR parser. Works on complete set of LR(l) Grammar. Generates large table and large number of states. Slow construction.
LR-PARSERS LALR(l) – Look ahead LR parser o Works on intermediate size of grammar. o Number of states are same as in SLR(l). In all type of LR parsing, input, output and stack are same but parsing table is different. Reasons for attractiveness of LR parser LR parsers can handle a large class of context-free grammars. The LR parsing method is a most general non-back tracking shift-reduce parsing method. An LR parser can detect the syntax errors as soon as they can occur. LR grammars can describe more languages than LL grammars.
LR-PARSERS Drawbacks of LR parsers It is too much work to construct LR parser by hand. It needs an automated parser generator. If the grammar contains ambiguities or other constructs then it is difficult to parse in a left-to-right scan of the input.