SYLLABUS Role of Parser – Grammars – Error Handling – Context-free grammars – Writing a grammar –Top Down Parsing – General Strategies Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift Reduce Parser-LR Parser – LR(0) Item Construction of SLR Parsing Table –Introduction to LALR Parser – Error Handling and Recovery in Syntax Analyzer- YACC.
Syntax Analysis By design , every programming language has precise rules that prescribe the syntactic structure of well formed programs In c , for example , a program is made up of functions , a function out of declarations and statements , a statement out of expressions and so on. The syntax of programming language constructs can be specified by context-free grammars or BNF ( Backus-Naur-Form)notation. The Parser determines the syntax or structure of a program. That is,it checks whether the input is syntactically correct or not.
Role of Parser Parser also called syntax analyzer is the one which does parsing Parsing is the process of getting tokens from the lexical analyzer and obtains a derivation for the sequence of tokens and builds a parse tree Thus if the program is syntactically correct , the parse tree is generated If a derivation for the sequence of tokens does not exist i.e., if the program is syntactically wrong , it results in syntax error and the parser displays the appropriate error messages. The parse trees are very important in figuring out the meaning of a program or part of the program The parse tree is also called the syntax tree or derivation tree.
Role of a parser In compiler model, the parser obtains a string of tokens from the lexical analyzer and verifies that the string of token names can be generated by the grammar for the source language.
Contd.. A parser is a software component that takes input data and builds a data structure called parse tree. The parser reads the tokens from the lexical analyser and checks whether the sequence of tokens matches the grammar rules of the programming language and generate the parse tree. A syntax error is reported when there is mismatch that is if the token sequence does not match the grammar.
Types of parser Three types of parser Universal parser – the Cocke -Youngster- Kasami algorithm and Earley’s algorithm can parse any grammar, Top-down parser – a top-down parser starts with the root of the parse tree , labelled with the start or goal symbol of the grammar and proceeds down to the leaves. Bottom-up parser – a bottom up parser starts with the leaves and moves to the start symbol or the root.
Error handling in syntax analysis The following are the common errors: Lexical Error: such as misspelling an identifier, keyword or operator Example: misspelling as ‘ whil ’ or ‘ caase ’ Syntactic Error: such as an arithmetic expression with unbalanced paranthesis Example: a=a%+s; or a=(( a+b )/n)*100) Semantic Error: such as operator applied to an incompatible operand. Example: c=true*8; this expression is syntactically right, but it does not has meaning. Logical Error: such as infinitely recursive call. Example: for( i =0;i< n;i ++);
Error recovering strategies The following are the common error recovering strategies: Panic Mode Phrase level Recovery Error production Global correction
1.Panic Mode When a parser encounters an error anywhere in the statement ,it ignores the rest of the statement by not processing input from erroneous input to delimiter. This method often skips a considerable amount of input without checking it for additional errors. It is an easiest way of error-recovery. It prevents the parser from developing infinite loops
Contd.. Example: a= b+c a= b+c // after reach c parser discards input symbol one at a time d= e+f ; The compiler will discard all subsequent token till the semicolon encountered. int a, 5abcd, sum, $2; // After int a, 5abcd , sum, $2 ; // parser discards input symbol one at a time.
Contd.. Advantage: It’s easy to use. The program never falls into the loop. Disadvantage: This technique may lead to semantic error or runtime error in further stages.
2.Phrase Level Recovery On discovering an error, a parser may perform local corrections on the remaining input It may replace a prefix of the remaining input by some string that allows the parser to continue Parser designers have to be careful here one wrong correction can lead to infinite loop Example: in case of an error like the previous ,it will report the error and generate the”;” and continue processing.
3.Error Production It requires good knowledge of common errors that might get encountered, then we can augment the grammar for the corresponding language with error productions that generate the erroneous constructs. If error production is used during parsing, we can generate an appropriate error message to indicate the error that has been recognized in the input. This method is extremely difficult to maintain, because if we change grammar, then it becomes necessary to change the corresponding productions.
Contd.. Example: Suppose the input string is abcd . Grammar: S-> A A-> aA | bA | a | b B-> cd The input string is not obtainable by the above grammar, so we need to add Augmented Grammar. Grammar: E->SB // AUGMENT THE GRAMMAR S-> A A-> aA | bA | a | b B-> cd Now, string abcd is possible to obtain.
Contd.. Advantages: Syntactic phase errors are generally recovered by error productions. Disadvantages: The method is very difficult to maintain because if we change the grammar then it becomes necessary to change the corresponding production. It is difficult to maintain by the developers.
4.Global Correction We often want such a compiler that makes very few changes in processing an incorrect input string to the correct input string. Given an incorrect input string x and grammar G , the algorithm itself can find a parse tree for a related string y (Expected output string); such that a number of insertions, deletions, and changes of token require to transform x into y is as low as possible. Global correction methods increase time & space requirements at parsing time. This is simply a theoretical concept.
CONTEXT FREE GRAMMAR
Context free Grammar(CFG) Context free grammar is a formal grammar which is used to generate all possible strings in a given formal language. Context free grammar G can be defined by four tuples as: G= (V, T, P, S) G Describes a grammar V set of variables or non-terminals T set of terminal symbols P set of production rules S Start symbols