This ppt is about the implementation of lexical analyzer.
Size: 3.73 MB
Language: en
Added: Jul 16, 2021
Slides: 27 pages
Slide Content
Lexical Analysis - Implementation
Lexical Analysis
Tokens
Tokens
Tokens -> RE Specification of Tokens : The Patterns corresponding to a token are generally specified using a compact notation known as regular expression. Regular expressions of a language are created by combining members of its alphabet. A regular expression r corresponds to a set of strings L(r) where L(r) is called a regular set or a regular language and may be infinite .
RegEx A regular expression is defined as follows :- A basic regular expression a denotes the set {a} where a ∈Σ ; L(a ) ={a} The regular expression ε denotes the set { ε } If r and s are two regular expressions denoting the sets L(r ) and L(s) then; following are some rules for regular expressions
RegEx
RegEx
Token Recognition Finite Automata are recognizers that can identify the tokens occurring in input stream. Finite Automaton (FA) consists of: A finite set of states A set of transitions (or moves) between states: A special start state A set of final or accepting states
Token Recognition
Token Recognition
Token Recognition
Transition Digrams
Input Buffering Buffer Pairs: Due to amount of time taken to process characters and the large number of characters that must be processed. Specialized buffering techniques are developed to reduce the amount of overhead required to process a single input character.
Language to Specify LA Lex , allows one to specify a lexical analyzer by specifying regular expressions to describe patterns for tokens. The input notation for the Lex tool is referred to as the Lex language and the tool itself is the Lex compiler . Behind the scenes , the Lex compiler transforms the input patterns into a transition diagram and generates code , in a file called lex.yy.c , that simulates this transition diagram .
Language to Specify LA Creating LA with Lex:-
Language to Specify LA In simple words:- LEX converts Lex source program to Lexical analyzer. Lexical Analyzer converts input stream into tokens.
Lex Source Program - Structure A Lex program has the following form :- Auxiliary D efinitions – it denotes Res of the form D1 = R1 //D i is the shortcut name for RE D2 = R2 // R i is the RE Translation Rules – Rules that tell LA which action to take upon encountering these tokens.
Lex Source Program - Structure Auxiliary Definitions:- D1 (letter) = A | B | C…| Z | a | b…| z (R1) D2 (digit) = 0 | 1| 2….. | 9 (R2) D3 (identifier) = letter (letter | digit)* D4 (integer) = digit + D5 (sign) = + | - D6 (signed-integer ) = sign integer
Lex Source Program - Structure Translation Rules:- The translation rules each have form: P i { Action i } Each pattern is a regular expression, which may use the regular definitions of the declaration section. The actions are fragments of code , typically written in C . Ex: for ‘keyword’-> begin {return 1} Ex: for ‘variable’-> identifier {install(); return 6}
Lex Source Program - Structure
Lex Source Program - Structure
Lex Source Program - Structure
Implementation of LA Lex generates LA as its o/p by taking Lex program as i /p. Lex program is collection of patterns (REs) and their corresponding actions. Patterns represent tokens to be recognized by LA to be generated. For each pattern, corresponding NFA will be designed.
Implementation of LA There can be ‘n’ no. of NFAs for ‘n’ no. of patterns. A start state is taken & using ε -transitions, all NFAs are combined. The final state of each NFA show that it has found its own token P i . Convert NFA into DFA. The final state of DFA shows the token we have found.
Implementation of LA If none of the states of DFA include any final states of NFA, then an error is reported. If final state of DFA includes more than one final state of NFA, then final state for pattern coming first in transition rule has priority. ************ -----------------************