This ppt is about lexical analysis of compiler design.
Size: 647.78 KB
Language: en
Added: Jul 16, 2021
Slides: 21 pages
Slide Content
Lexical Analysis
Lexical Analysis 1 st phase of compiler. Part of front end of compiler. It is usually implemented as subroutine or co-routine of parser.
Lexical Analysis Reads the source program as a file of characters and divide them up into tokens. Tokens are recognized using Finite Automata or Transition Diagrams .
Lexical Analysis The vital functions of Lexical Analyzer are:- Separate tokens from program Insert tokens into symbol table Eliminates whitespaces, comments, newline characters, etc. from string. Give tokens to parser (pass their integer no./code to parser) Keep track of line numbers (for error handling). Pre-processing of Macros.
Lexical Analysis It reads characters from the input, groups them into lexeme and Then, passes the tokens formed by the lexemes, together with their attribute value , to the parser. In some situations ,it has to read some characters ahead before it can decide on the token to be returned to the parser.
Lexical Analysis Ex: lexical analyzer for C must read ahead after it sees the character ‘>’. If the next character is = ,then the character sequence >= is the lexeme forming the token for the “Greater than or equal to ” operator. Other wise > is the lexeme forming “ Greater than ” operator ,and the lexical analyzer has read one character too many.
Lexical Analysis The extra character has to be pushed back on to the input , because it can be the beginning of the next lexeme in the input . The lexical analyzer and parser form a producer-Consumer pair. Produced tokens can be held in a token buffer until they are consumed.
Input Buffering The interaction b/w the two is constrained only by the size of the buffer , because:- lexical analyzer can not proceed when the buffer is full. parser can not proceed when the buffer is empty. Commonly ,the buffer holds just one token. Lexical Analyzer can be a procedure called by a parser, returning tokens on demand.
Input Buffering The implementation of reading and pushing back character is usually done by setting up an input buffer. A block of character is read into the buffer at a time. A pointer keeps track of the portion of the input that has been analysed. Pushing back a character is implemented by moving the pointer.
Input Buffering Source program is read from HDD. LA has to access secondary memory everytime it needs to identify tokens. It may be time consuming, hence an input buffer is used. LA scans input string from left to right one char. at a time to identify tokens.
Input Buffering It uses 2 pointers: Begin Pointer ( bptr ) – points to begin of string. Look-ahead Pointer ( lptr ) – moves ahead to search for end of token. Ex: int a, b;
Input Buffering
Input Buffering
Tokens, Lexemes & Patterns Connected with lexical analysis are three important terms with similar meaning. Lexeme Token Patterns
Tokens, Lexemes & Patterns A token is a pair consisting of a token name and an optional attribute value. Token names: Keywords , operators , identifiers , constants , literal strings , punctuation symbols(such as commas, semi-colons)
Tokens, Lexemes & Patterns A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token . Ex: Relation Operators {<,<=,>,>=,==,<>}
Tokens, Lexemes & Patterns A pattern is a description of the form that the lexemes of token may take. It gives an informal or formal description of a token . Ex: identifier Gives a precise description/ specification of tokens. Used to automatically generate a lexical analyzer