CSE-3201-Lecture-02-Lexical Analysis-Part-I.pptx

mahammedcse 0 views 16 slides Oct 07, 2025
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

Lexical Analysis of Compiler


Slide Content

Lecture-02 Lexical Analysis Part - I Mostafiz Ahammed Lecturer Department of Computer Science and Engineering Notre Dame University Bangladesh

Lexical Analysis Basic Concepts & Regular Expressions What does a Lexical Analyzer do? How does it Work? Formalizing Token Definition & Recognition Reviewing Finite Automata Concepts Non-Deterministic and Deterministic FA Conversion Process Regular Expressions to NFA NFA to DFA Relating NFAs/DFAs /Conversion to Lexical Analysis

Lexical Analysis The lexical analyzer – reads the input characters of the source program , group them into lexemes , and produce as output a sequence of tokens for each lexeme in the source program . The stream of token is sent to the parser for syntax analysis. Suppose we pass a statement through lexical analyzer – a = b + c ; It will generate token sequence like this: id1=id2+id3; Where each id refers to it’s variable in the symbol table referencing all details

Lexical Analysis int main() { // 2 va r i ables int a, b; a = 10; return 0; } Sample Program in C: All the valid 18 tokens are: ' in t ' ' m a i n' ' (' ' )' ' {' 'in t ' ' a' ' ,' ' b' ' ;' 'a' '=' '10' ';' 'return' '0' ';' '}' Note: You can observe that we have omitted comments.

Lexical Analysis As another example, consider below printf statement in C: There are 5 valid token in this printf statement.

Lexical Analyzer in Perspective lexical analyzer parser symbol table source p r ogram to k en get N ext T oken Important Issue: What are Responsibilities of each Box ? Focus on Lexical Analyzer and Parser. Fig: Interaction between the lexical analyzer and the parser

Lexical Analyzer in Perspective Identify the words: Lexical Analysis Converts a stream of characters (input program) into a stream of tokens. Also called Scanning or Tokenizing What is a token? A lexical token is a sequence of characters that can be treated as a unit in the grammar of the programming languages. Example of tokens: Keywords: Examples-for, while, if etc. Identifier: Examples-Variable name, function name, etc. Operators: Examples '+', '++', '-' etc. Separators: Examples ',' ';' etc. Identify the sentences: Parsing. Derive the structure of sentences: construct parse trees from a stream of tokens.

I NTERACTION OF L EXICAL A NALYZER WITH P ARSER Secondary tasks of Lexical Analyzer Strip out comments and white spaces from the source Correlate error messages with the source program Input Scanner Parser S y m bol Table Next_char() c h arac t er token Next_token()

Lexical Analyzer vs. Parser LEXICAL ANALYZER Scan Input Remove WS, NL, … Identify Tokens Create Symbol Table Insert Tokens into ST Generate Errors Send Tokens to Parser PARSER Perform Syntax Analysis Actions Dictated by Token Order Update Symbol Table Entries Create Abstract Rep. of Source Generate Errors And More…. (We’ll see later)

Introducing Basic Terminology TOKEN Token is a sequence of characters that can be treated as a single logical entity . Typically Tokens are- 1 ) Identifiers 2) keywords 3) operators 4) special symbols 5)constants 6) numbers PATTERN There is a set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token LEXEME A lexeme is a sequence of characters in the source program that is matched by the pattern for a token. Identifiers: x, count, name, etc…

Introducing Basic Terminology l i te r al “core dumped” Token Sample Lexemes Informal Description of Pattern const const const if if characters i, f relation <, <=, =, < >, >, >= < or <= or = or < > or >= or > id pi, count , D2 letter followed by letters and digits num 3.1416, 0, 6.02E23 any numeric constant any characters between “ and “ except “ C l assif i es Pattern Actual values are critical. Info is : 1.Stored in symbol table 2.Returned to parser

T OKEN S TREAM Tokens are terminal symbol in the grammar for the source language keywords, operators, identifiers, constants, literal strings, punctuation symbols , numbers etc. Source: if ( x == -3.1415 ) /* test x */ then ... Token Stream: < IF > < LPAREN > < ID, “x” > < EQUALS > < NUM, -3.14150000 > < RPAREN > < THEN > ...

Attributes for Tokens Tokens influence parsing decision; the attributes influence the translation of tokens. A token usually has only a single attribute A pointer to the symbol-table entry Other attributes (e.g. line number, lexeme) can be stored in symbol table

Attributes for Tokens Tokens influence parsing decision; the attributes influence the translation of tokens. Example : E = M * C ** 2 <id, pointer to symbol-table entry for E > <assign_op, > <id, pointer to symbol-table entry for M > <mult_op, > <id, pointer to symbol-table entry for C > <exp_op, > <num, integer value 2 >

Reading Materials Chapter -3 of your Text book: Compilers: Principles, Techniques, and Tools https://www.geeksforgeeks.org/compiler-design-tutorials/

End of slide
Tags