Lexical analysis - Compiler Design

1,057 views 37 slides Mar 08, 2021

Slide 1 of 37

About This Presentation

Role of Lexical Analyzer, Specify the tokens, Recognize the tokens, Lexical analyzer generator, Finite automata,
Design of lexical analyzer generator

Size: 1.02 MB

Language: en

Added: Mar 08, 2021

Slides: 37 pages

Slide Content

Lexical Analyzer / Scanner
P. Kuppusamy -Lexical Analyzer

Outline
•Role of lexical analyzer
•Specification of tokens
•Recognition of tokens
•Lexical analyzer generator
•Finite automata
•Design of lexical analyzer generator
P. Kuppusamy -Lexical Analyzer

Lexical analyzer terminologies
Lexeme
Lexemesarethewordsderivedfromthecharacterinputstream.
Itisaparticularinstantofatoken.
Ex. cout<< 3+2-3;
Processing By Scanner : {cout} | space | {<<} | space | {3} {+} {2} {-} {3} {;}
Lexemes: cout, <<, 3, +, 2, +, 3, ;
Token
Groupofcharactershavingacollectivemeaning.
Tokensarelexemesmappedintoatoken-nameandanattribute-value
Syntax:.<identifier,lexeme> //lexemeisoptional
Tokens:<id1,cout> <operator1,<<><number1,3><operator,+>
<number2,2><operator2,-><number3,3><punctuator,;>
Pattern:
Rule that describes how a token can be formed.
E.g.
identifier: ([a-z] | [A-Z]) ([a-z]|[A-Z]|[0-9])*
P. Kuppusamy -Lexical Analyzer

Lexical analyzer
The scanner analyses the source program by reading character by character.
Then grouping the characters into individual words and symbols (tokens).
Remove the comment lines and white spaces (blank space, tab, newline, etc)
Identifiers and values are stored in symbol table.
Types of Tokens: constant, identifier, keyword, operator and symbol.
•E.g.
constpi = 3.14159;
Token 1: (const, -)
Token 2: (identifier, ‘pi’)
Token 3: (=, -)
Token 4: (realnumber, 3.14159)
Token 5: (; , -)
P. Kuppusamy -Lexical Analyzer

Role of Lexical analyzer
Lexical
Analyzer
Parser
Source
program
token
getNextToken()
Symbol
table
To semantic
analysis
•If LA identifies the lexeme constituting an identifier, lexeme needs
to store in the symbol table.
•Sometime type of identifier may read from symbol table.
P. Kuppusamy -Lexical Analyzer

WhydoseparateLexicalanalysisandparsing
•Lexicalanalyzerneednottobeanindividualphase.
•Buthavingaseparatephasesimplifiesthedesignand
improvesthecompilerefficiencyandportability.
ExampleofTokens
TokenInformaldescriptionSamplelexemes
if
else
comparison
id
number
literal
Charactersi,f
Characterse,l,s,e
<or>or<=or>=or==or!=
Letterfollowedbyletteranddigits
Anynumericconstant
Anythingbut“sorroundedby“
if
else
<=,!=
pi,score,D2
3.14159,0,6.02e23
“coredumped”,“(“,“{“,
“,”,“}”,“[“,“;”,etc.,
P. Kuppusamy -Lexical Analyzer

Tokens, their patterns, and attribute values
ws–whitespace (\t, \n, blank space, etc.)
relop–relational operator
P. Kuppusamy -Lexical Analyzer

Example: Find the tokens & lexemes
printf(“total = %d\n”, score);
TokenInformal descriptionLexemes
id
literal
Letter followed by letter and digits
Anything but “ sorrounded by “
printf, score
“total = %d\n”
“, ”
2. Find the tokens & lexemes for E = M*C^2
P. Kuppusamy -Lexical Analyzer

Attributes for tokens
•E = M * C ^2
<id, pointer to symbol table entry for E>
<assign-op>
<id, pointer to symbol table entry for M>
<mult-op>
<id, pointer to symbol table entry for C>
<exp-op>
<number, integer value 2>
P. Kuppusamy -Lexical Analyzer

RE Terminilogy:
alphabet: a finite set of symbols.
E.g. ∑ ={a, b, c}
A stringis a finite sequence of symbols over an alphabet ∑
(sometimes a string is also called a sentence or a word).
A languageis a set of strings over an alphabet ∑.
Tokens are specified by regular expressions effectively.
P. Kuppusamy -Lexical Analyzer

Regular expressions
•Ɛis a regular expression, L(Ɛ) = {Ɛ}
•If a is a symbol in ∑ then a is a regular expression, L(a) = {a}
•Operations on languages (a set):
•Union
(r) | (s) is a regular expression denoting the language L(r) ∪L(s)
•Concatenation
(r) . (s) is a regular expression denoting the language L(r) . L(s)
•Kleene closure
(r)* is a regular expression denoting (L(r))*
•Positive closure
(r) is a regular expression denting (L(r))
+i
i
LL



0
*
 i
i
LL




1

P. Kuppusamy -Lexical Analyzer

•Regular expressions are used to represent the patterns for the tokens.
•The regular definition for C identifiers (letters, digits,and underscores).
LetterA | B | C|…| Z | a | b | … |z| _
digit 0|1|2 |… | 9
id letter( letter | digit )*
•The regular definition for Unsigned numbers (integer orfloating point)
such as 5280, 0.01234, 6.336E4, or 1.89E-4.
digit 0|1|2 |… | 9
digits digit digit*
optionalFraction.digits | 
optionalExponent( E( + |-| ) digits ) | 
number digits optionalFractionoptionalExponent
•More examples: integer constant, string constants, reserved words, operator,
real constant.
Regular definition (Pattern)
P. Kuppusamy -Lexical Analyzer

RECOGNITION OF TOKENS
•Tokens recognized by state transition diagrams.
•Recognition of identifiers
•Recognition of Whitespace (delimiters)
•Recognition of Relational operator
•Recognition of Keywords
•Recognition of Numbers (integer / float)
P. Kuppusamy -Lexical Analyzer

RECOGNITION OF TOKENS
Given the grammar of branching statement:
The pattern for whitespace (ws), keywords & operators:
ws(blankspace| tab | newline)
+
if if
then then
else else
relop< | > | <= | >= | = | <>
P. Kuppusamy -Lexical Analyzer

Transition diagrams
•Transition diagram for relop
* indicates input retraction (pull back)
(i.e. not > , =)
(i.e. not =)
P. Kuppusamy -Lexical Analyzer

Transition diagrams (cont.)
•Transition diagram for reserved words and identifiers
•Transition diagram for whitespace
(i.e. not letter
or digit
(i.e. not delimiter)
P. Kuppusamy -Lexical Analyzer

Transition diagrams (cont.)
•Transition diagram for unsigned numbers
(i.e. not digit)
P. Kuppusamy -Lexical Analyzer

Major Data Structure in a
Compiler
P. Kuppusamy -Lexical Analyzer

Data Structure for Communication among Phases
•TOKENS
•Ascannercollectscharactersintoatoken,asavalueofan
enumerateddatatypefortokens
•Alsopreservethestringofcharactersorotherderived
information,suchasnameofidentifier,valueofanumbertoken
•Asingleglobalvariableoranarrayoftokens
•SYNTAX TREE
•Astandardpointer-basedstructuregeneratedbyparser
•Eachnoderepresentsinformationcollectbyparser,which
maybedynamicallyallocatedorstoredinsymboltable
•Thenoderequiresdifferentattributesdependingonkindof
languagestructure,whichmayberepresentedasvariablerecord.
P. Kuppusamy -Lexical Analyzer

•SYMBOLTABLE
•Keepsinformationassociatedwithidentifiers:function,variable,
constants,anddatatypes
•Interactswithalmosteveryphaseofcompiler.
•Accessoperationneedtobeconstant-time
•Oneorseveralhashtablesareoftenused,
•LITERALTABLE
•Storesconstantsandstrings,reducingsizeofprogram
•Quickinsertionandlookupareessential
Data Structure for Communication among Phases
P. Kuppusamy -Lexical Analyzer

•INTERMEDIATECODE
•Keptasanarrayoftextstring,atemporarytext,oralinkedlist
ofstructuresdependsonkindofintermediatecode(e.g.three-
addresscodeandp-code(Pascal-code))
•Shouldbeeasyforreorganization
•TEMPORARYFILES
•Holdstheproductofintermediatestepsduringcompiling
•Theproblemofmemoryconstraintsorback-patchisaddressed
duringcodegeneration
Data Structure for Communication among Phases
P. Kuppusamy -Lexical Analyzer

Bootstrapping in Compiler Design
•Bootstrappingisaprocesstotranslatesimplelanguageintomore
complicatedprogramwhichinturnmayhandleformore
complicatedprogram.
•Thiscomplicatedprogramcanfurtherhandleevenmore
complicatedprogramandsoon.
•Afullbootstrapisnecessarywhenbuildinganewcompilerfrom
scratch.
•Writingacompilerforanyhighlevellanguageisacomplicated
process.
•Ittakesmoretimetowriteacompilerfromscratch.
•Hencesimplelanguageisusedtogeneratetargetcodeinsome
stages(phases).
P. Kuppusamy -Lexical Analyzer

Tombstone (T) Diagram
Program P implemented in L
L
P
M
Machine implemented in hardware
M
L
Language interpreter in L
•TDiagramcontainssetof“puzzlepieces”that
representscompilersandotherrelatedlanguageprocessing
programs.
S T
L
Translator implemented in L
S –Lang
Compiler
accepts as i/p
T–Lang o/p
by Compiler
L–Lang Compiler
is written
P. Kuppusamy -Lexical Analyzer

Tombstone diagram: Combination rules
M
M
P
ok
M
L
P
Wrong!
L
P
ST
M
Wrong!
S
P P
T
ST
M
M
ok
Both should same
P. Kuppusamy -Lexical Analyzer

•WriteacrosscompilerfornewlanguageX.Lettheimplementation
languageofthiscompilerisYandthetargetcodebeinggeneratedisin
languageZ.Thatis,wecreateXYZ.
•NowifexistingcompilerYrunsonmachineMandgeneratescodefor
MthenitisdenotedasYMM.
Example:Bootstrapping in Compiler Design
X  Z
Y
Compiler1
Y  M
M
M
P. Kuppusamy -Lexical Analyzer

•NowifwerunXYZusingYMMthenwegetacompilerXMZ.That
meansacompilerforsourcelanguageXthatgeneratesatargetcodein
languageZandwhichrunsonmachineM.
Example:Bootstrapping in Compiler Design
X  Z
Y
Compiler1
Y  M
M
X  Z
M
Thiscrosscompilercan
beusedforbootstrapping
onmachineMbutwedo
notwanttorelyonit
permanently.
Cross Compiler
Compiler2
P. Kuppusamy -Lexical Analyzer

Bootstrap to improve efficiency
The efficiency of programs and compilers:
Efficiency of programs:
-memory usage
-runtime
Efficiency of compilers:
-Efficiency of the compiler itself
-Efficiency of the emitted code
P. Kuppusamy -Lexical Analyzer

Lexical Analyzer Generator -Lex
LexisthelanguageorToolforspecifyingtheLexicalAnalyzer.
LAspecifiestheRegularExpressions
REareusedtorepresentsthepatternsforthetokens.
P. Kuppusamy -Lexical Analyzer

Create Lexical Analyzer Generator -Lex
Lexical
Compiler
Lex Source program
lex.l
lex.yy.c
C
compiler
lex.yy.c a.out
(objcode)
a.outInput stream
Sequence
of tokens
P. Kuppusamy -Lexical Analyzer

Structure of Lex programs
declarations
%%
translation rules
%%
auxiliary functions
Pattern {Action}
It contains 3 sections
1. Declarations 2. Translation rules3. Auxiliary functions
Denotes starting of translation rules
Denotes end of translation rules
P. Kuppusamy -Lexical Analyzer

Lex : Example
1. Declaration section
i) Declare identifiers, constants:
%{
Declare identifiers, constants, definitions of manifest constants(LT, LE, EQ, NE, GT,
GE, IF, THEN, ELSE, ID, NUMBER, RELOP) here
}%
E.g.:
%{
inta,b;
float s;
}%
P. Kuppusamy -Lexical Analyzer

Declaration section (cont..d)
ii). Declare regular expressions (without % symbol)
/* regular definitions*/
delim [ \t\n]
ws{delim}
+
letter[A-Za-z]
digit[0-9]
id{letter}({letter}|{digit})*
number{digit}
+
(.{digit}
+
)?(E[+-]?{digit}
+
)?
? --Denotes that combination executes 0 or 1 time
P. Kuppusamy -Lexical Analyzer

2. Translation rules section
•It Starts with %% and ends with %%
•Syntax:
Form of Pattern followed by action
%%
Pattern1{Action1}
Pattern2{Action2}
………
Pattern n{Action n}
%%
Pattern is Regular Expression & Action is C statements
E.g.
%%
{ws}{/* no action and no return */}
if {return(IF);}
then{return(THEN);}
else{return(ELSE);}
{id}{yylval= (int) installID(); return(ID); }
{number} {yylval= (int) installNum(); return(NUMBER); }
…
%%
IntinstallID()
{/*funtiontoinstallthelexeme,
whosefirstcharacterispointed
tobyyytext,andwhoselength
isyyleng,intothesymboltable
andreturnapointerthereto*/
}
IntinstallNum()
{/*similartoinstallID,butputs
numericalconstantsintoa
separatetable*/
}
P. Kuppusamy -Lexical Analyzer

3. Auxiliary function section
All required functions are defined in this section
This program used 2 predefined functions that are installID(),
installNum().
These functions need not to be defined here.
E.g.
Some user defined functions are
add()
{
// statements
}
mul()
{
//statements
}
IntinstallID()
{/*funtiontoinstallthelexeme,
whosefirstcharacterispointed
tobyyytext,andwhoselengthis
yyleng,intothesymboltableand
returnapointerthereto*/
}
IntinstallNum()
{/*similartoinstallID,butputs
numericalconstantsintoa
separatetable*/
}
P. Kuppusamy -Lexical Analyzer

Example -Lex program to recognize tokens:
Identify keywords, rel. operator, numbers
%{
/*no variable declaration */
}%
letter[A-Za-z]
digit[0-9]
id {letter}({letter}|{digit})*
number {digit}
+
(.{digit}+)?(E[+-]?{digit}+)?
%%
{id}{printf(“%s is an identifier”, yytext);} //yytestprovides token id , {} is optional
if {printf(“%s is keyword”, yytext);}
else{printf(“%s is keyword”, yytext);}
“<“{printf(“%s is Less than operator”, yytext);}
P. Kuppusamy -Lexical Analyzer

Lex program to recognize tokens (cont..d)
“>“{printf(“%s is Greater than operator”, yytext);}
“ <=“{printf(“%s is Less than or equal operator”, yytext); }
{number} {printf(“%s is number”, yytext);}
%%
P. Kuppusamy -Lexical Analyzer

Reference
•A.V. Aho, M.S. Lam, R. Sethi, J. D. Ullman, Compilers Principles,
Techniques and Tools, Pearson Edition, 2013.
•Lex&yacc–JohnR.Levine,TonyMason,DougBrown,O’reilly
P. Kuppusamy -Lexical Analyzer

Lexical analysis - Compiler Design

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Lexical analysis - Compiler Design

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

MGV Residential Design projects for different clients, including a New Mexico Adobe project-1-.pdf

EUNITED_Advocacy and Public Engagement through Visual Media

DESIGN THINKINGGG PPT 2 TOPIC IDEATION.pptx

DESIGN THINKING CHAPTER 1 PPTT PPT 1.pptx

Hinduism and Its History - PowerPoint Slides.pptx

Service Attributes of Manufactured Parts.pptx