in the presentation we have introduced the compiler and how to its work, the phases of compiler
Size: 523.43 KB
Language: en
Added: Oct 23, 2025
Slides: 27 pages
Slide Content
Compiler
Construction
1
Compiler
A Compiler translate information from One
Language to Another, Say from English to Urdu or
similar.
As this course is related to Computers , so we will
keep in mind computer languages.
A compiler translates from High level language to
Machine Language.
OR
A translator that translate source program code
into target machine code.
E
n
g
r
.S
h
a
u
k
a
t
A
l
i
(
0
3
4
5
-
9
5
3
0
4
6
8
)
2
Compiler
If the target program is an executable machine-
language program, it can then be called by the
user to process inputs and produce outputs
3
COMPILER
Source
Program
Target
Program
TAGET PROGRAMINPUT OUTPUT
Phases of Compiler
A Compiler Basically operates into six Phases
Each Phase is a Logical Activity that transform the
source Program from one representation into another.
4
Phases of a Compiler 5
Source Program
Lexical Analyzer
1
Syntax Analyzer
2
Semantic Analyzer
3
Intermediate Code
Generator
4
Code Optimizer
5
Code Generator
6
Target Program
Symbol-table
Manager
Error Handler
1. Lexical Analysis
It is the first phase of compiler
It is also called linear analyzer or scanner
It takes the source code program as a
stream of characters and reads it from
Left-to-Right and broken them into Tokens
Token: It is sequence of characters
having collective meanings (words in
natural language)
6
Token VS Lexeme
LEXEME: A lexeme is a sequence of characters in the
source program that matches the pattern for a
token. It is the raw text from the code that has been
identified as a single logical unit.
E.g sum , result, person etc
TOKEN: A token is a pair, typically consisting of a
token name (or type) and an optional attribute
value. The token name is an abstract symbol
representing a kind of lexical unit (e.g., keyword,
identifier, operator).
7
1. Lexical Analysis 8
Position := initial + rate * 60 ;______ __ ____ _ ___ _ __ _
id1 := Id2 + Id3 * 60
All are Lexemes Which are
converted to tokens and then passed
to next phase
2. Syntax Analyzer
It is the 2nd phase of compiler
It is also called Parser or Hierarchical Analyzer
It takes the stream of tokens as input and generate
parse tree
9
2. Syntax Analyzer
The hierarchical structure of a program is usually expressed by
recursive rules
Example:
Following are the rules for the definition of expression
1.An identifier is an expression
2.Any number is an expression
3.If expression
1
and expression
2
are expressions then so are :
Expression
1 + expression
2
Expression
1 * expression
2
(Expression
1
)
Thus by rule 1 initial and rate are expressions. By rule 2 60 is an
expression while by rule 3 we can first infer that rate * 60 is an
expression and finally that initial + rate * 60 is an expression.
10
2. Syntax Analyzer 11
identifier
identifier
expression
identifier
expression
number
expression
expression
expression
Assignment
Statement
position
:=
+
*
60
initial
rate
Position := initial + rate * 60
3. Semantic Analyzer
It is the 3rd phase of the compiler
This phase checks the source program for semantic errors
also called meanings. i.e. the operation applied in the source
program are meaningful or not
It uses the hierarchical structure determined by the syntax
analysis phase to identify the operators and operands of
expressions and statements
Its main function is the type checking i.e. Here compiler
checks that each operator has operands that are permitted
by the source language specification
12
3. Semantic Analyzer
Example:
Many programming languages definitions require a compiler to
report an error every time a real number is used to index an array
However, the language specification may permit some operand
coercions. e.g.
When a binary arithmetic operator is applied to an integer and real. In
this case the compiler may need to convert the integer to a real.
E
n
g
r
.S
h
a
u
k
a
t
A
l
i
(
0
3
4
5
-
9
5
3
0
4
6
8
)
13
3. Semantic Analyzer
Example:
Inside a machine the bit pattern representing an integer is
generally different from the bit pattern for a real. Even if the
integer and the real number happen to have the same value
Suppose that all identifiers in figure have been declared to
be real and that 60 by itself is assumed to be an integer.
Type checking of Figure1 show that * is applied to a real
rate and an integer 60
The general approach is to convert the integer to a real. This
has been achieved in Figure2 by creating an extra node for
the operator inttoreal that explicitly converts an integer into a
real
E
n
g
r
.S
h
a
u
k
a
t
A
l
i
(
0
3
4
5
-
9
5
3
0
4
6
8
)
14
3. Semantic Analyzer
E
n
g
r
.S
h
a
u
k
a
t
A
l
i
(
0
3
4
5
-
9
5
3
0
4
6
8
)
15
position
initial
rate
:=
+
*
60
Compressed Tree
Figure 1
position
initial
rate
:=
+
*
inttoreal
60
Conversion Action
Figure 2
4. Intermediate Code Generator
ICG is a program for an Abstract Machine
This code should have two properties
It should be easy to generate
It should be easy to translate into the target program
It can be represented in many form. One of them is called “Three
Address Code” which means that every statement of intermediate
Code can have:
At most Three Operands (Variables) and
At most One Operator in addition to Assignment Operator
It also checks the precedence of Operator
ICG generates temporary variables to hold the result of each
operation
17
4. Intermediate Code Generator
For the parse tree given in Semantic Analysis phase the
intermediate code can be generated as:
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
18
3 address code
5. Code Optimizer
The Code Optimizer improve the intermediate code, so that
fast running machine code is achieved
The intermediate code given below:
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
Can be optimized by Code Optimizer as:
temp1 =: id3 * 60.0
id1 =: id2 + temp1
19
6. Code Generation
It is the final phase of compiler which generate either
re-locatable machine code or assembly language code
The next job of the code generator is to select memory
locations for each and every variable used by the
program
Then intermediate instructions are each translated into
a sequence of machine instructions that perform the
same task
For example using register 1 and 2 the translation of
the optimized code into respective code is given below:
20
6. Code Generation
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2 , R1
MOVF R1 , id1
The first operand of each instruction specifies a source and
second operand specifies destination. The F in each instruction
tells us that it deals with floating-point numbers.
21
6. Code Generation
The code loads the contents of address id3 into register R2, then
multiplies it with floating-point constant 60.0. The # signifies that
60.0 is to be treated as an immediate constant. The third
instruction moves id2 into register R1 and the fourth adds to it the
value previously computed in register R2. Finally, the value in
register R1 is stored into the address of id1 , so the code correctly
implements the assignment statement
22
Reviewing the Entire Process23
E
R
R
O
R
S
position := initial + rate * 60
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Intermediate Code Generator
id1 := id2 + id3 * 60
:=
id1
id2
id3
+
*
60
:=
id1
id2
id3
+
*
inttoreal
60
Symbol Table
position ....
initial ….
rate….
Reviewing the Entire Process24
E
R
R
O
R
S
Intermediate Code Generator
Code Optimizer
Code Generator
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
temp1 := id3 * 60.0
id1 := id2 + temp1
position ....
initial ….
rate….
Symbol Table
3 Address Code
Symbol Table
This is the informal phase of compiler.
It is a data structure, containing information about identifiers
(variables) used in the source program in the form of various
attributes.
These attribute provide the information about :
Storage allocated
Type
Scope (visibility)
Procedure names (Function names).
In case of procedure names the attribute may be:
Arguments and return type of functions
The names, types and number of arguments.
The methods of passing arguments. i.e. either by
reference or by value
25
Error Handler
Error
An Error is an abnormal condition in the source program which
either stops the compilation or results in undesired output.
Error Handler:
The two basic task of compiler are
Error Detection
Error Recovery
During the compilation process each phase may encounter
errors so error handler is used to handle these errors
Error handler contains error handling routines
26