CS540-2-lecture1.pptgvcxc increment cpp cpp

compengwaelalahmar 7 views 62 slides Jul 19, 2024
Slide 1
Slide 1 of 62
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62

About This Presentation

How to compile


Slide Content

CS 540 Spring 2013

CS 540 Spring 2013 GMU 2
The Course covers:
•Lexical Analysis
•Syntax Analysis
•Semantic Analysis
•Runtime environments
•Code Generation
•Code Optimization

CS 540 Spring 2013 GMU 3
Pre-requisite courses
•Strongprogramming background in C, C++
or Java –CS 310
•Formal Language (NFAs, DFAs, CFG) –
CS 330
•Assembly Language Programming and
Machine Architecture –CS 367

CS 540 Spring 2013 GMU 4
Operational Information
•Office: Engineering Building, Rm. 5315
•E-mail: [email protected]
•Class Web Page: Blackboard
•Discussion board: Piazza
•Computer Accounts on zeus.vse.gmu.edu (link on
‘Useful Links’)

CS 540 Spring 2013 GMU 5
CS 540 Course Grading
•Programming Assignments (45%)
–5% + 10% + 10% + 20%
•Exams –midterm and final (25%, 30%)

CS 540 Spring 2013 GMU 6
Resources
•Textbooks:
–Compilers: Principles, Techniques and Tools,
Aho, Lam, Sethi & Ullman, 2007 (required)
–lex & yacc, Levine et. al.
•Slides
•Sample code for Lex/YACC (C, C++, Java)

CS 540 Spring 2013 GMU 7
Distance Education
•CS 540 Spring ‘13 session is delivered to the
Internet section (Section 540-DL) online by
NEW
•Students in distance section will access to online
lectures and can play back the lectures and
download the PDF slide files
•The distance education students will be given the
midterm and final exam on campus, on the same
day/time as in class students. Exam locations
will be announced closer to the exam dates.

Lecture 1: Introduction to
Language Processing & Lexical
Analysis
CS 540

CS 540 Spring 2013 GMU 9
What is a compiler?
A program that reads a program written in one
language and translates it into another
language.
Source language Target language
Traditionally, compilers go from high-level
languages to low-level languages.

CS 540 Spring 2013 GMU 10
Compiler Architecture
Front End –
language specific
Back End –
machine specific
Source
Language
Target Language
Intermediate
Language
In more detail:
•Separation of Concerns
•Retargeting

CS 540 Spring 2013 GMU 11
Compiler Architecture
Scanner
(lexical
analysis)
Parser
(syntax
analysis)
Code
Optimizer
Semantic
Analysis
(IC generator)
Code
Generator
Symbol
Table
Source
language
tokens
Syntactic
structure
Intermediate
Language
Target
language
Intermediate
Language

CS 540 Spring 2013 GMU 12
Lexical Analysis -Scanning
Scanner
(lexical
analysis)
Parser
(syntax
analysis)
Code
Optimizer
Semantic
Analysis
(IC generator)
Code
Generator
Symbol
Table
•Tokens described formally
•Breaks input into tokens
•Remove white space
Source
language
tokens

CS 540 Spring 2013 GMU 13
Input: result = a + b * c / d
•Tokens:
‘result’, ‘=‘, ‘a’, ‘+’, ‘b’, ‘*’, ‘c’, ‘/’, ‘d’
identifiers
operators

CS 540 Spring 2013 GMU 14
Static Analysis -Parsing
Scanner
(lexical
analysis)
Parser
(syntax
analysis)
Code
Optimizer
Semantic
Analysis
(IC generator)
Code
Generator
Symbol
Table
Source
language
tokens Syntactic
structure
Target
language
•Syntax described formally
•Tokens organized into syntax tree
that describes structure
•Error checking (syntax)

CS 540 Spring 2013 GMU 15
Exp ::= Exp ‘+’ Exp
| Exp ‘-’ Exp
| Exp ‘*’ Exp
| Exp ‘/’ Exp
| ID
Assign ::= ID ‘=‘ Exp
Assign
ID ‘=‘ Exp
Exp ‘+’Exp
Exp ‘*’Exp
Exp ‘/’Exp
ID
ID
ID ID
Input: result = a + b * c / d

CS 540 Spring 2013 GMU 16
Semantic Analysis
Scanner
(lexical
analysis)
Parser
(syntax
analysis)
Code
Optimizer
Semantic
Analysis
(IC generator)
Code
Generator
Symbol
Table
Source
language
Syntactic
structure
Syntactic/semantic
structure
Target
language
•“Meaning”
•Type/Error Checking
•Intermediate Code Generation –
abstract machine
Syntactic/semantic
structure

CS 540 Spring 2013 GMU 17
Optimization
Scanner
(lexical
analysis)
Parser
(syntax
analysis)
Code
Optimizer
Semantic
Analysis
(IC generator)
Code
Generator
Symbol
Table
Source
language
Syntactic/semantic
structure
Target
language
•Improving efficiency (machine
independent)
•Finding optimal code is NP
Syntactic/semantic
structure

CS 540 Spring 2013 GMU 18
Code Generation
Scanner
(lexical
analysis)
Parser
(syntax
analysis)
Code
Optimizer
Semantic
Analysis
(IC generator)
Code
Generator
Symbol
Table
Source
language
Syntactic/semantic
structure
Target
language
•IC to real machine code
•Memory management, register
allocation, instruction selection,
instruction scheduling, …
Syntactic/semantic
structure

CS 540 Spring 2013 GMU 19
Issues Driving Compiler Design
•Correctness
•Speed (runtime and compile time)
–Degrees of optimization
–Multiple passes
•Space
•Feedback to user
•Debugging

CS 540 Spring 2013 GMU 20
Related to Compilers
•Interpreters (direct execution)
•Assemblers
•Preprocessors
•Text formatters (non-WYSIWYG)
•Analysis tools

CS 540 Spring 2013 GMU 21
Why study compilers?
•Bring together:
–Data structures & Algorithms
–Formal Languages
–Computer Architecture
•Influence:
–Language Design
–Architecture (influence is bi-directional)
•Techniques used influence other areas (program
analysis, testing, …)

CS 540 Spring 2013 GMU 22
Review of Formal Languages
•Regular expressions, NFA, DFA
•Translating between formalisms
•Using these formalisms

CS 540 Spring 2013 GMU 23
What is a language?
•Alphabet–finite character set (S)
•String–finite sequence of characters –can
be e, the empty string (Some texts use l as
the empty string)
•Language–possibly infinite set of strings
over some alphabet –can be { }, the empty
language.

CS 540 Spring 2013 GMU 24
Suppose S = {a,b,c}. Some
languages over S could be:
•{aa,ab,ac,bb,bc,cc}
•{ab,abc,abcc,abccc,. . .}
•{ e }
•{ }
•{a,b,c,e}
•…

CS 540 Spring 2013 GMU 25
Why do we care about Regular
Languages?
•Formally describe tokens in the language
–Regular Expressions
–NFA
–DFA
•Regular Expressions finite automata
•Tools assist in the process

CS 540 Spring 2013 GMU 26
Regular Expressions
The regular expressionsover finite Sare the
strings over the alphabet S+{ ), (, |, * }
such that:
1.{ } (empty set) is a regular expression for the
empty set
2.e is a regular expression denoting { e}
3.ais a regular expression denoting set { a} for
any ain S

CS 540 Spring 2013 GMU 27
Regular Expressions
4. If P and Q are regular expressions over S, then so are:
•P | Q (union)
If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then
P | Q denotes the set {a,…,e,0,…,9}
•PQ (concatenation)
If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then
PQ denotes the set {a0,…,e0,a1,…,e9}
•Q* (closure)
If Q denotes the set {0,…,9} then Q* denotes the set
{e,0,…,9,00,…99,…}

CS 540 Spring 2013 GMU 28
Examples
If S= {a,b}
•(a | b)(a | b)
•(a | b)*b
•a*b*a*
•a*a (also known as a+)
•(ab*)|(a*b)

CS 540 Spring 2013 GMU 29
Nondeterministic Finite Automata
A nondeterministic finite automaton(NFA) is a
mathematical model that consists of
1.A set of states S
2.A set of input symbols S
3.A transition function that maps state/symbol pairs to a
set of states:
S x {S+ e} set of S
4.A special state s
0called the start state
5.A set of states F (subset of S) of final states
INPUT: string
OUTPUT: yes or no

CS 540 Spring 2013 GMU 30
STATE
a b e
00,30 1
1 2
2 3
3
Transition Table:
0 1 2 3
a,b
a
b b
e
S = {0,1,2,3}
S
0= 0
S= {a,b}
F = {3}
Example NFA

CS 540 Spring 2013 GMU 31
NFA Execution
An NFA says ‘yes’ for an input string if there is some path from
the start state to some final state where all input has been
processed.
NFA(int s0, int input) {
if (all input processed && s
0is a final state) return Yes;
if (all input processed && s
0not a final state) return No;
for all states s
1where transition(s
0,table[input]) = s
1
if (NFA(s
1,input_element+1) == Yes) return Yes;
for all states s
1where transition(s
0,e) = s
1
if (NFA(s
1,input_element) == Yes) return Yes;
return No;
}
Uses backtracking to search
all possible paths

CS 540 Spring 2013 GMU 32
Deterministic Finite Automata
A deterministic finite automaton(DFA) is a mathematical
model that consists of
1.A set of states S
2.A set of input symbols S
3.A transition function that maps state/symbol pairs to a
state:
S x SS
4.A special state s
0called the start state
5.A set of states F (subset of S) of final states
INPUT: string
OUTPUT: yes or no

CS 540 Spring 2013 GMU 33
DFA Execution
DFA(int start_state) {
state current = start_state;
input_element = next_token();
while (input to be processed) {
current =
transition(current,table[input_element])
if current is an error state return No;
input_element = next_token();
}
if current is a final state return Yes;
else return No;
}

CS 540 Spring 2013 GMU 34
Regular Languages
1.There is an algorithm for converting any RE into
an NFA.
2.There is an algorithm for converting any NFA to
a DFA.
3.There is an algorithm for converting any DFA to
a RE.
These facts tell us that REs, NFAs and DFAs have
equivalent expressive power. All three describe
the class of regular languages.

CS 540 Spring 2013 GMU 35
Converting Regular
Expressions to NFAs
The regular expressionsover finite Sare the strings
over the alphabet S+ { ), (, |, *} such that:
•{ } (empty set) is a regular expression for the empty set
•Empty string e is a regular expression denoting { e}
•ais a regular expression denoting {a} for any ain S
e
a

CS 540 Spring 2013 GMU 36
Converting Regular
Expressions to NFAs
If P and Q are regular expressions with NFAsN
p, N
q:
P | Q (union)
PQ (concatenation)
N
p
N
q
N
qN
p
e
e e
e
e
ee

CS 540 Spring 2013 GMU 37
Converting Regular
Expressions to NFAs
If Q is a regular expression with NFA N
q:
Q* (closure)
N
q
e e
e
e

CS 540 Spring 2013 GMU 38
Example (ab* | a*b)*
ab* 1
43
2
a*b
a
b
b a
Starting with:
1 2
a
43
5 6
ab* | a*b
e
e e
e
a
b
b

CS 540 Spring 2013 GMU 39
Example (ab* | a*b)*
1 2
a
43
5 6
1 2
a
43
5 6
ab* | a*b
(ab* | a*b)*
7 8
e
e
e
e
e
e
e e
e
e
e
e
a
b
b
b
b
a

CS 540 Spring 2013 GMU 40
Converting NFAs to DFAs
•Idea: Each state in the new DFA will correspond
to some set of states from the NFA. The DFA will
be in state {s
0,s
1,…} after input if the NFA could
be in anyof these states for the same input.
•Input: NFA N with state set S
N, alphabet S, start state s
N,
final states F
N, transition function T
N: S
Nx S+ {e} set
of S
N
•Output: DFA D with state set S
D, alphabet S, start state
s
D= e-closure(s
N), final states F
D, transition function
T
D: S
Dx SS
D

CS 540 Spring 2013 GMU 41
e-closure()
Defn: e-closure(T) = T + all NFA states reachable from
any state in T using only etransitions.
1
5
2
43
b
e
a
e
b
b
a
e-closure({1,2,5}) = {1,2,5}
e-closure({4}) = {1,4}
e-closure({3}) = {1,3,4}
e-closure({3,5}) = {1,3,4,5}

CS 540 Spring 2013 GMU 42
Algorithm: Subset Construction
s
D= e-closure(s
N) --create start state for DFA
S
D= {s
D} (unmarked)
while there is some unmarked state Rin S
D
mark state R
for all ain Sdo
s = e-closure(T
N(R,a));
if s not already in S
Dthen add it (unmarked)
T
D(R,a) = s;
end for
end while
F
D= any element of S
Dthat contains a state in F
N

CS 540 Spring 2013 GMU 43
Example 1: Subset Construction
1
5
2
43
e
b
a
b
a,b
a,b
NFA

CS 540 Spring 2013 GMU 44
Example 1: Subset Construction
1
5
2
43
e
b
a
b
a,b
a,b
NFA
1,2
a b
{1,2}

CS 540 Spring 2013 GMU 45
Example 1: Subset Construction
1
5
2
43
e
b
a
b
a,b
a,b
NFA
1,2 4,5
3,5
b
a
a b
{1,2}{3,5} {4,5}
{3,5}
{4,5}

CS 540 Spring 2013 GMU 46
Example 1: Subset Construction
1
5
2
43
e
b
a
b
a,b
a,b
NFA
1,2 4,5
3,5 4
b
a
b
a b
{1,2}{3,5} {4,5}
{3,5}- {4}
{4,5}
{4}

CS 540 Spring 2013 GMU 47
Example 1: Subset Construction
1
5
2
43
e
b
a
b
a,b
a,b
NFA
1,2 4,5 5
3,5 4
a,bb
a
b
a b
{1,2}{3,5} {4,5}
{3,5}- {4}
{4,5}{5} {5}
{4}
{5}

CS 540 Spring 2013 GMU 48
Example 1: Subset Construction
1
5
2
43
e
b
a
b
a,b
a,b
NFA
1,2 4,5 5
3,5 4
a,b
a,b
b
a
b
a b
{1,2}{3,5} {4,5}
{3,5}- {4}
{4,5}{5} {5}
{4}{5} {5}
{5}- -
All final states since the
NFA final state is included

CS 540 Spring 2013 GMU 49
Example 2: Subset Construction
1
5
2
43
b
e
a
b
b
a
NFA
e

CS 540 Spring 2013 GMU 50
Example 2: Subset Construction
1
5
2
43
b
e
a
b
b
a
1
1,3,4
2
1,4,5
a
b
b
NFA DFA
e
1,3,4,5
a
a
a
b
b
b

CS 540 Spring 2013 GMU 51
Example 3: Subset Construction
1 2
54
e
b
e
1,2,4
NFA DFA
a
b
a 5
3,4 3,5
4
3
3
b
b
a
b
a
b
b
a

CS 540 Spring 2013 GMU 52
Converting DFAs to REs
1.Combine serial links by concatenation
2.Combine parallel links by alternation
3.Remove self-loops by Kleene closure
4.Select a node (other than initial or final) for
removal. Replace it with a set of equivalent
links whose path expressions correspond to the
in and out links
5.Repeat steps 1-4 until the graph consists of a
single link between the entry and exit nodes.

CS 540 Spring 2013 GMU 53
Example
1 2 3 4 5
6 7
d
a
b
c
d
0
d a
d
b
b
c
1 2 3 4 5
6 7
d a|b|c d
0
d a
d
b
b|c
parallel edges become alternation

CS 540 Spring 2013 GMU 54
Example
3 4 5
d (a|b|c) d d
0
a
b (b|c) d
1 2 3 4 5
6 7
d a|b|c d
0
d a
d
b
b|c
serial edges become concatenation

CS 540 Spring 2013 GMU 55
Example
3 4 5
d (a|b|c) d d
0
a
b (b|c) d
3 4 5
d (a|b|c) d d
0
a
b(b|c)da
Find paths that can be “shortened”

CS 540 Spring 2013 GMU 56
Example
3 4 5
d (a|b|c) d d
0
a
b(b|c)da
3 4 5
d (a|b|c) d
(b(b|c)da)*d
0
a
5
d (a|b|c) d(b(b|c)da)*d
0
a
eliminate self-loops
serial edges become concatenation

CS 540 Spring 2013 GMU 57
Describing Regular Languages
•Generate allstrings in the language
•Generate onlystrings in the language
Try the following:
–Strings of {a,b} that end with ‘abb’
–Strings of {a,b} that don’t end with ‘abb’
–Strings of {a,b} where every ais followed by at
least one b

CS 540 Spring 2013 GMU 58
Strings of (a|b)* that end in abb
re: (a|b)*abb
0 1 2 3
a b b
b a
a
b
a
0 1 2 3
a b b
a,b
NFA
DFA

CS 540 Spring 2013 GMU 59
Strings of (a|b)* that don’t end in
abb
re: ??
0 1 2 3
a b b
b a
a
b
a
DFA/NFA

CS 540 Spring 2013 GMU 60
Strings of (a|b)* that don’t end in
abb
0 1 2 3
a b b
b a
a
b
a
0 1 2 3
b*a a*b b
a
b
a
0 1 2
b*a a*b
a
bb
ba
0 1 2
b*a a*b
a*bba |
a*ba
a*bbb

CS 540 Spring 2013 GMU 61
Suggestions for writing
NFA/DFA/RE
•Typically, one of these formalisms is more
natural for the problem. Start with that and
convert if necessary.
•In NFA/DFAs, each state typically captures
some partial solution
•Be sure that you include all relevant edges
(ask –does every state have an outgoing
transition for all alphabet symbols?)

CS 540 Spring 2013 GMU 62
Non-Regular Languages
Not all languages are regular”
•The language ww where w=(a|b)*
Non-regular languages cannot be described
using REs, NFAs and DFAs.
Tags