predatory parser left factorial 4-CFGs.ppt

MadhuCK2 6 views 81 slides Mar 03, 2025
Slide 1
Slide 1 of 81
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81

About This Presentation

predatory parser left factorial


Slide Content

COS 320
Compilers
David Walker

The Front End
•Lexical Analysis: Create sequence of tokens from
characters (Chap 2)
•Parsing: Create abstract syntax tree from
sequence of tokens (Chap 3)
•Type Checking: Check program for well-
formedness constraints
Lexer Parser
stream of
characters
stream of
tokens
abstract
syntax
Type
Checker

Parsing with CFGs
•Context-free grammars are (often) given by BNF
expressions (Backus-Naur Form)
–Appel Chap 3.1
•More powerful than regular expressions
–Matching parens
•wait, we could do nested comments with ML-LEX!
•still can’t describe PL structure effectively in ML-LEX
•CFGs are good for describing the overall
syntactic structure of programs.

Context-Free Grammars
•Context-free grammars consist of:
–Set of symbols:
•terminals that denotes token types
•non-terminals that denotes a set of strings
–Start symbol
–Rules:
•left-hand side: non-terminal
•right-hand side: terminals and/or non-terminals
•rules explain how to rewrite non-terminals (beginning with
start symbol) into terminals
symbol ::= symbol symbol ... symbol

Context-Free Grammars
A string is in the language of the CFG if and only if it is
possible to derive that string using the following
non-deterministic procedure:
1.begin with the start symbol
2.while any non-terminals exist, pick a non-terminal and rewrite it
using a rule <== could be many choices here
3.stop when all you have left are terminals (and check you arrived
at the string your were hoping to)
Parsing is the process of checking that a string is in the
CFG for your programming language. It is usually
coupled with creating an abstract syntax tree.

non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
S ::= S; S
S ::= ID := E
S ::= PRINT ( Elist )
E ::= ID
E ::= NUM
E ::= E + E
E ::= ( S , Elist )
Elist ::= E
Elist ::= Elist , E

non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
4. E ::= ID
5. E ::= NUM
6. E ::= E + E
7. E ::= ( S , Elist )
8. Elist ::= E
9. Elist ::= Elist , E
ID = NUM ; PRINT ( NUM )
Derive me!

non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
4. E ::= ID
5. E ::= NUM
6. E ::= E + E
7. E ::= ( S , Elist )
8. Elist ::= E
9. Elist ::= Elist , E
S

ID = NUM ; PRINT ( NUM )
Derive me!

non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
4. E ::= ID
5. E ::= NUM
6. E ::= E + E
7. E ::= ( S , Elist )
8. Elist ::= E
9. Elist ::= Elist , E
S
ID = E
ID = NUM ; PRINT ( NUM )
Derive me!

non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
4. E ::= ID
5. E ::= NUM
6. E ::= E + E
7. E ::= ( S , Elist )
8. Elist ::= E
9. Elist ::= Elist , E
S
ID = E
ID = NUM ; PRINT ( NUM )
Derive me!
oops,
can’t make
progress
E

non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
4. E ::= ID
5. E ::= NUM
6. E ::= E + E
7. E ::= ( S , Elist )
8. Elist ::= E
9. Elist ::= Elist , E
S
ID = NUM ; PRINT ( NUM )
Derive me!

non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
4. E ::= ID
5. E ::= NUM
6. E ::= E + E
7. E ::= ( S , Elist )
8. Elist ::= E
9. Elist ::= Elist , E
S
S ; S
ID = NUM ; PRINT ( NUM )
Derive me!

non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
4. E ::= ID
5. E ::= NUM
6. E ::= E + E
7. E ::= ( S , Elist )
8. Elist ::= E
9. Elist ::= Elist , E
S
S ; S
ID := E ; S
ID = NUM ; PRINT ( NUM )
Derive me!

non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
4. E ::= ID
5. E ::= NUM
6. E ::= E + E
7. E ::= ( S , Elist )
8. Elist ::= E
9. Elist ::= Elist , E
S
S ; S
ID = E ; S
ID = NUM ; S
ID = NUM ; PRINT ( Elist )
ID = NUM ; PRINT ( E )
ID = NUM ; PRINT ( NUM )
Derive me!

non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
4. E ::= ID
5. E ::= NUM
6. E ::= E + E
7. E ::= ( S , Elist )
8. Elist ::= E
9. Elist ::= Elist , E
S
S ; S
ID = E ; S
ID = NUM ; S
ID = NUM ; PRINT ( Elist )
ID = NUM ; PRINT ( E )
ID = NUM ; PRINT ( NUM )
Another way to
derive the
same string
S
S ; S
S ; PRINT ( Elist )
S ; PRINT ( E )
S ; PRINT ( NUM )
ID = E ; PRINT ( NUM )
ID = NUM ; PRINT ( NUM )
left-most derivation right-most derivation

Parse Trees
•Representing derivations as trees
–useful in compilers: Parse trees correspond quite
closely (but not exactly) with abstract syntax trees
we’re trying to generate
•difference: abstract syntax vs concrete (parse) syntax
•each internal node is labeled with a non-terminal
•each leaf node is labeled with a terminal
•each use of a rule in a derivation explains how to
generate children in the parse tree from the
parents

Parse Trees
•Example:
S
S S
ID E:=
NUM
NUM
E
L)(PRINT
;
S
S ; S
ID = E ; S
ID = NUM ; S
ID = NUM ; PRINT ( Elist )
ID = NUM ; PRINT ( E )
ID = NUM ; PRINT ( NUM )

Parse Trees
•Example: 2 derivations, but 1 tree
S
S S
ID E:=
NUM
NUM
E
L)(PRINT
;
S
S ; S
ID = E ; S
ID = NUM ; S
ID = NUM ; PRINT ( Elist )
ID = NUM ; PRINT ( E )
ID = NUM ; PRINT ( NUM )
S
S ; S
S ; PRINT ( Elist )
S ; PRINT ( E )
S ; PRINT ( NUM )
ID = E ; PRINT ( NUM )
ID = NUM ; PRINT ( NUM )

Parse Trees
•parse trees have meaning.
–order of children, nesting of subtrees is
significant
S
S
S
ID E:=
NUM
NUM
E
L)(PRINT
;
S
S
S
ID E:=
NUM
NUM
E
L)(PRINT
;

Ambiguous Grammars
•a grammar is ambiguous if the same
sequence of tokens can give rise to two or
more parse trees

Ambiguous Grammars
non-terminals:
E
terminals:
ID
NUM
PLUS
MULT
E ::= ID
| NUM
| E + E
| E * E
characters: 4 + 5 * 6
tokens: NUM(4) PLUS NUM(5) MULT NUM(6)
E
E E
NUM(4)
E
*
E
+
NUM(6)NUM(5)
I like using this notation where
I avoid repeating E ::=

Ambiguous Grammars
non-terminals:
E
terminals:
ID
NUM
PLUS
MULT
E ::= ID
| NUM
| E + E
| E * E
characters: 4 + 5 * 6
tokens: NUM(4) PLUS NUM(5) MULT NUM(6)
E
E E
NUM(4)
E
*
E
+
NUM(6)NUM(5)
E
EE
NUM(6)
E
+
E
*
NUM(5)NUM(4)

Ambiguous Grammars
•problem: compilers use parse trees to interpret the
meaning of parsed expressions
–different parse trees have different meanings
–eg: (4 + 5) * 6 is not 4 + (5 * 6)
–languages with ambiguous grammars are DISASTROUS;
The meaning of programs isn’t well-defined! You can’t tell
what your program might do!
•solution: rewrite grammar to eliminate ambiguity
–fold precedence rules into grammar to disambiguate
–fold associativity rules into grammar to disambiguate
–other tricks as well

Building Parsers
•In theory classes, you might have learned about general
mechanisms for parsing all CFGs
–algorithms for parsing all CFGs are expensive
•actually, with computers getting faster and bigger year over year,
researchers are beginning to dispute this claim.
–for 1/10/100 million-line applications, compilers must be fast.
•even for 20 thousand-line apps, speed is nice
–sometimes 1/3 of compilation time is spent in parsing
•compiler writers have developed specialized algorithms
for parsing the kinds of CFGs that you need to build
effective programming languages
–LL(k), LR(k) grammars can be parsed.

Recursive Descent Parsing
•Recursive Descent Parsing (Appel Chap 3.2):
–aka: predictive parsing; top-down parsing
–simple, efficient
–can be coded by hand in ML quickly
–parses many, but not all CFGs
•parses LL(1) grammars
–Left-to-right parse; Leftmost-derivation; 1 symbol lookahead
–key ideas:
•one recursive function for each non terminal
•each production becomes one clause in the function

1. S ::= IF E THEN S ELSE S
2. | BEGIN S L
3. | PRINT E
4. L ::= END
5. | ; S L
6. E ::= NUM = NUM
non-terminals: S, E, L
terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, =
rules:

1. S ::= IF E THEN S ELSE S
2. | BEGIN S L
3. | PRINT E
4. L ::= END
5. | ; S L
6. E ::= NUM = NUM
non-terminals: S, E, L
terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, =
rules:
datatype token = NUM | IF | THEN | ELSE | BEGIN | END
| PRINT | SEMI | EQ
Step 1: Represent the tokens
Step 2: build infrastructure for reading tokens from lexing stream
val tok = ref (getToken ())
fun advance () = tok := getToken ()
fun eat t = if (! tok = t) then advance () else error ()
function supplied by lexer

1. S ::= IF E THEN S ELSE S
2. | BEGIN S L
3. | PRINT E
4. L ::= END
5. | ; S L
6. E ::= NUM = NUM
non-terminals: S, E, L
terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, =
rules:
fun S () = case !tok of
IF => eat IF; E (); eat THEN; S (); eat ELSE; S ()
| BEGIN => eat BEGIN; S (); L ()
| PRINT => eat PRINT; E ()
and L () = case !tok of
END => eat END
| SEMI => eat SEMI; S (); L ()

and E () = eat NUM; eat EQ; eat NUM
Step 3: write parser => one function per non-terminal; one clause per rule
val tok = ref (getToken ())
fun advance () = tok := getToken ()
fun eat t = if (! tok = t) then advance () else error ()
datatype token = NUM | IF
| THEN | ELSE | BEGIN | END
| PRINT | SEMI | EQ

1.S ::= A EOF
2.A ::= ID := E
3. | PRINT ( L )
4. E ::= ID
5. | NUM
6. L ::= E
7. | L , E
non-terminals: S, A, E, L
rules:
fun S () = A (); eat EOF
and A () = case !tok of
ID => eat ID; eat ASSIGN; E ()
| PRINT => eat PRINT; eat LPAREN; L (); eat RPAREN
and E () = case !tok of
ID => eat ID
| NUM => eat NUM

and L () = case !tok of
ID => ???
| NUM => ???

problem
•predictive parsing only works for
grammars where the first terminal symbol
in the input provides enough information to
choose which production to use
–LL(1)
•when parsing L, if !tok = ID, the parser
cannot determine which production to use:
6. L ::= E (E could be ID)
7. | L , E (L could be E could be ID)

solution
•eliminate left-recursion
•rewrite the grammar so it parses the same
language but the rules are different:
L ::= E
| L , E
S ::= A EOF
A ::= ID := E
| PRINT ( L )
E ::= ID
| NUM
S ::= A EOF
A ::= ID := E
| PRINT ( L )
E ::= ID
| NUM

solution
•eliminate left-recursion
•rewrite the grammar so it parses the same
language but the rules are different:
L ::= E
| L , E
L ::= E M
M ::= , E M
|
S ::= A EOF
A ::= ID := E
| PRINT ( L )
E ::= ID
| NUM
S ::= A EOF
A ::= ID := E
| PRINT ( L )
E ::= ID
| NUM

eliminating single left-recursion
•Original grammar form:
•Transformed grammar:
X ::= base
| X repeat
X ::= base Xnew
Xnew ::= repeat Xnew
|
Strings: base repeat repeat ...
Strings: base repeat repeat ...
Think about: what if you have mutually left-recursive variables X,Y,Z?
What’s the most general pattern of left recursion? How to eliminate it?

Recursive Descent Parsing
•Unfortunately, can’t always eliminate left
recursion
•Questions:
–how do we know when we can parse
grammars using recursive descent?
–Is there an algorithm for generating such
parsers automatically?

Constructing RD Parsers
•To construct an RD parser, we need to know
what rule to apply when
–we are trying to parse a non terminal X
–we see the next terminal a in input
•We apply rule X ::= s when
–a is the first symbol that can be generated by string s,
OR
–s reduces to the empty string (is nullable) and a is the
first symbol in any string that can follow X

Constructing RD Parsers
•To construct an RD parser, we need to know
what rule to apply when
–we are trying to parse a non terminal X
–we see the next terminal a in input
•We apply rule X ::= s when
–a is the first symbol that can be generated by string s,
OR
–s reduces to the empty string (is nullable) and a is the
first symbol in any string that can follow X

Constructing Predictive Parsers
1. Y ::=
2. | bb
3. X ::= c
4. | Y Z
5. Z ::= d
X c
X b
X d
non-terminal
seen
next
terminal rule

Constructing Predictive Parsers
1. Y ::=
2. | bb
3. X ::= c
4. | Y Z
5. Z ::= d
X c 3
X b
X d
non-terminal
seen
next
terminal rule

Constructing Predictive Parsers
1. Y ::=
2. | bb
3. X ::= c
4. | Y Z
5. Z ::= d
X c 3
X b 4
X d
non-terminal
seen
next
terminal rule

Constructing Predictive Parsers
1. Y ::=
2. | bb
3. X ::= c
4. | Y Z
5. Z ::= d
X c 3
X b 4
X d 4
non-terminal
seen
next
terminal rule

Constricting Predictive Parsers
•in general, must compute:
–for each production X ::= s, must determine if
s can derive the empty string.
•if yes, X  Nullable
–for each production X := s, must determine
the set of all first terminals Q derivable from s
•Q  First(X)
–for each non terminal X, determine all
terminals symbols Q that immediately follow X
•Q  Follow(X)

Iterative Analysis
•Many compilers algorithms are iterative
techniques.
•Iterative analysis applies when:
–must compute a set of objects with some property P
–P is defined inductively. ie, there are:
•base cases: objects o1, o2 “obviously” have property P
•inductive cases: if certain objects (o3, o4) have property
P, this implies other objects (f o3; f o4) have property P
–The number of objects in the set is finite
•or we can represent infinite collections using some finite
notation & we can find effective termination conditions

Iterative Analysis
•general form:
–initialize set S with base cases
–applied inductive rules over and over until you reach a fixed
point
•a fixed point is a set that does not change when you
apply an inductive rule (function)
–Nullable, First and Follow sets can be determined through
iteration
–many program analyses & optimizations use iteration
–worst-case complexity is bad
–average-case complexity can be good: iteration “usually”
terminates in a couple of rounds

Iterative Analysis
Base Cases
“obviously”
have the property

Iterative Analysis
apply function
(rule) to things
that have the
property to
produce new things
that have the property
f
f
f
f

Iterative Analysis
Base Cases
+ things you get
by applying rule
to base cases
have the property

Iterative Analysis
Apply rules again

Iterative Analysis
Apply rules again

Iterative Analysis
Finally, you reach
a fixed point

Iterative Analysis
Example:
• axioms are “obviously true”/taken for granted
• rules of logic take basic axioms and prove new
things are true
“Dave teaches cos 320”
“Dave teaches cos 441”
“Dave is a great teacher”
rule r: X is a great teacher /\ X teaches Y => Y is a great class
axioms:

Iterative Analysis
Example:
• axioms are “obviously true”/taken for granted
• rules of logic take basic axioms and prove new
things are true
“Dave teaches cos 320”
“Dave teaches cos 441”
“Dave is a great teacher”
rule r: X is a great teacher /\ X teaches Y => Y is a great class
axioms:
r
“cos 320 is a great class”
r
“cos 441 is a great class”

Iterative Analysis
Example:
• axioms are “obviously true”/taken for granted
• rules of logic take basic axioms and prove new
things are true
“Dave teaches cos 320”
“Dave teaches cos 441”
“Dave is a great teacher”
rule r: X is a great teacher /\ X teaches Y => Y is a great class
axioms:
r
“cos 320 is a great class”
r
“cos 441 is a great class”
Fixed Point Reached!

Nullable Sets
•Non-terminal X is Nullable only if the
following constraints are satisfied
–base case:
•if (X := ) then X is Nullable
–inductive case:
•if (X := ABC...) and A, B, C, ... are all Nullable then
X is Nullable

Computing Nullable Sets
•Compute X is Nullable by iteration:
–Initialization:
•Nullable := { }
•if (X := ) then Nullable := Nullable U {X}
–While Nullable different from last iteration do:
•for all X,
–if (X := ABC...) and A, B, C, ... are all Nullable then
Nullable := Nullable U {X}

First Sets
•First(X) is specified like this:
–base case:
•if T is a terminal symbol then First (T) = {T}
–inductive case:
•if X is a non-terminal and (X:= ABC...) then
–First (X) = First (ABC...)
where First(ABC...) = F1 U F2 U F3 U ... and
»F1 = First (A)
»F2 = First (B), if A is Nullable; emptyset otherwise
»F3 = First (C), if A is Nullable & B is Nullable; emp...
»...

Computing First Sets
•Compute First(X):
–initialize:
•if T is a terminal symbol then First (T) = {T}
•if T is non-terminal then First(T) = { }
–while First(X) changes (for any X) do
•for all X and all rules (X:= ABC...) do
–First (X) := First(X) U First (ABC...)
where First(ABC...) := F1 U F2 U F3 U ... and
»F1 := First (A)
»F2 := First (B), if A is Nullable; emptyset otherwise
»F3 := First (C), if A is Nullable & B is Nullable; emp...
»...

Computing Follow Sets
•Follow(X) is computed iteratively
–base case:
•initially, we assume nothing in particular follows X
–(when computing, Follow (X) is initially { })
–inductive case:
•if (Y := s1 X s2) for any strings s1, s2 then
–Follow (X) = First (s2)
•if (Y := s1 X s2) for any strings s1, s2 then
–Follow (X) = Follow(Y), if s2 is Nullable

building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z
Y
X

building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no
Y yes
X no
base case

building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no
Y yes
X no
after one round of induction, we realize we have reached a fixed point

building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no { }
Y yes { }
X no { }
base case

building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d
Y yes c
X no a,b
round 1

building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b
Y yes c
X no a,b
round 2

building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b
Y yes c
X no a,b
after three rounds of iteration, no more changes ==> fixed point

building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c { }
X no a,b { }
base case

building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,bc,e,d,a,b
after one round of induction, no fixed point

building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,bc,e,d,a,b
after two rounds of induction, fixed point
(but notice, computing Follow(X) before Follow (Y) would have required 3
rd
round)

Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Grammar: Computed Sets:
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
a b c d e
Z
Y
X
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T

Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Grammar: Computed Sets:
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
a b c d e
Z Z ::= XYZZ ::= XYZ
Y
X
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T

Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Grammar: Computed Sets:
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
a b c d e
Z Z ::= XYZZ ::= XYZ Z ::= d
Y
X
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T

Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Grammar: Computed Sets:
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
a b c d e
Z Z ::= XYZZ ::= XYZ Z ::= d
Y Y ::= c
X
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T

Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Grammar: Computed Sets:
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
a b c d e
Z Z ::= XYZZ ::= XYZ Z ::= d
Y Y ::= Y ::= Y ::= c Y ::= Y ::=
X
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T

Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Grammar: Computed Sets:
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
a b c d e
Z Z ::= XYZZ ::= XYZ Z ::= d
Y Y ::= Y ::= Y ::= c Y ::= Y ::=
X X ::= aX ::= b Y e
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T

Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Grammar: Computed Sets:
What are the blanks?
a b c d e
Z Z ::= XYZZ ::= XYZ Z ::= d
Y Y ::= Y ::= Y ::= c Y ::= Y ::=
X X ::= aX ::= b Y e

Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Grammar: Computed Sets:
What are the blanks? --> syntax errors
a b c d e
Z Z ::= XYZZ ::= XYZ Z ::= d
Y Y ::= Y ::= Y ::= c Y ::= Y ::=
X X ::= aX ::= b Y e

Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Grammar: Computed Sets:
Is it possible to put 2 grammar rules in the same box?
a b c d e
Z Z ::= XYZZ ::= XYZ Z ::= d
Y Y ::= Y ::= Y ::= c Y ::= Y ::=
X X ::= aX ::= b Y e

Z ::= X Y Z
Z ::= d
Z ::= d e
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullablefirstfollow
Z no d,a,b { }
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Grammar: Computed Sets:
Is it possible to put 2 grammar rules in the same box?
a b c d e
Z Z ::= XYZZ ::= XYZ Z ::= d
Z ::= d e
Y Y ::= Y ::= Y ::= c Y ::= Y ::=
X X ::= aX ::= b Y e

predictive parsing tables
•if a predictive parsing table constructed this way
contains no duplicate entries, the grammar is
called LL(1)
–Left-to-right parse, Left-most derivation, 1 symbol
lookahead
•if not, of the grammar is not LL(1)
•in LL(k) parsing table, columns include every k-
length sequence of terminals:
aa ab ba bb ac ca ...

another trick
•Previously, we saw that grammars with
left-recursion were problematic, but could
be transformed into LL(1) in some cases
•the example non-LL(1) grammar we just
saw:
•how do we fix it?
Z ::= X Y Z
Z ::= d
Z ::= d e
Y ::= c
Y ::=
X ::= a
X ::= b Y e

another trick
•Previously, we saw that grammars with
left-recursion were problematic, but could
be transformed into LL(1) in some cases
•the example non-LL(1) grammar we just
saw:
•solution here is left-factoring:
Z ::= X Y Z
Z ::= d
Z ::= d e
Y ::= c
Y ::=
X ::= a
X ::= b Y e
Z ::= X Y Z
Z ::= d W Y ::= c
Y ::=
X ::= a
X ::= b Y e
W ::=
W ::= e

summary
•CFGs are good at specifying programming language structure
•parsing general CFGs is expensive so we define parsers for
simple classes of CFG
–LL(k), LR(k)
•we can build a recursive descent parser for LL(k) grammars by:
–computing nullable, first and follow sets
–constructing a parse table from the sets
–checking for duplicate entries, which indicates failure
–creating an ML program from the parse table
•if parser construction fails we can
–rewrite the grammar (left factoring, eliminating left recursion) and try
again
–try to build a parser using some other method
Tags