LL(1) and the LR family of parsers used in compilers

MandarMitra1 7 views 42 slides Dec 11, 2024
Slide 1
Slide 1 of 42
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42

About This Presentation

Based on the material in Compilers Principles Techniques and Tools by Aho, Lam, Sethi and Ullman


Slide Content

Compiler Construction: Parsing
Mandar Mitra
Indian Statistical Institute
M. Mitra (ISI) Parsing 1 / 39

Context-free grammars
Reference: Section 4.2
Formal way of specifying rules about the structure/syntax of a
program
terminals - tokens
non-terminals - represent higher-level structures of a program
start symbol, productions
Example:
E!EopEj(E)j Ejid
op!+j j j=j%
NOTE:recall token vs. lexeme difference
Derivation:starting from the start symbol, use productions to
generate a string (sequence of tokens)
Parse tree:pictorial representation of a derivation
M. Mitra (ISI) Parsing 2 / 39

Ambiguity
Reference: Section 4.2, 4.3
Left-most derivation: at each step, replace left-most non-terminal
Ambiguous grammar:Gis ambiguous if a string has>1left-most (or
right-most) derivation
ALT:Gis ambiguous if>1parse tree can be constructed for a string
Examples:
1.E!E+E E!EflE
2.stmt!ifexprthenstmtj
ifexprthenstmtelsestmt
[SOLUTION:stmt !matchedjunmatched
matched!ifEthenmatchedelsematched
unmatched!ifEthenstmtj
ifEthenmatchedelseunmatched ]
M. Mitra (ISI) Parsing 3 / 39

Top-down parsing - I
Reference: Section 4.4
Recursive descent parsing:
Corresponds to finding a leftmost derivation for an input string
Equivalent to constructing parse tree in pre-order
Example:
Grammar:S!cAd A!abja
Input: cad
Problems:
1.backtracking involved ()buffering of tokens required)
2.left recursion will lead to infinite looping
3.left factors may cause several backtracking steps
M. Mitra (ISI) Parsing 4 / 39

Top-down parsing - I
Reference: Section 4.3
Left recursion:Gistb left recursive if for some non-terminalA,A
+
)A
Simple case I:A!Aj )
A!A

A

!A


Simple case II:
A!A1jA2j: : :jAmj
1j: : :jn
+
A!1A

j: : :jnA

A

!1A

j2A

j: : :jmA


M. Mitra (ISI) Parsing 5 / 39

Top-down parsing - I
General case:
Input:Gwithout any cycles (A
+
)A) or"-productions
Output: equivalent non-recursive grammar
Algorithm:
Let the non-terminals beA1; : : : An.
fori= 1tondo
forj= 1toi1do
replaceAi!AjbyAi!1j2j: : :jk
whereAj!1j2j: : :jkare thecurrentAjproductions
end for
Eliminate immediate left-recursion forAi.
end for
M. Mitra (ISI) Parsing 6 / 39

Top-down parsing - I
Left factoring:
Example:stmt!if (expr)stmtj
if (expr)stmtelsestmt
Algorithm:
whileleft factors existdo
foreach non-terminalAdo
Find longest prefixcommon to2rules
ReplaceA!1j: : :jnj : : :
byA!A

j : : :
A

!1j: : :jn
end for
end while
M. Mitra (ISI) Parsing 7 / 39

Top-down parsing - II
Reference: Section 4.4
Predictive parsing:recursive descent parsing without backtracking
Principle:Given current input symbol and non-terminal, we should be
able to determine which production is to be used
Example:stmt!if (expr): : :j
while: : :j
for: : :
M. Mitra (ISI) Parsing 8 / 39

Top-down parsing - II
Implementation:use transition diagrams (1 per non-terminal)
A!X1X2: : : Xns F
X
1 X
2 X
n
...
1.IfXiis a terminal, match with next input token and advance to next
state.
2.IfXiis a non-terminal, go to the transition diagram forXi, and
continue. On reaching the final state of that transition diagram,
advance to next state of current transition diagram.
Example:E!E+TjT
T!TflFjF
F!(E)jid
M. Mitra (ISI) Parsing 9 / 39

Top-down parsing - II
Non-recursive implementation:X
a
Stack
Input
TABLE
top
Table: 2-d array s.t.
M[A; a]specifies
A-production to be used
if input symbol isa
Algorithm:
0.Initially: stack contains⟨EOFS⟩, input pointer is at start of input
1.ifX=a=EOF, done
2.ifX=a̸=EOF, pop stack and advance input pointer
3.ifXis non-terminal, lookupM[X; a])X!UV W
popX, pushW; V; U
M. Mitra (ISI) Parsing 10 / 39

FIRST and FOLLOW
FIRST(): set of terminals that begin strings derived from
if

)ϵ, thenϵ2FIRST()
FOLLOW(A): set of terminals that can appear immediately to the right of
Ain some sentential form
FOLLOW(A) =fajS

)Aag
ifAis the rightmost symbol in any sentential form, then
EOF2FOLLOW(A)
M. Mitra (ISI) Parsing 11 / 39

FIRST and FOLLOW
FIRST:
1.ifXis a terminal, thenFIRST(X) =fXg
2.ifX!ϵis a production, then addϵtoFIRST(X)
3.ifX!Y1Y2: : : Ykis a production:
ifa2FIRST(Yi)andϵ2FIRST(Y1); : : : ;FIRST(Yi1),
addatoFIRST(X)
ifϵ2FIRST(Yi)8i, addϵtoFIRST(X)
FOLLOW:
1.Add EOF toFOLLOW(S)
2.For each production of the formA!B
(a)addFIRST()nfϵgtoFOLLOW(B)
(b)if=ϵorϵ2FIRST(), then add everything inFOLLOW(A)to
FOLLOW(B)
M. Mitra (ISI) Parsing 12 / 39

Table construction
Algorithm:
1.For each productionA!
(a)for each terminala2FIRST(), addA!toM[A; a]
(b)ifϵ2FIRST(), addA!toM[A; b]for each terminal
b2FOLLOW(A)
2.Mark all other entries “error”
Example:
Grammar:E!E+TjT
T!TflFjF
F!(E)jid
Input: id + id * id
M. Mitra (ISI) Parsing 13 / 39

LL(1) grammars
If the table has no multiply defined entries, grammar istbLL(1)
L- left-to-rightL- leftmost 1 - lookahead
IfGisLL(1), thenGcannot be left-recursive or ambiguous
Example:S!i E t S S

ja
S

!e Sjϵ
E!b
M[S

; e] =fS

!ϵ; S

!e Sg
Some non-LL(1)grammars may be transformed into equivalent
LL(1)grammars
M. Mitra (ISI) Parsing 14 / 39

LL(1) parsing
Error recovery:
1.ifXis a terminal, butX̸=a, popX
2.ifM[X; a]is blank, skipa
3.ifM[X; a] =synch, popX, but do not advance input pointer
Synch sets:
useFOLLOW(A)
add theFIRSTset of a higher-level non-terminal to thesynchset of a
lower-level non-terminal
M. Mitra (ISI) Parsing 15 / 39

Bottom-up parsing: basic idea
Reference: Section 4.5
Grammar:E!E+TjT
T!TflFjF
F!(E)jid
Input:id+idflid
(Right-most) Derivation:
E)E+T (1)
)E+TflF(2)
)E+Tflid(3)
)E+Fflid(4)
)E+idflid(5)
)T+idflid(6)
)F+idflid(7)
)id+idflid(8)
Parse tree:
M. Mitra (ISI) Parsing 16 / 39

Bottom-up parsing: basic idea
Reference: Section 4.5
Grammar:E!E+TjT
T!TflFjF
F!(E)jid
Input:id+idflid
(Right-most) Derivation:
E)E+T (1)
)E+TflF(2)
)E+Tflid(3)
)E+Fflid(4)
)E+idflid(5)
)T+idflid(6)
)F+idflid(7)
)id+idflid(8)
Parse tree:
M. Mitra (ISI) Parsing 16 / 39

Bottom-up parsing: basic idea
Reference: Section 4.5
Grammar:E!E+TjT
T!TflFjF
F!(E)jid
Input:id+idflid
(Right-most) Derivation:
E)E+T (1)
)E+TflF(2)
)E+Tflid(3)
)E+Fflid(4)
)E+idflid(5)
)T+idflid(6)
)F+idflid(7)
)id+idflid(8)
Parse tree:
M. Mitra (ISI) Parsing 16 / 39

Bottom-up parsing: terminology
Sentential form:any stringof terminals/non-terminals s.t.S

)
Handle:for sentential form, handle is a specific substringofs.t.
A!is a production
replacing the occurrence ofinbyAproduces theprevious
right-sentential formin a rightmost derivation of
position ofis important
Example:sentential form:id+idflid, handle:idproduction:F!id
Properties:
1.string to right of handle must consist of terminals only
2.ifGis unambiguous, every right-sentential form has a unique handle
M. Mitra (ISI) Parsing 17 / 39

Bottom-up parsing: basic algorithm
Constructing parse tree forPback-calculating the derivation forP
Input:programPsequence of tokens returned by lexical analyser
Output:parse-tree / reverse derivation ofP
Method:
1.Initialisation: P
2.while(̸=S)// parse tree / reverse derivation not complete
(a)identifyhandleA!in(current sentential form)
(b) replace(; ; A)
Advantages:
1.No backtracking
2.More powerful thanLL(1)/ predictive parsing
How do we identify the handle in a given right-sentential form?
M. Mitra (ISI) Parsing 18 / 39

Bottom-up parsing: basic algorithm
Constructing parse tree forPback-calculating the derivation forP
Input:programPsequence of tokens returned by lexical analyser
Output:parse-tree / reverse derivation ofP
Method:
1.Initialisation: P
2.while(̸=S)// parse tree / reverse derivation not complete
(a)identifyhandleA!in(current sentential form)
(b) replace(; ; A)
Advantages:
1.No backtracking
2.More powerful thanLL(1)/ predictive parsing
How do we identify the handle in a given right-sentential form?
M. Mitra (ISI) Parsing 18 / 39

Identifying handlesI
HandleRHS of some production
Item(LR(0)item): production ofGwith a dot at some position in the
RHS, representing how much of the RHS has already been seen at a
given point in the parse
Itemnotation / representation of parser’s current state
Possible cases:
A! ϵX1X2: : : Xk: just starting to construct a handle
A!: : : Xϵa : : :
handle is partially constructed (uptoX)
to proceed, need to getterminala
if next token returned by lexical analyser isa, updated state:
A!: : : Xaϵ: : :
M. Mitra (ISI) Parsing 19 / 39

Identifying handlesII
A!: : : XϵB : : :
handle is partially constructed (uptoX)
to proceed, need to getnon-terminalB
need to construct sub-treeBfrom the next token(s) returned by lexical
analyser
need to consider all productionsB!1; B!2; : : :
new objective: identify one of the following handles
B! ϵ1; B! ϵ2; : : :
A!X1X2: : : Xkϵ
handle construction complete
reduce/ replace RHSX1X2: : : Xkby LHSA
take sub-trees corresponding toX1; X2; : : : ; Xkand make them child
nodes of a common parentA
M. Mitra (ISI) Parsing 20 / 39

LR(0) / SLR parsing
Grammar augmentation:
Create new start symbolS

; addS

!Sto productions
Closure:
LetIbe a set of items
closure(I) I
repeat until no more changes
for eachA!Bin closure(I)
for each productionB!s.t.B! ̸2closure(I)
addB! to closure(I)
Example: closure(E

! E)
M. Mitra (ISI) Parsing 21 / 39

SLR parsing
Goto construction:
goto(I; X)= closure(fA!XjA!X2Ig)
Example: LetI=fE

!E; E!E+Tg
goto(I;+)= closure(fE!E+Tg)
Canonical collection construction:
1.C fclosure(fS

! Sg)g
2. repeat until no more changes:
for eachI2 C, for each grammar symbolX
if goto(I; X)is not empty and not inC
add goto(I; X)toC
3. LetC=fI0; : : : ; Ingbe the canonical collection ofLR(0)items forG.
4. Create a statesicorresponding to eachIi.
5.Initial state(s0)set containingS

! S
M. Mitra (ISI) Parsing 22 / 39

SLR parsing: table constructionI
Parsing tables
action(s; a)
one row per states
one column per tokena, one column for EOF
goto(s; A)
one row per states
one column per non-terminalA
M. Mitra (ISI) Parsing 23 / 39

SLR parsing: table constructionII
Algorithm
1.IfA!a2Iiand goto(Ii; a) =Ij, then action(si; a)= shiftsj.
2.IfA! 2Ii(A̸=S

), then action(si; a)= reduceA!for all
a2FOLLOW(A).
3.IfS

!S 2Ii, then action(si;EOF)= accept.
4.If goto(Ii; A) =Ij, then goto(si; A) =sj.
5.Mark all blank entries error.
M. Mitra (ISI) Parsing 24 / 39

SLR parsing
Implementation scheme:
0.Use input buffer, stack, and parsing table.
1.Shift0input symbols onto stack until a handleis on top of stack.
2.ReducetoA(i.e. pop symbols ofand pushA).
3.Stop when stack =⟨EOF; S⟩, and input pointer is at EOF.
Stack:soX1s1: : : Xmsm, where eachsirepresents a “state” (current
situation in the parsing process)
Table:
used to guide steps 2 and 3
2-d array indexed by⟨state, input symbol⟩pairs
consists of two parts (action + goto)
M. Mitra (ISI) Parsing 25 / 39

SLR parsing
Algorithm:1. Initially, stack =⟨s0⟩(initial state)
2. Lets- state on top of stack
a- current input symbol
if action[s; a]= shifts

create a leaf node corresponding toa
pushnode foraands

on stack, advance input pointer
if action[s; a]= reduceA!
create a non-leaf node corresponding toA
popjjnode + state pairs
make thejjnodes children ofA
lets

be the new top of stack
pushnode forA, goto[s

; A]on stack
if action[s; a]= accept, done
node paired withsroot of parse tree
else error This is actually a sub-tree of
the parse tree rooted atA. This case occurs only when
s=closure(fS

!Sg)and
a=EOF.
M. Mitra (ISI) Parsing 26 / 39

SLR parsing: conflicts
1.Shift-reduce conflict:stmt!if (expr)stmtj
if (expr)stmtelsestmt
2.Reduce-reduce conflict:stmt!id (param_list) ;
expr!id (expr_list)
.
.
.
param!id
expr!id
3.Shift-shift conflict: not possible
4.Conflicts in reduce table: not possible
cf. definition of goto() function
M. Mitra (ISI) Parsing 27 / 39

SLR parsing: conflicts
Example:
Grammar:S!L=R S!R L! flR L!idR!L
Canonical collection:
I0=fS

! S; : : :gI2=fS!L=R; R!Lg: : :
Table: action(2;=)= shift: : :action(2;=)= reduce: : :
NOTE:SLR(1)grammars are unambiguous, but not vice versa.
M. Mitra (ISI) Parsing 28 / 39

Canonical LR parsers
Motivation:Reduction byA!not necessarily proper even if
a2FOLLOW(A)
)explicitly indicate tokens for which reduction is acceptable
LR(1) item:pair of the form⟨A!; a⟩, whereA!is a
production,ais a terminal orEOF
Properties:
1.⟨A!; a⟩- lookahead has no effect
⟨A!; a⟩- reduce only if input symbol isa
2.faj ⟨A!; a⟩ 2canonical collectiong FOLLOW(A)
M. Mitra (ISI) Parsing 29 / 39

Canonical LR parsers
Closure:
LetIbe a set of items
closure(I) I
repeat until no more changes
for each item⟨A!B; a⟩in closure(I)
for each productionB!and each terminalb2FIRST(a)
if⟨B! ; b⟩ ̸2closure(I)
add⟨B! ; b⟩to closure(I)
Goto:
goto(I; X) =closure(f⟨A!X; a⟩ j ⟨A!X; a⟩ 2Ig)
Canonical collection construction:
1.C fclosure(f⟨S

! S;EOF⟩g)g
2. /* Similar to SLR algorithm */
M. Mitra (ISI) Parsing 30 / 39

Canonical LR parsers
Table construction:
1.If⟨A!a; b⟩ 2Iiand goto(Ii; a)=Ijthen action(i; a)= shiftj.
2.If⟨A!; b⟩ 2Ii, then action(i; b)= reduceA!(A̸=S

).
If⟨S

!S;EOF⟩ 2Ii, then action(i;EOF)= accept.
M. Mitra (ISI) Parsing 31 / 39

LALR parsers
Motivation:try to combine efficiency of SLR parser with power of
canonical method
Core:set ofLR(0)items corresponding to a set ofLR(1)items
Method:
1.Construct canonical collection ofLR(1)items,C=fI0; : : : ; Ing.
2.Merge all sets with the same core. Let the new collection be
C

=fJ0; : : : ; Jmg.
3.Construct the action table as before.
4.IfJ=I1[: : :[Ik, then goto(J; X)= union of all sets with the same
core as goto(I1; X).
M. Mitra (ISI) Parsing 32 / 39

SLR vs LR(1) vs LALR
No. of states:SLR=LALR <=LR(1)(cf. Pascal)
Power:SLR < LALR < LR(1)
SLR vs. LALR:LALRitems can be regarded asSLRitems, with the
core augmented by appropriate subsetsofFOLLOW(A)explicitly
specified
LALR vs. LR(1):
1.If there were no shift-reduce conflicts in theLR(1)table, there will be
no shift-reduce conflicts in theLALRtable.
2.Step 2 may generate reduce-reduce conflicts.
Example:I1=f⟨A!; a⟩;⟨B!; b⟩g
I2=f⟨A!; b⟩;⟨B!; a⟩g
3.Correct inputs:LALRparser mimicsLR(1)parser
Incorrect inputs: incorrect reductions may occur on a lookaheada)
parser goes back to a stateIiin whichAhas just been recognized.
Butacannot followAin this state)error
M. Mitra (ISI) Parsing 33 / 39

Error recovery
Reference: Section 4.8
Detection:
CanonicalLR- errors are immediately detected (no unnecessary
shift/reduce actions)
SLR=LALR- no symbols are shifted onto stack, but reductions may
occur before error is detected
Panic mode recovery:
1.Scan down the stack until a stateswith a goto on a “significant”
non-terminalA(e.g.expr,stmt, etc.) is found.
2.Discard input until a symbolawhich can followAis found.
3.PushA;goto(s; A)and resume parsing.
Expln:sAa )
|{z}
a
location of error
M. Mitra (ISI) Parsing 34 / 39

Error recovery
Phrase-level error recovery:recovery by local correction on remaining
input e.g. insertion, deletion, substitution, etc.
Scheme:
1.Consider each blank entry, and decide what the error is likely to be.
2.Call appropriate recovery method
—should consume input (to avoid infinite loop)
—avoid popping a “significant” non-terminal from stack
Examples:
State:E

! E
Input:+orfl
Action: pushid, goto appropriate
state
Message: missing operand
State:E!E+T
Input:)
Action: skip ’)’ from input
Message: extra ’)’
M. Mitra (ISI) Parsing 35 / 39

Yacc
Reference: Section 4.9
Usage:$ yacc myfile.y (generatesy.tab.c)
File format:
declarations
%%
grammar rules (terminals, non-terminals, start symbol)
%%
auxiliary procedures (C functions)
Semantic actions:
—$$- attribute value associated with LHS non-terminal
$i- attribute value associated withi-th grammar symbol on
RHS
—specified action is executed on reduction by corresponding
production
—default action:$$ = $1
M. Mitra (ISI) Parsing 36 / 39

M. Mitra (ISI) Parsing 37 / 39

Yacc
Lexical analyzer:yylex()must be provided
should return integer code for a token
should setyylvalto attribute value
Usage with lex:
% lex scanner.l
% yacc parser.y
% cc y.tab.c -ly -ll
Add#include "lex.yy.c"to
third section of file
Declared tokens can be used as return values inscanner.l
M. Mitra (ISI) Parsing 38 / 39

Yacc
Implicit conflict resolution:
1.Shift-reduce: choose shift
2.Reduce-reduce: choose the production listed earlier
Explicit conflict resolution:
Precedence:tokens are assigned precedence according to the order
in which they are declared (lowest first)
Associativity:left, right, or nonassoc
Precedence/assoc. of a production=precedence/assoc. of rightmost
terminal or explicitly specified using%prec
GivenA!; a:
if precedence of production is higher, reduce
if precedence of production is same, and associativity of production is
left, reduce
else shift
M. Mitra (ISI) Parsing 39 / 39