LL(1) and the LR family of parsers used in compilers
MandarMitra1
7 views
42 slides
Dec 11, 2024
Slide 1 of 42
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
About This Presentation
Based on the material in Compilers Principles Techniques and Tools by Aho, Lam, Sethi and Ullman
Size: 120.16 KB
Language: en
Added: Dec 11, 2024
Slides: 42 pages
Slide Content
Compiler Construction: Parsing
Mandar Mitra
Indian Statistical Institute
M. Mitra (ISI) Parsing 1 / 39
Context-free grammars
Reference: Section 4.2
Formal way of specifying rules about the structure/syntax of a
program
terminals - tokens
non-terminals - represent higher-level structures of a program
start symbol, productions
Example:
E!EopEj(E)j Ejid
op!+j j j=j%
NOTE:recall token vs. lexeme difference
Derivation:starting from the start symbol, use productions to
generate a string (sequence of tokens)
Parse tree:pictorial representation of a derivation
M. Mitra (ISI) Parsing 2 / 39
Ambiguity
Reference: Section 4.2, 4.3
Left-most derivation: at each step, replace left-most non-terminal
Ambiguous grammar:Gis ambiguous if a string has>1left-most (or
right-most) derivation
ALT:Gis ambiguous if>1parse tree can be constructed for a string
Examples:
1.E!E+E E!EflE
2.stmt!ifexprthenstmtj
ifexprthenstmtelsestmt
[SOLUTION:stmt !matchedjunmatched
matched!ifEthenmatchedelsematched
unmatched!ifEthenstmtj
ifEthenmatchedelseunmatched ]
M. Mitra (ISI) Parsing 3 / 39
Top-down parsing - I
Reference: Section 4.4
Recursive descent parsing:
Corresponds to finding a leftmost derivation for an input string
Equivalent to constructing parse tree in pre-order
Example:
Grammar:S!cAd A!abja
Input: cad
Problems:
1.backtracking involved ()buffering of tokens required)
2.left recursion will lead to infinite looping
3.left factors may cause several backtracking steps
M. Mitra (ISI) Parsing 4 / 39
Top-down parsing - I
Reference: Section 4.3
Left recursion:Gistb left recursive if for some non-terminalA,A
+
)A
Simple case I:A!Aj )
A!A
′
A
′
!A
′
jϵ
Simple case II:
A!A1jA2j: : :jAmj
1j: : :jn
+
A!1A
′
j: : :jnA
′
A
′
!1A
′
j2A
′
j: : :jmA
′
jϵ
M. Mitra (ISI) Parsing 5 / 39
Top-down parsing - I
General case:
Input:Gwithout any cycles (A
+
)A) or"-productions
Output: equivalent non-recursive grammar
Algorithm:
Let the non-terminals beA1; : : : An.
fori= 1tondo
forj= 1toi1do
replaceAi!AjbyAi!1j2j: : :jk
whereAj!1j2j: : :jkare thecurrentAjproductions
end for
Eliminate immediate left-recursion forAi.
end for
M. Mitra (ISI) Parsing 6 / 39
Top-down parsing - I
Left factoring:
Example:stmt!if (expr)stmtj
if (expr)stmtelsestmt
Algorithm:
whileleft factors existdo
foreach non-terminalAdo
Find longest prefixcommon to2rules
ReplaceA!1j: : :jnj : : :
byA!A
′
j : : :
A
′
!1j: : :jn
end for
end while
M. Mitra (ISI) Parsing 7 / 39
Top-down parsing - II
Reference: Section 4.4
Predictive parsing:recursive descent parsing without backtracking
Principle:Given current input symbol and non-terminal, we should be
able to determine which production is to be used
Example:stmt!if (expr): : :j
while: : :j
for: : :
M. Mitra (ISI) Parsing 8 / 39
Top-down parsing - II
Implementation:use transition diagrams (1 per non-terminal)
A!X1X2: : : Xns F
X
1 X
2 X
n
...
1.IfXiis a terminal, match with next input token and advance to next
state.
2.IfXiis a non-terminal, go to the transition diagram forXi, and
continue. On reaching the final state of that transition diagram,
advance to next state of current transition diagram.
Example:E!E+TjT
T!TflFjF
F!(E)jid
M. Mitra (ISI) Parsing 9 / 39
Top-down parsing - II
Non-recursive implementation:X
a
Stack
Input
TABLE
top
Table: 2-d array s.t.
M[A; a]specifies
A-production to be used
if input symbol isa
Algorithm:
0.Initially: stack contains⟨EOFS⟩, input pointer is at start of input
1.ifX=a=EOF, done
2.ifX=a̸=EOF, pop stack and advance input pointer
3.ifXis non-terminal, lookupM[X; a])X!UV W
popX, pushW; V; U
M. Mitra (ISI) Parsing 10 / 39
FIRST and FOLLOW
FIRST(): set of terminals that begin strings derived from
if
fl
)ϵ, thenϵ2FIRST()
FOLLOW(A): set of terminals that can appear immediately to the right of
Ain some sentential form
FOLLOW(A) =fajS
fl
)Aag
ifAis the rightmost symbol in any sentential form, then
EOF2FOLLOW(A)
M. Mitra (ISI) Parsing 11 / 39
FIRST and FOLLOW
FIRST:
1.ifXis a terminal, thenFIRST(X) =fXg
2.ifX!ϵis a production, then addϵtoFIRST(X)
3.ifX!Y1Y2: : : Ykis a production:
ifa2FIRST(Yi)andϵ2FIRST(Y1); : : : ;FIRST(Yi1),
addatoFIRST(X)
ifϵ2FIRST(Yi)8i, addϵtoFIRST(X)
FOLLOW:
1.Add EOF toFOLLOW(S)
2.For each production of the formA!B
(a)addFIRST()nfϵgtoFOLLOW(B)
(b)if=ϵorϵ2FIRST(), then add everything inFOLLOW(A)to
FOLLOW(B)
M. Mitra (ISI) Parsing 12 / 39
Table construction
Algorithm:
1.For each productionA!
(a)for each terminala2FIRST(), addA!toM[A; a]
(b)ifϵ2FIRST(), addA!toM[A; b]for each terminal
b2FOLLOW(A)
2.Mark all other entries “error”
Example:
Grammar:E!E+TjT
T!TflFjF
F!(E)jid
Input: id + id * id
M. Mitra (ISI) Parsing 13 / 39
LL(1) grammars
If the table has no multiply defined entries, grammar istbLL(1)
L- left-to-rightL- leftmost 1 - lookahead
IfGisLL(1), thenGcannot be left-recursive or ambiguous
Example:S!i E t S S
′
ja
S
′
!e Sjϵ
E!b
M[S
′
; e] =fS
′
!ϵ; S
′
!e Sg
Some non-LL(1)grammars may be transformed into equivalent
LL(1)grammars
M. Mitra (ISI) Parsing 14 / 39
LL(1) parsing
Error recovery:
1.ifXis a terminal, butX̸=a, popX
2.ifM[X; a]is blank, skipa
3.ifM[X; a] =synch, popX, but do not advance input pointer
Synch sets:
useFOLLOW(A)
add theFIRSTset of a higher-level non-terminal to thesynchset of a
lower-level non-terminal
M. Mitra (ISI) Parsing 15 / 39
Bottom-up parsing: terminology
Sentential form:any stringof terminals/non-terminals s.t.S
fl
)
Handle:for sentential form, handle is a specific substringofs.t.
A!is a production
replacing the occurrence ofinbyAproduces theprevious
right-sentential formin a rightmost derivation of
position ofis important
Example:sentential form:id+idflid, handle:idproduction:F!id
Properties:
1.string to right of handle must consist of terminals only
2.ifGis unambiguous, every right-sentential form has a unique handle
M. Mitra (ISI) Parsing 17 / 39
Bottom-up parsing: basic algorithm
Constructing parse tree forPback-calculating the derivation forP
Input:programPsequence of tokens returned by lexical analyser
Output:parse-tree / reverse derivation ofP
Method:
1.Initialisation: P
2.while(̸=S)// parse tree / reverse derivation not complete
(a)identifyhandleA!in(current sentential form)
(b) replace(; ; A)
Advantages:
1.No backtracking
2.More powerful thanLL(1)/ predictive parsing
How do we identify the handle in a given right-sentential form?
M. Mitra (ISI) Parsing 18 / 39
Bottom-up parsing: basic algorithm
Constructing parse tree forPback-calculating the derivation forP
Input:programPsequence of tokens returned by lexical analyser
Output:parse-tree / reverse derivation ofP
Method:
1.Initialisation: P
2.while(̸=S)// parse tree / reverse derivation not complete
(a)identifyhandleA!in(current sentential form)
(b) replace(; ; A)
Advantages:
1.No backtracking
2.More powerful thanLL(1)/ predictive parsing
How do we identify the handle in a given right-sentential form?
M. Mitra (ISI) Parsing 18 / 39
Identifying handlesI
HandleRHS of some production
Item(LR(0)item): production ofGwith a dot at some position in the
RHS, representing how much of the RHS has already been seen at a
given point in the parse
Itemnotation / representation of parser’s current state
Possible cases:
A! ϵX1X2: : : Xk: just starting to construct a handle
A!: : : Xϵa : : :
handle is partially constructed (uptoX)
to proceed, need to getterminala
if next token returned by lexical analyser isa, updated state:
A!: : : Xaϵ: : :
M. Mitra (ISI) Parsing 19 / 39
Identifying handlesII
A!: : : XϵB : : :
handle is partially constructed (uptoX)
to proceed, need to getnon-terminalB
need to construct sub-treeBfrom the next token(s) returned by lexical
analyser
need to consider all productionsB!1; B!2; : : :
new objective: identify one of the following handles
B! ϵ1; B! ϵ2; : : :
A!X1X2: : : Xkϵ
handle construction complete
reduce/ replace RHSX1X2: : : Xkby LHSA
take sub-trees corresponding toX1; X2; : : : ; Xkand make them child
nodes of a common parentA
M. Mitra (ISI) Parsing 20 / 39
LR(0) / SLR parsing
Grammar augmentation:
Create new start symbolS
′
; addS
′
!Sto productions
Closure:
LetIbe a set of items
closure(I) I
repeat until no more changes
for eachA!Bin closure(I)
for each productionB!s.t.B! ̸2closure(I)
addB! to closure(I)
Example: closure(E
′
! E)
M. Mitra (ISI) Parsing 21 / 39
SLR parsing
Goto construction:
goto(I; X)= closure(fA!XjA!X2Ig)
Example: LetI=fE
′
!E; E!E+Tg
goto(I;+)= closure(fE!E+Tg)
Canonical collection construction:
1.C fclosure(fS
′
! Sg)g
2. repeat until no more changes:
for eachI2 C, for each grammar symbolX
if goto(I; X)is not empty and not inC
add goto(I; X)toC
3. LetC=fI0; : : : ; Ingbe the canonical collection ofLR(0)items forG.
4. Create a statesicorresponding to eachIi.
5.Initial state(s0)set containingS
′
! S
M. Mitra (ISI) Parsing 22 / 39
SLR parsing: table constructionI
Parsing tables
action(s; a)
one row per states
one column per tokena, one column for EOF
goto(s; A)
one row per states
one column per non-terminalA
M. Mitra (ISI) Parsing 23 / 39
SLR parsing: table constructionII
Algorithm
1.IfA!a2Iiand goto(Ii; a) =Ij, then action(si; a)= shiftsj.
2.IfA! 2Ii(A̸=S
′
), then action(si; a)= reduceA!for all
a2FOLLOW(A).
3.IfS
′
!S 2Ii, then action(si;EOF)= accept.
4.If goto(Ii; A) =Ij, then goto(si; A) =sj.
5.Mark all blank entries error.
M. Mitra (ISI) Parsing 24 / 39
SLR parsing
Implementation scheme:
0.Use input buffer, stack, and parsing table.
1.Shift0input symbols onto stack until a handleis on top of stack.
2.ReducetoA(i.e. pop symbols ofand pushA).
3.Stop when stack =⟨EOF; S⟩, and input pointer is at EOF.
Stack:soX1s1: : : Xmsm, where eachsirepresents a “state” (current
situation in the parsing process)
Table:
used to guide steps 2 and 3
2-d array indexed by⟨state, input symbol⟩pairs
consists of two parts (action + goto)
M. Mitra (ISI) Parsing 25 / 39
SLR parsing
Algorithm:1. Initially, stack =⟨s0⟩(initial state)
2. Lets- state on top of stack
a- current input symbol
if action[s; a]= shifts
′
create a leaf node corresponding toa
pushnode foraands
′
on stack, advance input pointer
if action[s; a]= reduceA!
create a non-leaf node corresponding toA
popjjnode + state pairs
make thejjnodes children ofA
lets
′
be the new top of stack
pushnode forA, goto[s
′
; A]on stack
if action[s; a]= accept, done
node paired withsroot of parse tree
else error This is actually a sub-tree of
the parse tree rooted atA. This case occurs only when
s=closure(fS
′
!Sg)and
a=EOF.
M. Mitra (ISI) Parsing 26 / 39
SLR parsing: conflicts
1.Shift-reduce conflict:stmt!if (expr)stmtj
if (expr)stmtelsestmt
2.Reduce-reduce conflict:stmt!id (param_list) ;
expr!id (expr_list)
.
.
.
param!id
expr!id
3.Shift-shift conflict: not possible
4.Conflicts in reduce table: not possible
cf. definition of goto() function
M. Mitra (ISI) Parsing 27 / 39
Canonical LR parsers
Motivation:Reduction byA!not necessarily proper even if
a2FOLLOW(A)
)explicitly indicate tokens for which reduction is acceptable
LR(1) item:pair of the form⟨A!; a⟩, whereA!is a
production,ais a terminal orEOF
Properties:
1.⟨A!; a⟩- lookahead has no effect
⟨A!; a⟩- reduce only if input symbol isa
2.faj ⟨A!; a⟩ 2canonical collectiong FOLLOW(A)
M. Mitra (ISI) Parsing 29 / 39
Canonical LR parsers
Closure:
LetIbe a set of items
closure(I) I
repeat until no more changes
for each item⟨A!B; a⟩in closure(I)
for each productionB!and each terminalb2FIRST(a)
if⟨B! ; b⟩ ̸2closure(I)
add⟨B! ; b⟩to closure(I)
Goto:
goto(I; X) =closure(f⟨A!X; a⟩ j ⟨A!X; a⟩ 2Ig)
Canonical collection construction:
1.C fclosure(f⟨S
′
! S;EOF⟩g)g
2. /* Similar to SLR algorithm */
M. Mitra (ISI) Parsing 30 / 39
Canonical LR parsers
Table construction:
1.If⟨A!a; b⟩ 2Iiand goto(Ii; a)=Ijthen action(i; a)= shiftj.
2.If⟨A!; b⟩ 2Ii, then action(i; b)= reduceA!(A̸=S
′
).
If⟨S
′
!S;EOF⟩ 2Ii, then action(i;EOF)= accept.
M. Mitra (ISI) Parsing 31 / 39
LALR parsers
Motivation:try to combine efficiency of SLR parser with power of
canonical method
Core:set ofLR(0)items corresponding to a set ofLR(1)items
Method:
1.Construct canonical collection ofLR(1)items,C=fI0; : : : ; Ing.
2.Merge all sets with the same core. Let the new collection be
C
′
=fJ0; : : : ; Jmg.
3.Construct the action table as before.
4.IfJ=I1[: : :[Ik, then goto(J; X)= union of all sets with the same
core as goto(I1; X).
M. Mitra (ISI) Parsing 32 / 39
SLR vs LR(1) vs LALR
No. of states:SLR=LALR <=LR(1)(cf. Pascal)
Power:SLR < LALR < LR(1)
SLR vs. LALR:LALRitems can be regarded asSLRitems, with the
core augmented by appropriate subsetsofFOLLOW(A)explicitly
specified
LALR vs. LR(1):
1.If there were no shift-reduce conflicts in theLR(1)table, there will be
no shift-reduce conflicts in theLALRtable.
2.Step 2 may generate reduce-reduce conflicts.
Example:I1=f⟨A!; a⟩;⟨B!; b⟩g
I2=f⟨A!; b⟩;⟨B!; a⟩g
3.Correct inputs:LALRparser mimicsLR(1)parser
Incorrect inputs: incorrect reductions may occur on a lookaheada)
parser goes back to a stateIiin whichAhas just been recognized.
Butacannot followAin this state)error
M. Mitra (ISI) Parsing 33 / 39
Error recovery
Reference: Section 4.8
Detection:
CanonicalLR- errors are immediately detected (no unnecessary
shift/reduce actions)
SLR=LALR- no symbols are shifted onto stack, but reductions may
occur before error is detected
Panic mode recovery:
1.Scan down the stack until a stateswith a goto on a “significant”
non-terminalA(e.g.expr,stmt, etc.) is found.
2.Discard input until a symbolawhich can followAis found.
3.PushA;goto(s; A)and resume parsing.
Expln:sAa )
|{z}
a
location of error
M. Mitra (ISI) Parsing 34 / 39
Error recovery
Phrase-level error recovery:recovery by local correction on remaining
input e.g. insertion, deletion, substitution, etc.
Scheme:
1.Consider each blank entry, and decide what the error is likely to be.
2.Call appropriate recovery method
—should consume input (to avoid infinite loop)
—avoid popping a “significant” non-terminal from stack
Examples:
State:E
′
! E
Input:+orfl
Action: pushid, goto appropriate
state
Message: missing operand
State:E!E+T
Input:)
Action: skip ’)’ from input
Message: extra ’)’
M. Mitra (ISI) Parsing 35 / 39
Yacc
Reference: Section 4.9
Usage:$ yacc myfile.y (generatesy.tab.c)
File format:
declarations
%%
grammar rules (terminals, non-terminals, start symbol)
%%
auxiliary procedures (C functions)
Semantic actions:
—$$- attribute value associated with LHS non-terminal
$i- attribute value associated withi-th grammar symbol on
RHS
—specified action is executed on reduction by corresponding
production
—default action:$$ = $1
M. Mitra (ISI) Parsing 36 / 39
M. Mitra (ISI) Parsing 37 / 39
Yacc
Lexical analyzer:yylex()must be provided
should return integer code for a token
should setyylvalto attribute value
Usage with lex:
% lex scanner.l
% yacc parser.y
% cc y.tab.c -ly -ll
Add#include "lex.yy.c"to
third section of file
Declared tokens can be used as return values inscanner.l
M. Mitra (ISI) Parsing 38 / 39
Yacc
Implicit conflict resolution:
1.Shift-reduce: choose shift
2.Reduce-reduce: choose the production listed earlier
Explicit conflict resolution:
Precedence:tokens are assigned precedence according to the order
in which they are declared (lowest first)
Associativity:left, right, or nonassoc
Precedence/assoc. of a production=precedence/assoc. of rightmost
terminal or explicitly specified using%prec
GivenA!; a:
if precedence of production is higher, reduce
if precedence of production is same, and associativity of production is
left, reduce
else shift
M. Mitra (ISI) Parsing 39 / 39