Introduction to LR Parsing: Simple LR The most prevalent type of bottom-up parser today is based on a concept called LR(k) parsing; the "L" is for left-to-right scanning of the input, the "R" for constructing a rightmost derivation in reverse, and the k for the number of input symbols of lookahead that are used in making parsing decisions. The cases k = 0 or k = 1 are of practical interest, and we shall only consider LR parsers with k 5 1 here. When (k) is omitted, k is assumed to be 1
This section introduces the basic concepts of LR parsing and the easiest method for constructing shift-reduce parsers, called "simple LR" (or SLR, for short). Some familiarity with the basic concepts is helpful even if the LR parser itself is constructed using an automatic parser generator. We begin with "items" and "parser states;" the diagnostic output from an LR parser generator typically includes parser states, which can be used to isolate the sources of parsing conflicts.
Examples: Example : Consider the augmented expression grammar: E' + E E -+ E+T (T T + T*F 1 F E -+ (E) I id If I is the set of one item {[E' -+ .El}, then CLOSURE(I) contains the set of items I. in Fig. 4.3
To see how the closure is computed, E' -+ -E is put in CLOSURE(I) by rule (1). Since there is an E immediately to the right of a dot, we add the E-productions with dots at the left ends: E -+ .E + T and E -+ ST. Now there is a T immediately to the right of a dot in the latter item, so we add T -+ ST * F and T -+ .F. Next, the F to the right of a dot forces us to add F + .(E) and F -+ -id, but no other items need to be added
1. Kernel items: the initial item, S' -+ .S, and all items whose dots are not at the left end. 2. Nonkernel items: all items with their dots at the left end, except for S' -+ .S.
Table
The codes for the actions are: 1. si means shift and stack state i , 2. rj means reduce by the production numbered j, 3. acc means accept, 4. blank means error. Note that the value of GOTO[S, a] for terminal a is found in the ACTION field connected with the shift action on input a for state s. The GOTO field gives GOTO[S, A] for nonterminals A. Although we have not yet explained how the entries for Fig. 4.37 were selected, we shall deal with this issue short
More Powerful LR Parsers: we shall extend the previous LR parsing techniques to use one symbol of lookahead on the input. There are two different methods: 1. The "canonical-LR" or just "LR" method, which makes full use of the lookahead symbol(s). This method uses a large set of items, called the LR(1) items. 2. The " lookahead -LR" or "LALR" method, which is based on the LR(0) sets of items, and has many fewer states than typical parsers based on the LR(1) items. By carefully introducing lookaheads into the LR(0) items, we can handle many more grammars with the LALR method than with the SLR method, and build parsing tables that are no bigger than the SLR tables. LALR is the method of choice in most situations.
Canonical LR(1) Items : We shall now present the most general technique for constructing an LR parsing table from a grammar. Recall that in the SLR method, state i calls for reduction by A -+ a if the set of items Ii contains item [A --+ as] and a is in FOLLOW(A). In some situations, however, when state i appears on top of the stack, the viable prefix pa on the stack is such that PA cannot be followed by a in any right-sentential form. Thus, the reduction by A -+ a should be invalid on input a.
Examples: Example : Consider the following augmented grammar. We begin by computing the closure of {[St -+ -S, $1). To close, we match the item [St -+ -S, $1 with the item [A -+ a-BP, a] in the procedure CLOSURE. That is, A = St, a = e, B = S, P = e, and a = $. Function CLOSURE tells us to add [B -+ .y, b] for each production B -+ y and terminal b in FIRST(P~). In terms of the present grammar, B -+ y must be S -+ CC, and since ,8 is c and a is $, b may only be $. Thus we add [S -+ .CC, $1
We continue to compute the closure by adding all items [C -+ .y, b] for b in FIRST(C$). That is, matching [S -+ .CC, $1 against [A -+ a.B,O , a], we have A = S, a = 6, B = C, p = C, and a = $. Since C does not derive the empty string, FIRST(C$) = FIRST(C). Since FIRST@) contains terminals c and d, we add items [C -+ - cC , c], [C -+ . cC , dl, [C -t -d, c] and [C -+ -d, dl. None of the new items has a nonterminal immediately to the right of the dot, so we have completed our first set of LR(1) items. The initial set of items is I0: S->.S,$ S -> .CC, $ C -> . cC , c/d C -> .d, c/d
Constructing LALR Parsing Table We now introduce our last parser construction method, the LALR ( Eoolcahead - LR) technique. This method is often used in practice, because the tables obtained by it are considerably smaller than the canonical LR tables, yet most common syntactic constructs of programming languages can be expressed conveniently by an LALR grammar. The same is almost true for SLR grammars, but there are a few constructs that cannot be conveniently handled by SLR techniques