Syntax analysis

EsmeraldaAkshu1 9,953 views 58 slides Jun 04, 2016
Slide 1
Slide 1 of 58
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58

About This Presentation

Syntax analysis


Slide Content

SYNTAX
ANALYSIS

Syntax Analyzer
Syntax Analyzercreates the syntactic structure of the given
source program.
This syntactic structure -parse tree.
Syntax Analyzer is also known as parser.
The syntax analyzer (parser) checks whether a given source
program satisfies the rules implied by a context-free grammar
or not.
If it satisfies, the parser creates the parse tree of that program.
Otherwise the parser gives the error messages.

INTRODUCTION
Everyprogramminglanguagehaspreciserulesthatprescribethesyntactic
structureofwell-formedprograms.
Programismadeupoffunctions,afunctionoutofdeclarationsandstatements,a
statementoutofexpressions
Thesyntaxofprogramminglanguageconstructscanbespecifiedbycontext-
freegrammars
A context-free grammar
gives a precise syntactic specification of a programming language.
the design of the grammar is an initial phase of the design of a
compiler.
a grammar can be directly converted into a parser by some tools.

Parser
•Parser works on a stream of tokens.
•The smallest item is a token.
Lexical
Analyzer
Parser
source
program
token
get next token
parse tree

Parsing
Parsing is the process of determining whether a string of tokens can be
generated by a grammar.
Parsing methods
The top-down
Bottom-upmethods.
Top-down parsing, construction starts at the root and proceeds to the
leaves.
Bottom-up parsing, construction starts at the leaves and proceeds towards
the root.
Top-down parsers are easy to build by hand.
Bottom-up parsing,
Can handle a larger class of grammars.
They are not as easy to build, but tools for generating them directly from a grammar are available.
Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time).

Top-Down Parsing
Donebystartingwiththeroot,labeledwiththestartingnonterminalstmt,
andrepeatedlyperformingthefollowingtwosteps.
AtnodeN,labeledwithnonterminalA,selectoneoftheproductionsforAand
constructchildrenatNforthesymbolsintheproductionbody.
Findthenextnodeatwhichasubtreeistobeconstructed,typicallytheleftmost
unexpandednonterminalofthetree.
Thecurrentterminalbeingscannedintheinputisfrequentlyreferredtoas
thelookaheadsymbol.

Top-Down Parsing

Top-Down Parsing

Top-Down Parsing

Top-Down Parsing
Top-Down Parsing is an attempt to find a left-most
derivation for an input string
Example:
S cAd Find a derivation for
A ab | a for w cad
S S Backtrack S
/ | \ / | \  / | \
c A d c A d c A d
/ \ |
a b a

Predictive Parsing
Recursive-descentparsingisatop-downmethodofsyntaxanalysisin
whichasetofrecursiveproceduresisusedtoprocesstheinput.
Simpleformofrecursivedescent–PredictiveParsing

Syntax Error Handling
Goals in error handling
Report the presence of errors clearly and accurately.
Recover from each error quickly enough to detect subsequent errors.
Add minimal overhead to the processing of correct programs.

Error-Recovery Strategies
The simplest approach is for the parser to quit with an informative error
message when it detects the first error.
Panic-mode recovery
Phrase-level recovery
Error-productions
Global-correction.

Panic-Mode Recovery
Theparserdiscardsinputsymbolsoneatatimeuntiloneofadesignatedsetof
synchronizingtokensisfound.
Thesynchronizingtokensareusuallydelimiters,suchas;or}.
Skipsaconsiderableamountofinputwithoutcheckingforadditionalerrors
Ithastheadvantageofsimplicity,andisguaranteednottogointoaninfinite
loop.

Phrase-Level Recovery
Perform local correction on the remaining input;
It may replace a prefix of the remaining input by some string that allows the
parser to continue.
A typical local correction is to replace a comma by a semicolon.
Delete an extraneous semicolon.
Insert a missing semicolon.
Disadvantage in coping with situations in which the actual error has occurred
before the point of detection.

Error Productions
Expand the grammar for the language at hand with productions that generate the
erroneous constructs.
The parser can then generate appropriate error diagnostics about the erroneous
construct that has been recognized in the input.

Global Correction
Compilertomakeasfewchangesaspossibleinprocessinganincorrectinput
string.
GivenanincorrectinputstringxandgrammarG,algorithmswillfindaparse
treeforarelatedstringy,suchthatthenumberofinsertions,deletions,and
changesoftokensrequiredtotransformxintoyisassmallaspossible.
Notimplemented.

Syntax Definition
Agrammardescribesthehierarchicalstructureofprogramminglanguageconstructs.
Eg:if(expression)statementelsestatement
Anif-elsestatementistheconcatenationofthekeywordif,anopeningparenthesis,an
expression,aclosingparenthesis,astatement,thekeywordelse,andanotherstatement.
Stmt->if(expr)stmtelsestmt
Ruleiscalledaproduction.
Inaproduction,lexicalelementsifandtheparenthesesarecalledterminals.
Variableslikeexprandstmtarecallednonterminals.

A Context Free Grammar
A context-free grammar has four components:
A set of terminal symbols, sometimes referred to as "tokens.“
A set of nonterminals, sometimes called "syntactic variables."
A set of productions, where each production consists of a nonterminal,called the head or
left side of the production, an arrow, and a sequence of terminals and/or nonterminals ,
called the body or right side of the production
A designation of one of the nonterminals as the start symbol.

A Context Free Grammar
The terminal symbols are

Notational Conventions
These symbols are terminals:
Lowercase letters early in the alphabet, such as a, b, c.
Operator symbols such as +, *, and so on.
Punctuation symbols such as parentheses, comma, and so on.
The digits 0, 1, . . . , 9.
Boldface strings such as idor if, each of which represents a single terminal
symbol.

Notational Conventions
Thesesymbolsarenonterminals:
Uppercaselettersearlyinthealphabet,suchasA,B,C.
Theletters,which,whenitappears,isusuallythestartsymbol.
Lowercase,italicnamessuchasexprorstmt.
Uppercaselettersmaybeusedtorepresentnonterminalsfortheconstructs.
Forexample,nonterminalsforexpressions,terms,andfactorsareoften
representedbyE,T,andF,respectively.

Notational Conventions
Uppercaseletterslateinthealphabet,suchasX,Y,Z,representgrammar
symbols;thatis,eithernonterminalsorterminals.
Lowercaseletterslateinthealphabet,chieflyu,v,...,z,represent(possibly
empty)stringsofterminals.
Lowercasegreekletters,α,β,γforexample,represent(possiblyempty)strings
ofgrammarsymbols.
Asetofproductionsa->α1,a->α2,...,a->αkwithacommonhead
A(callthema-productions),maybewrittenA->α1Iα2I.,.Iαk·callα1,
α2,...,αkthealternativesforA.
Unlessstatedotherwise,theheadofthefirstproductionisthestartsymbol

Notational Conventions

Derivations
E E+E :E+E derives from E
E E+E id+E id+id
A sequence of replacements of non-terminal symbols is called a derivation
of id+id from E.
Aif there is a production rule Ain our grammar and and
are arbitrary strings of terminal and non-terminal symbols

1 
2 ... 
n (
n derives from 
1 or 
1 derives 
n )
 : derives in one step
 : derives in zero or more steps
 : derives in one or more steps
*
+

CFG -Terminology
L(G) is the language of G(the language generated by G) which is a set of
sentences.
A sentence of L(G)is a string of terminal symbols of G.
If S is the start symbol of G then
is a sentence of L(G) iff S where is a string of terminals of G
If G is a context-free grammar, L(G) is a context-free language.
Two grammars are equivalentif they produce the same language.
S -If contains non-terminals, it is called as a sententialform of G.
-If does not contain non-terminals, it is called as a sentenceof G.
*
*

Derivation Example
E-E -(E) -(E+E) -(id+E) -(id+id)
OR
E-E -(E) -(E+E) -(E+id) -(id+id)
At each derivation step, we can choose any of the non-terminal in the
sentential form of G for the replacement.
If we always choose the left-most non-terminal in each derivation
step, this derivation is called as left-most derivation.
If we always choose the right-most non-terminal in each derivation
step, this derivation is called as right-most derivation.

Left-Most and Right-Most Derivations
Left-Most Derivation
E -E -(E) -(E+E) -(id+E) -(id+id)
Right-Most Derivation
E -E -(E) -(E+E) -(E+id) -(id+id)
We will see that the top-down parsers try to find the left-most derivation of the
given source program.
We will see that the bottom-up parsers try to find the right-most derivation of
the given source program in the reverse order.
lmlmlmlm
lm
rmrmrmrmrm

Parse Trees and Derivations
A parse tree is a graphical representation of a derivation that filters out the
order in which productions are applied to replace nonterminals.
The interior node is labeled with the nonterminal A in the head of the
production;
The children of the node are labeled, from left to right, by the symbols in the
body of the production
The leaves of a parse tree are labeled by nonterminals or terminals
Read from left to right, constitute a sentential form, called the yield or frontier
of the tree.
There is a many-to-one relationship between derivations and parse trees.

Ambiguity
a grammar that produces more than one parse tree for some sentence is said
to be ambiguous

1
2
3
4
a
b
c
d
e
f

Writing a Grammar
Grammars are capable of describing most, of the syntax of programming
languages .
Grammar should be unambiguous.
Left-recursion elimination and left factoring -are useful for rewriting
grammars .
From the resulting grammar we can create top down parsers without
backtracking.
Such parsers are called predictive parsers or recursive-descent parser

Eliminating Ambiguity
ambiguous grammar can be rewritten to eliminate the ambiguity.
stmt -> if exprthen stmt
|if exprthen stmt else stmt
|other
is ambiguous since the string
if E1 then if E2 then S1 else S2 has the two parse trees

Two parse trees for an ambiguous sentence

Eliminating Ambiguity
The general rule is, "Match each else with the closest unmatched then."

Left Recursion
A grammar is left recursiveif it has a non-terminal A such that there is a
derivation.
A Afor some string 
Top-down parsing techniques cannot handle left-recursive grammars.
The left-recursion may appear in a single step of the derivation (immediate left-
recursion), or may appear in more than one step of the derivation.
*

Immediate Left-Recursion
A A |  where does not start with A
 eliminate immediate left recursion
A A

A

A

| 
A A 
1| ... | A 
m| 
1| ... | 
n where 
1... 
ndo not start with A
 eliminate immediate left recursion
A 
1A

| ... | 
nA

A


1A

| ... | 
mA

|  an equivalent grammar
In general,

Left-Recursion --Problem
•A grammar cannot be immediately left-recursive, but it still can
be left-recursive.
S Aa | b
A Sc | d
SAa Sca
ASc Aac causes to a left-recursion

Eliminate Left-Recursion --Algorithm
-Arrange non-terminals in some order: A
1... A
n
-fori from1 to n do{
-forj from1 toi-1 do{
replace each production
A
iA
j
by
A
i
1| ... | 
k
where A
j
1| ... | 
k
}
-eliminate immediate left-recursions among A
iproductions
}

Eliminate Left-Recursion
S Aa | b
A Ac | Sd | f
-Order of non-terminals: S, A
-A Ac | Aad | bd | f
-Eliminate the immediate left-recursion in A
A bdA

| fA

A

cA

| adA

| 
So, the resulting equivalent grammar which is not left-recursive is:
S Aa | b
A bdA

| fA

A

cA

| adA

| 

Eliminate Left-Recursion –Example2
S Aa | b
A Ac | Sd | f
-Order of non-terminals: A, S
-Eliminate the immediate left-recursion in A
A SdA

| fA

A

cA

| 
-Replace S Aa with S SdA

a | fA

a
-Eliminate the immediate left-recursion in S
S fA’aS’ | bS

S

dA

aS’ | 
So, the resulting equivalent grammar which is not left-recursive is:
S fA’aS’ | bS

S

dA

aS’ | 
A SdA

| fA

A

cA

| 

Left-Recursive Grammars III
Here is an example of a (directly) left-recursive grammar:
E E + T | T
T T * F | F
F ( E ) | id
This grammar can be re-written as the following non left-
recursive grammar:
E T E’ E’ + TE’ | є
T F T’ T’ * F T’ | є
F (E) | id

Left Factoring
Left factoring is a grammar transformation that is useful for
producing a grammar suitable for predictive, or top-down,
parsing.
Stmt->if expr then stmt else stmt
|if exprthen stmt
A ->α
1|α
2
So it should be left factored as

Left-Factoring --Algorithm
For each non-terminal A with two or more alternatives (production rules)
with a common non-empty prefix
A 
1| ... | 
n | 
1| ... | 
m
convert it into
A A

| 
1| ... | 
m
A


1| ... | 
n

Left-Factoring –Example1
A abB | aB | cdg | cdeB | cdfB

A aA

| cdg | cdeB | cdfB
A

bB | B

A aA

| cdA
’’
A

bB | B
A
’’
g | eB | fB

Left-Factoring –Example2
A ad | a | ab | abc | b

A aA’ | b
A’ d | | b | bc

A aA’ | b
A’ d | | bA’’
A’’ | c

Top-Down Parsing
The parse tree is created top to bottom.
Top-down parser
Recursive-descent parsing
Backtracking is needed
It is a general parsing technique, but not widely used.
Not efficient
Predictive parsing
No backtracking
Efficient
Needs a special form of grammars -(LL(1) grammars).
Recursive predictive parsing is a special form of recursive descent parsing without
backtracking.
Non-recursive (table driven) predictive parser is also known as LL(1) parser.

Recursive Predictive Parsing
Each non-terminal corresponds to a procedure.
Ex: A aBb
proc A {
-match the current token with a, and move to the next
token;
-call ‘B’;
-match the current token with b, and move to the next
token;
}

Recursive Predictive Parsing (cont.)
A aBb | bAB
proc A {
case of the current token {
‘a’: -match the current token with a, and move to the next token;
-call ‘B’;
-match the current token with b, and move to the next token;
‘b’: -match the current token with b, and move to the next token;
-call ‘A’;
-call ‘B’;
}
}

Top-down parse for id + id * id

FIRST and FOLLOW
FIRST and FOLLOW allow us to choose which production toapply, based on the
next input symbol.
FIRST(α), where αis any string of grammar symbols, to be the set of terminals that
begin strings derived from α.
If α=> ε, then εis also in FIRST(α) .
A => cY, so c is in FIRST(A)
FOLLOW(A) is the set of the terminals which occur immediately after (follow) the
non-terminal A in the strings derived from the starting symbol.
a terminal a is in FOLLOW(A) if S Aa
$ is in FOLLOW(A) if S A
*
*

FIRST
1.IfXisaterminal,thenFIRST(X)={X}.
2.IfXisanonterminalandX->Y
IY
2...Y
kisaproductionforsomek>=1,then
placeainFIRST(X)ifforsomei,aisinFIRST(Y
i),andεisinallof
FIRST(Y
I),...,FIRST(Y
i-I);thatis,Y
IY
2...Y
i-1=>ε.IfεisinFIRST(Y
j)for
allj=1,2,...,k,thenaddεtoFIRST(X).Forexample,everythinginFIRST(Y
1)
issurelyinFIRST(X).IfYidoesnotderiveεthenweaddnothingmoreto
FIRST(X),butifY
1=>ε,thenweaddFIRST(Y
2),andsoon.
3.3.IfX=>εisaproduction,thenaddεtoFIRST(X).
*

FOLLOW

LL ( 1 ) Grammars
L: scanning the input from left to right
L: producing a leftmost derivation
1 : one input symbol of lookahead at each step
A grammar G is LL(1) if and only if whenever A -> α| βare two distinct
productions of G, the following conditions hold:

Construction of a predictive parsing
table.