Theory of computer science
Unit 1
Basic Concepts and Finite Automata
Overview
●Importance of TCS
●Alphabets, Strings, Languages, Closure properties.
●Finite Automata (FA) and Finite State machine (FSM).
●Deterministic Finite Automata (DFA) and Nondeterministic Finite Automata
(NFA): Definitions, transition diagrams and Language recognizers
●NFA to DFA Conversion
●Equivalence between NFA with and without ε- transitions
●Minimization of DFA
●FSM with output: Moore and Mealy machines, Equivalence
●Applications and limitations of FA
What is automata theory ?
●Study of abstract computing devices, or “machines”
●Automaton = an abstract computing device
●Note: A “device” need not even be a physical hardware!
●A fundamental question in computer science:
●Find out what different models of machines can do and cannot do
●The theory of computation
Theory of Computation: A Historical Perspective
Alphabet
An alphabet is a finite, non-empty set of symbols
We use the symbol ∑ (sigma) to denote an alphabet
Examples:
1.Binary: ∑ = {0,1}
2.All lower case letters: ∑ = {a,b,c,..z}
3.Alphanumeric: ∑ = {a-z, A-Z, 0-9}
4.DNA molecule letters: ∑ = {a,c,g,t}
Strings
A string or word is a finite sequence of symbols chosen from ∑
1.Empty string is ε (or “epsilon”): empty string is the string with zero
occurrences of symbols.
2.Length of a string w, denoted by “|w|”, is equal to the number of (non- e)
characters in the string
E.g., x = 010100 |x| = 6
x = 01 ε 0 ε 1 ε 00 ε |x| = ?
NOTE: |ε| = 0
Power of an alphabet
Let ∑ be an alphabet.
∑
k
= the set of all strings of length k
∑* = ∑
0
U ∑
1
U ∑
2
U … is set of all strings over an alphabet ∑
∑
+
= ∑
1
U ∑
2
U ∑
3
U …
NOTE : ∑
0
={ε} regardless of what alphabet ∑ is, i.e ε is the only string whose
length is 0.
∑
1
= {0,1}
∑
2
= {00,01,11,10} so on….
Language
L is said to be a language over alphabet ∑, only if L ⊆ ∑*
This is because ∑* is the set of all strings (of all possible length including 0)
over the given alphabet ∑
Examples:
1.Let L be the language of all strings consisting of n 0’s followed by n 1’s:
L = {e, 01, 0011, 000111,…}
2.Let L be the language of all strings of with equal number of 0’s and 1’s:
L = {e, 01, 10, 0011, 1100, 0101, 1010, 1001,…}
Question
Finite Automata
Some Applications:
●Software for designing and checking the behavior of digital circuits
●Lexical analyzer of a typical compiler
●Software for scanning large bodies of text (e.g., web pages) for pattern
finding
●Software for verifying systems of all types that have a finite number of states
(e.g., stock market transaction, communication/network protocol)
●“Regular Grammar” are exactly the ones that can be described by finite automata.
●A finite automaton has a set of states and its control moves from state to state in
response to external inputs.
●One of the crucial distinctions among classes of finite automata is whether that control
is deterministic meaning that the automaton cannot be in more than one state at any one
time or non deterministic meaning that it may be in several states at once
●Adding nondeterminism does not let us define any language that cannot be defined by a
deterministic finite automaton but there can be substantial efficiency in describing an
application using a nondeterministic automaton.
●Nondeterminism allows us to program solutions to problems using a higher level
language. The nondeterministic finite automaton is then compiled by an algorithm we
into a deterministic automaton that can be executed on a conventional computer.
Contd..
●Informally, a state diagram that comprehensively captures all possible states and
transitions that a machine can take while responding to a stream or sequence of
input symbols.
●Recognizer for “Regular Languages”
1.Deterministic Finite Automata (DFA)
The machine can exist in only one state at any given time
2.Non-deterministic Finite Automata (NFA)
The machine can exist in multiple states at the same time
Deterministic Finite Automata- Definition
A Deterministic Finite Automaton (DFA) consists of:
Q ==> a finite set of states
Σ ==> a finite set of input symbols (alphabet)
q0 ==> a start state
F ==> set of accepting states( F ⊂ Q)
δ ==> a transition function, which is a mapping between Q x Σ ==> Q
Transition function that takes as arguments a state and an input symbol & returns a
state
A DFA is defined by the 5-tuple: {Q Σ q F δ }
Graph representation of DFA
●Nodes = states.
●Arcs represent transition function: Arc from state p to state q labeled by all those
input symbols that have transitions from p to q.
●Arrow labeled “Start” to the start state.
●Final states indicated by double circles
What does a DFA do on reading an input string?
●Input: a word w in Σ*
●Question: Is w acceptable by the DFA?
●Steps:
1.Start at the “start state” q0.
2.For every input symbol in the sequence w do, Compute the next state from the current
state, given the current input symbol in w and the transition function.
3.If after all symbols in w are consumed, the current state is one of the accepting states
(F) then accept w;
4.Otherwise, reject w.
Simple notations for DFA
●A transition diagram
Example
The transition diagram for the DFA accepting all strings with a substring 01
●A transition table:
Example
Regular Grammar
●Let L(A) be a language recognized by a DFA A, then L(A) is called a “Regular
Language”.
●Locate regular languages in the Chomsky Hierarchy.
The Chomsky Hierarchy
Example 1
●Build a DFA for the following language:
L = {w | w is a binary string that contains 01 as a substring}
●Steps for building a DFA to recognize L:
1.Σ = {0,1}
2.Decide on the states: Q
3.Designate start state and final state(s)
4.δ: Decide on the transitions:
●“Final” states == same as “accepting states”
●Other states == same as “non-accepting states”
DFA for strings containing 01
Extended transition function
Example
Delta hat
Language of a DFA
A DFA A accepts string w if there is a path from q0 to an accepting (or final)
state that is labeled by w
i.e., L(A) = { w | is in F }
i.e., L(A) = all strings that lead to an accepting state from q0
Non Deterministic Finite automata
A Non-deterministic Finite Automaton (NFA) is of course “non-deterministic”
●Implying that the machine can exist in more than one state at the same time
●Transitions could be non-deterministic
Contd...
●The NFAs accept exactly the regular languages, just as DFAs do.
●They are often more succinct and easier to design than DFAs.
●We can always convert an NFA to a DFA, the latter may have exponentially more
states than the NFA.
●The difference between the DFA and the NFA is in the type of transition function.
Contd..
A Non-deterministic Finite Automaton (NFA) consists of:
Q ==> a finite set of states
Σ ==> a finite set of input symbols (alphabet)
q0 ==> a start state
F ==> set of accepting states
δ ==> a transition function, which is a mapping between Q x Σ ==> subset of Q
An NFA is also defined by the 5-tuple: {Q Σ q F δ }
How to use an NFA ?
●Input: a word w in Σ*
●Question: Is w acceptable by the NFA?
●Steps:
1.Start at the “start state” q0
2.For every input symbol in the sequence w do:
Determine all possible next states from all current states, given the current input
symbol in w and the transition function
3.If after all symbols in w are consumed and if at least one of the current states is a final
state then accept w;
4.Otherwise, reject w.
Example
An NFA accepting all strings that end in 01
●Transition tables can be used to specify the transition function for an NFA as
well as for a DFA.
●The only difference is that each entry in the table for the NFA is a set, even if
the set is a singleton has one member.
●Also, when there is no transition at all from a given state on a given input
symbol, the proper entry is Φ the empty set.
Transition function for input 00101
Extension of to δ NFA paths
Language of a NFA
●An NFA accepts w if there exists at least one path from the start state to an accepting
(or final) state that is labeled by w
●Formally, if N= {Q Σ q F δ} is an NFA, then
L(N) = { w | ∩ F ≠ Φ }
which means that L(N) is the set of strings w in ∑* such that contains at least
one accepting state.
What is an error state ?
NOTE :
Omitting to explicitly
show error states is just a
matter of design
convenience
(one that is generally
followed for NFAs), and
i.e., this feature should
not be confused with the
notion of
non-determinism.
●DFA in practice has about as many states as the NFA, although it often has more
transitions.
●In the worst case however, the smallest DFA can have 2
n
states while the smallest
NFA for the same language has only n states.
●Every language that can be described by some NFA can also be described by some
DFA.
●The proof that DFAs can do whatever NFAs can do involves an important
construction called the subset construction because it involves constructing all subsets
of the set of states of the NFA.
● In general, many proofs about automata involve constructing one automaton from
another.
●It is important for us to observe the subset construction as an example of how one
formally describes one automaton in terms of the states and transitions of another,
without knowing the specifics of the latter automaton
Equivalence of DFA and NFA
Theorem: A language L is accepted by a DFA if and only if it is accepted by an NFA.
Proof:
1.If part:
Prove by showing every NFA can be converted to an equivalent DFA
2.Only-if part is trivial:
Every DFA is a special case of an NFA where each state has exactly one transition for
every input symbol. Therefore, if L is accepted by a DFA, it is accepted by a corresponding
NFA.
PROOF OF IF PART
●Surprisingly, for any NFA there is a DFA that accepts the same language.
●Proof is the subset construction.
●The number of states of the DFA can be exponential in the number of states of
the NFA.
●Thus, NFA’s accept exactly the regular languages.
NFA to DFA subset construction
Notice that this transition table belongs to a deterministic nite automaton.Even though the
entries in the table are sets the states of the constructed DFA are sets.To make the point
clearer we can invent new names for these states eg A for Φ B for q0 and so on.This clears
the point that the entries in the table are single states of the DFA.
1.Out of the 8 states in Fig. starting in the start state B we can only reach states B, E
and F.
2.The other 5 states are inaccessible from the start state.
Resultant DFA constructed from NFA that accepts the strings ending with 01.
NOTE
1.Enumerate all possible subsets.
2.Determine transitions
3.Retain only those states reachable from {q0}
NFA to DFA: Repeating the example using LAZY
CREATION
Correctness of subset creation
Applications
Few properties of DFA and NFA
FA with epsilon transitions
To simulate any transition:
Step 1) Go to all immediate destination states.
Step 2) From there go to all their epsilon-closure states
as well.
Closure of states
●CL(q) = set of states you can reach from state q following only arcs labeled ε .
●Example: CL(A) = {A}; CL(E) = {B, C, D, E}.
●Closure of a set of states = union of the closure of each state.
Minimization of DFA (refer separately shared pdf for
this topic)
Equivalence theorem
1.DISTINGUISHABLE STATES
2.SET PARTITIONING
NOTE : Remove all the states that are dead or unreachable from the initial state via
any set of the transition of DFA and then start applying Equivalence theorem.
Refer : https://www.geeksforgeeks.org/minimization-of-dfa/
A finite state machine is a machine that has many states and has a logical way of changing
from one state to another.
1.FSM - without o/p
2.FSM - with output
(A)Mealy machine : output in transition
(B)Moore machine : output on state
Mealy machine
A mealy machine is an FSM whose output depends on the present state as well as the
present input. In mealy machine every transition for a particular input symbol has a fixed
output. It can be described by a 6 tuple:
●Q is a finite set of states.
●∑ is a finite set of symbols called the input alphabet.
●O is a finite set of symbols called the output alphabet.
●δ is the input transition function where δ: Q × ∑ → Q
●X is the output transition function where X: Q × ∑ → O
●q0 is the initial state from where any input is processed (q0 ∈ Q).
Moore machine
In Moore machine the value of output function depends on the present state only. Moore
machine can be described as
1.Q is a finite set of states.
2.∑ is a finite set of symbols called the input alphabet.
3.O is a finite set of symbols called the output alphabet.
4.δ is the input transition function where δ: Q × ∑ → Q
5.X is the output transition function where X: Q → O
6.q0 is the initial state from where any input is processed (q0 ∈ Q).
Moore to mealy machine
Input - Moore machine
Output - mealy machine
Steps-
1.Take a blank mealy machine transition table format
2.Copy all the Moore machine transition states into this table format.
3.Check the present states and their corresponding outputs in the Moore machine state
table; if for a state Qi output is m, copy it into the output columns of the mealy
machine state table whenever Qi appears in the next state.
Mealy to moore machine
●Step 1: For each state(Q
i
), calculate the number of different outputs that are
available in the transition table of the Mealy machine.
●Step 2: Copy state Q
i
, if all the outputs of Q
i
are the same. Break qi into n states
as Q
in
, if it has n distinct outputs where n = 0, 1, 2....
●Step 3: If the output of initial state is 0, insert a new initial state at the starting
which gives 1 output.
1.For q, there is only one incident edge with output 0. So, we don't need to split this
state in Moore machine.
2.For q
2
, there is 2 incident edge with output 0 and 1. So, we will split this state into q
20
(
state with output 0) and q
21
(with output 1).
3.For q
3
, there is 2 incident edge with output 0 and 1. So, we will split this state into q
30
(
state with output 0) and q
30
( state with output 1).
4.For q
4
, there is only one incident edge with output 0. So, we don't need to split this.
Transition table for moore machine
Transition diagram for moore machine
Limitations
1.The input tape is read only and the only memory it can have is by moving from
state to state and since there are finite number of states, a finite automata has
memory which is strictly finite.
2.The finite automata have only string pattern (regular expression) recognizing
power.
3.The head movement is restricted in one direction, either from left to right or
right to left.
Equivalence of FA
Two automata M0 and M1 are said to be equivalent to each other if both accept exactly the
same set of input strings. Formally, if two automata M0 and M1 are equivalent then
●If there is a path from the start state of M0 to a final state of M0 labeled M01 M02
… … M0k, there is a path from the start state of M1 to a final state of M1 labeled
M01, M02 .. . . . . M0k.
●If there is a path from the start state of M1 to a final state of M1 labeled M11, M12
… . . . . . … . M1k, there is a path from the start state of M0 to a final state of M0
labeled M11, M12,…….. M1k.
Application of FA
We have several application based on finite automata and finite state machine. Some
are given below;
●A finite automata is highly useful to design Lexical Analyzers.
●A finite automata is useful to design text editors.
●A finite automata is highly useful to design spell checkers.
●A finite automata is useful to design sequential circuit design (Transducer).
Equivalence&Minimizationof Equivalence
&
Minimization
of
DFAs
41
Applications of interest
Comparing two DFAs:
L(DFA
1
) == L(DFA
2
)?
HowtominimizeaDFA?
How
to
minimize
a
DFA?
1.
Remove unreachable states
2.
Identif
y
& condense e
q
uivalent states into one
yq
42
WhentocalltwostatesinaDFA When
to
call
two
states
in
a
DFA
“equivalent”?
Two states p and q are said to be
equivalent
iff:
re does!
equivalent
iff:
i)
Any string w accepted by starting at p is also accepted by
starting at q;
only futur
p
matter -o
p q
AND
w
i)
Any string w rejected by starting at p is also rejected by starting at q.
t doesn’t
p
w
43
Past
q
w
p≡q
Com
p
utin
g
e
q
uivalent states
pgq
in a DFA
Table Filling Algorithm
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
Cxx=
Dxxx=
Exxxx=
1
0
0
Fxxxxx=
Gxxx=xx=
H
x
x
=
x
x
x
x
=
Pass #0
1. Mark accepting states ≠ non-accepting states
H
x
x
=
x
x
x
x
=
ABCDEFGH
Pass #1
1. Compare every pair of states
2. Distinguish by one symbol transition
3. Mark = or ≠ or blank(tbd)
Pass #2
44
Pass
#2 1. Compare every pair of states
2. Distinguish by up to two symbol transitions (until different or same or tbd)
….
(keep repeating until table complete)
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
C=
D=
E=
1
0
0
F=
G=
H
=
H
=
ABCDEFGH
45
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
C=
D=
EXXXX=
1
0
0
FX=
GX=
H
X
=
H
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
46
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CX=
DX=
EXXXX=
1
0
0
FX=
GXX=
H
X
X
=
H
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
47
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXX=
EXXXX=
1
0
0
FX=
GXX X=
H
X
X
X
=
H
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
48
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXX=
GXXX X=
H
X
X
=
X
=
H
X
X
=
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
49
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=X=
H
X
X
=
X
X
=
H
X
X
=
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
50
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=XX=
H
X
X
=
X
X
X
=
H
X
X
=
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
51
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
52
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Pass 1:
Look 1- hop away for distinguishing states or strings
3. Pass 2:
Look 1
-
hop away again for distinguishing states or strings
53
Look
1
hop
away
again
for
distinguishing
states
or
strings
continue….
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
E
q
uivalences:
1. Mark Xbetween accepting vs. non-accepting state
2. Pass 1:
Look 1- hop away for distinguishing states or strings
3. Pass 2:
Look 1
-
hop away again for distinguishing states or strings
54
q
•A=B
•C=H
• D=G
Look
1
hop
away
again
for
distinguishing
states
or
strings
continue….
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A
C
E
0
1
1 0
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
A
C
E
D
F
0
1
1
0
0
1
1
0
0
0
1
Retrain only one copy for
E
q
uivalences:
Retrain
only
one
copy
for
each equivalence set of states
55
q
•A=B
•C=H
• D=G
Table Fillin
g
Al
g
orithm
–
gg
special case
A=
B=
A
C
E
G
0
1 0
0
1
C=
D=
E=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
?
F=
G=
H
=
1
0
0
H
=
ABCDEFGH
Q) What happens if the input DFA
has more than one final state?
56
Can all final states initially be treated
as equivalent to one another?
Putting it all together …
How to minimize a DFA?
Goal:
Minimize the number of states in
a DFA
Dth
fi t t l f th t t t t
Algorithm:
1.
Eliminate states unreachable from the
D
ep
th
-
fi
rs
t
t
raversa
l
f
rom
th
e s
t
ar
t
s
t
a
t
e
start state
2.
Identify and remove equivalent states
Table filling algorithm
3.
Output the resultant DFA
57
Are Two DFAs Equivalent?
q
DFA
1
Unified DFA
q
0
…
DFA
2
Is q
0
≡ q
0
’?
: if yes, then DFA
1
≡DFA
2
: else, not equiv.
q
0
’
…
1. Make a new dummy DFA by just putting together both DFAs
2. Run table-filling algorithm on the unified DFA
3
IF
the start states of both DFAs are fo nd to be eq i alent
58
3
.
IF
the
start
states
of
both
DFAs
are
fo
u
nd
to
be
eq
u
iv
alent
,
THEN:DFA
1
≡ DFA
2
ELSE:different
Regular expressions and Languages
Unit-2
Overview
●Regular Expression (RE)
●Equivalence of RE and FA, Arden‘s Theorem
●RE Applications
●Regular Language (RL)
●Closure properties of RLs
●Decision properties of RLs
●Pumping lemma for RLs
Arden’s Theorem
The Arden's Theorem is useful for checking the equivalence of two regular expressions
as well as in the conversion of DFA to a regular expression.
Let us see its use in the conversion of DFA to a regular expression.
Following algorithm is used to build the regular expression form given DFA.
1. Let q
1
be the initial state.
2. There are q
2
, q
3
, q
4
....q
n
number of states. The final state may be some q
j
where j<=
n.
3. Let α
ji
represents the transition from q
j
to q
i
.
4. Calculate q
i
such that
q
i
= α
ji
* q
j
If q
j
is a start state then we have:
q
i
= α
ji
* q
j
+ ε
5. Similarly, compute the final state which ultimately gives the regular expression
'r'.
Construct the regular expression for the given DFA
Solution:
Let us write down the equations
Since q1 is the start state, so ε will be added, and the input 0 is coming to q1 from
q1 hence we write
State = source state of input × input coming to it
Similarly,
Since the final states are q1 and q2, we are interested in solving q1 and q2 only.
Let us see q1 first
q1 = 0* (ε.R*= R*)
Substituting the value into q2, we will get
q2 = 0* 1 + q2 1
q2 = 0* 1 (1)* (R = Q + RP → Q P*)
q1 = ε + q1* 0, which is similar to R = Q + RP, and gets reduced to R = QP*.
q1 = ε.(0)*
The regular expression is given by
r = q1 + q2
= 0
*
+ 0
*
1.1
*
r = 0
*
+ 0
*
1
+
(1.1
*
= 1
+
)
Re
g
ular Ex
p
ressions
gp
1
2
RE’s: Introduction
Regular expressions
are an algebraic
way to describe languages.
They describe exactly the regular
languages.
If E is a regular expression, then L(E) is
the language it defines.
We’ll describe RE’s and their languages
recursively.
3
RE’s: Definition
Basis 1: If
a
is any symbol, then ais a
RE, and L(a) = {a}.
Note: {a} is the language containing one
string, and that string is of length 1.
Basis 2: εis a RE, and L(ε) = {ε}.
Basis 3:
∅
is a RE, and L(
∅
) =
∅
.
RegularExpressionsvs Finite Regular
Expressions
vs
.
Finite
Automata
Offers a declarative way to express the pattern of any
string we want to accept
Eg
01
*
+10
*
E
.
g
.,
01 +
10
Automata => more machine-like
itti tt[ t/jt]
<
inpu
t
: s
t
r
ing , ou
t
pu
t
:
[
accep
t/
re
jec
t]
>
Regular expressions => more program syntax-like
Unix environments heavily use regular expressions
E.g., bash shell, grep, vi & other editors, sed
Perlscripting
goodforstringprocessing
2
Perl
scripting
–
good
for
string
processing
Lexical analyzers such as Lex or Flex
Regular Expressions
Regular
expressions
Finite Automata
(DFA, NFA, -NFA)
=
Automata/machines
Syntactical
expressions
Regular
Languages Languages
Formal language classes
3
classes
4
RE’s: Definition –(2)
Induction 1: If E
1
and E
2
are regular
expressions, then E
1
+E
2
is a regular
expression, and L(E
1
+E
2
) =
L(E
1
)L(E
2
).
Induction 2: If E
1
and E
2
are regular
expressions, then E
1
E
2
is a regular
expression, and L(E
1
E
2
) = L(E
1
)L(E
2
).
Concatenation
: the set of strings wx such that w
Is in L(E
1
) and x is in L(E
2
).
5
RE’s: Definition –(3)
Induction 3: If E is a RE, then E* is a
RE, and L(E*) = (L(E))*.
Closure
, or “Kleene closure” = set of strings
w
1
w
2
…w
n
, for some n >
0, where each w
i
is
in L(E).
Note: when n=0, the string is ε.
7
Examples: RE’s
L(01) = {01}.
L(01+0) = {01, 0}.
L(0(1+0)) = {01, 00}.
Note order of precedence of operators.
L(0*) = {ε, 0, 00, 000,… }.
L((0+10)*(ε+1)) = all strings of 0’s
and 1’s without two consecutive 1’s.
Precedence of Operators
Highest to lowest
*
operator(star)
Language Operators
Union
of two languages:
L U M= all strings that are either in L or M
Note:
A union of two languages produces a third
language
Concatenation
of two languages:
L . M= all strings that are of the form xy
s.t., x L and y M
The
dot
operatorisusuallyomitted
4
The
dot
operator
is
usually
omitted
i.e., LMis same as L.M
“i” here refers to how many stri ngs to concatenate from the parent
language L to produce strings in the language L
i
Kleene Closure (the * operator)
Kleene Closure
of a given language L:
L
0
= {
}
L
1
= {w | for some w
L}
L
2
= { w
1
w
2
| w
1
L, w
2
L (duplicates allowed)}
L
i= { w
1
w
2
…w
i
| all w’s chosen are
L (duplicates allowed)}
(Note: the choice of each w
i
is independent)
U
L* =
U
i≥0
L
i
(arbitrary number of concatenations)
Example:
Let L = { 1, 00}
L
0
{
}
L
0
=
{
}
L
1
= {1,00}
L
2
= {11,100,001,0000}
L
3
=
{
111
,
1100
,
1001
,
10000
,
000000
,
00001
,
00100
,
0011
}
5
{
,
,
,
,
,
,
,
}
L* = L
0
U
L
1
U
L
2
U
…
Kleene Closure (special notes)
L* is an infinite set iff |L| ≥1 and L≠{
}
IfL {
}th L* {
}
Why?
If
L
=
{
}
,
th
en
L*
=
{
}
If L = Φ, then L* = {
}
Why?
Why?
Σ* denotes the set of all words over an
alphabet
Σ
alphabet
Σ
Therefore, an abbreviated way of saying
there is an arbitrary language L over an
lhbt
Σ
i
6
a
lp
h
a
b
e
t
Σ
is:
L Σ*
Building Regular Expressions
Let E be a regular expression and the languagerepresentedbyEisL(E) language
represented
by
E
is
L(E)
Then:
(E)=E
(E)
=
E
L(E + F) = L(E) U L(F) L(EF) L(E)L(F)
L(E
F)
=
L(E)
L(F)
L(E*) = (L(E))*
7
Example: how to use these regular
ex
p
ression
p
ro
p
erties and lan
g
ua
g
e
ppp gg
operators?
L = { w | w is a binary string which does not contain two consecutive 0s or
two consecutive 1s anywhere)
E.g., w = 01010101 is in L, while w = 10010 is not in L
Goal:
Build a regular expression for L
Four cases for w:
Case A: w starts with 0 and |w| is even
Case B: w starts with 1 and |w| is even
Case C: w starts with 0 and |w| is odd
Case D: w starts with 1 and |w| is odd
Case
D:
w
starts
with
1
and
|w|
is
odd
Regular expression for the four cases:
Case A: (01)*
Case B: (10)*
Case C: 0(10)*
Case D: 1(01)*
Since L is the union of all 4 cases:
Reg Exp for L = (01)* + (10)* + 0(10)* + 1(01)*
If we introduce then the regular expression can be simplified to:
8
Reg Exp for L = (
+1)(01)*(
+0)
8
Equivalence of RE’s and
Automata
We need to show that for every RE,
there is an automaton that accepts the
same language.
Pick the most powerful automaton type: the
ε-NFA.
And we need to show that for every
automaton, there is a RE defining its
language.
Pick the most restrictive type: the DFA.
FiniteAutomata(FA)&Regular Finite
Automata
(FA)
&
Regular
Expressions (Reg Ex)
To show that they are interchangeable,
consider the following theorems:
Theorem 1:
For every DFA A there exists a regular
expression R such that L(R)=L(A)
Theorem 2:For ever
y
re
g
ular ex
p
ression R there
Proofs
in the book
yg p
exists an
-NFA E such that L(E)=L(R)
NFA
NFA
-
NFA
NFA
Theorem 2
Kleene Theorem
10
DF
A
Reg Ex
Theorem 1
9
Converting a RE to an ε-NFA
Proof is an induction on the number of
operators (+, concatenation, *) in the
RE.
We always construct an automaton of a
special form (next slide).
10
Form of ε-NFA’sConstructed
No arcs from outside,
no arcs leaving
Start state: Only state with external predecessors
“Final” state:
Only state
with external
successors
11
RE to ε-NFA: Basis
Symbol a:
ε: ∅
:
a ε
12
RE to ε-NFA: Induction 1–Union
For E
1
For E
2
For E
1
E
2
ε
εε
ε
13
RE to ε-NFA: Induction 2–
Concatenation
For E
1
For E
2
For E
1
E
2
ε
14
RE to ε-NFA: Induction 3–Closure
For E
For E*
ε
ε
ε ε
-NFA
Reg Ex
Theorem 2
RE to -NFA construction Example:
(0+1)*01(0+1)*
(0+1)* 01 (0+1)*
0
0
1
0
1
1
12
15
DFA-to-RE
A strange sort of induction.
States of the DFA are assumed to be
1,2,…,n.
We construct RE’s for the labels of
restricted sets of paths.
Basis: single arcs or no arc at all.
Induction: paths that are allowed to
traverse next state in order.
Reg Ex
DFA
Theorem 1
DFA to RE construction
Informally, trace all distinct paths (traversing cycles only once)
from the start state to each of the final states
and enumerate all the ex
p
ressions alon
g
the wa
y
Example:
0
1
1
0
0,1pgy
q
0
q
1
q
2
0
1
(1
*
)
0
(0
*
)
1
(0
+
1)
*
(1 )
0
(0 )
1
(0
1)
00*
1*
1(0+1)*
Q) What is the language?
11
1*00*1(0+1)*
Q)
What
is
the
language?
16
k-Paths
A k-path is a path through the graph of
the DFA that goes thoughno state
numbered higher than k.
Endpoints are not restricted; they can
be any state.
17
Example: k-Paths
1
3
2
0
0 0
1
11
0-paths from 2 to 3:
RE for labels = 0.
1-paths from 2 to 3:
RE for labels = 0+11.
2-paths from 2 to 3:
RE for labels =
(10)*0+1(01)*1
3-paths from 2 to 3:
RE for labels = ??
18
k-Path Induction
Let R
ij
k
be the regular expression for
the set of labels of k-paths from state i
to state j.
Basis: k=0. R
ij
0
= sum of labels of arc
from i to j.
∅
if no such arc.
But add εif i=j.
20
k-Path Inductive Case
A k-path from i to j either:
1.Never goes through state k, or
2.Goes through k one or more times.
R
ij
k
= R
ij
k-1
+ R
ik
k-1
(R
kk
k-1
)* R
kj
k-1
.
Doesn’t go
through k
Goes from i to k the first time
Zero or more times from k to k
Then, from k to j
21
Illustration of Induction
States < k
k
i
j
Paths not going
through k
From k to j
From k to k Several times
Path to k
22
Final Step
The RE with the same language as the
DFA is the sum (union) of R
ij
n
, where:
1.n is the number of states; i.e., paths are
unconstrained.
2.i is the start state.
3.j is one of the final states.
Algebraic Laws…
Distributive:
E(F+G) = EF + EG
(F+G)E = FE+GE
Idempotent:
E + E = E
IliKl l
I
nvo
lv
ing
Kl
eene c
losures:
(E*)* = E*
Φ
*=
Φ
*=
E
+
=EE*
14
E? =
+E
A
lg
ebraic Laws of Re
g
ular
gg
Expressions
Commutative:
E+F = F+E
Associative:
(E+F)+G = E+(F+G) (EF)G=E(FG)
(EF)G
=
E(FG)
Identity:
E
+
Φ
=
E
E
Φ
E
E = E
= E
Annihilator:
13
ΦE = EΦ = Φ
26
Identities and Annihilators
∅
is the identity for +.
R +
∅
= R.
ε is the identity for concatenation.
εR = Rε= R.
∅
is the annihilator for concatenation.
∅
R = R
∅
=
∅
.
PropertiesofRegular Properties
of
Regular
Lan
g
ua
g
es
gg
Topics 1)
How to prove whether a given languageisregularornot? language
is
regular
or
not?
Closurepropertiesofregular
2)
Closure
properties
of
regular
languages
Some lan
g
ua
g
es are not
gg
regular When is a language is regular?
if we are able to construct one of the
fll i f
o
ll
ow
ing:
DFA orNFA or
-NFA orregular
expression
When is it not?
If we can show that no FA can be built for a language
How to
p
rove lan
g
ua
g
es are
pgg
notregular? What if we cannot come up with any FA?
A
)
Can it be lan
g
ua
g
e that is not re
g
ular?
)
gg g
B) Or is it that we tried wrong approaches?
How do we decisively prove that a language
is not regular?
“ff
4
“
The hardest thing o
f
all is to
f
ind a black cat in a dark room,
especially if there is no cat!” -Confucius
Exam
p
le of a non-re
g
ular
p
g
language Let L = {w | w is of the form 0
n
1
n
, for all n≥0}
H
yp
othesis:L is not re
g
ular
yp
g
Intuitive rationale:
How do you keep track
of a runnin
g
count in an FA?
g
A more formal rationale:
By contradition, if L is regul ar then there should exist a DFA fLf
or
L
.
Let k = number of states in that DFA.
Consider the special word w= 0
k
1
k
=> w L
5
DFA is in some state p
i, after consuming the first i symbols in
w
Uses Pigeon Hole Principle
Rationale…
Let {p
0
,p
1
,… p
k
} be the sequence of states that the
DFA should have visited after consuming the first
ksymbolsinwwhichis0
k
k
symbols
in
w
which
is
0
k
But there are only k states in the DFA!
==
>atleastonestateshouldrepeatsomewhere
>
at
least
one
state
should
repeat
somewhere
along the path (by ++ Principle)
==> Let the repeating state be p
i=p
J
for i < j
==> We can fool the DFA by inputing 0
(k-(j-i))
1
k
and
still get it to accept (note: k-(j-i) is at most k-1).
==>DFAacceptsstringsw/unequalnumberof0s
6
==>
DFA
accepts
strings
w/
unequal
number
of
0s
and 1s, implying that the DFA is wrong!
ThePumpingLemmafor The
Pumping
Lemma
for
Regular Languages
What it is? The Pumping Lemma is a property
of all regular languages.
How is it used? A technique that is used to show that a given language is not regular
7
Pum
p
in
g
Lemma for Re
g
ular
pg g
Languages Let L be a regular language Then there exists
some constant Nsuch thatfor
every
string w
Ls.t. |w|≥N, there exists
a
waytobreak
w
intothreeparts
w=
x
y
z
way
to
break
w
into
three
parts
,
w=
x
y
z
,
such that:
1.
y
≠
y
2.
|xy|≤N
3.
For all k≥0, all strings of the form xy
k
z
L
8
This property should hold for all
regular languages.
Definition:N is called the “Pumping Lemma Constant”
Pumping Lemma: Proof
Lisreg lar >itsho ldha eaDFA
L
is
reg
u
lar
=
>
it
sho
u
ld
ha
v
e
a
DFA
.
Set
N:= number of states in the DFA
Ati
Lt||
≥N
hldh th
A
ny s
t
r
ing w
L
, s.
t
.
|
w
|
≥N
, s
h
ou
ld
h
ave
th
e
form: w=a
1
a
2
…a
m
, where m≥N
L tth t t t d ft di th fi t
L
e
t
th
e s
t
a
t
es
t
raverse
d
a
ft
er rea
di
ng
th
e
fi
rs
t
N symbols be: {p
0
,p
1
,… p
N
}
==>ThereareN+1p
-
states whilethereareonly
==>
There
are
N+1
p
-
states
,
while
there
are
only
N DFA states
==> at least one state has to repeat
9
i.e, p
i= p
J
where 0≤i<j≤N (by PHP)
Pumping Lemma: Proof…
=> We should be able to break w=xyzas follows:
x=a
1
a
2
..a
i; y=a
i+1
a
i+2
..a
J
; z=a
J+1
a
J+2
..a
m
x
’s path will be p
0
..p
i
xs
path
will
be
p
0
..p
i
y’s path will be p
i
p
i+1
..p
J
(but p
i=p
J
implying a loop)
z’s path will be p
J
p
J+1
..p
m
Now consider another
y
k
(for k loops)
x
z
Now
consider
another
string w
k
=xy
k
z, where k≥0
Case k=0
DFA will reach the accept state p
p
0
p
i
p
m
x
z
=p
j
DFA
will
reach
the
accept
state
p
m
Case k>0
DFA will loop for y
k
, and finally reach the accept state p
m
for z
10
In either case, w
k
L This proves part (3) of the lemma
Pumping Lemma: Proof…
For part (1):
Since i<
j,
y
≠
p
0
p
i
p
m
xz
y
k
(for k loops)
j,
y
Forpart(2):
=p
j
For
part
(2):
By PHP, the repetition of states has to
occur within the first N symbols in w
==> |xy|≤N
11
The Pur
p
ose of the Pum
p
in
g
ppg
Lemma for RL
To prove that some languagescannot be
regular.
be
regular.
12
How to use the
p
um
p
in
g
ppg
lemma? Think of playing a 2 person game
Role 1:
We claim that the language cannot
blb
e regu
la
r
R
o
le
2:
A
n
ad
v
e
r
sa
r
y
wh
o
c
la
im
s
th
e
oe
ad e sa y
oca s e
language is regular Weshowthattheadversary’sstatementwill
We
show
that
the
adversary’s
statement
will
lead to a contradiction that implyiespumping
lemma cannothold for the language.
13
We win!!
How to use the
p
um
p
in
g
ppg
lemma? (The Steps) 1.
(we) L is not regular.
2.
(adv.) Claims that L is regular and gives you avalueforNasitsP/Lconstant a
value
for
N
as
its
P/L
constant
3.
(we) Using N, choose a string w L s.t.,
1
|w|≥N
1
.
|w|
≥
N
,
2.
Using w as the template, construct other words
w
k
of the form xy
k
zand show that at least one
suchw
L
such
w
k
L
=> this implies we have successfully broken the
pumping lemma for the language, and hence that the
di
14
a
d
versary
is wrong.
(Note: In this process, we may have to try many values of k,
starting with k=0, and then 2, 3, .. so on, until w
k
L )
Note: We don’t have any control over N, except that it is positive.
We also don’t have any control over how to split w=xyz,
but xyz should respect the P/L conditions (1) and (2).
Using the Pumping Lemma
What WE do?
What the Adversary does?
1 ClaimsL isregular
3. Using N, we construct
1
.
Claims
L
is
regular
2. Provides N
our template string w
4. Demonstrate to the
adversary either adversary
,
either
through pumping up or
down on w, that some
stringw
k
L
string
w
k
L
(this should happen
regardless of w=xyz)
15
ExampleofusingthePumpingLemmato
Note:
This N can be anything (need not necessarily be the #states in the DFA.
It’s the adversary’s choice.)
Example
of
using
the
Pumping
Lemma
to
prove that a language is not regular
LtL
{|i bi ti ith l b
L
e
t
L
eq
=
{
w
|
w
i
s a
bi
nary s
t
r
i
ng w
ith
equa
l
num
b
er
of 1s and 0s}
Your Claim:
L
eq
is not regular
eq
Proof:
By contradiction, let L
eq
be regular
P/Lconstant shouldexist
adv.
adv
P/L
constant
should
exist
Let N= that P/L constant
Consider input w = 0
N
1
N
(your choicefor the template string)
you
adv
.
(your
choice
for
the
template
string)
By pumping lemma, we should be able to break
w=xyz, such that:
≠
you
16
1)
y
≠
2)
|xy|≤N
3)
For all k≥0, the string xy
k
zis also in L
Template string w = 0
N
1
N
= 00 …. 011 … 1
NN
Proof…
Because
|xy|≤N
, xyshould contain only 0s
(This and because
y≠
,
impliesy=0
+
)
Th f
i
tt
N
10
you
Th
ere
f
ore x can conta
in a
t
mos
t
N
-
1
0
s
Also, all the N 1s must be inside z
By(3), any string of the form
x
y
k
z
L
eq
for all
k
≥0
By
(3),
any
string
of
the
form
x
y
z
L
eq
for
all
k
≥0
Case k=0:
xz has at most N-1 0s but has N 1s
Therefore, xy
0
zL
eq
Setting k=0 is
referred to as
“pumping down”
This violates the P/L (a contradiction)
Another wa
y
of
p
rovin
g
this will be to show that if
17
yp g
the #0s is arbitrarily pumped up (e.g., k=2),
then the #0s will become exceed the #1s
Setting k>1 is
referred to as
“pumping up”
Exercise 2 Prove L = {0
n
10
n
| n≥ 1} is not regular
Note:
This n is not to be confused with the pumping
lemma constant N. That canbe different.
In other words, the above question is same as
proving:
L = {0
m
10
m
| m≥ 1} is not regular
18
Example 3: Pumping Lemma Claim:
L = { 0
i
| i is a perfect square} is not regular
Proof:
BtditiltLb l
B
y con
t
ra
di
c
ti
on,
le
t
L
b
e regu
lar.
P/L should apply
Let N= P/L constant
Choose w=0
N
2
Choose
w=0
By pumping lemma, w=xyzsatisfying all three rules
By rules (1) & (2), yhas between 1 and N 0s
B
y
rule
(
3
),
an
y
strin
g
of the form x
y
k
zis also in L for all k ≥0
y(),yg
y
Case k=0:
#zeros (xy
0
z) = #zeros (xyz) - #zeros (y)
N
2
–N ≤ #zeros (xy
0
z) ≤ N
2
-1
(N
-
1)
2
<N
2
-
N≤#zeros (
x
y
0
z
)≤N
2
-
1<
N
2
19
(N
1)
<
N
N
≤
#zeros
(
x
y
z
)
≤
N
1
<
N
xy
0
zL
But the above will complete the proof ONLY IF N>1.
… (proof contd.. Next slide)
Example 3: Pumping Lemma
(proof contd…)
If the adversary pick N=1, then (N-1)
2
≤ N
2
– N, and therefore the #zeros(xy
0
z)
could end up being a perfect square!
This means that
p
um
p
in
g
down
(
i.e.
,
settin
g
k=0
)
is not
g
ivin
g
us the
p
roof!
ppg (, g ) gg p
So lets try pumping up next…
Case k=2:
#zeros (xy
2
z) = #zeros (xyz) + #zeros (y)
N
2
+1≤#zeros (
x
y
2
z
)≤N
2
+N
N
+
1
≤
#zeros
(
x
y
z
)
≤
N
+
N
N
2
< N
2
+ 1 ≤ #zeros (xy
2
z) ≤ N
2
+ N < (N+1)
2
xy
2
zL
(Notice that the above should hold for all po ssible N values of N>0. Therefore, this
completes the proof.)
20
ClosurepropertiesofRegular Closure
properties
of
Regular
Languages
21
Closure
p
ro
p
erties for Re
g
ular
pp g
Languages (RL)
This is different
from Kleene
Closure property:
If a set of regular languages are combined using
tthth ltil il
closure
an opera
t
or,
th
en
th
e resu
lti
ng
language
is a
lso
regular
Re
g
ular lan
g
ua
g
es are closedunder:
ggg
Union, intersection, complement, difference
Reversal
Kleene closure
Concatenation
Homomorphism
Now, lets prove all of this!
22
Homomorphism
Inverse homomorphism
RLs are closed under union
IF L and M are two RLs THEN:
they both have two corresponding regular
expressions, R and S respectively
(L U M) can be represented using the regular expressionR+S expression
R+S
Therefore
,
(
L U M
)
is also re
g
ula
r
23
,( ) g
How can this be proved using FAs?
RLs are closed under complementation
If L is an RL over ∑, then L=∑*-L
To show L is also regular, make the following
DFA for L
DFA for L
construction
Convert every final state into non-final, and
every non-final state into a final state
q
0
q
F1
q
F2
q
i
q
0
q
F1
q
F2
q
i
…
…
24
q
Fk
q
Fk
Assumes q0 is a non-final state. If not, do the opposite.
RLs are closed under intersection
A quick, indirect way to prove:
ByDeMorgan
’slaw:
By
DeMorgans
law:
L ∩ M = (L U M)
SinceweknowRLsareclosedunderunion
Since
we
know
RLs
are
closed
under
union
and complementation, they are also closed
under intersection
A more direct way would be construct a finiteautomatonfor
L∩M
25
finite
automaton
for
L
∩
M
DFA construction for L ∩ M
A
L
= DFA for L = {Q
L
, ∑ , q
L
,F
L
, δ
L
}
A
M
= DFA for M = {Q
M
, ∑ , q
M
,F
M
, δ
M
}
Build A
L ∩ M
= {Q
L
xQ
M
,∑, (q
L
,q
M
), F
L
xF
M
,δ}
such that:
δ((
))
(
δ
()
δ
())
h
i
Q
d
δ((
p,q
)
,a
)
=
(
δ
L
(
p,a
)
,
δ
M
(
q,a
))
, w
h
ere p
in
Q
L
,an
d
q
inQ
M
This construction ensures that a strin
g
w will
g
be accepted if and only if w reaches an
accepting state in both
input DFAs.
26
DFA construction for L ∩ M
q
F1
DFA for L
p
F1
DFA for M
q
0
q
F2…
q
i
p
0
p
F2…
p
i
q
j
a
p
j
a
DFA for L
M
(q
F1
,p
F1
)
…
a
(q
i
,p
i)
(q
j
,p
j)
(q
0
,p
0
)
27
RLs are closed under set difference
We observe:
L
-
M
=
L
∩M
Closed under intersection
Closed under complementation
L
M
L
∩
M
Th f L
Mi l l
complementation
Th
ere
f
ore,
L
-
M
is a
lso regu
la
r
28
RLs are closed under reversal Reversal of a string w is denoted by w
R
E.
g
.
,
w=00111
,
w
R
=11100
g, ,
Reversal of a language:
L
R
=
Thelanguagegeneratedby
L
The
language
generated
by
reversing all
strings in L
Theorem:
If L is regular then L
R
is also
regular
29
regular
-NFA Construction for L
R
DFAfor L
New -NFA for L
R
q
q
F1
q
q
q
a
DFA
for
L
New start
q
’
q
0
q
F2…
q
i
q
j
New
start
state
q
0
Make the
old start state
q
Fk
as the only new final state
Reverse all transitions
What to do if q
as
30
Reverse
all
transitions
Convert the old set of final states into non-final
states
What
to
do
if
q
0
w
as
one of the final states in the input DFA?
IfLisregular L
R
isregular(proof
If
L
is
regular
,
L
is
regular
(proof
using regular expressions)
Let E be a regular expression for L
Given E, how to build E
R
?
Bi
IfE
Øth
E
R
E
B
as
is:
If
E
=
,
Ø
, or a,
th
en
E
R
=
E
Induction:
Every part of E (refer to the part as “F”)
can be in onl
y
oneof the three followin
g
forms:
y
g
1.
F = F
1
+F
2
F
R
= F
1
R
+F
2
R
2
F=F
F
2
.
F
=
F
1
F
2
F
R
= F
2
R
F
1
R
3.
F = (F
1
)*
(
R
)* (
R
)*
31
(
F
R
)*
=
(
F
1
R
)*
Homomorphisms
Substitute each symbol
in ∑ (main alphabet)
by a corresponding string
in T (another
lhbt)
a
lp
h
a
b
e
t)
h: ∑--->T*
Example
:
Example
:
Let ∑={0,1} and T={a,b}
Let a homomorphic function h on ∑ be:
h(0)=ab, h(1)=
If w=10110, then h(w) =abab = abab
Ingeneral
32
In
general
,
h(w) = h(a
1
) h(a
2
)… h(a
n
)
RLs are closed under homomorphisms
Theorem:
If L is regular, then so is h(L)
Proof:
If E is a RE for L, then show L(h(E)) = h(L(E))
Basis:
If E=
, Ø, or a, then the claim holds.
Induction:
There are three forms of E:
1.
E = E
1
+E
2
1
2
L(h(E)) = L(h(E
1
) + h(E
2
)) = L(h(E
1
)) U L(h(E
2
)) ----- (1)
h(L(E)) = h(L(E
1
) + L(E
2
)) = h(L(E
1
)) U h(L(E
2
)) ----- (2)
By inductive hypothesis, L(h(E
1
))= h(L(E
1
)) and L(h(E
2
))=
h(L(E
2
))
Therefore, L(h(E)= h(L(E)
2.
E = E
1
E
2
E(E
)*
Similar ar
g
ument
Think of a DFA based
co
n
st
r
uct
io
n
33
3.
E
=
(E
1
)*
g
co st ucto
Given a DFA for L, how to convert it into an FA for h(L)?
FA Construction for h(L)
q
F1
DFA for L
Replace every edge “a” by
a path
labeled h(a)
ith DFA
q
0
q
F2
q
i
q
j
a
in
th
e new
DFA
h(a)
q
Fk…
h(a)
-
Build a new FAthat simulates h(a) for every symbol a transition in
34
-
Build
a
new
FA
that
simulates
h(a)
for
every
symbol
a
transition
in
the above DFA
- The resulting FA may or may not be a DFA, but will be a FA for h(L)
The set of strings in ∑*
whose homomorphic translation
results in the strings of M
Given a DFA for M, how to convert it into an FA for h
-1
(M)?
Inverse homomorphism
Let h: ∑--->T*
Let M be a language over alphabet T
h
-1
(M) = {w | w ∑* s.t., h(w) M }
Claim:
If M is regular, then so is h
-1
(M)
Proof:
Let A be a DFA for M Ctt thDFAA’hih dh
1
(M)
C
ons
t
ruc
t
ano
th
er
DFA
A’
w
hi
c
h
enco
d
es
h
-
1
(M)
A’ is an exact replica of A, except that its transition
functions are s.t. for any input symbol ain ∑, A’
35
will simulate h(a) in A.
δ(p,a) = δ(p,h(a))
Decisionpropertiesofregular Decision
properties
of
regular
languages
Any “decision problem” looks like this:
Decision problem
In
p
ut
Yes
problem
solver
p
(generally a question)
No
36
Membership question
Decision Problem:
Given L, is w in L?
Possibleanswers:
YesorNo
Possible
answers:
Yes
or
No
Approach:
B ild DFAf L
1.
B
u
ild
a
DFA
f
or
L
2.
Input w to the DFA
3.
If the DFA ends in an accepting state,
then yes; otherwise no.
37
Emptiness test
Decision Problem:
Is L=Ø ?
Approach:
On a DFA for L: 1.
From the start state, run a reachability test, which returns: returns: 1.
success:
if there is at least one final state that is
reachable from the start state
2
failure:
otherwise
2
.
failure:
otherwise
2.
L=Ø if and only if the reachability test fails
38
How to implement the reachability test?
Finiteness
Decision Problem:
Is L finite or infinite?
Approach:
On a DFA for L: 1.
Remove all states unreachable from the start state
2.
Remove all states that cannot lead to an
y
acce
p
tin
g
state.
ypg
3.
After removal, check for cycles in the resulting FA
4.
L is finite if there are no cycles; otherwise it is infinite
Ath h
A
no
th
er approac
h
Build a regular expression and look for Kleene closure
39
How to implement steps 2 and 3?
Finiteness test -examples
Ex 1) Is the language of this DFA finite or infinite?
X
q
6
0
1
X
FINITE
Ex 2) Is the language of this DFA finite or infinite?
q
6
X
0
X
INFINITE
40
1
0
Equivalence&Minimizationof Equivalence
&
Minimization
of
DFAs
41
Applications of interest
Comparing two DFAs:
L(DFA
1
) == L(DFA
2
)?
HowtominimizeaDFA?
How
to
minimize
a
DFA?
1.
Remove unreachable states
2.
Identif
y
& condense e
q
uivalent states into one
yq
42
WhentocalltwostatesinaDFA When
to
call
two
states
in
a
DFA
“equivalent”?
Two states p and q are said to be
equivalent
iff:
re does!
equivalent
iff:
i)
Any string w accepted by starting at p is also accepted by
starting at q;
only futur
p
matter -o
p q
AND
w
i)
Any string w rejected by starting at p is also rejected by starting at q.
t doesn’t
p
w
43
Past
q
w
p≡q
Com
p
utin
g
e
q
uivalent states
pgq
in a DFA
Table Filling Algorithm
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
Cxx=
Dxxx=
Exxxx=
1
0
0
Fxxxxx=
Gxxx=xx=
H
x
x
=
x
x
x
x
=
Pass #0
1. Mark accepting states ≠ non-accepting states
H
x
x
=
x
x
x
x
=
ABCDEFGH
Pass #1
1. Compare every pair of states
2. Distinguish by one symbol transition
3. Mark = or ≠ or blank(tbd)
Pass #2
44
Pass
#2 1. Compare every pair of states
2. Distinguish by up to two symbol transitions (until different or same or tbd)
….
(keep repeating until table complete)
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
C=
D=
E=
1
0
0
F=
G=
H
=
H
=
ABCDEFGH
45
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
C=
D=
EXXXX=
1
0
0
FX=
GX=
H
X
=
H
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
46
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CX=
DX=
EXXXX=
1
0
0
FX=
GXX=
H
X
X
=
H
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
47
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXX=
EXXXX=
1
0
0
FX=
GXX X=
H
X
X
X
=
H
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
48
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXX=
GXXX X=
H
X
X
=
X
=
H
X
X
=
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
49
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=X=
H
X
X
=
X
X
=
H
X
X
=
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
50
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=XX=
H
X
X
=
X
X
X
=
H
X
X
=
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
51
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
52
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Pass 1:
Look 1- hop away for distinguishing states or strings
3. Pass 2:
Look 1
-
hop away again for distinguishing states or strings
53
Look
1
hop
away
again
for
distinguishing
states
or
strings
continue….
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
E
q
uivalences:
1. Mark Xbetween accepting vs. non-accepting state
2. Pass 1:
Look 1- hop away for distinguishing states or strings
3. Pass 2:
Look 1
-
hop away again for distinguishing states or strings
54
q
•A=B
•C=H
• D=G
Look
1
hop
away
again
for
distinguishing
states
or
strings
continue….
Table Fillin
g
Al
g
orithm -ste
p
gg
p
by step
A
C
E
G
0
1 0
0
1
A
C
E
0
1
1 0
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
A
C
E
D
F
0
1
1
0
0
1
1
0
0
0
1
Retrain only one copy for
E
q
uivalences:
Retrain
only
one
copy
for
each equivalence set of states
55
q
•A=B
•C=H
• D=G
Table Fillin
g
Al
g
orithm
–
gg
special case
A=
B=
A
C
E
G
0
1 0
0
1
C=
D=
E=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
?
F=
G=
H
=
1
0
0
H
=
ABCDEFGH
Q) What happens if the input DFA
has more than one final state?
56
Can all final states initially be treated
as equivalent to one another?
Putting it all together …
How to minimize a DFA?
Goal:
Minimize the number of states in
a DFA
Dth
fi t t l f th t t t t
Algorithm:
1.
Eliminate states unreachable from the
D
ep
th
-
fi
rs
t
t
raversa
l
f
rom
th
e s
t
ar
t
s
t
a
t
e
start state
2.
Identify and remove equivalent states
Table filling algorithm
3.
Output the resultant DFA
57
Are Two DFAs Equivalent?
q
DFA
1
Unified DFA
q
0
…
DFA
2
Is q
0
≡ q
0
’?
: if yes, then DFA
1
≡DFA
2
: else, not equiv.
q
0
’
…
1. Make a new dummy DFA by just putting together both DFAs
2. Run table-filling algorithm on the unified DFA
3
IF
the start states of both DFAs are fo nd to be eq i alent
58
3
.
IF
the
start
states
of
both
DFAs
are
fo
u
nd
to
be
eq
u
iv
alent
,
THEN:DFA
1
≡ DFA
2
ELSE:different
Summary
How to prove languages are not regular?
Pumping lemma & its applications
Closure properties of regular languages
Simplification of DFAs
How to remove unreachable states?
How to identif
y
and colla
p
se e
q
uivalent states?
ypq
How to minimize a DFA?
How to tell whether two DFAs are equivalent?
59
Grammers
Unit - 3
Overview
1.Grammars & Chomsky hierarchy
2.Regular Grammar (RG)
3.Equivalence of Left & Right linear grammar
4.Equivalence of RG & FA
5.Context Free Grammars (CFG): Definition, Sentential forms, Leftmost & Rightmost
derivations, Parse tree, Ambiguity.
6.Simplification & Applications.
7.Normal Forms: Chomsky Normal Forms (CNF) & Greibach Normal Forms (GNF).
8.CFLs - Pumping lemma, Closure properties
Regular grammar is a type of grammar that describes a regular language. A regular
grammar is a mathematical object, G, which consists of four components, G = (N, E, P,
S), where
●N: non-empty, finite set of non-terminal symbols,
●E: a finite set of terminal symbols, or alphabet, symbols,
●P: a set of grammar rules, each of one having one of the forms
○A ⇢ aB
○A⇢ a
○A ⇢∈, Here ∈=empty string, A, B ∈ N, a ∈ ∑
●S ∈ N is the start symbol
This grammar can be of two forms:
1.Right Linear Regular Grammar
2.Left Linear Regular Grammar
Right Linear Regular Grammar
In this type of regular grammar, all the non-terminals on the right-hand side exist at the
rightmost place, i.e; right ends.
Examples :
A ⇢ a, A ⇢ aB, A ⇢ ∈ where,A and B are non-terminals, a is terminal, and ∈ is empty
string
S ⇢ 00B | 11S
B ⇢ 0B | 1B | 0 | 1
where, S and B are non-terminals, and 0 and 1 are terminals
Left Linear Regular Grammar
In this type of regular grammar, all the non-terminals on the right-hand side exist at the
leftmost place, i.e; left ends.
Examples :
A ⇢ a, A ⇢ Ba, A ⇢ ∈
where, A and B are non-terminals, a is terminal, and ∈ is empty string
S ⇢ B00 | S11
B ⇢ B0 | B1 | 0 | 1
where S and B are non-terminals, and 0 and 1 are terminals
Context-Free Languages &
Grammars
(CFLs & CFGs)
Not all languages are regular
So what happens to the languages
which are not regular?
Can we still come up with a language
recognizer?
i.e., something that will accept (or reject)
strings that belong (or do not belong) to the
language?
Context-Free Languages
A language class larger than the class of regular
languages
Supports natural, recursive notation called “context-
free grammar”
Applications:
Parse trees, compilers
XML
Regular
(FA/RE)
Context-
free
(PDA/CFG)
4
An Example
A palindrome is a word that reads identical from both
ends
E.g., madam, redivider, malayalam, 010010010
Let L = { w | w is a binary palindrome}
Is L regular?
No.
Proof:
Let w=0
N
10
N (assuming N to be the p/l constant)
By Pumping lemma, w can be rewritten as xyz, such that xy
k
zis also L
(for any k≥0)
But |xy|≤N and y≠
==> y=0
+
==> xy
k
zwill NOT be in L for k=0
==> Contradiction
5
But the language of
palindromes…
is a CFL, because it supports recursive
substitution (in the form of a CFG)
This is because we can construct a
“grammar” like this:
1.A ==>
2.A ==> 0
3.A ==> 1
4.A ==> 0A0
5.A ==> 1A1
Terminal
Productions
Variable or non-terminal
How does this grammar work?
Same as:
A => 0A0 | 1A1 | 0 | 1 |
How does the CFG for
palindromes work?
An input string belongs to the language (i.e.,
accepted) iff it can be generated by the CFG
Example:w=01110
G can generate w as follows:
1.A => 0A0
2. => 01A10
3. => 01110
6
G:
A => 0A0 | 1A1 | 0 | 1 |
Generating a string from a grammar:
1.Pick and choose a sequence
of productions that would
allow us to generate the
string.
2.At every step, substitute one variable
with one of its productions.
7
Context-Free Grammar:
Definition
A context-free grammar G=(V,T,P,S), where:
V: set of variables or non-terminals
T: set of terminals (= alphabet U {})
P: set of productions,each of which is of the form
V ==>
1|
2| …
Where each
iis an arbitrary string of variables and
terminals
S ==> start variable
CFG for the language of binary palindromes:
G=({A},{0,1},P,A)
P: A ==> 0 A 0 | 1 A 1 | 0 | 1 |
8
More examples
Parenthesis matching in code
Syntax checking
In scenarios where there is a general need
for:
Matching a symbol with another symbol, or
Matching a count of one symbol with that of
another symbol, or
Recursively substituting one symbol with a string
of other symbols
9
Example #2
Language of balanced paranthesis
e.g., ()(((())))((()))….
CFG?
G:
S => (S) | SS |
How would you “interpret” the string “(((()))()())” using this grammar?
10
Example #3
A grammar for L = {0
m
1
n
| m≥n}
CFG? G:
S => 0S1 | A
A => 0A |
How would you interpret the string “00000111”
using this grammar?
11
Example #4
A program containing if-then(-else) statements
if Conditionthen StatementelseStatement
(Or)
ifConditionthenStatement
CFG?
More examples
L
1= {0
n
| n≥0 }
L
2= {0
n
| n≥1 }
L
3={0
i
1
j
2
k
| i=j or j=k, where i,j,k≥0}
L
4={0
i
1
j
2
k
| i=j or i=k, where i,j,k≥1}
12
13
Applications of CFLs & CFGs
Compilers use parsers for syntactic checking
Parsers can be expressed as CFGs
1.Balancing paranthesis:
B ==> BB | (B)| Statement
Statement ==> …
2.If-then-else:
S ==> SS | ifCondition thenStatement elseStatement| ifCondition
thenStatement| Statement
Condition ==> …
Statement ==> …
3.C paranthesis matching { … }
4.Pascal begin-end matching
5.YACC (Yet Another Compiler-Compiler)
15
Tag-Markup Languages
Roll ==> <ROLL>Class Students </ROLL>
Class ==> <CLASS>Text </CLASS>
Text ==> Char Text | Char
Char ==> a | b | … | z | A | B | .. | Z
Students ==> Student Students |
Student ==> <STUD>Text </STUD>
Here, the left hand side of each production denotes one non-terminals
(e.g., “Roll”, “Class”, etc.)
Those symbols on the right hand side for which no productions (i.e.,
substitutions) are defined are terminals (e.g., ‘a’, ‘b’, ‘|’, ‘<‘, ‘>’, “ROLL”,
etc.)
16
Structure of a production
A =======>
1|
2| … |
k
head
bodyderivation
1.A ==>
1
2.A ==>
2
3.A ==>
3
…
K. A ==>
k
The above is same as:
17
CFG conventions
Terminal symbols <== a, b, c…
Non-terminal symbols <== A,B,C, …
Terminal ornon-terminal symbols <== X,Y,Z
Terminal strings <== w, x, y, z
Arbitrary strings of terminals and non-
terminals <== , , , ..
18
Syntactic Expressions in
Programming Languages
result = a*b + score + 10 * distance + c
Regular languages have only terminals
Reg expression = [a-z][a-z0-1]*
If we allow only letters a & b, and 0 & 1 for
constants (for simplification)
Regular expression = (a+b)(a+b+0+1)*
terminalsvariablesOperators are also
terminals
19
String membership
How to say if a string belong to the language
defined by a CFG?
1.Derivation
Head to body
2.Recursive inference
Body to head
Example:
w = 01110
Is w a palindrome?
Both are equivalent forms
G:
A => 0A0 | 1A1 | 0 | 1 |
A => 0A0
=> 01A10
=> 01110
20
Simple Expressions…
We can write a CFG for accepting simple
expressions
G = (V,T,P,S)
V = {E,F}
T = {0,1,a,b,+,*,(,)}
S = {E}
P:
E ==> E+E | E*E | (E) | F
F ==> aF | bF | 0F | 1F | a | b | 0 | 1
21
Generalization of derivation
Derivation is head ==> body
A==>X (A derives X in a single step)
A ==>*
GX (A derives X in a multiple steps)
Transitivity:
IFA ==>*
GB, and B ==>*
GC, THEN A ==>*
GC
22
Context-Free Language
The language of a CFG, G=(V,T,P,S),
denoted by L(G), is the set of terminal
strings that have a derivation from the
start variable S.
L(G) = { w in T* | S ==>*
Gw }
23
Left-most & Right-most
Derivation Styles
Derive the string a*(ab+10) from G:
E
==> E* E
==> F* E
==> aF* E
==> a * E
==> a * (E)
==> a * (E+ E)
==> a * (F+ E)
==> a * (aF+ E)
==> a * (abF + E)
==> a * (ab + E)
==> a * (ab + F)
==> a * (ab + 1F)
==> a * (ab + 10F)
==> a * (ab + 10)
E =
*
=>
Ga*(ab+10)
Left-most
derivation:
E
==> E * E
==> E * (E)
==> E * (E + E)
==> E * (E + F)
==> E * (E + 1F)
==> E * (E + 10F)
==> E * (E+ 10)
==> E * (F+ 10)
==> E * (aF+ 10)
==> E * (abF + 0)
==>E * (ab + 10)
==>F * (ab + 10)
==> aF* (ab + 10)
==> a * (ab + 10)
Right-most
derivation:
G:
E => E+E | E*E | (E) | F
F => aF| bF| 0F | 1F |
Always
substitute
leftmost
variable
Always
substitute
rightmost
variable
24
Leftmost vs. Rightmost
derivations
Q1) For every leftmost derivation, there is a rightmost
derivation, and vice versa. True or False?
Q2) Does every word generated by a CFG have a
leftmost and a rightmost derivation?
Q3) Could there be words which have more than one
leftmost (or rightmost) derivation?
True -will use parse trees to prove this
Yes –easy to prove (reverse direction)
Yes –depending on the grammar
How to prove that your CFGs
are correct?
(using induction)
25
26
CFG & CFL
Theorem:A string w in (0+1)* is in
L(G
pal), if and only if, w is a palindrome.
Proof:
Use induction
on string length for the IF part
On length of derivation for the ONLY IF part
G
pal:
A => 0A0 | 1A1 | 0 | 1 |
Parse trees
27
28
Parse Trees
Each CFG can be represented using a parse tree:
Each internal nodeis labeled by a variable in V
Each leafis terminal symbol
For a production, A==>X
1X
2…X
k, then any internal node
labeled A has k children which are labeled from X
1,X
2,…X
k
from left to right
A
X
1 X
i X
k… …
Parse tree for production and all other subsequent productions:
A ==> X
1..X
i..X
k
29
Examples
E
E+ E
F F
a 1
Parse tree for a + 1
A
0A 0
1 1A
Parse tree for 0110
Recursive inference
Derivation
G:
E => E+E | E*E | (E) | F
F => aF| bF| 0F | 1F | 0 | 1 | a | b
G:
A => 0A0 | 1A1 | 0 | 1 |
30
Parse Trees, Derivations, and
Recursive Inferences
A
X
1 X
i X
k… …
Recursive inference
Derivation
Production:
A ==> X
1..X
i..X
k
Parse tree
Left-most
derivation
Right-most
derivation
Derivation
Recursive
inference
31
Interchangeability of different
CFG representations
Parse tree ==> left-most derivation
DFS left to right
Parse tree ==> right-most derivation
DFS right to left
==> left-most derivation == right-most
derivation
Derivation ==> Recursive inference
Reverse the order of productions
Recursive inference ==> Parse trees
bottom-up traversal of parse tree
Connection between CFLs
and RLs
32
33
CFLs & Regular Languages
A CFG is said to beright-linearif all the
productions are one of the following two
forms: A ==> wB (or) A ==> w
Theorem 1:Every right-linear CFG generates
a regular language
Theorem 2:Every regular language has a
right-linear grammar
Theorem 3:Left-linear CFGs also represent
RLs
Where:
•A & B are variables,
•w is a string of terminals
What kind of grammars result for regular languages?
Some Examples
34
A B C
0
1 0
1
0,1
A B C
0
1 0
1
1
0
A => 01B | C
B => 11B | 0C | 1A
C => 1A | 0 | 1
Right linear CFG?
Right linear CFG?
Finite Automaton?
Ambiguity in CFGs and CFLs
35
36
Ambiguity in CFGs
A CFG is said to be ambiguousif there
exists a string which has more than one
left-most derivation
LM derivation #1:
S => AS
=> 0A1S
=>0A11S
=> 00111S
=> 00111
Example:
S ==> AS |
A ==> A1 | 0A1 | 01
Input string: 00111
Can be derived in two ways
LM derivation #2:
S => AS
=> A1S
=> 0A11S
=> 00111S
=> 00111
37
Why does ambiguity matter?
string = a * b + c
E ==> E + E | E * E | (E) | a | b | c | 0 | 1
•LM derivation #1:
•E => E + E => E * E + E
==>* a * b + c
•LM derivation #2
•E => E * E => a * E =>
a * E + E ==>* a * b + c
E
E + E
E *E
a b
c
(a*b)+c
E
E
*
E
E+Ea
b c
a*(b+c)
Values are
different !!!
The calculated value depends on which
of the two parse trees is actually used.
38
Removing Ambiguity in
Expression Evaluations
It MAY be possible to remove ambiguity for
some CFLs
E.g.,, in a CFG for expression evaluation by
imposing rules & restrictions such as precedence
This would imply rewrite of the grammar
Precedence:(), * , +
How will this avoid ambiguity?
E => E + T | T
T => T * F | F
F => I | (E)
I => a | b | c | 0 | 1
Modified unambiguous version:
Ambiguous version:
E ==> E + E | E * E | (E) | a | b | c | 0 | 1
39
Inherently Ambiguous CFLs
However, for some languages, it may not be
possible to remove ambiguity
A CFL is said to be inherently ambiguousif
every CFG that describes it is ambiguous
Example:
L = { a
n
b
n
c
m
d
m
| n,m≥ 1} U {a
n
b
m
c
m
d
n
| n,m≥ 1}
L is inherently ambiguous
Why?
Input string: a
n
b
n
c
n
d
n
Objective
1.What is a linear grammar? What is a left linear
grammar? What is a right linear grammar?
2.Left linear grammars are evil –why?
3.What algorithm can be used to convert a left
linear grammar to a right linear grammar?
1
Linear grammar
•A linear grammaris a context-free grammar
that has at most one non-terminal symbol on
the right hand side of each grammar rule.
–A rule may have just terminal symbols on the right
hand side (zero non-terminals).
•Here is a linear grammar:
2
S → aA
A → aBb
B → Bb
Left linear grammar
•A left linear grammaris a linear grammar in
which the non-terminal symbol always occurs
on the left side.
•Here is a left linear grammar:
S → Aa
A → ab
3
Right linear grammar
•A right linear grammaris a linear grammar in
which the non-terminal symbol always occurs
on the right side.
•Here is a right linear grammar:
S → abaA
A → ε
4
Left linear grammars are evil
•Consider this rule from a left linear grammar:
A → Babc
•Can that rule be used to recognize this string:
abbabc
•We need to check the rule for B:
B → Cb| D
•Now we need to check the rules for Cand D.
•This is very complicated.
•Left linear grammars require complex parsers.
5
Right linear grammars are good
•Consider this rule from a right linear grammar:
A → abcB
•Can that rule be used to recognize this string:
abcabb
•We immediately see that the first part of the
string –abc–matches the first part of the rule.
Thus, the problem simplifies to this: can the rule
for Bbe used to recognize this string :
abb
•Parsers for right linear grammars are much
simpler.
6
Convert left linear to right linear
Now we will see an algorithm for converting any
left linear grammar to its equivalent right linear
grammar.
S → Aa
A → ab
left linear
Both grammars generate this language: {aba}
S → abaA
A → ε
right linear
7
May need to make a new start symbol
The algorithm on the following slides assume
that the left linear grammar doesn’t have any
rules with the start symbol on the right hand
side.
–If the left linear grammar has a rule with the start
symbol Son the right hand side, simply add this
rule:
S
0→ S
8
Symbols used by the algorithm
•Let Sdenote the start symbol
•Let A, Bdenote non-terminal symbols
•Let pdenote zero or more terminal symbols
•Let εdenote the empty symbol
9
Algorithm
1)If the left linear grammar has a rule S →p, then
make that a rule in the right linear grammar
2)If the left linear grammar has a rule A →p, then add
the following rule to the right linear grammar:
S →pA
3)If the left linear grammar has a rule B →Ap, add the
following rule to the right linear grammar:
A →pB
4)If the left linear grammar has a rule S →Ap, then
add the following rule to the right linear grammar:
A →p
10
Convert this left linear grammar
11
S → Aa
A → ab
left linear
Right hand side has terminals
12
S → Aa
A → ab
left linear
2) If the left linear grammar has this rule A →p,
then add the following rule to the right linear
grammar: S →pA
S → abA
right linear
Right hand side of S has non-terminal
13
S → Aa
A → ab
left linear
4) If the left linear grammar has S →Ap, then
add the following rule to the right linear
grammar: A →p
S → abA
A → a
right linear
Equivalent!
14
S → Aa
A → ab
left linear
Both grammars generate this language: {aba}
S → abA
A → a
right linear
Convert this left linear grammar
15
left linear
S
0→ S
S → Ab
S → Sb
A → Aa
A → a
S → Ab
S → Sb
A → Aa
A → a
original grammar
make a new
start symbol
Convert this
Right hand sidehas terminals
16
S
0→ S
S → Ab
S → Sb
A → Aa
A → a
left linear
S
0→ aA
right linear
2) If the left linear grammar has this rule A →p,
then add the following rule to the right linear
grammar: S →pA
Right hand sidehas non-terminal
17
S
0→ S
S → Ab
S → Sb
A → Aa
A → a
left linear
S
0→ aA
A → bS
A → aA
S → bS
right linear
3) If the left linear grammar has a rule B →Ap, add the
following rule to the right linear grammar: A →pB
Right hand sideof start symbol has
non-terminal
18
S
0→ S
S → Ab
S → Sb
A → Aa
A → a
left linear
S
0→ aA
A → bS
A → aA
S → bS
S → ε
right linear
4) If the left linear grammar has S →Ap, then
add the following rule to the right linear
grammar: A →p
Equivalent!
19
S
0→ S
S → Ab
S → Sb
A → Aa
A → a
left linear
S
0→ aA
A → bS
A → aA
S → bS
S → ε
right linear
Both grammars generate this language: {a
+
b
+
}
Will the algorithm always work?
•We have seen two examples where the
algorithm creates a right linear grammar that
is equivalent to the left linear grammar.
•But will the algorithm alwaysproduce an
equivalent grammar?
•Yes! The following slide shows why.
20
Generate string p
•Let p = a string generated by the left linear
grammar.
•We will show that the grammar generated by
the algorithm also produces p.
21
Case 1: the start symbol produces p
Suppose the left linear grammar has this rule:
S →p. Then the right linear grammar will
have the same rule (see 1 below). So the right
linear grammar will also produce p.
1)If the left linear grammar contains S →p, then put that rule in the right linear grammar.
2)If the left linear grammar contains A →p, then put this rule in the right linear grammar: S →pA
3)If the left linear grammar contains B →Ap, then put this rule in the right linear grammar: A →pB
4)If the left linear grammar contains S →Ap, then put this rule in the right linear grammar: A →p
Algorithm:
22
Case 2: multiple rules needed
to produce p
Suppose p is produced by a sequence of n
production rules:
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1 p (p is composed of n symbols)
23
Case 2 (continued)
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
Let’s see what right linear rules will be generated
by the algorithm for the rules implied by this
production sequence.
24
Algorithm inputs and outputs
algorithm
left linear rules
right linear rules
25
Case 2 (continued)
1)If the left linear grammar contains S →p, then put that rule in the right linear grammar.
2)If the left linear grammar contains A →p, then put this rule in the right linear grammar: S →pA
3)If the left linear grammar contains B →Ap, then put this rule in the right linear grammar: A →pB
4)If the left linear grammar contains S →Ap, then put this rule in the right linear grammar: A →p
Algorithm:
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
S →A
1p
1
algorithm
A
1→p
1(see 4 below)
26
Case 2 (continued)
1)If the left linear grammar contains S →p, then put that rule in the right linear grammar.
2)If the left linear grammar contains A →p, then put this rule in the right linear grammar: S →pA
3)If the left linear grammar contains B →Ap, then put this rule in the right linear grammar: A →pB
4)If the left linear grammar contains S →Ap, then put this rule in the right linear grammar: A →p
Algorithm:
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
A
1→A
2p
2
algorithm
A
2→p
2A
1(see 3 below)
27
Case 2 (continued)
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
S →A
1p
1
A
1→A
2p
2
algorithm
A
1→p
1
A
2→p
2A
1
28
Case 2 (continued)
S →A
1p
1
A
1→A
2p
2
algorithm
A
1→p
1
A
2→p
2A
1
From A
2we obtain p
2p
1:
A
2→p
2A
1
→p
2p
1
29
Case 2 (continued)
1)If the left linear grammar contains S →p, then put that rule in the right linear grammar.
2)If the left linear grammar contains A →p, then put this rule in the right linear grammar: S →pA
3)If the left linear grammar contains B →Ap, then put this rule in the right linear grammar: A →pB
4)If the left linear grammar contains S →Ap, then put this rule in the right linear grammar: A →p
Algorithm:
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
A
2→A
3p
3
algorithm
A
3→p
3A
2(see 3 below)
30
Case 2 (continued)
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2
31
Case 2 (continued)
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2
From A
3we obtain p
3p
2p
1:
A
3→p
3A
2
→p
3p
2A
1
→p
3p
2p
1
32
Case 2 (continued)
1)If the left linear grammar contains S →p, then put that rule in the right linear grammar.
2)If the left linear grammar contains A →p, then put this rule in the right linear grammar: S →pA
3)If the left linear grammar contains B →Ap, then put this rule in the right linear grammar: A →pB
4)If the left linear grammar contains S →Ap, then put this rule in the right linear grammar: A →p
Algorithm:
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
A
n-1→p
n
algorithm
S →p
nA
n-1(see 2 below)
33
Case 2 (concluded)
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3
…
A
n-1→p
n
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2
…
A
n-1→p
n-1A
n-2
S →p
nA
n-1
From S we obtain p
n…p
2p
1:
S→p
nA
n-1
→p
np
n-1A
n-2
→…
→p
np
n-1…p
3A
2
→p
np
n-1…p
3p
2A
1
→p
n…p
3p
n-1p
2p
1(this is the desired string, p)
34
We have shown that for any string p
generated by the left linear grammar,
the right linear grammar produced by
the algorithm will also generate p.
35
How we understand the algorithm
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3
…
A
n-1→p
n
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2
…
A
n-1→p
n-1A
n-2
S →p
nA
n-1
These rules descend through the non-terminals until
reaching a rule with terminals on the RHS, the terminals
are output, then we unwind from the descent and output
the terminals.
Make the rule with terminals on the RHS the starting rule
and output its terminals. Ascend through the other rules.
36
How we understand the algorithm
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3
…
A
n-1→p
n
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2
…
A
n-1→p
n-1A
n-2
S →p
nA
n-1
If the left linear grammar contains A →p,
then put this rule in the right linear grammar:
S →pA
37
How we understand the algorithm
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3
…
A
n-1→p
n
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2
…
A
n-1→p
n-1A
n-2
S →p
nA
n-1
If the left linear grammar contains B →Ap,
then put this rule in the right linear grammar:
A →pB
38
How we understand the algorithm
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3
…
A
n-1→p
n
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2
…
A
n-1→p
n-1A
n-2
S →p
nA
n-1
If the left linear grammar contains S →Ap,
then put this rule in the right linear grammar:
A →p
39
Left-linear grammars generate
Type 3 languages
•Every left-linear grammar can be converted to
an equivalent right-linear grammar.
–“Equivalent right-linear grammar” means the
grammar generate the same language.
•Right-linear grammars generate Type 3
languages.
•Therefore, every left-linear grammar
generates a Type 3 language.
40
1
Properties of Context-free
Languages
1)Simplifying CFGs, Normal forms
2)Pumping lemma for CFLs
3)Closure and decision properties
of CFLs
2
Three ways to simplify/clean a CFG
(clean)
1.Eliminate useless symbols
(simplify)
2.Eliminate -productions
3.Eliminate unit productions
A =>
A => B
3
Eliminating useless symbols
Grammar cleanup
4
Eliminating useless symbols
A symbol X is reachableif there exists:
S
*
X
A symbol X is generatingif there exists:
X
*
w,
for some w T*
For a symbol X to be “useful”, it has to be both
reachable andgenerating
S
*
X
*
w’, for some w’ T*
reachablegenerating
5
Algorithm to detect useless
symbols
1.First, eliminate all symbols that are not
generating
2.Next, eliminate all symbols that are not
reachable
Is the order of these steps important,
or can we switch?
6
Example: Useless symbols
SAB | a
Ab
1.A, Sare generating
2.B is not generating (and therefore B is useless)
3.==> Eliminating B… (i.e., remove all productions that involve B)
1.Sa
2.A b
4.Now, A is not reachable and therefore is useless
5.Simplified G:
1.S a
What would happen if you reverse the order:
i.e., test reachability before generating?
Will fail to remove:
A b
7
Algorithm to find all generating symbols
Given:G=(V,T,P,S)
Basis:
Every symbol in T is obviously generating.
Induction:
Suppose for a production A, where
is generating
Then, A is also generating
X
*
w
8
Algorithm to find all reachable symbols
Given:G=(V,T,P,S)
Basis:
S is obviously reachable (from itself)
Induction:
Suppose for a production A
1
2…
k,
where A is reachable
Then, all symbols on the right hand side,
{
1,
2,…
k} are also reachable.
S
*
X
9
Eliminating -productions
A =>
10
Eliminating -productions
Caveat:It is not possible to eliminate -productions for
languages which include in their word set
Theorem:If G=(V,T,P,S) is a CFG for a language L,
then L\{} has a CFG without -productions
Definition:A is “nullable” if A*
If A is nullable, then any production of the form
“BCAD” can be simulated by:
B CD | CAD
This can allow us to remove transitions for A
A
So we will target the grammar for the restof the language
What’s the point of removing -productions?
11
Algorithm to detect all nullable
variables
Basis:
If Ais a production in G, then A is
nullable
(note: A can still have other productions)
Induction:
If there is a production BC
1C
2…C
k,
where every C
iis nullable, then B is also
nullable
12
Eliminating -productions
Given:G=(V,T,P,S)
Algorithm:
1.Detect all nullablevariables in G
2.Then construct G
1=(V,T,P
1,S) as follows:
i.For each production of the form: AX
1X
2…X
k,where
k≥1, suppose mout of the kX
i’s are nullablesymbols
ii.Then G
1will have 2
m
versions for this production
i.i.e, all combinations where each X
iis either present or absent
iii.Alternatively, if a production is of the form: A,then
remove it
13
Example: Eliminating -
productions
Let L be the language represented by the following CFG G:
i. SAB
ii.AaAA |
iii.BbBB |
Goal:To construct G1, which is the grammar for L-{}
Nullable symbols: {A, B}
G
1can be constructed from G as follows:
B b | bB | bB | bBB
==> B b | bB | bBB
Similarly, A a | aA | aAA
Similarly, S A | B | AB
Note:L(G) = L(G
1) U {}
G
1:
•S A | B | AB
•A a | aA | aAA
•B b | bB | bBB
•S
+
Simplified
grammar
14
Eliminating unit productions
A => B
B has to be a variable
What’s the point of removing unittransitions ?
A=>B | …
B=>C | …
C=>D | …
D=>xxx | yyy| zzz
A=>xxx | yyy| zzz| …
B=> xxx | yyy| zzz| …
C=> xxx | yyy| zzz| …
D=>xxx | yyy| zzz
Will save #substitutions
E.g.,
before after
15
Eliminating unit productions
Unit production is one which is of the form AB, where both A & B are
variables
E.g.,
1.E T| E+T
2.T F| T*F
3.F I| (E)
4.I a | b | Ia| Ib| I0 | I1
How to eliminate unit productions?
Replace ET with E F | T*F
Then, upon recursive application wherever there is a unit production:
EF | T*F | E+T (substituting for T)
EI | (E) | T*F| E+T (substituting
for F)
Ea | b | Ia| Ib| I0 | I1 | (E) | T*F | E+T (substituting
for I)
Now, E has no unit productions
Similarly, eliminate for the remainder of the unit productions
A B
16
The Unit Pair Algorithm:
to remove unit productions
Suppose AB
1B
2… B
n
Action:Replace all intermediate productions to produce
directly
i.e., A; B
1; … B
n;
Definition: (A,B) to be a “unit pair” if A
*
B
We can find all unit pairs inductively:
Basis:Every pair (A,A) is a unit pair (by definition). Similarly, if
AB is a production, then (A,B) is a unit pair.
Induction:If (A,B) and (B,C) are unit pairs, and AC is also a unit
pair.
17
The Unit Pair Algorithm:
to remove unit productions
Input:G=(V,T,P,S)
Goal:to build G
1=(V,T,P
1,S) devoid of unit
productions
Algorithm:
1.Find all unit pairs in G
2.For each unit pair (A,B) in G:
1.Add to P
1a new production A, for every
Bwhich is a non-unit production
2.If a resulting production is already there in P,
then there is no need to add it.
18
Example: eliminating unit
productions
G:
1.E T | E+T
2.T F | T*F
3.F I | (E)
4.I a | b | Ia | Ib | I0 | I1
Unit pairs Only non-unit
productions to be
added to P
1
(E,E) E E+T
(E,T) E T*F
(E,F) E (E)
(E,I) E a|b|Ia| Ib| I0 | I1
(T,T) T T*F
(T,F) T (E)
(T,I) T a|b| Ia | Ib | I0 | I1
(F,F) F (E)
(F,I) F a| b| Ia | Ib | I0 |
I1
(I,I) I a| b | Ia| Ib| I0 |
I1
G
1:
1.E E+T | T*F | (E) | a| b | Ia | Ib | I0 | I1
2.T T*F | (E) | a| b | Ia | Ib | I0 | I1
3.F (E) | a| b | Ia | Ib | I0 | I1
4.I a | b | Ia | Ib | I0 | I1
19
Putting all this together…
Theorem:If G is a CFG for a language that
contains at least one string other than , then there
is another CFG G
1, such that L(G
1)=L(G) -, and
G
1has:
no -productions
no unit productions
no useless symbols
Algorithm:
Step 1)eliminate -productions
Step 2)eliminate unit productions
Step 3)eliminate useless symbols
Again,
the order is
important!
20
Normal Forms
21
Why normal forms?
If all productions of the grammar could be
expressed in the same form(s), then:
a.It becomes easy to design algorithms that use
the grammar
b.It becomes easy to show proofs and properties
22
Chomsky Normal Form (CNF)
Let G be a CFG for some L-{}
Definition:
G is said to be in Chomsky Normal Form if all
its productions are in one of the following
two forms:
i.A BC where A,B,C are variables, or
ii.A a where a is a terminal
G has no useless symbols
G has no unit productions
G has no -productions
23
How to convert a G into CNF?
Assumption:G has no -productions, unit productions or useless
symbols
1) For every terminal athat appears in the body of a production:
i. create a unique variable, say X
a, with a production X
aa, and
ii.replace all other instances of ain G by X
a
2)Now, all productions will be in one of the following
two forms:
A B
1B
2… B
k(k≥3) or Aa
3) Replace each production of the form A B
1B
2B
3… B
kby:
AB
1C
1C
1B
2C
2… C
k-3B
k-2C
k-2C
k-2B
k-1B
k
B
1 C
1
B
2 C
2
and so on…
Example #1
24
G:
S => AS | BABC
A => A1 | 0A1 | 01
B => 0B | 0
C => 1C | 1
X
0=> 0
X
1=> 1
S => AS | BY
1
Y
1=> AY
2
Y
2=> BC
A => AX
1| X
0Y
3| X
0X
1
Y
3=> AX
1
B => X
0B | 0
C => X
1C | 1
G in CNF:
All productions are of the form: A=>BC or A=>a
25
Example #2
G:
1.E E+T | T*F | (E) | Ia | Ib | I0 | I1
2.T T*F | (E) | Ia | Ib | I0 | I1
3.F (E) | Ia | Ib | I0 | I1
4.I a | b | Ia | Ib | I0 | I1
1.E EX
+T | TX
*F | X
(EX
)| IX
a| IX
b| IX
0| IX
1
2.T TX
*F | X
(EX
)| IX
a| IX
b| IX
0| IX
1
3.F X
(EX
)| IX
a| IX
b| IX
0| IX
1
4.I X
a| X
b| IX
a| IX
b| IX
0| IX
1
5.X
++
6.X
**
7.X
++
8.X
((
9.…….
Step (1)
1.E EC
1| TC
2| X
(C
3| IX
a| IX
b| IX
0| IX
1
2.C
1X
+T
3.C
2X
*F
4.C
3EX
)
5.T ..…….
6.….
26
Languages with
For languages that include ,
Write down the rest of grammar in CNF
Then add production “S => ”at the end
G:
S => AS | BABC
A => A1 | 0A1 | 01 |
B => 0B | 0 |
C => 1C | 1 |
G in CNF:E.g., consider:
X
0=> 0
X
1=> 1
S => AS | BY
1
Y
1=> AY
2
Y
2=> BC
A => AX
1| X
0Y
3| X
0X
1
Y
3=> AX
1
B => X
0B | 0
C => X
1C | 1
|
27
Other Normal Forms
Griebach Normal Form (GNF)
All productions of the form
A==>a
28
Return of the Pumping Lemma !!
Think of languages that cannot be CFL
== think of languages for which a stack will not be enough
e.g., the language of strings of the form ww
29
Why pumping lemma?
A result that will be useful in proving
languages that are not CFLs
(just like we did for regular languages)
But before we prove the pumping
lemma for CFLs ….
Let us first prove an important property
about parse trees
30
The “parse tree theorem”
Given:
Suppose we have a
parse tree for a
string w, according
to a CNF grammar,
G=(V,T,P,S)
Let hbe the height of
the parse tree
Implies:
|w| ≤ 2
h-1
w
Parse tree for w
S = A
0
A
1
A
2
A
h-1
h
= tree height
a
In other words, a CNF parse tree’s string yield (w)
can no longer be 2
h-1
Observe that any parse tree generated by a CNF will be a
binary tree, where all internal nodes have exactly two children
(except those nodes connected to the leaves).
31
Implication of the Parse Tree
Theorem (assuming CNF)
Fact:
If the height of a parse tree is h, then
==> |w| ≤ 2
h-1
Implication:
If |w| ≥ 2
m
, then
Its parse tree’s height is at leastm+1
32
The Pumping Lemma for CFLs
Let L be a CFL.
Then there exists a constant N, s.t.,
if z L s.t. |z|≥N, then we can write
z=uvwxy, such that:
1.|vwx| ≤ N
2.vx≠
3.For all k≥0: uv
k
wx
k
y L
Note:we are pumping in two places (v & x)
33
Proof: Pumping Lemma for CFL
If L=Φor contains only , then the lemma is
trivially satisfied (as it cannot be violated)
For any other L which is a CFL:
Let G be a CNF grammar for L
Let m = number of variables in G
Choose N=2
m
.
Pick any z L s.t. |z|≥ N
the parse tree for z should have a height ≥ m+1
(by the parse tree theorem)
34
Parse tree for z
z
S = A
0
A
1
A
2
A
h-1
h ≥ m+1
z = uvwxy
S = A
0
A
i
A
j
h ≥ m+1
u
w
y
v x
•Therefore, vx≠
h-m≤ i < j ≤ h
m+1
A
i = A
j
Meaning:
Repetition in the
last m+1 variables
A
h=a
+
35
Extending the parse tree…
z = uv
k
wx
k
y
S = A
0
A
i=A
j
A
i
h ≥ m+1
u
w
y
v x
Replacing
A
jwith A
i
(k times)
v x
A
i
==> For all k≥0: uv
k
wx
k
y L
z = uwy
S = A
0
A
j
u
w
y
Or, replacing
A
iwith A
j
36
Proof contd..
•Also, since A
i’s subtree no taller than m+1
==> the string generated under A
i‘s subtree, which is
vwx, cannot be longer than 2
m
(=N)
But, 2
m
=N
==> |vwx| ≤ N
This completes the proof for the pumping lemma.
37
Application of Pumping
Lemma for CFLs
Example 1: L = {a
m
b
m
c
m
| m>0 }
Claim:L is not a CFL
Proof:
Let N <== P/L constant
Pick z = a
N
b
N
c
N
Apply pumping lemma to z and show that there
exists at least one other string constructed from z
(obtained by pumping up or down) that is L
38
Proof contd…
z = uvwxy
As z = a
N
b
N
c
N
and|vwx| ≤ N andvx≠
==> v, x cannot contain all three symbols
(a,b,c)
==> we can pump up or pump down to build
another string which is L
39
Example #2 for P/L application
L = { ww | w is in {0,1}*}
Show that L is not a CFL
Try string z = 0
N
0
N
what happens?
Try string z = 0
N
1
N
0
N
1
N
what happens?
40
Other examples
L = { 0
k
2
| k is any integer)
L = {a
i
b
j
c
k
| i<j<k }
41
CFL Closure Property Results
CFLs are closed under:
Union
Concatenation
Kleene closure operator
Substitution
Homomorphism, inverse homomorphism
reversal
CFLs are not closed under:
Intersection
Difference
Complementation
Note: Reg languages
are closed
under
these
operators
42
Strategy for Closure Property
Proofs
First prove “closure under substitution”
Using the above result, prove other closure properties
CFLs are closed under:
Union
Concatenation
Kleene closure operator
Substitution
Homomorphism, inverse homomorphism
Reversal
Prove
this first
43
The Substitutionoperation
For each a ∑, then let s(a)be a language
If w=a
1a
2…a
nL, then:
s(w) = { x
1x
2 …}s(L), s.t., x
is(a
i)
Example:
Let ∑={0,1}
Let: s(0) = {a
n
b
n
| n ≥1},s(1) = {aa,bb}
If w=01, s(w)=s(0).s(1)
E.g., s(w) contains a
1
b
1
aa, a
1
b
1
bb,
a
2
b
2
aa, a
2
b
2
bb,
…and so on.
Note: s(L) can use
a different alphabet
44
CFLs are closed under
Substitution
IF L is a CFL and a substititution defined
on L, s(L), is s.t., s(a) is a CFL for every
symbol a, THEN:
s(L) is also a CFL
L
w
1
w
2
w
3
w
4
…
s(L)
s(L)
s(w
1)
s(w
2)
s(w
3)
s(w
4)
…
Note:each s(w)
is itself a set of strings
What is s(L)?
45
CFLs are closed under
Substitution
G=(V,T,P,S) : CFG for L
Because every s(a) is a CFL, there is a CFG for each s(a)
Let G
a= (V
a,T
a,P
a,S
a)
Construct G’=(V’,T’,P’,S) for s(L)
P’ consists of:
The productions of P, but with every occurrence of terminal “a” in
their bodies replaced by S
a.
All productions in any P
a, for any a ∑
x
1 x
2 x
n
…
S
S
a
1
S
a
2
S
a
n
Parse tree for G’:
Substitution of a CFL:
example
Let L = language of binary palindromes s.t., substitutions for 0
and 1 are defined as follows:
s(0) = {a
n
b
n
| n ≥1},s(1) = {xx,yy}
Prove that s(L) is also a CFL.
46
CFG for L:
S=> 0S0|1S1|
CFG for s(0):
S
0=> aS
0b | ab
CFG for s(1):
S
1=> xx | yy
Therefore, CFG for s(L):
S=> S
0SS
0|S
1 SS
1 |
S
0=> aS
0b | ab
S
1=> xx | yy
47
CFLs are closed under union
Let L
1and L
2be CFLs
To show:L
2U L
2is also a CFL
Make a new language:
L
new= {a,b} s.t., s(a) = L
1and s(b) = L
2
==> s(L
new) == same as == L
1U L
2
A more direct, alternative proof
Let S
1and S
2be the starting variables of the
grammars for L
1and L
2
Then, S
new=> S
1| S
2
Let us show by using the result of Substitution
48
CFLs are closed under
concatenation
Let L
1and L
2be CFLs
Make L
new= {ab} s.t.,
s(a) = L
1and s(b)= L
2
==> L
1L
2= s(L
new)
A proof without using substitution?
Let us show by using the result of Substitution
49
CFLs are closed under
Kleene Closure
Let L be a CFL
Let L
new= {a}* and s(a) = L
1
Then, L* = s(L
new)
50
CFLs are closed under
Reversal
Let L be a CFL, with grammar
G=(V,T,P,S)
For L
R
, construct G
R
=(V,T,P
R
,S) s.t.,
If A==> is in P, then:
A==>
R
is in P
R
(that is, reverse every production)
We won’t use substitution to prove this result
51
CFLs are not closed under
Intersection
Existential proof:
L
1= {0
n
1
n
2
i
| n≥1,i≥1}
L
2= {0
i
1
n
2
n
| n≥1,i≥1}
Both L
1and L
2are CFLs
Grammars?
But L
1L
2cannotbe a CFL
Why?
We have an example, where intersection is
not closed.
Therefore, CFLs are not closed under
intersection
Some negativeclosure results
52
CFLs are not closed under
complementation
Follows from the fact that CFLs are not
closed under intersection
L
1L
2 = L
1U L
2
Some negativeclosure results
Logic: if CFLs were to be closed under complementation
the whole right hand side becomes a CFL (because
CFL is closed for union)
the left hand side (intersection) is also a CFL
but we just showed CFLs are
NOT closed under intersection!
CFLs cannot be closed under complementation.
53
CFLs are not closed under
difference
Follows from the fact that CFLs are not
closed under complementation
Because, if CFLs are closed under
difference, then:
L = ∑* -L
So L has to be a CFL too
Contradiction
Some negativeclosure results
54
Decision Properties
Emptiness test
Generating test
Reachability test
Membership test
PDA acceptance
55
“Undecidable” problems for
CFL
Is a given CFG G ambiguous?
Is a given CFL inherently ambiguous?
Is the intersection of two CFLs empty?
Are two CFLs the same?
Is a given L(G) equal to ∑*?