TCS GOLDEN NOTES THEORY OF COMPUTATION .pdf

userqwerty2612 67 views 233 slides Sep 23, 2024
Slide 1
Slide 1 of 344
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146
Slide 147
147
Slide 148
148
Slide 149
149
Slide 150
150
Slide 151
151
Slide 152
152
Slide 153
153
Slide 154
154
Slide 155
155
Slide 156
156
Slide 157
157
Slide 158
158
Slide 159
159
Slide 160
160
Slide 161
161
Slide 162
162
Slide 163
163
Slide 164
164
Slide 165
165
Slide 166
166
Slide 167
167
Slide 168
168
Slide 169
169
Slide 170
170
Slide 171
171
Slide 172
172
Slide 173
173
Slide 174
174
Slide 175
175
Slide 176
176
Slide 177
177
Slide 178
178
Slide 179
179
Slide 180
180
Slide 181
181
Slide 182
182
Slide 183
183
Slide 184
184
Slide 185
185
Slide 186
186
Slide 187
187
Slide 188
188
Slide 189
189
Slide 190
190
Slide 191
191
Slide 192
192
Slide 193
193
Slide 194
194
Slide 195
195
Slide 196
196
Slide 197
197
Slide 198
198
Slide 199
199
Slide 200
200
Slide 201
201
Slide 202
202
Slide 203
203
Slide 204
204
Slide 205
205
Slide 206
206
Slide 207
207
Slide 208
208
Slide 209
209
Slide 210
210
Slide 211
211
Slide 212
212
Slide 213
213
Slide 214
214
Slide 215
215
Slide 216
216
Slide 217
217
Slide 218
218
Slide 219
219
Slide 220
220
Slide 221
221
Slide 222
222
Slide 223
223
Slide 224
224
Slide 225
225
Slide 226
226
Slide 227
227
Slide 228
228
Slide 229
229
Slide 230
230
Slide 231
231
Slide 232
232
Slide 233
233
Slide 234
234
Slide 235
235
Slide 236
236
Slide 237
237
Slide 238
238
Slide 239
239
Slide 240
240
Slide 241
241
Slide 242
242
Slide 243
243
Slide 244
244
Slide 245
245
Slide 246
246
Slide 247
247
Slide 248
248
Slide 249
249
Slide 250
250
Slide 251
251
Slide 252
252
Slide 253
253
Slide 254
254
Slide 255
255
Slide 256
256
Slide 257
257
Slide 258
258
Slide 259
259
Slide 260
260
Slide 261
261
Slide 262
262
Slide 263
263
Slide 264
264
Slide 265
265
Slide 266
266
Slide 267
267
Slide 268
268
Slide 269
269
Slide 270
270
Slide 271
271
Slide 272
272
Slide 273
273
Slide 274
274
Slide 275
275
Slide 276
276
Slide 277
277
Slide 278
278
Slide 279
279
Slide 280
280
Slide 281
281
Slide 282
282
Slide 283
283
Slide 284
284
Slide 285
285
Slide 286
286
Slide 287
287
Slide 288
288
Slide 289
289
Slide 290
290
Slide 291
291
Slide 292
292
Slide 293
293
Slide 294
294
Slide 295
295
Slide 296
296
Slide 297
297
Slide 298
298
Slide 299
299
Slide 300
300
Slide 301
301
Slide 302
302
Slide 303
303
Slide 304
304
Slide 305
305
Slide 306
306
Slide 307
307
Slide 308
308
Slide 309
309
Slide 310
310
Slide 311
311
Slide 312
312
Slide 313
313
Slide 314
314
Slide 315
315
Slide 316
316
Slide 317
317
Slide 318
318
Slide 319
319
Slide 320
320
Slide 321
321
Slide 322
322
Slide 323
323
Slide 324
324
Slide 325
325
Slide 326
326
Slide 327
327
Slide 328
328
Slide 329
329
Slide 330
330
Slide 331
331
Slide 332
332
Slide 333
333
Slide 334
334
Slide 335
335
Slide 336
336
Slide 337
337
Slide 338
338
Slide 339
339
Slide 340
340
Slide 341
341
Slide 342
342
Slide 343
343
Slide 344
344

About This Presentation

TCS , THEORY OF COMPUTATION


Slide Content

Theory of computer science
Unit 1
Basic Concepts and Finite Automata

Overview
●Importance of TCS
●Alphabets, Strings, Languages, Closure properties.
●Finite Automata (FA) and Finite State machine (FSM).
●Deterministic Finite Automata (DFA) and Nondeterministic Finite Automata
(NFA): Definitions, transition diagrams and Language recognizers
●NFA to DFA Conversion
●Equivalence between NFA with and without ε- transitions
●Minimization of DFA
●FSM with output: Moore and Mealy machines, Equivalence
●Applications and limitations of FA

What is automata theory ?
●Study of abstract computing devices, or “machines”
●Automaton = an abstract computing device
●Note: A “device” need not even be a physical hardware!
●A fundamental question in computer science:
●Find out what different models of machines can do and cannot do
●The theory of computation

Theory of Computation: A Historical Perspective

Alphabet
An alphabet is a finite, non-empty set of symbols
We use the symbol ∑ (sigma) to denote an alphabet
Examples:
1.Binary: ∑ = {0,1}
2.All lower case letters: ∑ = {a,b,c,..z}
3.Alphanumeric: ∑ = {a-z, A-Z, 0-9}
4.DNA molecule letters: ∑ = {a,c,g,t}

Strings
A string or word is a finite sequence of symbols chosen from ∑
1.Empty string is ε (or “epsilon”): empty string is the string with zero
occurrences of symbols.

2.Length of a string w, denoted by “|w|”, is equal to the number of (non- e)
characters in the string
E.g., x = 010100 |x| = 6
x = 01 ε 0 ε 1 ε 00 ε |x| = ?
NOTE: |ε| = 0

Power of an alphabet
Let ∑ be an alphabet.

k
= the set of all strings of length k
∑* = ∑
0
U ∑
1
U ∑
2
U … is set of all strings over an alphabet ∑

+
= ∑
1
U ∑
2
U ∑
3
U …

NOTE : ∑
0
={ε} regardless of what alphabet ∑ is, i.e ε is the only string whose
length is 0.

1
= {0,1}


2
= {00,01,11,10} so on….

Language
L is said to be a language over alphabet ∑, only if L ⊆ ∑*
This is because ∑* is the set of all strings (of all possible length including 0)
over the given alphabet ∑
Examples:
1.Let L be the language of all strings consisting of n 0’s followed by n 1’s:
L = {e, 01, 0011, 000111,…}
2.Let L be the language of all strings of with equal number of 0’s and 1’s:
L = {e, 01, 10, 0011, 1100, 0101, 1010, 1001,…}

Question

Finite Automata
Some Applications:
●Software for designing and checking the behavior of digital circuits
●Lexical analyzer of a typical compiler
●Software for scanning large bodies of text (e.g., web pages) for pattern
finding
●Software for verifying systems of all types that have a finite number of states
(e.g., stock market transaction, communication/network protocol)

●“Regular Grammar” are exactly the ones that can be described by finite automata.
●A finite automaton has a set of states and its control moves from state to state in
response to external inputs.
●One of the crucial distinctions among classes of finite automata is whether that control
is deterministic meaning that the automaton cannot be in more than one state at any one
time or non deterministic meaning that it may be in several states at once
●Adding nondeterminism does not let us define any language that cannot be defined by a
deterministic finite automaton but there can be substantial efficiency in describing an
application using a nondeterministic automaton.
●Nondeterminism allows us to program solutions to problems using a higher level
language. The nondeterministic finite automaton is then compiled by an algorithm we
into a deterministic automaton that can be executed on a conventional computer.

Contd..
●Informally, a state diagram that comprehensively captures all possible states and
transitions that a machine can take while responding to a stream or sequence of
input symbols.
●Recognizer for “Regular Languages”
1.Deterministic Finite Automata (DFA)
The machine can exist in only one state at any given time
2.Non-deterministic Finite Automata (NFA)
The machine can exist in multiple states at the same time

Deterministic Finite Automata- Definition

A Deterministic Finite Automaton (DFA) consists of:
Q ==> a finite set of states
Σ ==> a finite set of input symbols (alphabet)
q0 ==> a start state
F ==> set of accepting states( F ⊂ Q)
δ ==> a transition function, which is a mapping between Q x Σ ==> Q
Transition function that takes as arguments a state and an input symbol & returns a
state
A DFA is defined by the 5-tuple: {Q Σ q F δ }

Graph representation of DFA
●Nodes = states.
●Arcs represent transition function: Arc from state p to state q labeled by all those
input symbols that have transitions from p to q.
●Arrow labeled “Start” to the start state.
●Final states indicated by double circles

What does a DFA do on reading an input string?

●Input: a word w in Σ*
●Question: Is w acceptable by the DFA?
●Steps:
1.Start at the “start state” q0.
2.For every input symbol in the sequence w do, Compute the next state from the current
state, given the current input symbol in w and the transition function.
3.If after all symbols in w are consumed, the current state is one of the accepting states
(F) then accept w;
4.Otherwise, reject w.

Simple notations for DFA
●A transition diagram

Example





The transition diagram for the DFA accepting all strings with a substring 01

●A transition table:

Example

Regular Grammar
●Let L(A) be a language recognized by a DFA A, then L(A) is called a “Regular
Language”.
●Locate regular languages in the Chomsky Hierarchy.

The Chomsky Hierarchy

Example 1
●Build a DFA for the following language:
L = {w | w is a binary string that contains 01 as a substring}
●Steps for building a DFA to recognize L:
1.Σ = {0,1}
2.Decide on the states: Q
3.Designate start state and final state(s)
4.δ: Decide on the transitions:
●“Final” states == same as “accepting states”
●Other states == same as “non-accepting states”

DFA for strings containing 01

Extended transition function

Example

Delta hat

Language of a DFA
A DFA A accepts string w if there is a path from q0 to an accepting (or final)
state that is labeled by w
i.e., L(A) = { w | is in F }
i.e., L(A) = all strings that lead to an accepting state from q0

Non Deterministic Finite automata
A Non-deterministic Finite Automaton (NFA) is of course “non-deterministic”
●Implying that the machine can exist in more than one state at the same time
●Transitions could be non-deterministic

Contd...
●The NFAs accept exactly the regular languages, just as DFAs do.
●They are often more succinct and easier to design than DFAs.
●We can always convert an NFA to a DFA, the latter may have exponentially more
states than the NFA.
●The difference between the DFA and the NFA is in the type of transition function.

Contd..
A Non-deterministic Finite Automaton (NFA) consists of:
Q ==> a finite set of states
Σ ==> a finite set of input symbols (alphabet)
q0 ==> a start state
F ==> set of accepting states
δ ==> a transition function, which is a mapping between Q x Σ ==> subset of Q
An NFA is also defined by the 5-tuple: {Q Σ q F δ }

How to use an NFA ?
●Input: a word w in Σ*
●Question: Is w acceptable by the NFA?
●Steps:
1.Start at the “start state” q0
2.For every input symbol in the sequence w do:
Determine all possible next states from all current states, given the current input
symbol in w and the transition function
3.If after all symbols in w are consumed and if at least one of the current states is a final
state then accept w;
4.Otherwise, reject w.

Example
An NFA accepting all strings that end in 01

●Transition tables can be used to specify the transition function for an NFA as
well as for a DFA.
●The only difference is that each entry in the table for the NFA is a set, even if
the set is a singleton has one member.
●Also, when there is no transition at all from a given state on a given input
symbol, the proper entry is Φ the empty set.

Transition function for input 00101

Extension of to δ NFA paths

Language of a NFA
●An NFA accepts w if there exists at least one path from the start state to an accepting
(or final) state that is labeled by w
●Formally, if N= {Q Σ q F δ} is an NFA, then
L(N) = { w | ∩ F ≠ Φ }
which means that L(N) is the set of strings w in ∑* such that contains at least
one accepting state.

What is an error state ?
NOTE :
Omitting to explicitly
show error states is just a
matter of design
convenience
(one that is generally
followed for NFAs), and
i.e., this feature should
not be confused with the
notion of
non-determinism.

●DFA in practice has about as many states as the NFA, although it often has more
transitions.
●In the worst case however, the smallest DFA can have 2
n
states while the smallest
NFA for the same language has only n states.
●Every language that can be described by some NFA can also be described by some
DFA.
●The proof that DFAs can do whatever NFAs can do involves an important
construction called the subset construction because it involves constructing all subsets
of the set of states of the NFA.
● In general, many proofs about automata involve constructing one automaton from
another.
●It is important for us to observe the subset construction as an example of how one
formally describes one automaton in terms of the states and transitions of another,
without knowing the specifics of the latter automaton

Equivalence of DFA and NFA
Theorem: A language L is accepted by a DFA if and only if it is accepted by an NFA.
Proof:
1.If part:
Prove by showing every NFA can be converted to an equivalent DFA
2.Only-if part is trivial:
Every DFA is a special case of an NFA where each state has exactly one transition for
every input symbol. Therefore, if L is accepted by a DFA, it is accepted by a corresponding
NFA.

PROOF OF IF PART

●Surprisingly, for any NFA there is a DFA that accepts the same language.
●Proof is the subset construction.
●The number of states of the DFA can be exponential in the number of states of
the NFA.
●Thus, NFA’s accept exactly the regular languages.

NFA to DFA subset construction

Notice that this transition table belongs to a deterministic nite automaton.Even though the
entries in the table are sets the states of the constructed DFA are sets.To make the point
clearer we can invent new names for these states eg A for Φ B for q0 and so on.This clears
the point that the entries in the table are single states of the DFA.
1.Out of the 8 states in Fig. starting in the start state B we can only reach states B, E
and F.
2.The other 5 states are inaccessible from the start state.

Resultant DFA constructed from NFA that accepts the strings ending with 01.
NOTE
1.Enumerate all possible subsets.
2.Determine transitions
3.Retain only those states reachable from {q0}

NFA to DFA: Repeating the example using LAZY
CREATION

Correctness of subset creation

Applications

Few properties of DFA and NFA

FA with epsilon transitions

To simulate any transition:
Step 1) Go to all immediate destination states.
Step 2) From there go to all their epsilon-closure states
as well.

Closure of states
●CL(q) = set of states you can reach from state q following only arcs labeled ε .
●Example: CL(A) = {A}; CL(E) = {B, C, D, E}.
●Closure of a set of states = union of the closure of each state.

Minimization of DFA (refer separately shared pdf for
this topic)

Equivalence theorem

1.DISTINGUISHABLE STATES
2.SET PARTITIONING

NOTE : Remove all the states that are dead or unreachable from the initial state via
any set of the transition of DFA and then start applying Equivalence theorem.
Refer : https://www.geeksforgeeks.org/minimization-of-dfa/

Myhill Nerode Theorem (Table filling method.)
Refer : https://www.tutorialspoint.com/automata_theory/dfa_minimization.html

FSM with output : Mealy and Moore machines

A finite state machine is a machine that has many states and has a logical way of changing
from one state to another.
1.FSM - without o/p
2.FSM - with output
(A)Mealy machine : output in transition
(B)Moore machine : output on state

Mealy machine
A mealy machine is an FSM whose output depends on the present state as well as the
present input. In mealy machine every transition for a particular input symbol has a fixed
output. It can be described by a 6 tuple:
●Q is a finite set of states.
●∑ is a finite set of symbols called the input alphabet.
●O is a finite set of symbols called the output alphabet.
●δ is the input transition function where δ: Q × ∑ → Q
●X is the output transition function where X: Q × ∑ → O
●q0 is the initial state from where any input is processed (q0 ∈ Q).

Moore machine
In Moore machine the value of output function depends on the present state only. Moore
machine can be described as
1.Q is a finite set of states.
2.∑ is a finite set of symbols called the input alphabet.
3.O is a finite set of symbols called the output alphabet.
4.δ is the input transition function where δ: Q × ∑ → Q
5.X is the output transition function where X: Q → O
6.q0 is the initial state from where any input is processed (q0 ∈ Q).

Moore to mealy machine
Input - Moore machine
Output - mealy machine
Steps-
1.Take a blank mealy machine transition table format
2.Copy all the Moore machine transition states into this table format.
3.Check the present states and their corresponding outputs in the Moore machine state
table; if for a state Qi output is m, copy it into the output columns of the mealy
machine state table whenever Qi appears in the next state.

Mealy to moore machine
●Step 1: For each state(Q
i
), calculate the number of different outputs that are
available in the transition table of the Mealy machine.
●Step 2: Copy state Q
i
, if all the outputs of Q
i
are the same. Break qi into n states
as Q
in
, if it has n distinct outputs where n = 0, 1, 2....
●Step 3: If the output of initial state is 0, insert a new initial state at the starting
which gives 1 output.

1.For q, there is only one incident edge with output 0. So, we don't need to split this
state in Moore machine.
2.For q
2
, there is 2 incident edge with output 0 and 1. So, we will split this state into q
20
(
state with output 0) and q
21
(with output 1).
3.For q
3
, there is 2 incident edge with output 0 and 1. So, we will split this state into q
30
(
state with output 0) and q
30
( state with output 1).
4.For q
4
, there is only one incident edge with output 0. So, we don't need to split this.

Transition table for moore machine

Transition diagram for moore machine

Limitations
1.The input tape is read only and the only memory it can have is by moving from
state to state and since there are finite number of states, a finite automata has
memory which is strictly finite.
2.The finite automata have only string pattern (regular expression) recognizing
power.
3.The head movement is restricted in one direction, either from left to right or
right to left.

Equivalence of FA
Two automata M0 and M1 are said to be equivalent to each other if both accept exactly the
same set of input strings. Formally, if two automata M0 and M1 are equivalent then
●If there is a path from the start state of M0 to a final state of M0 labeled M01 M02
… … M0k, there is a path from the start state of M1 to a final state of M1 labeled
M01, M02 .. . . . . M0k.
●If there is a path from the start state of M1 to a final state of M1 labeled M11, M12
… . . . . . … . M1k, there is a path from the start state of M0 to a final state of M0
labeled M11, M12,…….. M1k.

Application of FA
We have several application based on finite automata and finite state machine. Some
are given below;
●A finite automata is highly useful to design Lexical Analyzers.
●A finite automata is useful to design text editors.
●A finite automata is highly useful to design spell checkers.
●A finite automata is useful to design sequential circuit design (Transducer).

Equivalence&Minimizationof Equivalence

&

Minimization

of

DFAs
41

Applications of interest 
Comparing two DFAs:

L(DFA
1
) == L(DFA
2
)?

HowtominimizeaDFA?

How

to

minimize

a

DFA?
1.
Remove unreachable states
2.
Identif
y
& condense e
q
uivalent states into one
yq
42

WhentocalltwostatesinaDFA When

to

call

two

states

in

a

DFA

“equivalent”?
Two states p and q are said to be
equivalent
iff:
re does!
equivalent

iff:

i)
Any string w accepted by starting at p is also accepted by
starting at q;
only futur
p
matter -o
p q
AND
w
i)
Any string w rejected by starting at p is also rejected by starting at q.
t doesn’t
p
w
43
Past
q
w
p≡q

Com
p
utin
g
e
q
uivalent states
pgq
in a DFA
Table Filling Algorithm
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
Cxx=
Dxxx=
Exxxx=
1
0
0
Fxxxxx=
Gxxx=xx=
H
x
x
=
x
x
x
x
=
Pass #0
1. Mark accepting states ≠ non-accepting states
H
x
x
=
x
x
x
x
=
ABCDEFGH
Pass #1
1. Compare every pair of states
2. Distinguish by one symbol transition
3. Mark = or ≠ or blank(tbd)
Pass #2
44
Pass

#2 1. Compare every pair of states
2. Distinguish by up to two symbol transitions (until different or same or tbd)
….
(keep repeating until table complete)

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
C=
D=
E=
1
0
0
F=
G=
H
=
H
=
ABCDEFGH
45

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
C=
D=
EXXXX=
1
0
0
FX=
GX=
H
X
=
H
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
46

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CX=
DX=
EXXXX=
1
0
0
FX=
GXX=
H
X
X
=
H
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
47

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXX=
EXXXX=
1
0
0
FX=
GXX X=
H
X
X
X
=
H
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
48

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXX=
GXXX X=
H
X
X
=
X
=
H
X
X
=
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
49

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=X=
H
X
X
=
X
X
=
H
X
X
=
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
50

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=XX=
H
X
X
=
X
X
X
=
H
X
X
=
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
51

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
52

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Pass 1:
Look 1- hop away for distinguishing states or strings
3. Pass 2:
Look 1
-
hop away again for distinguishing states or strings
53
Look

1
hop

away

again

for

distinguishing

states

or

strings
continue….

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
E
q
uivalences:
1. Mark Xbetween accepting vs. non-accepting state
2. Pass 1:
Look 1- hop away for distinguishing states or strings
3. Pass 2:
Look 1
-
hop away again for distinguishing states or strings
54
q
•A=B
•C=H
• D=G
Look

1
hop

away

again

for

distinguishing

states

or

strings
continue….

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A
C
E
0
1
1 0
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
A
C
E
D
F
0
1
1
0
0
1
1
0
0
0
1
Retrain only one copy for
E
q
uivalences:
Retrain

only

one

copy

for

each equivalence set of states
55
q
•A=B
•C=H
• D=G

Table Fillin
g
Al
g
orithm

gg
special case
A=
B=
A
C
E
G
0
1 0
0
1
C=
D=
E=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
?
F=
G=
H
=
1
0
0
H
=
ABCDEFGH
Q) What happens if the input DFA
has more than one final state?
56
Can all final states initially be treated
as equivalent to one another?

Putting it all together …
How to minimize a DFA? 
Goal:
Minimize the number of states in
a DFA
Dth
fi t t l f th t t t t

Algorithm:
1.
Eliminate states unreachable from the
D
ep
th
-
fi
rs
t

t
raversa
l
f
rom
th
e s
t
ar
t
s
t
a
t
e
start state
2.
Identify and remove equivalent states
Table filling algorithm
3.
Output the resultant DFA
57

Are Two DFAs Equivalent?
q
DFA
1
Unified DFA
q
0

DFA
2
Is q
0
≡ q
0
’?
: if yes, then DFA
1
≡DFA
2
: else, not equiv.
q
0


1. Make a new dummy DFA by just putting together both DFAs
2. Run table-filling algorithm on the unified DFA
3
IF
the start states of both DFAs are fo nd to be eq i alent
58
3
.
IF
the

start

states

of

both

DFAs

are

fo
u
nd

to

be

eq
u
iv
alent
,
THEN:DFA
1
≡ DFA
2
ELSE:different

Regular expressions and Languages
Unit-2

Overview
●Regular Expression (RE)
●Equivalence of RE and FA, Arden‘s Theorem
●RE Applications
●Regular Language (RL)
●Closure properties of RLs
●Decision properties of RLs
●Pumping lemma for RLs

Arden’s Theorem
The Arden's Theorem is useful for checking the equivalence of two regular expressions
as well as in the conversion of DFA to a regular expression.
Let us see its use in the conversion of DFA to a regular expression.
Following algorithm is used to build the regular expression form given DFA.
1. Let q
1
be the initial state.
2. There are q
2
, q
3
, q
4
....q
n
number of states. The final state may be some q
j
where j<=
n.
3. Let α
ji
represents the transition from q
j
to q
i
.

4. Calculate q
i
such that
q
i
= α
ji
* q
j


If q
j
is a start state then we have:
q
i
= α
ji
* q
j
+ ε

5. Similarly, compute the final state which ultimately gives the regular expression
'r'.

Construct the regular expression for the given DFA

Solution:
Let us write down the equations

Since q1 is the start state, so ε will be added, and the input 0 is coming to q1 from
q1 hence we write
State = source state of input × input coming to it
Similarly,


Since the final states are q1 and q2, we are interested in solving q1 and q2 only.
Let us see q1 first


q1 = q1* 0 + ε
q2 = q1* 1 + q2 *1
q3 = q2* 0 + q3* (0+1)
q1 = q1* 0 + ε

We can rewrite it as

Assuming R = q1, Q = ε, P = 0
We get

q1 = 0* (ε.R*= R*)
Substituting the value into q2, we will get
q2 = 0* 1 + q2 1
q2 = 0* 1 (1)* (R = Q + RP → Q P*)

q1 = ε + q1* 0, which is similar to R = Q + RP, and gets reduced to R = QP*.
q1 = ε.(0)*

The regular expression is given by
r = q1 + q2
= 0
*
+ 0
*
1.1
*

r = 0
*
+ 0
*
1
+
(1.1
*
= 1
+
)

Re
g
ular Ex
p
ressions
gp
1

2
RE’s: Introduction

Regular expressions
are an algebraic
way to describe languages.
They describe exactly the regular
languages.
If E is a regular expression, then L(E) is
the language it defines.
We’ll describe RE’s and their languages
recursively.

3
RE’s: Definition
Basis 1: If
a
is any symbol, then ais a
RE, and L(a) = {a}.
Note: {a} is the language containing one
string, and that string is of length 1.
Basis 2: εis a RE, and L(ε) = {ε}.
Basis 3:

is a RE, and L(

) =

.

RegularExpressionsvs Finite Regular

Expressions

vs
.
Finite

Automata

Offers a declarative way to express the pattern of any
string we want to accept

Eg
01
*
+10
*

E
.
g
.,
01 +

10

Automata => more machine-like
itti tt[ t/jt]
<
inpu
t
: s
t
r
ing , ou
t
pu
t
:
[
accep
t/
re
jec
t]
>

Regular expressions => more program syntax-like

Unix environments heavily use regular expressions

E.g., bash shell, grep, vi & other editors, sed
Perlscripting
goodforstringprocessing
2

Perl

scripting


good

for

string

processing

Lexical analyzers such as Lex or Flex

Regular Expressions
Regular
expressions
Finite Automata
(DFA, NFA, -NFA)
=
Automata/machines
Syntactical
expressions
Regular
Languages Languages
Formal language classes
3
classes

4
RE’s: Definition –(2)
Induction 1: If E
1
and E
2
are regular
expressions, then E
1
+E
2
is a regular
expression, and L(E
1
+E
2
) =
L(E
1
)L(E
2
).
Induction 2: If E
1
and E
2
are regular
expressions, then E
1
E
2
is a regular
expression, and L(E
1
E
2
) = L(E
1
)L(E
2
).
Concatenation
: the set of strings wx such that w
Is in L(E
1
) and x is in L(E
2
).

5
RE’s: Definition –(3)
Induction 3: If E is a RE, then E* is a
RE, and L(E*) = (L(E))*.
Closure
, or “Kleene closure” = set of strings
w
1
w
2
…w
n
, for some n >
0, where each w
i
is
in L(E).
Note: when n=0, the string is ε.

7
Examples: RE’s
L(01) = {01}.
L(01+0) = {01, 0}.
L(0(1+0)) = {01, 00}.
Note order of precedence of operators.
L(0*) = {ε, 0, 00, 000,… }.
L((0+10)*(ε+1)) = all strings of 0’s
and 1’s without two consecutive 1’s.

Precedence of Operators 
Highest to lowest

*
operator(star)


operator

(star)

.
(concatenation)

+ operato
r

Example:

01* + 1 =
(
0 .
((
1
)
*
)

)
+ 1
9
((()))

Language Operators 
Union
of two languages:

L U M= all strings that are either in L or M

Note:
A union of two languages produces a third
language

Concatenation
of two languages:

L . M= all strings that are of the form xy
s.t., x L and y M

The
dot
operatorisusuallyomitted
4

The

dot

operator

is

usually

omitted


i.e., LMis same as L.M

“i” here refers to how many stri ngs to concatenate from the parent
language L to produce strings in the language L
i
Kleene Closure (the * operator) 
Kleene Closure
of a given language L:

L
0
= {

}

L
1
= {w | for some w

L}

L
2
= { w
1
w
2
| w
1

L, w
2

L (duplicates allowed)}

L
i= { w
1
w
2
…w
i
| all w’s chosen are

L (duplicates allowed)}

(Note: the choice of each w
i
is independent)
U

L* =
U
i≥0
L
i
(arbitrary number of concatenations)
Example: 
Let L = { 1, 00}
L
0
{
}

L
0
=
{

}

L
1
= {1,00}

L
2
= {11,100,001,0000}

L
3
=
{
111
,
1100
,
1001
,
10000
,
000000
,
00001
,
00100
,
0011
}
5
{
,
,
,
,
,
,
,
}

L* = L
0
U
L
1
U
L
2
U

Kleene Closure (special notes) 
L* is an infinite set iff |L| ≥1 and L≠{

}
IfL {
}th L* {
}
Why?

If

L
=
{

}
,
th
en
L*
=
{

}

If L = Φ, then L* = {

}
Why?
Why?
Σ* denotes the set of all words over an
alphabet
Σ
alphabet

Σ

Therefore, an abbreviated way of saying
there is an arbitrary language L over an
lhbt
Σ
i
6
a
lp
h
a
b
e
t

Σ

is:

L Σ*

Building Regular Expressions 
Let E be a regular expression and the languagerepresentedbyEisL(E) language

represented

by

E

is

L(E)

Then:
(E)=E

(E)

=

E

L(E + F) = L(E) U L(F) L(EF) L(E)L(F)

L(E

F)
=
L(E)

L(F)

L(E*) = (L(E))*
7

Example: how to use these regular
ex
p
ression
p
ro
p
erties and lan
g
ua
g
e
ppp gg
operators?

L = { w | w is a binary string which does not contain two consecutive 0s or
two consecutive 1s anywhere)

E.g., w = 01010101 is in L, while w = 10010 is not in L

Goal:
Build a regular expression for L

Four cases for w:

Case A: w starts with 0 and |w| is even

Case B: w starts with 1 and |w| is even

Case C: w starts with 0 and |w| is odd

Case D: w starts with 1 and |w| is odd

Case

D:

w

starts

with

1

and

|w|

is

odd


Regular expression for the four cases:

Case A: (01)*

Case B: (10)*

Case C: 0(10)*

Case D: 1(01)*

Since L is the union of all 4 cases:

Reg Exp for L = (01)* + (10)* + 0(10)* + 1(01)*

If we introduce then the regular expression can be simplified to:
8

Reg Exp for L = (

+1)(01)*(

+0)

8
Equivalence of RE’s and
Automata
We need to show that for every RE,
there is an automaton that accepts the
same language.
Pick the most powerful automaton type: the
ε-NFA.
And we need to show that for every
automaton, there is a RE defining its
language.
Pick the most restrictive type: the DFA.

FiniteAutomata(FA)&Regular Finite

Automata

(FA)

&

Regular

Expressions (Reg Ex)

To show that they are interchangeable,
consider the following theorems:

Theorem 1:
For every DFA A there exists a regular
expression R such that L(R)=L(A)

Theorem 2:For ever
y
re
g
ular ex
p
ression R there
Proofs
in the book
yg p
exists an

-NFA E such that L(E)=L(R)

NFA
NFA

-
NFA
NFA
Theorem 2
Kleene Theorem
10
DF
A
Reg Ex
Theorem 1

9
Converting a RE to an ε-NFA
Proof is an induction on the number of
operators (+, concatenation, *) in the
RE.
We always construct an automaton of a
special form (next slide).

10
Form of ε-NFA’sConstructed
No arcs from outside,
no arcs leaving
Start state: Only state with external predecessors
“Final” state:
Only state
with external
successors

11
RE to ε-NFA: Basis
Symbol a:
ε: ∅
:
a ε

12
RE to ε-NFA: Induction 1–Union
For E
1
For E
2
For E
1

E
2
ε
εε
ε

13
RE to ε-NFA: Induction 2–
Concatenation
For E
1
For E
2
For E
1
E
2
ε

14
RE to ε-NFA: Induction 3–Closure
For E
For E*
ε
ε
ε ε


-NFA
Reg Ex
Theorem 2
RE to -NFA construction Example:
(0+1)*01(0+1)*
(0+1)* 01 (0+1)*
0




0
1
0



1



1



12

15
DFA-to-RE
A strange sort of induction.
States of the DFA are assumed to be
1,2,…,n.
We construct RE’s for the labels of
restricted sets of paths.
Basis: single arcs or no arc at all.
Induction: paths that are allowed to
traverse next state in order.

Reg Ex
DFA
Theorem 1
DFA to RE construction
Informally, trace all distinct paths (traversing cycles only once)
from the start state to each of the final states
and enumerate all the ex
p
ressions alon
g
the wa
y
Example:
0
1
1
0
0,1pgy
q
0
q
1
q
2
0
1
(1
*
)
0
(0
*
)
1
(0
+
1)
*
(1 )
0
(0 )
1
(0

1)
00*
1*
1(0+1)*
Q) What is the language?
11
1*00*1(0+1)*
Q)

What

is

the

language?

16
k-Paths
A k-path is a path through the graph of
the DFA that goes thoughno state
numbered higher than k.
Endpoints are not restricted; they can
be any state.

17
Example: k-Paths
1
3
2
0
0 0
1
11
0-paths from 2 to 3:
RE for labels = 0.
1-paths from 2 to 3:
RE for labels = 0+11.
2-paths from 2 to 3:
RE for labels =
(10)*0+1(01)*1
3-paths from 2 to 3:
RE for labels = ??

18
k-Path Induction
Let R
ij
k
be the regular expression for
the set of labels of k-paths from state i
to state j.
Basis: k=0. R
ij
0
= sum of labels of arc
from i to j.
∅
if no such arc.
But add εif i=j.

19
Example: Basis
R
12
0
= 0.
R
11
0
=

+ ε= ε.
1
3
2
0
0 0
1
11

20
k-Path Inductive Case
A k-path from i to j either:
1.Never goes through state k, or
2.Goes through k one or more times.
R
ij
k
= R
ij
k-1
+ R
ik
k-1
(R
kk
k-1
)* R
kj
k-1
.
Doesn’t go
through k
Goes from i to k the first time
Zero or more times from k to k
Then, from k to j

21
Illustration of Induction
States < k
k
i
j
Paths not going
through k
From k to j
From k to k Several times
Path to k

22
Final Step
The RE with the same language as the
DFA is the sum (union) of R
ij
n
, where:
1.n is the number of states; i.e., paths are
unconstrained.
2.i is the start state.
3.j is one of the final states.

23
Example
R
23
3
= R
23
2
+ R
23
2
(R
33
2
)*R
33
2
=
R
23
2
(R
33
2
)*
R
23
2
= (10)*0+1(01)*1
R
33
2
=0(01)*(1+00) + 1(10)*(0+11)
R
23
3
= [(10)*0+1(01)*1]
[(0(01)*(1+00) + 1(10)*(0+11))]*
1
3
2
0
0 0
1
11

Algebraic Laws… 
Distributive:

E(F+G) = EF + EG

(F+G)E = FE+GE

Idempotent:
E + E = E
IliKl l

I
nvo
lv
ing
Kl
eene c
losures:

(E*)* = E*

Φ
*=


Φ



*= 

E
+
=EE*
14

E? =

+E

A
lg
ebraic Laws of Re
g
ular
gg
Expressions 
Commutative:

E+F = F+E

Associative:

(E+F)+G = E+(F+G) (EF)G=E(FG)

(EF)G

=

E(FG)

Identity:

E
+
Φ
=
E

E
Φ

E


E = E

= E

Annihilator:
13

ΦE = EΦ = Φ

26
Identities and Annihilators
∅
is the identity for +.
R +

= R.
ε is the identity for concatenation.
εR = Rε= R.


is the annihilator for concatenation.
∅
R = R

=

.

PropertiesofRegular Properties

of

Regular

Lan
g
ua
g
es
gg

Topics 1)
How to prove whether a given languageisregularornot? language

is

regular

or

not?
Closurepropertiesofregular
2)
Closure

properties

of

regular

languages

Some lan
g
ua
g
es are not
gg
regular When is a language is regular?
if we are able to construct one of the
fll i f
o
ll
ow
ing:
DFA orNFA or

-NFA orregular
expression
When is it not?
If we can show that no FA can be built for a language

How to
p
rove lan
g
ua
g
es are
pgg
notregular? What if we cannot come up with any FA?
A
)
Can it be lan
g
ua
g
e that is not re
g
ular?
)
gg g
B) Or is it that we tried wrong approaches?
How do we decisively prove that a language
is not regular?
“ff
4

The hardest thing o
f
all is to
f
ind a black cat in a dark room,
especially if there is no cat!” -Confucius

Exam
p
le of a non-re
g
ular
p
g
language Let L = {w | w is of the form 0
n
1
n
, for all n≥0}

H
yp
othesis:L is not re
g
ular
yp
g

Intuitive rationale:
How do you keep track
of a runnin
g
count in an FA?
g

A more formal rationale:

By contradition, if L is regul ar then there should exist a DFA fLf
or
L
.

Let k = number of states in that DFA.

Consider the special word w= 0
k
1
k
=> w L
5

DFA is in some state p
i, after consuming the first i symbols in
w

Uses Pigeon Hole Principle
Rationale…

Let {p
0
,p
1
,… p
k
} be the sequence of states that the
DFA should have visited after consuming the first
ksymbolsinwwhichis0
k
k

symbols

in

w

which

is

0
k

But there are only k states in the DFA!

==
>atleastonestateshouldrepeatsomewhere

>

at

least

one

state

should

repeat

somewhere

along the path (by ++ Principle)

==> Let the repeating state be p
i=p
J
for i < j

==> We can fool the DFA by inputing 0
(k-(j-i))
1
k
and
still get it to accept (note: k-(j-i) is at most k-1).
==>DFAacceptsstringsw/unequalnumberof0s
6

==>

DFA

accepts

strings

w/

unequal

number

of

0s

and 1s, implying that the DFA is wrong!

ThePumpingLemmafor The

Pumping

Lemma

for

Regular Languages
What it is? The Pumping Lemma is a property
of all regular languages.
How is it used? A technique that is used to show that a given language is not regular
7

Pum
p
in
g
Lemma for Re
g
ular
pg g
Languages Let L be a regular language Then there exists
some constant Nsuch thatfor
every
string w

Ls.t. |w|≥N, there exists
a
waytobreak
w
intothreeparts
w=
x
y
z
way

to

break

w
into

three

parts
,
w=
x
y
z
,
such that:
1.
y


y
2.
|xy|≤N
3.
For all k≥0, all strings of the form xy
k
z

L
8
This property should hold for all
regular languages.
Definition:N is called the “Pumping Lemma Constant”

Pumping Lemma: Proof
Lisreg lar >itsho ldha eaDFA

L

is

reg
u
lar
=
>

it

sho
u
ld

ha
v
e

a

DFA
.

Set
N:= number of states in the DFA
Ati
Lt||
≥N
hldh th

A
ny s
t
r
ing w

L
, s.
t
.
|
w
|
≥N
, s
h
ou
ld

h
ave
th
e
form: w=a
1
a
2
…a
m
, where m≥N
L tth t t t d ft di th fi t

L
e
t

th
e s
t
a
t
es
t
raverse
d
a
ft
er rea
di
ng
th
e
fi
rs
t

N symbols be: {p
0
,p
1
,… p
N
}

==>ThereareN+1p
-
states whilethereareonly

==>

There

are

N+1

p
-
states
,
while

there

are

only

N DFA states

==> at least one state has to repeat
9
i.e, p
i= p
J
where 0≤i<j≤N (by PHP)

Pumping Lemma: Proof… 
=> We should be able to break w=xyzas follows:

x=a
1
a
2
..a
i; y=a
i+1
a
i+2
..a
J
; z=a
J+1
a
J+2
..a
m

x
’s path will be p
0
..p
i

xs

path

will

be

p
0
..p
i

y’s path will be p
i
p
i+1
..p
J
(but p
i=p
J
implying a loop)

z’s path will be p
J
p
J+1
..p
m

Now consider another
y
k
(for k loops)
x
z

Now

consider

another

string w
k
=xy
k
z, where k≥0

Case k=0

DFA will reach the accept state p
p
0
p
i
p
m
x
z
=p
j

DFA

will

reach

the

accept

state

p
m

Case k>0

DFA will loop for y
k
, and finally reach the accept state p
m
for z
10

In either case, w
k
L This proves part (3) of the lemma

Pumping Lemma: Proof… 
For part (1):

Since i<
j,

y


p
0
p
i
p
m
xz
y
k
(for k loops)
j,
y

Forpart(2):
=p
j

For

part

(2):

By PHP, the repetition of states has to
occur within the first N symbols in w

==> |xy|≤N
11

The Pur
p
ose of the Pum
p
in
g

ppg
Lemma for RL 
To prove that some languagescannot be
regular.
be

regular.

12

How to use the
p
um
p
in
g

ppg
lemma? Think of playing a 2 person game

Role 1:
We claim that the language cannot
blb
e regu
la
r

R
o
le
2:
A
n
ad
v
e
r
sa
r
y
wh
o

c
la
im
s
th
e

oe
ad e sa y
oca s e
language is regular Weshowthattheadversary’sstatementwill

We

show

that

the

adversary’s

statement

will

lead to a contradiction that implyiespumping
lemma cannothold for the language.
13

We win!!

How to use the
p
um
p
in
g

ppg
lemma? (The Steps) 1.
(we) L is not regular.
2.
(adv.) Claims that L is regular and gives you avalueforNasitsP/Lconstant a

value

for

N

as

its

P/L

constant
3.
(we) Using N, choose a string w L s.t.,
1
|w|≥N
1
.
|w|



N
,
2.
Using w as the template, construct other words
w
k
of the form xy
k
zand show that at least one
suchw

L
such

w
k

L

=> this implies we have successfully broken the
pumping lemma for the language, and hence that the
di
14
a
d
versary
is wrong.
(Note: In this process, we may have to try many values of k,
starting with k=0, and then 2, 3, .. so on, until w
k
L )

Note: We don’t have any control over N, except that it is positive.
We also don’t have any control over how to split w=xyz,
but xyz should respect the P/L conditions (1) and (2).
Using the Pumping Lemma 
What WE do?

What the Adversary does?
1 ClaimsL isregular
3. Using N, we construct
1
.
Claims

L

is

regular
2. Provides N
our template string w
4. Demonstrate to the
adversary either adversary
,
either

through pumping up or
down on w, that some
stringw
k

L
string

w
k

L
(this should happen
regardless of w=xyz)
15

ExampleofusingthePumpingLemmato
Note:
This N can be anything (need not necessarily be the #states in the DFA.
It’s the adversary’s choice.)
Example

of

using

the

Pumping

Lemma

to

prove that a language is not regular
LtL
{|i bi ti ith l b
L
e
t

L
eq
=
{
w
|
w
i
s a
bi
nary s
t
r
i
ng w
ith
equa
l
num
b
er
of 1s and 0s}

Your Claim:
L
eq
is not regular
eq

Proof:

By contradiction, let L
eq
be regular

P/Lconstant shouldexist
adv. 
adv

P/L

constant

should

exist

Let N= that P/L constant

Consider input w = 0
N
1
N
(your choicefor the template string)
you
adv
.
(your

choice

for

the

template

string)

By pumping lemma, we should be able to break
w=xyz, such that:

you
16
1)
y


2)
|xy|≤N
3)
For all k≥0, the string xy
k
zis also in L

Template string w = 0
N
1
N
= 00 …. 011 … 1
NN
Proof…

Because
|xy|≤N
, xyshould contain only 0s

(This and because
y≠

,
impliesy=0
+
)
Th f
i
tt
N
10
you

Th
ere
f
ore x can conta
in a
t
mos
t
N
-
1

0
s

Also, all the N 1s must be inside z

By(3), any string of the form
x
y
k
z

L
eq
for all
k
≥0

By

(3),

any

string

of

the

form

x
y
z

L
eq
for

all

k
≥0


Case k=0:
xz has at most N-1 0s but has N 1s

Therefore, xy
0
zL
eq
Setting k=0 is
referred to as
“pumping down”

This violates the P/L (a contradiction)
Another wa
y
of
p
rovin
g
this will be to show that if
17
yp g
the #0s is arbitrarily pumped up (e.g., k=2),
then the #0s will become exceed the #1s
Setting k>1 is
referred to as
“pumping up”

Exercise 2 Prove L = {0
n
10
n
| n≥ 1} is not regular
Note:
This n is not to be confused with the pumping
lemma constant N. That canbe different.
In other words, the above question is same as
proving:

L = {0
m
10
m
| m≥ 1} is not regular
18

Example 3: Pumping Lemma Claim:
L = { 0
i
| i is a perfect square} is not regular

Proof:
BtditiltLb l

B
y con
t
ra
di
c
ti
on,
le
t

L

b
e regu
lar.

P/L should apply

Let N= P/L constant

Choose w=0
N
2

Choose

w=0

By pumping lemma, w=xyzsatisfying all three rules

By rules (1) & (2), yhas between 1 and N 0s

B
y
rule
(
3
),
an
y
strin
g
of the form x
y
k
zis also in L for all k ≥0
y(),yg
y

Case k=0:

#zeros (xy
0
z) = #zeros (xyz) - #zeros (y)

N
2
–N ≤ #zeros (xy
0
z) ≤ N
2
-1

(N
-
1)
2
<N
2
-
N≤#zeros (
x
y
0
z
)≤N
2
-
1<
N
2
19

(N
1)
<

N
N



#zeros

(
x
y
z
)



N
1

<

N

xy
0
zL

But the above will complete the proof ONLY IF N>1.

… (proof contd.. Next slide)

Example 3: Pumping Lemma

(proof contd…)

If the adversary pick N=1, then (N-1)
2
≤ N
2
– N, and therefore the #zeros(xy
0
z)
could end up being a perfect square!

This means that
p
um
p
in
g
down
(
i.e.
,
settin
g
k=0
)
is not
g
ivin
g
us the
p
roof!
ppg (, g ) gg p

So lets try pumping up next…

Case k=2:

#zeros (xy
2
z) = #zeros (xyz) + #zeros (y)

N
2
+1≤#zeros (
x
y
2
z
)≤N
2
+N

N
+

1



#zeros

(
x
y
z
)



N
+

N

N
2
< N
2
+ 1 ≤ #zeros (xy
2
z) ≤ N
2
+ N < (N+1)
2

xy
2
zL

(Notice that the above should hold for all po ssible N values of N>0. Therefore, this
completes the proof.)
20

ClosurepropertiesofRegular Closure

properties

of

Regular

Languages
21

Closure
p
ro
p
erties for Re
g
ular
pp g
Languages (RL)
This is different
from Kleene

Closure property:

If a set of regular languages are combined using
tthth ltil il
closure
an opera
t
or,
th
en
th
e resu
lti
ng
language
is a
lso
regular

Re
g
ular lan
g
ua
g
es are closedunder:
ggg

Union, intersection, complement, difference

Reversal

Kleene closure

Concatenation

Homomorphism
Now, lets prove all of this!
22

Homomorphism

Inverse homomorphism

RLs are closed under union 
IF L and M are two RLs THEN:

they both have two corresponding regular
expressions, R and S respectively

(L U M) can be represented using the regular expressionR+S expression

R+S


Therefore
,

(
L U M
)
is also re
g
ula
r
23
,( ) g
How can this be proved using FAs?

RLs are closed under complementation 
If L is an RL over ∑, then L=∑*-L

To show L is also regular, make the following
DFA for L
DFA for L
construction
Convert every final state into non-final, and
every non-final state into a final state
q
0
q
F1
q
F2
q
i
q
0
q
F1
q
F2
q
i


24
q
Fk
q
Fk
Assumes q0 is a non-final state. If not, do the opposite.

RLs are closed under intersection 
A quick, indirect way to prove:

ByDeMorgan
’slaw:

By

DeMorgans

law:


L ∩ M = (L U M)

SinceweknowRLsareclosedunderunion

Since

we

know

RLs

are

closed

under

union

and complementation, they are also closed
under intersection

A more direct way would be construct a finiteautomatonfor
L∩M
25
finite

automaton

for

L



M

DFA construction for L ∩ M 
A
L
= DFA for L = {Q
L
, ∑ , q
L
,F
L
, δ
L
}

A
M
= DFA for M = {Q
M
, ∑ , q
M
,F
M
, δ
M
}

Build A
L ∩ M
= {Q
L
xQ
M
,∑, (q
L
,q
M
), F
L
xF
M
,δ}
such that:
δ((
))
(
δ
()
δ
())
h
i
Q
d

δ((
p,q
)
,a
)
=
(
δ
L
(
p,a
)
,
δ
M
(
q,a
))
, w
h
ere p
in
Q
L
,an
d
q
inQ
M

This construction ensures that a strin
g
w will
g
be accepted if and only if w reaches an
accepting state in both
input DFAs.
26

DFA construction for L ∩ M
q
F1
DFA for L
p
F1
DFA for M
q
0
q
F2…
q
i
p
0
p
F2…
p
i
q
j
a
p
j
a
DFA for L

M
(q
F1
,p
F1
)

a
(q
i
,p
i)
(q
j
,p
j)
(q
0
,p
0
)
27

RLs are closed under set difference 
We observe:

L
-
M
=
L
∩M
Closed under intersection
Closed under complementation

L

M

L



M
Th f L
Mi l l
complementation

Th
ere
f
ore,
L
-
M

is a
lso regu
la
r
28

RLs are closed under reversal Reversal of a string w is denoted by w
R

E.
g
.
,
w=00111
,
w
R
=11100
g, ,
Reversal of a language: 
L
R
=
Thelanguagegeneratedby

L

The

language

generated

by

reversing all
strings in L
Theorem:
If L is regular then L
R
is also
regular
29
regular


-NFA Construction for L
R
DFAfor L
New -NFA for L
R
q
q
F1
q
q
q
a
DFA

for

L
New start
q



q
0
q
F2…
q
i
q
j
New

start
state
q
0
 
Make the
old start state
q
Fk
as the only new final state
Reverse all transitions
What to do if q
as
30
Reverse

all

transitions
Convert the old set of final states into non-final
states
What

to

do

if

q
0
w
as
one of the final states in the input DFA?

IfLisregular L
R
isregular(proof
If

L

is

regular
,
L
is

regular

(proof

using regular expressions)

Let E be a regular expression for L

Given E, how to build E
R
?
Bi
IfE
Øth
E
R
E

B
as
is:
If

E
=

,
Ø
, or a,
th
en
E
R
=
E

Induction:
Every part of E (refer to the part as “F”)
can be in onl
y
oneof the three followin
g
forms:
y
g
1.
F = F
1
+F
2

F
R
= F
1
R
+F
2
R
2
F=F
F
2
.
F

=

F
1
F
2

F
R
= F
2
R
F
1
R
3.
F = (F
1
)*
(
R
)* (
R
)*
31

(
F
R
)*
=
(
F
1
R
)*

Homomorphisms 
Substitute each symbol
in ∑ (main alphabet)
by a corresponding string
in T (another
lhbt)
a
lp
h
a
b
e
t)

h: ∑--->T*

Example
:

Example
:

Let ∑={0,1} and T={a,b}

Let a homomorphic function h on ∑ be:

h(0)=ab, h(1)=


If w=10110, then h(w) =abab = abab
Ingeneral
32

In

general
,

h(w) = h(a
1
) h(a
2
)… h(a
n
)

RLs are closed under homomorphisms 
Theorem:
If L is regular, then so is h(L)

Proof:
If E is a RE for L, then show L(h(E)) = h(L(E))

Basis:
If E=

, Ø, or a, then the claim holds.

Induction:
There are three forms of E:
1.
E = E
1
+E
2
1
2

L(h(E)) = L(h(E
1
) + h(E
2
)) = L(h(E
1
)) U L(h(E
2
)) ----- (1)

h(L(E)) = h(L(E
1
) + L(E
2
)) = h(L(E
1
)) U h(L(E
2
)) ----- (2)

By inductive hypothesis, L(h(E
1
))= h(L(E
1
)) and L(h(E
2
))=
h(L(E
2
))

Therefore, L(h(E)= h(L(E)
2.
E = E
1
E
2
E(E
)*
Similar ar
g
ument
Think of a DFA based
co
n
st
r
uct
io
n
33
3.
E
=
(E
1
)*
g
co st ucto

Given a DFA for L, how to convert it into an FA for h(L)?
FA Construction for h(L)
q
F1
DFA for L
Replace every edge “a” by
a path
labeled h(a)
ith DFA
q
0
q
F2
q
i
q
j
a
in
th
e new
DFA
h(a)
q
Fk…
h(a)
-
Build a new FAthat simulates h(a) for every symbol a transition in
34
-
Build

a

new

FA

that

simulates

h(a)

for

every

symbol

a

transition

in

the above DFA
- The resulting FA may or may not be a DFA, but will be a FA for h(L)

The set of strings in ∑*
whose homomorphic translation
results in the strings of M
Given a DFA for M, how to convert it into an FA for h
-1
(M)?
Inverse homomorphism 
Let h: ∑--->T*

Let M be a language over alphabet T

h
-1
(M) = {w | w ∑* s.t., h(w) M }
Claim:
If M is regular, then so is h
-1
(M)

Proof:

Let A be a DFA for M Ctt thDFAA’hih dh
1
(M)

C
ons
t
ruc
t
ano
th
er
DFA

A’
w
hi
c
h
enco
d
es
h
-
1
(M)

A’ is an exact replica of A, except that its transition
functions are s.t. for any input symbol ain ∑, A’
35
will simulate h(a) in A.

δ(p,a) = δ(p,h(a))

Decisionpropertiesofregular Decision

properties

of

regular

languages
Any “decision problem” looks like this:
Decision problem
In
p
ut
Yes
problem

solver
p
(generally a question)
No
36

Membership question 
Decision Problem:
Given L, is w in L?

Possibleanswers:
YesorNo

Possible

answers:
Yes

or

No

Approach:
B ild DFAf L
1.
B
u
ild
a
DFA

f
or
L
2.
Input w to the DFA
3.
If the DFA ends in an accepting state,
then yes; otherwise no.
37

Emptiness test 
Decision Problem:
Is L=Ø ?

Approach:
On a DFA for L: 1.
From the start state, run a reachability test, which returns: returns: 1.
success:
if there is at least one final state that is
reachable from the start state
2
failure:
otherwise
2
.
failure:
otherwise
2.
L=Ø if and only if the reachability test fails
38
How to implement the reachability test?

Finiteness 
Decision Problem:
Is L finite or infinite?

Approach:
On a DFA for L: 1.
Remove all states unreachable from the start state
2.
Remove all states that cannot lead to an
y
acce
p
tin
g
state.
ypg
3.
After removal, check for cycles in the resulting FA
4.
L is finite if there are no cycles; otherwise it is infinite
Ath h

A
no
th
er approac
h

Build a regular expression and look for Kleene closure
39
How to implement steps 2 and 3?

Finiteness test -examples
Ex 1) Is the language of this DFA finite or infinite?
X
q
6
0
1
X
FINITE
Ex 2) Is the language of this DFA finite or infinite?
q
6
X
0
X
INFINITE
40
1
0

Equivalence&Minimizationof Equivalence

&

Minimization

of

DFAs
41

Applications of interest 
Comparing two DFAs:

L(DFA
1
) == L(DFA
2
)?

HowtominimizeaDFA?

How

to

minimize

a

DFA?
1.
Remove unreachable states
2.
Identif
y
& condense e
q
uivalent states into one
yq
42

WhentocalltwostatesinaDFA When

to

call

two

states

in

a

DFA

“equivalent”?
Two states p and q are said to be
equivalent
iff:
re does!
equivalent

iff:

i)
Any string w accepted by starting at p is also accepted by
starting at q;
only futur
p
matter -o
p q
AND
w
i)
Any string w rejected by starting at p is also rejected by starting at q.
t doesn’t
p
w
43
Past
q
w
p≡q

Com
p
utin
g
e
q
uivalent states
pgq
in a DFA
Table Filling Algorithm
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
Cxx=
Dxxx=
Exxxx=
1
0
0
Fxxxxx=
Gxxx=xx=
H
x
x
=
x
x
x
x
=
Pass #0
1. Mark accepting states ≠ non-accepting states
H
x
x
=
x
x
x
x
=
ABCDEFGH
Pass #1
1. Compare every pair of states
2. Distinguish by one symbol transition
3. Mark = or ≠ or blank(tbd)
Pass #2
44
Pass

#2 1. Compare every pair of states
2. Distinguish by up to two symbol transitions (until different or same or tbd)
….
(keep repeating until table complete)

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
C=
D=
E=
1
0
0
F=
G=
H
=
H
=
ABCDEFGH
45

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
C=
D=
EXXXX=
1
0
0
FX=
GX=
H
X
=
H
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
46

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CX=
DX=
EXXXX=
1
0
0
FX=
GXX=
H
X
X
=
H
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
47

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXX=
EXXXX=
1
0
0
FX=
GXX X=
H
X
X
X
=
H
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
48

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXX=
GXXX X=
H
X
X
=
X
=
H
X
X
=
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
49

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=X=
H
X
X
=
X
X
=
H
X
X
=
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
50

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=XX=
H
X
X
=
X
X
X
=
H
X
X
=
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
51

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Look 1- hop away for distinguishing states or strings
52

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
1. Mark Xbetween accepting vs. non-accepting state
2. Pass 1:
Look 1- hop away for distinguishing states or strings
3. Pass 2:
Look 1
-
hop away again for distinguishing states or strings
53
Look

1
hop

away

again

for

distinguishing

states

or

strings
continue….

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A=
B==
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
CXX=
DXXX=
EXXXX=
1
0
0
FXXXXX=
GXXX=XX=
H
X
X
=
X
X
X
X
=
H
X
X
=
X
X
X
X
=
ABCDEFGH
E
q
uivalences:
1. Mark Xbetween accepting vs. non-accepting state
2. Pass 1:
Look 1- hop away for distinguishing states or strings
3. Pass 2:
Look 1
-
hop away again for distinguishing states or strings
54
q
•A=B
•C=H
• D=G
Look

1
hop

away

again

for

distinguishing

states

or

strings
continue….

Table Fillin
g
Al
g
orithm -ste
p

gg
p
by step
A
C
E
G
0
1 0
0
1
A
C
E
0
1
1 0
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
A
C
E
D
F
0
1
1
0
0
1
1
0
0
0
1
Retrain only one copy for
E
q
uivalences:
Retrain

only

one

copy

for

each equivalence set of states
55
q
•A=B
•C=H
• D=G

Table Fillin
g
Al
g
orithm

gg
special case
A=
B=
A
C
E
G
0
1 0
0
1
C=
D=
E=
A
C
E
G
B
D
F
H
0
1
11
1
1
0
0
0
0
1
?
F=
G=
H
=
1
0
0
H
=
ABCDEFGH
Q) What happens if the input DFA
has more than one final state?
56
Can all final states initially be treated
as equivalent to one another?

Putting it all together …
How to minimize a DFA? 
Goal:
Minimize the number of states in
a DFA
Dth
fi t t l f th t t t t

Algorithm:
1.
Eliminate states unreachable from the
D
ep
th
-
fi
rs
t

t
raversa
l
f
rom
th
e s
t
ar
t
s
t
a
t
e
start state
2.
Identify and remove equivalent states
Table filling algorithm
3.
Output the resultant DFA
57

Are Two DFAs Equivalent?
q
DFA
1
Unified DFA
q
0

DFA
2
Is q
0
≡ q
0
’?
: if yes, then DFA
1
≡DFA
2
: else, not equiv.
q
0


1. Make a new dummy DFA by just putting together both DFAs
2. Run table-filling algorithm on the unified DFA
3
IF
the start states of both DFAs are fo nd to be eq i alent
58
3
.
IF
the

start

states

of

both

DFAs

are

fo
u
nd

to

be

eq
u
iv
alent
,
THEN:DFA
1
≡ DFA
2
ELSE:different

Summary 
How to prove languages are not regular?

Pumping lemma & its applications

Closure properties of regular languages

Simplification of DFAs

How to remove unreachable states?

How to identif
y
and colla
p
se e
q
uivalent states?
ypq

How to minimize a DFA?

How to tell whether two DFAs are equivalent?
59

Grammers
Unit - 3

Overview
1.Grammars & Chomsky hierarchy
2.Regular Grammar (RG)
3.Equivalence of Left & Right linear grammar
4.Equivalence of RG & FA
5.Context Free Grammars (CFG): Definition, Sentential forms, Leftmost & Rightmost
derivations, Parse tree, Ambiguity.
6.Simplification & Applications.
7.Normal Forms: Chomsky Normal Forms (CNF) & Greibach Normal Forms (GNF).
8.CFLs - Pumping lemma, Closure properties

Regular grammar is a type of grammar that describes a regular language. A regular
grammar is a mathematical object, G, which consists of four components, G = (N, E, P,
S), where
●N: non-empty, finite set of non-terminal symbols,
●E: a finite set of terminal symbols, or alphabet, symbols,
●P: a set of grammar rules, each of one having one of the forms
○A ⇢ aB
○A⇢ a
○A ⇢∈, Here ∈=empty string, A, B ∈ N, a ∈ ∑
●S ∈ N is the start symbol

This grammar can be of two forms:
1.Right Linear Regular Grammar
2.Left Linear Regular Grammar

Right Linear Regular Grammar
In this type of regular grammar, all the non-terminals on the right-hand side exist at the
rightmost place, i.e; right ends.
Examples :
A ⇢ a, A ⇢ aB, A ⇢ ∈ where,A and B are non-terminals, a is terminal, and ∈ is empty
string
S ⇢ 00B | 11S
B ⇢ 0B | 1B | 0 | 1
where, S and B are non-terminals, and 0 and 1 are terminals

Left Linear Regular Grammar
In this type of regular grammar, all the non-terminals on the right-hand side exist at the
leftmost place, i.e; left ends.
Examples :
A ⇢ a, A ⇢ Ba, A ⇢ ∈
where, A and B are non-terminals, a is terminal, and ∈ is empty string
S ⇢ B00 | S11
B ⇢ B0 | B1 | 0 | 1
where S and B are non-terminals, and 0 and 1 are terminals

Context-Free Languages &
Grammars
(CFLs & CFGs)

Not all languages are regular
So what happens to the languages
which are not regular?
Can we still come up with a language
recognizer?
i.e., something that will accept (or reject)
strings that belong (or do not belong) to the
language?

Context-Free Languages
A language class larger than the class of regular
languages
Supports natural, recursive notation called “context-
free grammar”
Applications:
Parse trees, compilers
XML
Regular
(FA/RE)
Context-
free
(PDA/CFG)

4
An Example
A palindrome is a word that reads identical from both
ends
E.g., madam, redivider, malayalam, 010010010
Let L = { w | w is a binary palindrome}
Is L regular?
No.
Proof:
Let w=0
N
10
N (assuming N to be the p/l constant)
By Pumping lemma, w can be rewritten as xyz, such that xy
k
zis also L
(for any k≥0)
But |xy|≤N and y≠
==> y=0
+
==> xy
k
zwill NOT be in L for k=0
==> Contradiction

5
But the language of
palindromes…
is a CFL, because it supports recursive
substitution (in the form of a CFG)
This is because we can construct a
“grammar” like this:
1.A ==> 
2.A ==> 0
3.A ==> 1
4.A ==> 0A0
5.A ==> 1A1
Terminal
Productions
Variable or non-terminal
How does this grammar work?
Same as:
A => 0A0 | 1A1 | 0 | 1 | 

How does the CFG for
palindromes work?
An input string belongs to the language (i.e.,
accepted) iff it can be generated by the CFG
Example:w=01110
G can generate w as follows:
1.A => 0A0
2. => 01A10
3. => 01110
6
G:
A => 0A0 | 1A1 | 0 | 1 | 
Generating a string from a grammar:
1.Pick and choose a sequence
of productions that would
allow us to generate the
string.
2.At every step, substitute one variable
with one of its productions.

7
Context-Free Grammar:
Definition
A context-free grammar G=(V,T,P,S), where:
V: set of variables or non-terminals
T: set of terminals (= alphabet U {})
P: set of productions,each of which is of the form
V ==> 
1| 
2| …
Where each 
iis an arbitrary string of variables and
terminals
S ==> start variable
CFG for the language of binary palindromes:
G=({A},{0,1},P,A)
P: A ==> 0 A 0 | 1 A 1 | 0 | 1 | 

8
More examples
Parenthesis matching in code
Syntax checking
In scenarios where there is a general need
for:
Matching a symbol with another symbol, or
Matching a count of one symbol with that of
another symbol, or
Recursively substituting one symbol with a string
of other symbols

9
Example #2
Language of balanced paranthesis
e.g., ()(((())))((()))….
CFG?
G:
S => (S) | SS | 
How would you “interpret” the string “(((()))()())” using this grammar?

10
Example #3
A grammar for L = {0
m
1
n
| m≥n}
CFG? G:
S => 0S1 | A
A => 0A | 
How would you interpret the string “00000111”
using this grammar?

11
Example #4
A program containing if-then(-else) statements
if Conditionthen StatementelseStatement
(Or)
ifConditionthenStatement
CFG?

More examples
L
1= {0
n
| n≥0 }
L
2= {0
n
| n≥1 }
L
3={0
i
1
j
2
k
| i=j or j=k, where i,j,k≥0}
L
4={0
i
1
j
2
k
| i=j or i=k, where i,j,k≥1}
12

13
Applications of CFLs & CFGs
Compilers use parsers for syntactic checking
Parsers can be expressed as CFGs
1.Balancing paranthesis:
B ==> BB | (B)| Statement
Statement ==> …
2.If-then-else:
S ==> SS | ifCondition thenStatement elseStatement| ifCondition
thenStatement| Statement
Condition ==> …
Statement ==> …
3.C paranthesis matching { … }
4.Pascal begin-end matching
5.YACC (Yet Another Compiler-Compiler)

14
More applications
Markup languages
Nested Tag Matching
HTML
<html> …<p> … <a href=…> … </a> </p> … </html>
XML
<PC> … <MODEL> … </MODEL> .. <RAM> …
</RAM> … </PC>

15
Tag-Markup Languages
Roll ==> <ROLL>Class Students </ROLL>
Class ==> <CLASS>Text </CLASS>
Text ==> Char Text | Char
Char ==> a | b | … | z | A | B | .. | Z
Students ==> Student Students | 
Student ==> <STUD>Text </STUD>
Here, the left hand side of each production denotes one non-terminals
(e.g., “Roll”, “Class”, etc.)
Those symbols on the right hand side for which no productions (i.e.,
substitutions) are defined are terminals (e.g., ‘a’, ‘b’, ‘|’, ‘<‘, ‘>’, “ROLL”,
etc.)

16
Structure of a production
A =======> 
1| 
2| … | 
k
head
bodyderivation
1.A ==> 
1
2.A ==> 
2
3.A ==> 
3

K. A ==> 
k
The above is same as:

17
CFG conventions
Terminal symbols <== a, b, c…
Non-terminal symbols <== A,B,C, …
Terminal ornon-terminal symbols <== X,Y,Z
Terminal strings <== w, x, y, z
Arbitrary strings of terminals and non-
terminals <== , , , ..

18
Syntactic Expressions in
Programming Languages
result = a*b + score + 10 * distance + c
Regular languages have only terminals
Reg expression = [a-z][a-z0-1]*
If we allow only letters a & b, and 0 & 1 for
constants (for simplification)
Regular expression = (a+b)(a+b+0+1)*
terminalsvariablesOperators are also
terminals

19
String membership
How to say if a string belong to the language
defined by a CFG?
1.Derivation
Head to body
2.Recursive inference
Body to head
Example:
w = 01110
Is w a palindrome?
Both are equivalent forms
G:
A => 0A0 | 1A1 | 0 | 1 | 
A => 0A0
=> 01A10
=> 01110

20
Simple Expressions…
We can write a CFG for accepting simple
expressions
G = (V,T,P,S)
V = {E,F}
T = {0,1,a,b,+,*,(,)}
S = {E}
P:
E ==> E+E | E*E | (E) | F
F ==> aF | bF | 0F | 1F | a | b | 0 | 1

21
Generalization of derivation
Derivation is head ==> body
A==>X (A derives X in a single step)
A ==>*
GX (A derives X in a multiple steps)
Transitivity:
IFA ==>*
GB, and B ==>*
GC, THEN A ==>*
GC

22
Context-Free Language
The language of a CFG, G=(V,T,P,S),
denoted by L(G), is the set of terminal
strings that have a derivation from the
start variable S.
L(G) = { w in T* | S ==>*
Gw }

23
Left-most & Right-most
Derivation Styles
Derive the string a*(ab+10) from G:
E
==> E* E
==> F* E
==> aF* E
==> a * E
==> a * (E)
==> a * (E+ E)
==> a * (F+ E)
==> a * (aF+ E)
==> a * (abF + E)
==> a * (ab + E)
==> a * (ab + F)
==> a * (ab + 1F)
==> a * (ab + 10F)
==> a * (ab + 10)
E =
*
=>
Ga*(ab+10)
Left-most
derivation:
E
==> E * E
==> E * (E)
==> E * (E + E)
==> E * (E + F)
==> E * (E + 1F)
==> E * (E + 10F)
==> E * (E+ 10)
==> E * (F+ 10)
==> E * (aF+ 10)
==> E * (abF + 0)
==>E * (ab + 10)
==>F * (ab + 10)
==> aF* (ab + 10)
==> a * (ab + 10)
Right-most
derivation:
G:
E => E+E | E*E | (E) | F
F => aF| bF| 0F | 1F | 
Always
substitute
leftmost
variable
Always
substitute
rightmost
variable

24
Leftmost vs. Rightmost
derivations
Q1) For every leftmost derivation, there is a rightmost
derivation, and vice versa. True or False?
Q2) Does every word generated by a CFG have a
leftmost and a rightmost derivation?
Q3) Could there be words which have more than one
leftmost (or rightmost) derivation?
True -will use parse trees to prove this
Yes –easy to prove (reverse direction)
Yes –depending on the grammar

How to prove that your CFGs
are correct?
(using induction)
25

26
CFG & CFL
Theorem:A string w in (0+1)* is in
L(G
pal), if and only if, w is a palindrome.
Proof:
Use induction
on string length for the IF part
On length of derivation for the ONLY IF part
G
pal:
A => 0A0 | 1A1 | 0 | 1 | 

Parse trees
27

28
Parse Trees
Each CFG can be represented using a parse tree:
Each internal nodeis labeled by a variable in V
Each leafis terminal symbol
For a production, A==>X
1X
2…X
k, then any internal node
labeled A has k children which are labeled from X
1,X
2,…X
k
from left to right
A
X
1 X
i X
k… …
Parse tree for production and all other subsequent productions:
A ==> X
1..X
i..X
k

29
Examples
E
E+ E
F F
a 1
Parse tree for a + 1
A
0A 0
1 1A

Parse tree for 0110
Recursive inference
Derivation
G:
E => E+E | E*E | (E) | F
F => aF| bF| 0F | 1F | 0 | 1 | a | b
G:
A => 0A0 | 1A1 | 0 | 1 | 

30
Parse Trees, Derivations, and
Recursive Inferences
A
X
1 X
i X
k… …
Recursive inference
Derivation
Production:
A ==> X
1..X
i..X
k
Parse tree
Left-most
derivation
Right-most
derivation
Derivation
Recursive
inference

31
Interchangeability of different
CFG representations
Parse tree ==> left-most derivation
DFS left to right
Parse tree ==> right-most derivation
DFS right to left
==> left-most derivation == right-most
derivation
Derivation ==> Recursive inference
Reverse the order of productions
Recursive inference ==> Parse trees
bottom-up traversal of parse tree

Connection between CFLs
and RLs
32

33
CFLs & Regular Languages
A CFG is said to beright-linearif all the
productions are one of the following two
forms: A ==> wB (or) A ==> w
Theorem 1:Every right-linear CFG generates
a regular language
Theorem 2:Every regular language has a
right-linear grammar
Theorem 3:Left-linear CFGs also represent
RLs
Where:
•A & B are variables,
•w is a string of terminals
What kind of grammars result for regular languages?

Some Examples
34
A B C
0
1 0
1
0,1
A B C
0
1 0
1
1
0
A => 01B | C
B => 11B | 0C | 1A
C => 1A | 0 | 1
Right linear CFG?
Right linear CFG?
Finite Automaton?

Ambiguity in CFGs and CFLs
35

36
Ambiguity in CFGs
A CFG is said to be ambiguousif there
exists a string which has more than one
left-most derivation
LM derivation #1:
S => AS
=> 0A1S
=>0A11S
=> 00111S
=> 00111
Example:
S ==> AS | 
A ==> A1 | 0A1 | 01
Input string: 00111
Can be derived in two ways
LM derivation #2:
S => AS
=> A1S
=> 0A11S
=> 00111S
=> 00111

37
Why does ambiguity matter?
string = a * b + c
E ==> E + E | E * E | (E) | a | b | c | 0 | 1
•LM derivation #1:
•E => E + E => E * E + E
==>* a * b + c
•LM derivation #2
•E => E * E => a * E =>
a * E + E ==>* a * b + c
E
E + E
E *E
a b
c
(a*b)+c
E
E
*
E
E+Ea
b c
a*(b+c)
Values are
different !!!
The calculated value depends on which
of the two parse trees is actually used.

38
Removing Ambiguity in
Expression Evaluations
It MAY be possible to remove ambiguity for
some CFLs
E.g.,, in a CFG for expression evaluation by
imposing rules & restrictions such as precedence
This would imply rewrite of the grammar
Precedence:(), * , +
How will this avoid ambiguity?
E => E + T | T
T => T * F | F
F => I | (E)
I => a | b | c | 0 | 1
Modified unambiguous version:
Ambiguous version:
E ==> E + E | E * E | (E) | a | b | c | 0 | 1

39
Inherently Ambiguous CFLs
However, for some languages, it may not be
possible to remove ambiguity
A CFL is said to be inherently ambiguousif
every CFG that describes it is ambiguous
Example:
L = { a
n
b
n
c
m
d
m
| n,m≥ 1} U {a
n
b
m
c
m
d
n
| n,m≥ 1}
L is inherently ambiguous
Why?
Input string: a
n
b
n
c
n
d
n

Objective
1.What is a linear grammar? What is a left linear
grammar? What is a right linear grammar?
2.Left linear grammars are evil –why?
3.What algorithm can be used to convert a left
linear grammar to a right linear grammar?
1

Linear grammar
•A linear grammaris a context-free grammar
that has at most one non-terminal symbol on
the right hand side of each grammar rule.
–A rule may have just terminal symbols on the right
hand side (zero non-terminals).
•Here is a linear grammar:
2
S → aA
A → aBb
B → Bb

Left linear grammar
•A left linear grammaris a linear grammar in
which the non-terminal symbol always occurs
on the left side.
•Here is a left linear grammar:
S → Aa
A → ab
3

Right linear grammar
•A right linear grammaris a linear grammar in
which the non-terminal symbol always occurs
on the right side.
•Here is a right linear grammar:
S → abaA
A → ε
4

Left linear grammars are evil
•Consider this rule from a left linear grammar:
A → Babc
•Can that rule be used to recognize this string:
abbabc
•We need to check the rule for B:
B → Cb| D
•Now we need to check the rules for Cand D.
•This is very complicated.
•Left linear grammars require complex parsers.
5

Right linear grammars are good
•Consider this rule from a right linear grammar:
A → abcB
•Can that rule be used to recognize this string:
abcabb
•We immediately see that the first part of the
string –abc–matches the first part of the rule.
Thus, the problem simplifies to this: can the rule
for Bbe used to recognize this string :
abb
•Parsers for right linear grammars are much
simpler.
6

Convert left linear to right linear
Now we will see an algorithm for converting any
left linear grammar to its equivalent right linear
grammar.
S → Aa
A → ab
left linear
Both grammars generate this language: {aba}
S → abaA
A → ε
right linear
7

May need to make a new start symbol
The algorithm on the following slides assume
that the left linear grammar doesn’t have any
rules with the start symbol on the right hand
side.
–If the left linear grammar has a rule with the start
symbol Son the right hand side, simply add this
rule:
S
0→ S
8

Symbols used by the algorithm
•Let Sdenote the start symbol
•Let A, Bdenote non-terminal symbols
•Let pdenote zero or more terminal symbols
•Let εdenote the empty symbol
9

Algorithm
1)If the left linear grammar has a rule S →p, then
make that a rule in the right linear grammar
2)If the left linear grammar has a rule A →p, then add
the following rule to the right linear grammar:
S →pA
3)If the left linear grammar has a rule B →Ap, add the
following rule to the right linear grammar:
A →pB
4)If the left linear grammar has a rule S →Ap, then
add the following rule to the right linear grammar:
A →p
10

Convert this left linear grammar
11
S → Aa
A → ab
left linear

Right hand side has terminals
12
S → Aa
A → ab
left linear
2) If the left linear grammar has this rule A →p,
then add the following rule to the right linear
grammar: S →pA
S → abA
right linear

Right hand side of S has non-terminal
13
S → Aa
A → ab
left linear
4) If the left linear grammar has S →Ap, then
add the following rule to the right linear
grammar: A →p
S → abA
A → a
right linear

Equivalent!
14
S → Aa
A → ab
left linear
Both grammars generate this language: {aba}
S → abA
A → a
right linear

Convert this left linear grammar
15
left linear
S
0→ S
S → Ab
S → Sb
A → Aa
A → a
S → Ab
S → Sb
A → Aa
A → a
original grammar
make a new
start symbol
Convert this

Right hand sidehas terminals
16
S
0→ S
S → Ab
S → Sb
A → Aa
A → a
left linear
S
0→ aA
right linear
2) If the left linear grammar has this rule A →p,
then add the following rule to the right linear
grammar: S →pA

Right hand sidehas non-terminal
17
S
0→ S
S → Ab
S → Sb
A → Aa
A → a
left linear
S
0→ aA
A → bS
A → aA
S → bS
right linear
3) If the left linear grammar has a rule B →Ap, add the
following rule to the right linear grammar: A →pB

Right hand sideof start symbol has
non-terminal
18
S
0→ S
S → Ab
S → Sb
A → Aa
A → a
left linear
S
0→ aA
A → bS
A → aA
S → bS
S → ε
right linear
4) If the left linear grammar has S →Ap, then
add the following rule to the right linear
grammar: A →p

Equivalent!
19
S
0→ S
S → Ab
S → Sb
A → Aa
A → a
left linear
S
0→ aA
A → bS
A → aA
S → bS
S → ε
right linear
Both grammars generate this language: {a
+
b
+
}

Will the algorithm always work?
•We have seen two examples where the
algorithm creates a right linear grammar that
is equivalent to the left linear grammar.
•But will the algorithm alwaysproduce an
equivalent grammar?
•Yes! The following slide shows why.
20

Generate string p
•Let p = a string generated by the left linear
grammar.
•We will show that the grammar generated by
the algorithm also produces p.
21

Case 1: the start symbol produces p
Suppose the left linear grammar has this rule:
S →p. Then the right linear grammar will
have the same rule (see 1 below). So the right
linear grammar will also produce p.
1)If the left linear grammar contains S →p, then put that rule in the right linear grammar.
2)If the left linear grammar contains A →p, then put this rule in the right linear grammar: S →pA
3)If the left linear grammar contains B →Ap, then put this rule in the right linear grammar: A →pB
4)If the left linear grammar contains S →Ap, then put this rule in the right linear grammar: A →p
Algorithm:
22

Case 2: multiple rules needed
to produce p
Suppose p is produced by a sequence of n
production rules:
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1 p (p is composed of n symbols)
23

Case 2 (continued)
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
Let’s see what right linear rules will be generated
by the algorithm for the rules implied by this
production sequence.
24

Algorithm inputs and outputs
algorithm
left linear rules
right linear rules
25

Case 2 (continued)
1)If the left linear grammar contains S →p, then put that rule in the right linear grammar.
2)If the left linear grammar contains A →p, then put this rule in the right linear grammar: S →pA
3)If the left linear grammar contains B →Ap, then put this rule in the right linear grammar: A →pB
4)If the left linear grammar contains S →Ap, then put this rule in the right linear grammar: A →p
Algorithm:
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
S →A
1p
1
algorithm
A
1→p
1(see 4 below)
26

Case 2 (continued)
1)If the left linear grammar contains S →p, then put that rule in the right linear grammar.
2)If the left linear grammar contains A →p, then put this rule in the right linear grammar: S →pA
3)If the left linear grammar contains B →Ap, then put this rule in the right linear grammar: A →pB
4)If the left linear grammar contains S →Ap, then put this rule in the right linear grammar: A →p
Algorithm:
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
A
1→A
2p
2
algorithm
A
2→p
2A
1(see 3 below)
27

Case 2 (continued)
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
S →A
1p
1
A
1→A
2p
2
algorithm
A
1→p
1
A
2→p
2A
1
28

Case 2 (continued)
S →A
1p
1
A
1→A
2p
2
algorithm
A
1→p
1
A
2→p
2A
1
From A
2we obtain p
2p
1:
A
2→p
2A
1
→p
2p
1
29

Case 2 (continued)
1)If the left linear grammar contains S →p, then put that rule in the right linear grammar.
2)If the left linear grammar contains A →p, then put this rule in the right linear grammar: S →pA
3)If the left linear grammar contains B →Ap, then put this rule in the right linear grammar: A →pB
4)If the left linear grammar contains S →Ap, then put this rule in the right linear grammar: A →p
Algorithm:
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
A
2→A
3p
3
algorithm
A
3→p
3A
2(see 3 below)
30

Case 2 (continued)
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2
31

Case 2 (continued)
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2
From A
3we obtain p
3p
2p
1:
A
3→p
3A
2
→p
3p
2A
1
→p
3p
2p
1
32

Case 2 (continued)
1)If the left linear grammar contains S →p, then put that rule in the right linear grammar.
2)If the left linear grammar contains A →p, then put this rule in the right linear grammar: S →pA
3)If the left linear grammar contains B →Ap, then put this rule in the right linear grammar: A →pB
4)If the left linear grammar contains S →Ap, then put this rule in the right linear grammar: A →p
Algorithm:
S→A
1p
1
→A
2p
2p
1
→A
3p
3p
2p
1
→…
→A
n-1p
n-1…p
3p
2p
1
→p
np
n-1…p
3p
2p
1
A
n-1→p
n
algorithm
S →p
nA
n-1(see 2 below)
33

Case 2 (concluded)
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3

A
n-1→p
n
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2

A
n-1→p
n-1A
n-2
S →p
nA
n-1
From S we obtain p
n…p
2p
1:
S→p
nA
n-1
→p
np
n-1A
n-2
→…
→p
np
n-1…p
3A
2
→p
np
n-1…p
3p
2A
1
→p
n…p
3p
n-1p
2p
1(this is the desired string, p)
34

We have shown that for any string p
generated by the left linear grammar,
the right linear grammar produced by
the algorithm will also generate p.
35

How we understand the algorithm
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3

A
n-1→p
n
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2

A
n-1→p
n-1A
n-2
S →p
nA
n-1
These rules descend through the non-terminals until
reaching a rule with terminals on the RHS, the terminals
are output, then we unwind from the descent and output
the terminals.
Make the rule with terminals on the RHS the starting rule
and output its terminals. Ascend through the other rules.
36

How we understand the algorithm
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3

A
n-1→p
n
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2

A
n-1→p
n-1A
n-2
S →p
nA
n-1
If the left linear grammar contains A →p,
then put this rule in the right linear grammar:
S →pA
37

How we understand the algorithm
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3

A
n-1→p
n
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2

A
n-1→p
n-1A
n-2
S →p
nA
n-1
If the left linear grammar contains B →Ap,
then put this rule in the right linear grammar:
A →pB
38

How we understand the algorithm
S →A
1p
1
A
1→A
2p
2
A
2→A
3p
3

A
n-1→p
n
algorithm
A
1→p
1
A
2→p
2A
1
A
3→p
3A
2

A
n-1→p
n-1A
n-2
S →p
nA
n-1
If the left linear grammar contains S →Ap,
then put this rule in the right linear grammar:
A →p
39

Left-linear grammars generate
Type 3 languages
•Every left-linear grammar can be converted to
an equivalent right-linear grammar.
–“Equivalent right-linear grammar” means the
grammar generate the same language.
•Right-linear grammars generate Type 3
languages.
•Therefore, every left-linear grammar
generates a Type 3 language.
40

1
Properties of Context-free
Languages
1)Simplifying CFGs, Normal forms
2)Pumping lemma for CFLs
3)Closure and decision properties
of CFLs

2
Three ways to simplify/clean a CFG
(clean)
1.Eliminate useless symbols
(simplify)
2.Eliminate -productions
3.Eliminate unit productions
A => 
A => B

3
Eliminating useless symbols
Grammar cleanup

4
Eliminating useless symbols
A symbol X is reachableif there exists:
S 
*
X 
A symbol X is generatingif there exists:
X 
*
w,
for some w T*
For a symbol X to be “useful”, it has to be both
reachable andgenerating
S 
*
X 
*
w’, for some w’ T*
reachablegenerating

5
Algorithm to detect useless
symbols
1.First, eliminate all symbols that are not
generating
2.Next, eliminate all symbols that are not
reachable
Is the order of these steps important,
or can we switch?

6
Example: Useless symbols
SAB | a
Ab
1.A, Sare generating
2.B is not generating (and therefore B is useless)
3.==> Eliminating B… (i.e., remove all productions that involve B)
1.Sa
2.A b
4.Now, A is not reachable and therefore is useless
5.Simplified G:
1.S a
What would happen if you reverse the order:
i.e., test reachability before generating?
Will fail to remove:
A b

7
Algorithm to find all generating symbols
Given:G=(V,T,P,S)
Basis:
Every symbol in T is obviously generating.
Induction:
Suppose for a production A, where 
is generating
Then, A is also generating
X 
*
w

8
Algorithm to find all reachable symbols
Given:G=(V,T,P,S)
Basis:
S is obviously reachable (from itself)
Induction:
Suppose for a production A
1 
2…
k,
where A is reachable
Then, all symbols on the right hand side,
{
1,
2,…
k} are also reachable.
S 
*
X 

9
Eliminating -productions
A => 

10
Eliminating -productions
Caveat:It is not possible to eliminate -productions for
languages which include in their word set
Theorem:If G=(V,T,P,S) is a CFG for a language L,
then L\{} has a CFG without -productions
Definition:A is “nullable” if A* 
If A is nullable, then any production of the form
“BCAD” can be simulated by:
B CD | CAD
This can allow us to remove transitions for A
A 
So we will target the grammar for the restof the language
What’s the point of removing -productions?

11
Algorithm to detect all nullable
variables
Basis:
If Ais a production in G, then A is
nullable
(note: A can still have other productions)
Induction:
If there is a production BC
1C
2…C
k,
where every C
iis nullable, then B is also
nullable

12
Eliminating -productions
Given:G=(V,T,P,S)
Algorithm:
1.Detect all nullablevariables in G
2.Then construct G
1=(V,T,P
1,S) as follows:
i.For each production of the form: AX
1X
2…X
k,where
k≥1, suppose mout of the kX
i’s are nullablesymbols
ii.Then G
1will have 2
m
versions for this production
i.i.e, all combinations where each X
iis either present or absent
iii.Alternatively, if a production is of the form: A,then
remove it

13
Example: Eliminating -
productions
 Let L be the language represented by the following CFG G:
i. SAB
ii.AaAA | 
iii.BbBB | 
Goal:To construct G1, which is the grammar for L-{}
 Nullable symbols: {A, B}
 G
1can be constructed from G as follows:
 B b | bB | bB | bBB
 ==> B b | bB | bBB
 Similarly, A a | aA | aAA
 Similarly, S A | B | AB
 Note:L(G) = L(G
1) U {}
G
1:
•S A | B | AB
•A a | aA | aAA
•B b | bB | bBB
•S 
+
Simplified
grammar

14
Eliminating unit productions
A => B
B has to be a variable
What’s the point of removing unittransitions ?
A=>B | …
B=>C | …
C=>D | …
D=>xxx | yyy| zzz
A=>xxx | yyy| zzz| …
B=> xxx | yyy| zzz| …
C=> xxx | yyy| zzz| …
D=>xxx | yyy| zzz
Will save #substitutions
E.g.,
before after

15
Eliminating unit productions
 Unit production is one which is of the form AB, where both A & B are
variables
 E.g.,
1.E T| E+T
2.T F| T*F
3.F I| (E)
4.I a | b | Ia| Ib| I0 | I1
 How to eliminate unit productions?
Replace ET with E F | T*F
Then, upon recursive application wherever there is a unit production:
EF | T*F | E+T (substituting for T)
EI | (E) | T*F| E+T (substituting
for F)
Ea | b | Ia| Ib| I0 | I1 | (E) | T*F | E+T (substituting
for I)
Now, E has no unit productions
Similarly, eliminate for the remainder of the unit productions
A B

16
The Unit Pair Algorithm:
to remove unit productions
Suppose AB
1B
2… B
n
Action:Replace all intermediate productions to produce 
directly
i.e., A; B
1; … B
n;
Definition: (A,B) to be a “unit pair” if A
*
B
We can find all unit pairs inductively:
Basis:Every pair (A,A) is a unit pair (by definition). Similarly, if
AB is a production, then (A,B) is a unit pair.
Induction:If (A,B) and (B,C) are unit pairs, and AC is also a unit
pair.

17
The Unit Pair Algorithm:
to remove unit productions
Input:G=(V,T,P,S)
Goal:to build G
1=(V,T,P
1,S) devoid of unit
productions
Algorithm:
1.Find all unit pairs in G
2.For each unit pair (A,B) in G:
1.Add to P
1a new production A, for every
Bwhich is a non-unit production
2.If a resulting production is already there in P,
then there is no need to add it.

18
Example: eliminating unit
productions
G:
1.E T | E+T
2.T F | T*F
3.F I | (E)
4.I a | b | Ia | Ib | I0 | I1
Unit pairs Only non-unit
productions to be
added to P
1
(E,E) E E+T
(E,T) E T*F
(E,F) E (E)
(E,I) E a|b|Ia| Ib| I0 | I1
(T,T) T T*F
(T,F) T (E)
(T,I) T a|b| Ia | Ib | I0 | I1
(F,F) F (E)
(F,I) F a| b| Ia | Ib | I0 |
I1
(I,I) I a| b | Ia| Ib| I0 |
I1
G
1:
1.E E+T | T*F | (E) | a| b | Ia | Ib | I0 | I1
2.T T*F | (E) | a| b | Ia | Ib | I0 | I1
3.F (E) | a| b | Ia | Ib | I0 | I1
4.I a | b | Ia | Ib | I0 | I1

19
Putting all this together…
Theorem:If G is a CFG for a language that
contains at least one string other than , then there
is another CFG G
1, such that L(G
1)=L(G) -, and
G
1has:
no -productions
no unit productions
no useless symbols
Algorithm:
Step 1)eliminate -productions
Step 2)eliminate unit productions
Step 3)eliminate useless symbols
Again,
the order is
important!

20
Normal Forms

21
Why normal forms?
If all productions of the grammar could be
expressed in the same form(s), then:
a.It becomes easy to design algorithms that use
the grammar
b.It becomes easy to show proofs and properties

22
Chomsky Normal Form (CNF)
Let G be a CFG for some L-{}
Definition:
G is said to be in Chomsky Normal Form if all
its productions are in one of the following
two forms:
i.A BC where A,B,C are variables, or
ii.A a where a is a terminal
G has no useless symbols
G has no unit productions
G has no -productions

23
How to convert a G into CNF?
 Assumption:G has no -productions, unit productions or useless
symbols
1) For every terminal athat appears in the body of a production:
i. create a unique variable, say X
a, with a production X
aa, and
ii.replace all other instances of ain G by X
a
2)Now, all productions will be in one of the following
two forms:
A B
1B
2… B
k(k≥3) or Aa
3) Replace each production of the form A B
1B
2B
3… B
kby:
 AB
1C
1C
1B
2C
2… C
k-3B
k-2C
k-2C
k-2B
k-1B
k
B
1 C
1
B
2 C
2
and so on…

Example #1
24
G:
S => AS | BABC
A => A1 | 0A1 | 01
B => 0B | 0
C => 1C | 1
X
0=> 0
X
1=> 1
S => AS | BY
1
Y
1=> AY
2
Y
2=> BC
A => AX
1| X
0Y
3| X
0X
1
Y
3=> AX
1
B => X
0B | 0
C => X
1C | 1
G in CNF:
All productions are of the form: A=>BC or A=>a

25
Example #2
G:
1.E E+T | T*F | (E) | Ia | Ib | I0 | I1
2.T T*F | (E) | Ia | Ib | I0 | I1
3.F (E) | Ia | Ib | I0 | I1
4.I a | b | Ia | Ib | I0 | I1
1.E EX
+T | TX
*F | X
(EX
)| IX
a| IX
b| IX
0| IX
1
2.T TX
*F | X
(EX
)| IX
a| IX
b| IX
0| IX
1
3.F X
(EX
)| IX
a| IX
b| IX
0| IX
1
4.I X
a| X
b| IX
a| IX
b| IX
0| IX
1
5.X
++
6.X
**
7.X
++
8.X
((
9.…….
Step (1)
1.E EC
1| TC
2| X
(C
3| IX
a| IX
b| IX
0| IX
1
2.C
1X
+T
3.C
2X
*F
4.C
3EX
)
5.T ..…….
6.….

26
Languages with 
For languages that include ,
Write down the rest of grammar in CNF
Then add production “S => ”at the end
G:
S => AS | BABC
A => A1 | 0A1 | 01 | 
B => 0B | 0 | 
C => 1C | 1 | 
G in CNF:E.g., consider:
X
0=> 0
X
1=> 1
S => AS | BY
1
Y
1=> AY
2
Y
2=> BC
A => AX
1| X
0Y
3| X
0X
1
Y
3=> AX
1
B => X
0B | 0
C => X
1C | 1
| 

27
Other Normal Forms
Griebach Normal Form (GNF)
All productions of the form
A==>a 

28
Return of the Pumping Lemma !!
Think of languages that cannot be CFL
== think of languages for which a stack will not be enough
e.g., the language of strings of the form ww

29
Why pumping lemma?
A result that will be useful in proving
languages that are not CFLs
(just like we did for regular languages)
But before we prove the pumping
lemma for CFLs ….
Let us first prove an important property
about parse trees

30
The “parse tree theorem”
Given:
Suppose we have a
parse tree for a
string w, according
to a CNF grammar,
G=(V,T,P,S)
Let hbe the height of
the parse tree
Implies:
|w| ≤ 2
h-1
w
Parse tree for w
S = A
0
A
1
A
2
A
h-1
h
= tree height
a
In other words, a CNF parse tree’s string yield (w)
can no longer be 2
h-1
Observe that any parse tree generated by a CNF will be a
binary tree, where all internal nodes have exactly two children
(except those nodes connected to the leaves).

31
Implication of the Parse Tree
Theorem (assuming CNF)
Fact:
If the height of a parse tree is h, then
==> |w| ≤ 2
h-1
Implication:
If |w| ≥ 2
m
, then
Its parse tree’s height is at leastm+1

32
The Pumping Lemma for CFLs
Let L be a CFL.
Then there exists a constant N, s.t.,
if z L s.t. |z|≥N, then we can write
z=uvwxy, such that:
1.|vwx| ≤ N
2.vx≠
3.For all k≥0: uv
k
wx
k
y L
Note:we are pumping in two places (v & x)

33
Proof: Pumping Lemma for CFL
If L=Φor contains only , then the lemma is
trivially satisfied (as it cannot be violated)
For any other L which is a CFL:
Let G be a CNF grammar for L
Let m = number of variables in G
Choose N=2
m
.
Pick any z L s.t. |z|≥ N
the parse tree for z should have a height ≥ m+1
(by the parse tree theorem)

34
Parse tree for z
z
S = A
0
A
1
A
2
A
h-1
h ≥ m+1
z = uvwxy
S = A
0
A
i
A
j
h ≥ m+1
u
w
y
v x
•Therefore, vx≠
h-m≤ i < j ≤ h
m+1
A
i = A
j
Meaning:
Repetition in the
last m+1 variables
A
h=a
+

35
Extending the parse tree…
z = uv
k
wx
k
y
S = A
0
A
i=A
j
A
i
h ≥ m+1
u
w
y
v x
Replacing
A
jwith A
i
(k times)
v x
A
i
==> For all k≥0: uv
k
wx
k
y L
z = uwy
S = A
0
A
j
u
w
y
Or, replacing
A
iwith A
j

36
Proof contd..
•Also, since A
i’s subtree no taller than m+1
==> the string generated under A
i‘s subtree, which is
vwx, cannot be longer than 2
m
(=N)
But, 2
m
=N
==> |vwx| ≤ N
This completes the proof for the pumping lemma.

37
Application of Pumping
Lemma for CFLs
Example 1: L = {a
m
b
m
c
m
| m>0 }
Claim:L is not a CFL
Proof:
Let N <== P/L constant
Pick z = a
N
b
N
c
N
Apply pumping lemma to z and show that there
exists at least one other string constructed from z
(obtained by pumping up or down) that is L

38
Proof contd…
z = uvwxy
As z = a
N
b
N
c
N
and|vwx| ≤ N andvx≠
==> v, x cannot contain all three symbols
(a,b,c)
==> we can pump up or pump down to build
another string which is L

39
Example #2 for P/L application
L = { ww | w is in {0,1}*}
Show that L is not a CFL
Try string z = 0
N
0
N
 what happens?
Try string z = 0
N
1
N
0
N
1
N
 what happens?

40
Other examples
L = { 0
k
2
| k is any integer)
L = {a
i
b
j
c
k
| i<j<k }

41
CFL Closure Property Results
CFLs are closed under:
Union
Concatenation
Kleene closure operator
Substitution
Homomorphism, inverse homomorphism
reversal
CFLs are not closed under:
Intersection
Difference
Complementation
Note: Reg languages
are closed
under
these
operators

42
Strategy for Closure Property
Proofs
First prove “closure under substitution”
Using the above result, prove other closure properties
CFLs are closed under:
Union
Concatenation
Kleene closure operator
Substitution
Homomorphism, inverse homomorphism
Reversal
Prove
this first

43
The Substitutionoperation
For each a ∑, then let s(a)be a language
If w=a
1a
2…a
nL, then:
s(w) = { x
1x
2 …}s(L), s.t., x
is(a
i)
Example:
Let ∑={0,1}
Let: s(0) = {a
n
b
n
| n ≥1},s(1) = {aa,bb}
If w=01, s(w)=s(0).s(1)
E.g., s(w) contains a
1
b
1
aa, a
1
b
1
bb,
a
2
b
2
aa, a
2
b
2
bb,
…and so on.
Note: s(L) can use
a different alphabet

44
CFLs are closed under
Substitution
IF L is a CFL and a substititution defined
on L, s(L), is s.t., s(a) is a CFL for every
symbol a, THEN:
s(L) is also a CFL
L
w
1
w
2
w
3
w
4

s(L)
s(L)
s(w
1)
s(w
2)
s(w
3)
s(w
4)

Note:each s(w)
is itself a set of strings
What is s(L)?

45
CFLs are closed under
Substitution
G=(V,T,P,S) : CFG for L
Because every s(a) is a CFL, there is a CFG for each s(a)
Let G
a= (V
a,T
a,P
a,S
a)
Construct G’=(V’,T’,P’,S) for s(L)
P’ consists of:
The productions of P, but with every occurrence of terminal “a” in
their bodies replaced by S
a.
All productions in any P
a, for any a ∑
x
1 x
2 x
n

S
S
a
1
S
a
2
S
a
n
Parse tree for G’:

Substitution of a CFL:
example
Let L = language of binary palindromes s.t., substitutions for 0
and 1 are defined as follows:
s(0) = {a
n
b
n
| n ≥1},s(1) = {xx,yy}
Prove that s(L) is also a CFL.
46
CFG for L:
S=> 0S0|1S1|
CFG for s(0):
S
0=> aS
0b | ab
CFG for s(1):
S
1=> xx | yy
Therefore, CFG for s(L):
S=> S
0SS
0|S
1 SS
1 |
S
0=> aS
0b | ab
S
1=> xx | yy

47
CFLs are closed under union
Let L
1and L
2be CFLs
To show:L
2U L
2is also a CFL
Make a new language:
L
new= {a,b} s.t., s(a) = L
1and s(b) = L
2
==> s(L
new) == same as == L
1U L
2
A more direct, alternative proof
Let S
1and S
2be the starting variables of the
grammars for L
1and L
2
Then, S
new=> S
1| S
2
Let us show by using the result of Substitution

48
CFLs are closed under
concatenation
Let L
1and L
2be CFLs
Make L
new= {ab} s.t.,
s(a) = L
1and s(b)= L
2
==> L
1L
2= s(L
new)
A proof without using substitution?
Let us show by using the result of Substitution

49
CFLs are closed under
Kleene Closure
Let L be a CFL
Let L
new= {a}* and s(a) = L
1
Then, L* = s(L
new)

50
CFLs are closed under
Reversal
Let L be a CFL, with grammar
G=(V,T,P,S)
For L
R
, construct G
R
=(V,T,P
R
,S) s.t.,
If A==> is in P, then:
A==> 
R
is in P
R
(that is, reverse every production)
We won’t use substitution to prove this result

51
CFLs are not closed under
Intersection
Existential proof:
L
1= {0
n
1
n
2
i
| n≥1,i≥1}
L
2= {0
i
1
n
2
n
| n≥1,i≥1}
Both L
1and L
2are CFLs
Grammars?
But L
1L
2cannotbe a CFL
Why?
We have an example, where intersection is
not closed.
Therefore, CFLs are not closed under
intersection
Some negativeclosure results

52
CFLs are not closed under
complementation
Follows from the fact that CFLs are not
closed under intersection
L
1L
2 = L
1U L
2
Some negativeclosure results
Logic: if CFLs were to be closed under complementation
the whole right hand side becomes a CFL (because
CFL is closed for union)
the left hand side (intersection) is also a CFL
but we just showed CFLs are
NOT closed under intersection!
CFLs cannot be closed under complementation.

53
CFLs are not closed under
difference
Follows from the fact that CFLs are not
closed under complementation
Because, if CFLs are closed under
difference, then:
L = ∑* -L
So L has to be a CFL too
Contradiction
Some negativeclosure results

54
Decision Properties
Emptiness test
Generating test
Reachability test
Membership test
PDA acceptance

55
“Undecidable” problems for
CFL
Is a given CFG G ambiguous?
Is a given CFL inherently ambiguous?
Is the intersection of two CFLs empty?
Are two CFLs the same?
Is a given L(G) equal to ∑*?