Complier Design - Operations on Languages, RE, Finite Automata

1mohamedgamal54 57 views 16 slides Sep 20, 2024
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

Exploring the basics of Operations on Languages (Sets), Regular Expressions (REs), and Finite Automata for Compiler Design subject.


Slide Content

Dr. Mohamed Gamal Faculty of Computers and Informatics
Contents
1 Operations on Languages (Sets) 2
1.1 L1 Concatenation L2 (L1·L2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 L1 Union L2 (L1∪L2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3L
3
1
(L1 Concatenated with Itself 3 Times) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4L

1
(Kleene Closure of L1 - Zero or more) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5L
+
1
(Positive Closure of L1 - One or more) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Regular Expressions (RE) and Languages 4
2.1 Basic Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 C-Language Identifiers and Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Finite Automata 7
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Minimizing Number of States of a DFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Page 1

Dr. Mohamed Gamal Faculty of Computers and Informatics
1 Operations on Languages (Sets)
Operation Definition & Notation
Union ofLandM L ∪M={s|sis inLorsinM}
Concatenation ofLandM LM ={st|sis inLandtis inM}
Kleene closure ofLandM L

=∪

i=0
L
i
Positive closure ofL L
+
=∪

i=1
L
i
Given:
L1={a, b, c, d}andL2={1,2}
1.1 L1 Concatenation L2 (L1·L2)
The concatenation ofL1andL2forms strings by concatenating each element ofL1with each element ofL2.
L1·L2={a1, a2, b1, b2, c1, c2, d1, d2}
1.2 L1 Union L2 (L1∪L2)
The union ofL1andL2combines all elements from both sets.
L1∪L2={a, b, c, d,1,2}
1.3L
3
1(L1 Concatenated with Itself 3 Times)
This is the set of all possible strings formed by concatenating three elements ofL1.
L
3
1=



















aaa, aab, aac, aad, aba, abb, abc, abd, aca, acb, acc, acd, ada, adb, adc, add,
baa, bab, bac, bad, bba, bbb, bbc, bbd, bca, bcb, bcc, bcd, bda, bdb, bdc, bdd,
caa, cab, cac, cad, cba, cbb, cbc, cbd, cca, ccb, ccc, ccd, cda, cdb, cdc, cdd,
daa, dab, dac, dad, dba, dbb, dbc, dbd, dca, dcb, dcc, dcd, dda, ddb, ddc, ddd



















Page 2

Dr. Mohamed Gamal Faculty of Computers and Informatics
1.4L

1(Kleene Closure of L1 - Zero or more)
The Kleene closure ofL1includes all possible stringsincluding the empty string.
L

1=



































































ϵ,
a, b, c, d,
aa, ab, ac, ad,
ba, bb, bc, bd,
ca, cb, cc, cd,
da, db, dc, dd,
aaa, aab, aac, aad, aba, abb, abc, abd, aca, acb, acc, acd, ada, adb, adc, add,
bba, bbb, bbc, bbd, bca, bcb, bcc, bcd, bda, bdb, bdc, bdd,
caa, cab, cac, cad, cba, cbb, cbc, cbd, cca, ccb, ccc, ccd, cda, cdb, cdc, cdd,
daa, dab, dac, dad, dba, dbb, dbc, dbd, dca, dcb, dcc, dcd, dda, ddb, ddc, ddd,
. . .



































































1.5L
+
1(Positive Closure of L1 - One or more)
The positive closure ofL1includes all possible non-empty strings formed by concatenating elements ofL1one
or more times.
L
+
1
=







































a, b, c, d,
aa, ab, ac, ad,
ba, bb, bc, bd,
ca, cb, cc, cd,
da, db, dc, dd,
aaa, aab, aac, aad, aba, abb, abc, abd, aca, acb, acc, acd, ada, adb, adc, add,
. . .







































Page 3

Dr. Mohamed Gamal Faculty of Computers and Informatics
2 Regular Expressions (RE) and Languages
2.1 Basic Terminology
Regular
Expression
Language Denoted Explanation
a {a} The language contains only the string ‘a’.
ab {ab} The language contains only the string “ab”.
(a|b) {a, b} The language contains the strings ‘a’ and ‘b’.
a(b|c) {ab, ac} The language contains the strings “ab” and
“ac”.
a(bc)

{a, abc, abcbc, abcbcbc, . . .} The language contains strings that start with ‘a’
followed by zero or more repetitions of Φc”.
(ab|cd)

{ϵ, ab, cd, abcd, ababcd, cdab, . . .} The language contains strings formed by con-
catenating zero or more repetitions of “ab” or
“cd”.
a(bc|de)

{a, abc, ade, abcde, abcbc, . . .} The language contains strings that start with ‘a’
followed by zero or more repetitions of Φc” or
“de”.
(a|b)

c {c, ac, bc, aac, abc, bbc, . . .} The language contains strings that end with ‘c’
and are preceded by zero or more repetitions of
‘a’ or ‘b’.
a

{ϵ, a, aa, aaa, . . .} The language contains zero or more repetitions
of ‘a’.
a
+
b {ab, aab, aaab, . . .} The language contains strings with one or more
‘a’ followed by ‘b’.
(a|b)

abb {abb, aabb, babb, aaabb, . . .} The language contains strings that end with
“abb” and can have zero or more repetitions of
‘a’ or ‘b’ before it.
(a|b)

{ϵ, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, . . .}The language contains all possible strings (in-
cluding the empty string) made up of ‘a’ and
‘b’.
Page 4

Dr. Mohamed Gamal Faculty of Computers and Informatics
2.2 C-Language Identifiers and Numbers
C-Language Identifiers
DefinitionRegular Expression Example
letter [A-Za-z] a,B,
digit [0-9] 0,1,9
CId letter(letter—digit)

var,myVar123,temp
Unsigned Integer or Floating Point Numbers
DefinitionRegular Expression Example
digit [0-9] 0,1,9
digits digit
+
123,4567
number digits(.digits)? (E [+−]? digits)?42,3.14,2.71E-3
Regular Expressions Validators:
ˆhttps://regexr.com/
ˆhttps://regex101.com/
Page 5

Dr. Mohamed Gamal Faculty of Computers and Informatics
2.3 Examples
Write a regular expression for a language
1.
Solution:R.E =a

b

. . . z

.
2.
P
={a, b}
Solution:R.E =b

a b

a b

.
3.
P
={a, b, c}
Solution:R.E = (b|c)

a(b|c)

.
4.
P
={a, b, c}
Solution:R.E = (b|c)

a? (b|c)

a? (b|c)

a? (b|c)

.
5.
P
={a, b}
Solution:R.E =b

a

.
6.
P
={0, 1}
Solution:R.E = (0|1)

010 (0|1)

.
7.
P
={a, b, c}
Solution:R.E = (b|c)

(aaa)

(b|c)

.
8.
P
={a, b}
Solution:R.E = (a|b)

(a|bb)
+
.
9.
P
={a, b}
Solution:R.E = (aa|bb|ab|ba)

.
10.
P
={0, 1}
Solution:R.E = 1 (0|1)

1.
Page 6

Dr. Mohamed Gamal Faculty of Computers and Informatics
3 Finite Automata
3.1 Introduction
Arecognizerfor a language is a program that takes a string x, and answers “yes” if x is a sentence of that
language, and Ωo” otherwise.
We call the recognizer of the tokens as afinite automatonwhich can be
1.Non-deterministic (NFA):slower, but it may take less space.
1)S: a set of states.
2)
P
: a set of input symbols (alphabet).
3)
4)s0: a start (initial) state.
5)F: a set of accepting states (final states).
Note thatϵ-transitions are allowedin NFA.
Example:
2.Deterministic (DFA):faster recognizer, but it may take more space - widely used.
ˆDFA is a special form of a NFA.
ˆNoϵ-transitions.
ˆfor each symbol a and state s, there isat mostone labeled edge leaving s.
Example:
Two Algorithms:
Algorithm 1:Regular Expression→NFA→DFA
Algorithm 2:Regular Expression→DFA
Page 7

Dr. Mohamed Gamal Faculty of Computers and Informatics
Algorithm 1:Regular Expression→NFA→DFA
1.Thomson’s Construction:Regular Expression→NFA
Operation Decsription
ϵ-closure(s) Set of NFA states reachable from NFA statesonϵ-transitions
alone.
ϵ-closure(T) Set of NFA states reachable from some NFA statesin setTon
ϵ-transitions alone; =∪sinTϵ−closure(s).
move(T, a) Set of NFA states to which there is a transition on input symbol
’a’ from some statesinT.
Page 8

Dr. Mohamed Gamal Faculty of Computers and Informatics
Example to recognize the regular expression (a|b)

a
2. →DFA
Page 9

Dr. Mohamed Gamal Faculty of Computers and Informatics
NFA State DFA Stateab Type
{0,1,2,4,7} S0 S1S2 Start
{1,2,3,4,6,7,8} S1 S1S2 Final
{1,2,4,5,6,7} S2 S1S2 -
Transition table for the DFA
Another Example:
NFA for (a|b)

abb
NFA State DFA Statea b Type
{0,1 ,2 , 4, 7} A B C Start
{1, 2, 3, 4, 6, 7, 8} B B D -
{1, 2, 4, 5, 6, 7} C B C -
{1, 2, 4, 5, 6, 7, 9} D B E -
{1, 2, 3, 5, 6, 7, 10} E B C Final
Transition table for the DFA
Page 10

Dr. Mohamed Gamal Faculty of Computers and Informatics
Resulting DFA
Page 11

Dr. Mohamed Gamal Faculty of Computers and Informatics
Algorithm 2:Regular Expression→DFA
1.
e.g., r→(r)#
2. syntax treefor this augmented regular expression.
ˆall alphabet symbols (plus # and the empty string) will be on the leaves.
ˆall inner nodes will be the operators.
3.
4.
ˆfirstpos(node):set of positions in the string that can start at the node’s subtree.
ˆlastpos(node):set of positions in the string that can end at the node’s subtree.
ˆfollowpos(i):set of positions that can follow position i in the regular expression.
Page 12

Dr. Mohamed Gamal Faculty of Computers and Informatics
Algorithm 1Computingfollowpos
1:foreach nodenin the treedo
2:ifnis a concatenation node with left childc1 and right childc2then
3: foreachiinlastpos(c1)do
4: followpos(i)←followpos(i)∪firstpos(c2)
5: end for
6:else ifnis a star nodethen
7: foreachiinlastpos(n)do
8: followpos(i)←followpos(i)∪firstpos(n)
9: end for
10:end if
11:end for
Example:
We first construct thesyntax treeusing the table above and find thefirstposandlastposfor each node, as
follows
Then we calculatefollowposusing algorithm 1 above, the result is as follows
ˆfollowpos(1) ={1, 2, 3}
ˆfollowpos(2) ={1, 2, 3}
ˆfollowpos(3) ={4}
ˆfollowpos(4) ={}
Page 13

Dr. Mohamed Gamal Faculty of Computers and Informatics
Now we are ready to construct the corresponding DFA using the followpos.
(a
1
|b
2
)

a
3
#
4
followpos(1) ={1, 2, 3}
followpos(2) ={1, 2, 3}
followpos(3) ={4}
followpos(4) ={}
S1=firstpos(root) ={1,2,3}
⇓markS1
a:followpos(1)∪followpos(3) ={1,2,3,4}=S2 move(S1, a) =S2
b:followpos(2) ={1,2,3}=S1 move(S1, b) =S1
⇓markS2
a:followpos(1)∪followpos(3) ={1,2,3,4}=S2 move(S1, a) =S2
b:followpos(2) ={1,2,3}=S1 move(S1, b) =S1
Start State:S1
Final (accepting) State:S2
Final resulting DFA
Page 14

Dr. Mohamed Gamal Faculty of Computers and Informatics
3.2 Minimizing Number of States of a DFA
The process of minimizing a DFA is
1.Initial Partition:divide all the states of your DFA into two groups
ˆG1: Accepting States.
ˆG2: Non-Accepting States.
2.Refining Groups:refine these groups to create subgroups based on transitions
ˆFor each group G (either G1 or G2):
–Break G into smaller subgroups.
–Two states, s1 and s2, will be in the same subgroup if and only if, for every input symbol, they
transition to states within the same subgroup.
3.Start State:the start state of the minimized DFA is the group thatcontains the original start stateof
the DFA. This helps in keeping the starting behavior consistent with the original DFA.
4.Accepting States: the accepting states of the minimized DFA are those groups thatcontain at least
one of the original accepting states. This ensures that the minimized DFA correctly represents all the
accepting conditions of the original DFA.
Example:
Example DFA
1.
G1={4}
G2={1,2,3}
2.
Stateab
1 23
2 23
3 43
4 23
DFA transition table
Page 15

Dr. Mohamed Gamal Faculty of Computers and Informatics
From the transition table we notice that the states 1 and 2 inG2lead to the same input symbols (colored
in), thus, we can group them alone together.
G1={4}G2={1,2,3}{1,2}{3}

{1,2} {3} {4}
3.Start State:{1,2}
4.Final State:{4}
Minimized DFA
Online Resources:
ˆRegular Expression to DFA, NFA, Minimize
Page 16