2_6 Optimization of DFA Based Pattern Matchers.ppt

429 views 17 slides Aug 03, 2024
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

cd course


Slide Content

OPTIMIZATION OF DFA
BASED PATTERN
MATCHERS

Important States of an NFA

An NFA state is important if it has non-
out transitions

During Subset construction - -closure
(move (T, a)) takes into account only the
important states

Direct construction relates important
states of NFA with symbols in the RE

Augmented RE

Final state is not important

Concatenate an unique right end marker
#

Add a transition on # out of the
accepting state

Converting Regular Expression to DFA
A regular expression can be converted into a
DFA (without creating a NFA first).
First the given regular expression is
augmented by concatenating it with a
special symbol #.
r  (r)#
augmented regular expression
Then a syntax tree is created for this
augmented regular expression.


In this syntax tree, all alphabet symbols, # ,
and the empty string in the augmented
regular expression will be on the leaves,
and

All inner nodes will be the operators

Then each alphabet symbol and # will be
numbered (position numbers).
Converting Regular Expression to DFA

Types of Interior nodes

Cat-node

Star-node

Or-node

Regular Expression  DFA (cont.)
(a|b)
*
a  (a|b)
*
a # augmented regular expression

*

|
b
a
#
a
1
4
3
2
Syntax tree of (a|b)
*
a #
• each symbol is numbered
• each symbol is at a leaf
• inner nodes are operators

followpos
Followpos is defined for the positions
(positions assigned to leaves).
followpos(i) : is the set of positions which can follow
the position i in the strings generated by
the augmented regular expression.
For example, ( a | b)
*
a #
1 2 3 4
followpos(1) = {1,2,3}
followpos(2) = {1,2,3}
followpos(3) = {4}
followpos(4) = {}
followpos is just defined for leaves,
it is not defined for inner nodes.

firstpos, lastpos, nullable
To evaluate followpos, three more functions are to be
defined for the nodes (not just for leaves) of the
syntax tree.

firstpos(n) -- the set of the positions of the first
symbols of strings generated by the sub-expression
rooted at n.

lastpos(n) -- the set of the positions of the last
symbols of strings generated by the sub-expression
rooted at n.

nullable(n) -- true if the empty string is a member of
strings generated by the sub-expression rooted by n
false otherwise

Evaluation of firstpos, lastpos, nullable
n nullable(n)firstpos(n) lastpos(n)
leaf labeled

true  
leaf labeled
with
position i
false {i} {i}
c
1
c
2
nullable(c
1
) or
nullable(c
2
)
firstpos(c
1
) 
firstpos(c
2
)
lastpos(c
1
) 
lastpos(c
2
)


c
1
c
2
nullable(c
1
) and
nullable(c
2)
if (nullable(c
1))
firstpos(c
1
) 
firstpos(c
2)
else firstpos(c
1
)
if (nullable(c
2))
lastpos(c
1
) 
lastpos(c
2)
else lastpos(c
2
)
*
c
1
true firstpos(c
1
) lastpos(c
1
)

Evaluation of followpos
Two-rules define the function followpos:
1.If n is concatenation-node with left child c
1 and right child c
2,and i is
a position in lastpos(c
1
), then all positions in firstpos(c
2
) are in
followpos(i).
2.If n is a star-node, and i is a position in lastpos(n), then all positions
in firstpos(n) are in followpos(i).
If firstpos and lastpos have been computed for each node, followpos of
each position can be computed by making one depth-first traversal of the
syntax tree.

n
C
1 C
2
Cat-node
ilastpos(c
1) firstpos(c
2
)
followpos(i) = { firstpos(c
2) }
Followpos

Followpos
n
C
1
Star-node
ilastpos(n)firstpos(n)
followpos(i) = { firstpos(n) }

Example -- ( a | b)
*
a #

*

|
b
a
#
a
1
4
3
2
{1}{1}
{1,2,3}
{3}
{1,2,3}
{1,2}
{1,2}
{2}
{4}
{4}
{4}{3}
{3}
{1,2}
{1,2}
{2}
red – firstpos
blue – lastpos
followpos(1) = {1,2,3}
followpos(2) = {1,2,3}
followpos(3) = {4}
followpos(4) = {}
• The DFA can now be constructed for the Regular Expression

Algorithm (RE  DFA)
Create the syntax tree of (r) #
Calculate the functions: followpos, firstpos, lastpos, nullable
Put firstpos(root) into the states of DFA as an unmarked state.
while (there is an unmarked state S in the states of DFA) do

mark S

for each input symbol a do

let s
1,...,s
n are positions in S and symbols in those positions are a

S


 followpos(s
1
)  ...  followpos(s
n
)

move(S,a)  S’

if (S’ is not empty and not in the states of DFA)

put S’ into the states of DFA as an unmarked state.
the start state of DFA is firstpos(root)
the accepting states of DFA are all states containing the position of #

Example -- ( a | b)
*
a #
followpos(1)={1,2,3} followpos(2)={1,2,3} followpos(3)={4} followpos(4)={}
S
1=firstpos(root)={1,2,3}
 mark S
1
a: followpos(1)  followpos(3)={1,2,3,4}=S
2
move(S
1
,a)=S
2
b: followpos(2)={1,2,3}=S
1 move(S
1,b)=S
1
 mark S
2
a: followpos(1)  followpos(3)={1,2,3,4}=S
2
move(S
2
,a)=S
2
b: followpos(2)={1,2,3}=S
1
move(S
2
,b)=S
1
start state: S
1
accepting states: {S
2}
1 2 3 4
S
1
S
2
a
b
b
a

Example -- ( a | ) b c
*
#
1 2 3 4
followpos(1)={2} followpos(2)={3,4} followpos(3)={3,4} followpos(4)={}
S
1
=firstpos(root)={1,2}
 mark S
1
a: followpos(1)={2}=S
2
move(S
1
,a)=S
2
b: followpos(2)={3,4}=S
3 move(S
1,b)=S
3
 mark S
2
b: followpos(2)={3,4}=S
3 move(S
2,b)=S
3
 mark S
3
c: followpos(3)={3,4}=S
3
move(S
3
,c)=S
3
start state: S
1
accepting states: {S
3
}
S
3
S
2
S
1
c
a
b
b
Tags