ADA (Algorithm & Design Analysis) PROJECT CO-208 YASH BHARTI 2k16 / CO /A5 365
Knuth-Morris-Pratt Algorithm The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more than once in the pattern) of the pattern and improves the worst case complexity to O(n). The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know some of the characters in the text of next window. We take advantage of this information to avoid matching the characters that we know will anyway match.
EXAMPLE: STRING : abxabc SUBSTRING: abcaby First we prepare the LPS(Longest Proper Substring) array for the Substring Intialize j= 0 , i =1 & LPS[0] =0. If ( SubStr [j]!= SubStr [ i ]) then move j to the value stored at LPS[j-1]. If j=0 & ( SubStr [j]!= SubStr [ i ]) increment i & LPS[ i ] = 0. Then If ( SubStr [j]== SubStr [ i ]) increment both I & j and LPS[ i ] = j+1.
1 2 3 4 5 6 7 a b c d a b c a j i SubStr [j] != SubStr [ i ] 1 2 3 4 5 6 7 a b c d a b c a j i SubStr [j] != SubStr [ i ] 1 2 3 4 5 6 7 a b c d a b c a j i SubStr [j] != SubStr [ i ] 1 2 3 4 5 6 7 a b c d a b c a j i
1 2 1 2 3 4 5 6 7 a b c d a b c a j i SubStr [j] = = SubStr [ i ] 1 2 3 1 2 3 4 5 6 7 a b c d a b c a 1 2 3 1 1 2 3 4 5 6 7 a b c d a b c a j i SubStr [j] != SubStr [ i ] 1 2 3 1 2 3 4 5 6 7 a b c d a b c a SubStr [j] = = SubStr [ i ] j i j i
Utilizing the LPS array made to check for patterns Considering the original example 1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y
1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y Intialize j= 0 & i =0 . If Str [ i ] == SubStr [j] then increment i & j. If Str [ i ] != SubStr [j] then j == LPS[j-1] .if j= 0 then increment i This is because from LPS we know the length of the longest prefix which is also a suffix and thus we can be sure that the index after that is the index from which we need to start comparing. j i 1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y j i SubStr [j] = = Str [ i ]
1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y j i 1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y j i SubStr [j] != Str [ i ] 1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y j i 1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y j i SubStr [j] = = Str [ i ] SubStr [j] != Str [ i ]
1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y 1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y SubStr [j] = = Str [ i ] 1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y j i 1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y SubStr [j] = = Str [ i ] SubStr [j] = = Str [ i ] j i j i j i
1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y 1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y SubStr [j] = = Str [ i ] 1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y 1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y SubStr [j] ! = Str [ i ] SubStr [j] = = Str [ i ] j i j i j i j i
1 2 1 2 3 4 5 a b c a b y a b x a b c a b c a b y Pattern Matching finished j i
The worst case complexity of Naive algorithm is O(m(n-m+1)). Time complexity of KMP algorithm is O(n) in worst case. Further the space complexity of the KMP is also O(n). This is an improvement over the Naïve searching algorithm
ANALYSIS The worst case complexity of Naive algorithm is O(m(n-m+1)). Time complexity of KMP algorithm is O(n) in worst case. Further the space complexity of the KMP is also O(n). This is an improvement over the Naïve searching algorithm