RABIN KARP algorithm with hash function and hash collision, analysis, algorithm and code for implementation. Besides it contains applications of RABIN KARP algorithm also
Size: 176.61 KB
Language: en
Added: Jul 09, 2017
Slides: 12 pages
Slide Content
PRESENTED BY ABHISHEK KUMAR SINGH RABIN KARP String Matching Algorithm
Let text string be T of length N Pattern string be P of length M Example T=“hello world”; N=11; P=“llo”; M=3 Problem Statement H E L L O W O R L D L L O
The Rabin-Karp string matching algorithm calculates a hash value for the pattern, as well as for each M-character subsequence of given text to be compared. If the hash values for a particular subsequence are unequal, the algorithm will calculate the hash value for next M-character sequence. If the hash values are equal, the algorithm will do a Brute Force comparison between the pattern and the M-character sequence. Therefore there is only one comparison per text subsequence, and Brute Force is only needed when hash values match. Note: Only one comparison for sub sequence Rabin-Karp Algorithm
Consider an M-character sequence as an M-digit number in the alphabet. The text subsequence t[I ... i+M-1] is mapped to the number x(i) = t[i]*3^0+ t[i+1]*b^1+...+ t[i+M-1]*3^(M-1) Furthermore, given x(i) we can compute x(i+1) for the next subsequence t[i+1 .. i+M] in constant time,as follows: x(i+1) = (x(i)-(t[i]))/3+(3^(M-1))*t[i+M-1] Rolling Hash Function
A B D A B C Example tHash = Hash(“ABD”) = 1*3^0+2*3^1+4*3^2= 43 pHash = Hash(“ABC”) = 1*3^0+2*3^1+3*3^2 = 34 tHash == pHash FALSE tHash = Hash(“BDA”) = 2*3^0+4*3^1+1*3^2 = 23 tHash == pHash FALSE tHash = Hash(“DAB”) = 4*3^0+1*3^1+2*3^2 = 25 tHash == pHash FALSE A B D A B C A B D A B C A B D A B C tHash = Hash(“ABC”) = 30+30+40 = 34 tHash == pHash TRUE
TEXT : PATTERN : HERE PATTERN MATCHES THE SUBSTRING SO INDEX NO. 3 IS RETURNED Incase of HIT A B D A B C A B C
int main() { string s1,s2; cin>>s1>>s2; int l1=s1.size(),l2=s2.size(),i,j,k; int hs1=0,p=0,ar[l1]; for(i=0;i<l2;i++) hs1+=(int)s2[i]*pow(3,i); int hs2=0; for(i=0;i<l1;i++) { if(i==0) { for(k=i;k<l2;k++) hs2+=(int)s1[k]*pow(3,k); } else { hs2-=(int)s1[i-1]; hs2/=3; hs2+=(int)s1[i+l2-1]*pow(3,l2-1); } int c=0; if(hs2==hs1) { for(j=i;j<i+l2;j++) if(s1[j]==s2[j-i]) c++; else break; } if(c==l2) ar[p++]=i; } cout<<l1<<endl; cout<<"Stored index are“\n; for(i=0;i<p;i++) cout<<ar[i]<<" "; return 0; }
Hash to two string match then it is called Hit There is possibility Hash of “abc” is 34 Hash of “dga” is 34 This is called Hash Collision Minimize Collision by Taking mod with prime number HASH COLLISION
Hash of Pattern O(m) Hash Comparison O(n-m+1) = O(n) ; m < n Average Running Time O(m+n) Worst Case Running Time m comparison in each iteration O(mn) ANALYSIS
Keyword matching in large files Good for Plagiarism detection Searching engines Database searching Bioinformatics APPLICATION