Rabin Karp Algorithm

473 views 13 slides Apr 22, 2020
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

Algorithm to find whether a given Pattern is present in a Text which is represented in Decimal Form.


Slide Content

Rabin-Karp Algorithm
Dr.Kiran K
Assistant Professor
Department of CSE
UVCE
Bengaluru, India.

Introduction
•Itmakesuseofelementarynumber-theoreticnotionssuchastheEquivalenceofTwo
NumbersModuloaThirdNumber.
•∑={0,1,...,9}-Eachcharacterisadecimaldigitinradix-dnotation.
•d=|∑|.
•kconsecutivecharactersrepresentlength-kdecimalnumber.
•P[1..m]: Pattern
•p : DecimalValuecorrespondingtopatternP[1..m].
•T[1..n]: Text
•t
s : Decimalvalueofthelength-msubstringT[s+1..s+m],for
s=0,1,...,n–m.

Introduction…
•t
s=pifandonlyifT[s+1..s+m]=P[1..m]
→sisavalidshiftifft
s=p
Computingp:
•pcanbecomputedinΘ(m)timeusingHorner’sRule:
p=P[m]+10(P[m–1]+10(P[m–2]+...+10P[2]+P[1]…))
Computing(t
0,t
1,,,t
n–m):
•t
0canbecomputedfromT[1..m]inΘ(m)timeusingHorner’srule.
•However,t
s+1canbecomputedfromt
sinconstanttime:
t
s+1=10(t
s–10
m–1
T[s+1])+T[s+m+1]
→t
1,t
2,,,t
n-mcanbecomputedinΘ(n–m)time.

Introduction…
Eg.:T[1..6]=314152andpatternlengthm=5
Computingt
0usingHorner’sRule:
t
0 = T[5]+10(T[4]+10(T[3]+10(T[2]+10(T[1])))))
= 5+10(1+10(4+10(1+10(3))))
= 5+10(1+10(4+10(1+30)))
= 5+10(1+10(4+10(31)))
= 5+10(1+10(4+310))
= 5+10(1+10(314))
= 5+10(1+3140)
= 5+10(3141)
= 5+31410
= 31415

Introduction…
Computingt
1using:t
s+1=10(t
s–10
m–1
T[s+1])+T[s+m+1]
s=0,m=5,T[1..6]=314152,t
0=31415
t
1 = 10(t
0–10
5–1
(T[1]))+T[0+5+1]
= 10(t
0–10
4
(T[1]))+T[6]
= 10(31415–10000(3))+2
= 10(31415–30000)+2
= 10(1415)+2
= 14150+2
= 14152

Introduction…
•If10
m–1
isprecomputed,thencomputingt
1,t
2,,,t
n-mwilltakeaconstantnumber
ofarithmeticoperations.
•RunningTime:
pcanbecomputedinΘ(m)time.
t
0,t
1,,,t
n-mcanbecomputedinΘ(n–m+1)time.
Hence,alloccurrencesofpatternP[1..m]inthetextT[1..n]canbefound
withΘ(m)preprocessingtimeandΘ(n–m+1)matchingtime.

Introduction…
Note:
Ifpandt
saretoolargethencomputingpandt
smaynothappeninaconstanttime.
Solution:
•Computepandthet
svaluesmoduloasuitablemodulusq.
•Ifqischosentobeaprimesuchthat10qfitswithinonecomputerword,thenall
necessarycomputationscanbeperformedwithsingle-precisionarithmetic.
•Ingeneralwithad-aryalphabet{0,1,...,d–1}qhastobechosensothatdqfits
withinacomputerword.
t
s+1canbecomputedasfollows:
t
s+1=(d(t
s–T[s+1]h)+T[s+m+1])modq
h=d
m–1
(modq)-valueofmostsignificantdigitinthem-digittextwindow.

Example
Text: 2359023141526739921
n : 19
Pattern: 31415
M : 5
q : 13
Computingpandt
0usingHorner’sRule:
P[m]+10(P[m-1]+10(P[m-2]+...+10P[2]+P[1]…))
p=5+10(1+10(4+10(1+10(3))))=31415
t
0=0+10(9+10(5+10(3+10(2))))(mod13)=23590mod13=8

Example…
Computingt
s+1:
t
s+1=(d(t
s–T[s+1]h)+T[s+m+1])modq,h=d
m–1
(modq)
h=10
4
(mod13)=3
s=0,t
0=8,T[1..6]=235902
t
1=(10(8–2(3))+2)(mod13)=(10(2)+2)(mod13)=22mod(13)=9
s=1,t
1=9,T[2..7]=359023
t
2=(10(9–3(3))+3)(mod13)=(10(0)+3)(mod13)=3mod(13)=3
s=2,t
2=3,T[3..8]=590231
t
3=(10(3–5(3))+1)(mod13)=(10(-12)+1)(mod13)=-119mod(13)=11

Example…

Example…
•t
s≡p(modq) - Hit
•t
s≡p(modq)butt
s≠p - SpuriousHit
•t
s≠p(modq)→t
s≠p - InvalidShift
•t
s≡p(modq)andP[1..m]=T[s+1..s+m] - ValidShift,s.

Algorithm
RABIN –KARP MATCHER (T, P, d, q)
n= T.length
m= P.length
h= d
m-1
(mod q)
p= 0
t
0= 0
For (i=1 to m)
p= (dp+ P [i]) (mod q)
t
0= (dt
0+ T[i]) (mod q)
For (s= 0 to n–m)
If (p == t
s)
If (P[1 . . m] == T[s+ 1 . . s+ m])
Print “Pattern occurs with shift ” s
If (s< n –m)
t
s+ 1= (d(t
s–T[s+ 1] h) + T[s+ m+ 1]) (mod q)
Running Time:
PreprocessingTime: Θ(m)
Matching Time: Θ((n –m+ 1) m)

References:
•ThomasHCormen.CharlesELeiserson,RonaldLRivest,CliffordStein,
IntroductiontoAlgorithms,ThirdEdition,TheMITPressCambridge,
MassachusettsLondon,England.