Apriori Algorithm.pptx

292 views 35 slides May 04, 2022
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

Data mining apriori


Slide Content

• A p riori Al g orithm in D ata M i n i n g

What Is An I t emse t ? • A set o f i t ems to g e t h er i s c a l l ed an i t emse t . I f any i t emset h as k - i t ems i t i s c a l l ed a k - i t emse t . A n i t emset co nsis t s o f t wo o r more i t e m s. An i t e m set t h at o c c urs fre q uen t ly i s c a l led a fre q uent i t emse t . Th u s fre q u ent i t emset m i n i ng i s a d a t a m i n i ng t echn iq u e t o i d e n t i fy t h e i t e m s t h at of t en o c c ur t oget h e r . • Fo r Exam p l e , B read and b utt er, La p to p and A n t i v i rus s o f t ware, e tc .

W h at Is A Frequent I t em s et? • A set of items i s c a l led fre q uent i f i t sa t i sf i es a m i n i mum t h res h o l d v a l u e for su pp o rt and co nf i d ence. S u pp o rt shows t rans a ct i o ns w i t h i t ems p u r c h a s ed to g e t h er i n a sin g l e t rans a ct i o n. C o nf i d ence shows t ransa c t i ons w h ere t h e i t e m s are p ur c h ased one af t er t h e o t h er. Fo r fre q u ent i t emset m i n i ng metho d , we co nsider o nly t h o se t ransa c t i ons w hi c h m e et m i n i mum t h res h o l d s u pp ort and co nf i d ence re q u i rem e n t s. I nsi gh t s fr o m t h ese m i n i ng a l gor i t h ms offer a l ot of b e n ef i t s, c o s t - c u t t i ng and i m p rov e d c om p etiti v e a d v an t age. Th ere i s a t ra d eoff t i me t a k en t o m i ne d a t a a nd t h e v olu me o f d a t a f o r fre q u ent m i n i n g . Th e fre q u ent m i n i ng a l gor i t h m i s a n eff i c i ent a l gor i t h m t o m i ne t h e hi dd en p a t t erns o f i t emsets w i t hi n a s h o rt t i me and less m e mory c ons u m p t i on. • •

Frequent P attern Mining (FPM) • Th e fr eq uent p a t t ern m i n i ng a l g orithm i s one of t h e m o s t i m p or t ant t echn iq u es o f d a t a m i n i ng t o d i s c o v er rel at i o nsh ip s b e t w e en d i fferent i t ems i n a d a t a s e t . Th ese rel at i o nsh ip s are re p resented i n t h e form of a ss oc i a t i on ru l es. I t h el p s t o fi n d t h e i rr e g u l ar i t i es i n d a t a. FP M h as many ap p l i c a t i o ns i n t h e f i eld o f d a t a a na l y s i s, s o f t ware b ugs, c ros s -mar k eti n g , sa l e c am p ai g n ana l ysis, mar k et b as k et an al y s i s, e tc . F re q u ent i t emsets d i s c o v ered t h ro u gh A p r i o ri h ave many a pp lic at i ons i n d a t a m i n i ng t as k s. T as k s s uc h as f i ndi n g i nteres t i ng p a t t erns i n t h e d a t a b a s e, f i n d i ng ou t se q u ence and M i n i ng o f a s s o c i a t i o n ru l es i s t h e most i m p o r t ant o f t h em. A s s oc i a t i on ru l es a pp ly t o s up e r m ar k et t ransa c t i on d a t a, t h at i s, t o exam i ne t h e cu s t o mer b e h av i o r i n t erms o f t h e p u r c h a s ed p ro d uct s. A s soc i a t i o n ru l es d es c r i b e h o w o f t en the i t ems are p u r c h a s ed t oget h er. • • •

A s s o c i at i on R u l es • Asso ci a t i on Ru l e M i nin g i s d e f in e d as: “Let I = { … } be a set of ‘ n ’ bi n ary a tt ri b u t es c al l ed i t e m s. Let D= { … .} be set o f t r ans a c t i on c al l ed da t a b a s e. E a c h t r ans a c t i on i n D has a un i que t r ans a c t i on I D and c on t ai n s a s u bset of t he i t e m s i n I . A r ule i s d e f i n ed as an i m p li c a t i on of f o r m X -> Y w h er e X , Y? I and X ? Y=?. The set o f i t e m s X and Y a r e c al l ed an t e c e d e n t and c onseque n t of t he r u l e r e sp e c t i ve l y . ” Learn i ng o f A s soc i a t i o n ru l es i s u sed t o f i nd rel at i o nsh ip s b e t w e en a t t r i b ut es i n l arge d a t a b a s es. A n a s s o c i a t i o n ru l e, A = > B , w i l l b e o f t h e form” f o r a set o f t rans a ct i o ns, s o me v a l u e of i t emset A d e t erm i nes t h e v a l u es o f i t emset B u n d er t h e c o n d i t i o n i n w h i c h m i n i mum s u pp ort and c onf i d e n c e are m e t ”. •

• S u p p o rt a n d C o nf id en ce can be rep r esen ted by t h e follow i n g e xamp le : Brea d => butter [ s upp o r t=2 % , c o n f id en c e - 6 %] T h e ab ov e s t a tem e n t is an examp l e o f an as s o cia t i o n r u l e. T h is m e ans t h at t h ere is a 2% t r an s ac t i o n t h at b o u g h t b r ead and butter t o g e t h er and t h ere a r e 6 0% o f c us t o m e rs w h o b o u g h t bread as wel l as b u tte r . • S u p p o rt a n d C o nf id en ce fo r I te mse t A a n d B a r e rep r esen ted by f o rmu l a s : • 1. 2. Asso ciati o n ru l e mi n i n g c o ns i s ts o f 2 s tep s : Fi n d a l l t h e f r e q u e n t item s et s . G e n er a te as s o cia t i o n r u l es f r o m t h e ab ov e f r e q u e n t item s et s .

Why F r eq u ent I t emset Mini n g ? • Frequent i t e m s e t or p a tt ern m i n i ng i s bro a d l y u s e d b eca u se of i t s wi d e a p p l i ca t i ons i n m i n i ng a s s o c i a t i on r u l e s , c orrela t i ons a nd gr a ph p a tt er n s c onstra i nt t h a t i s b a s e d on frequent p a tt e rn s , se q u e n t i a l p a tt e rn s , a nd m a ny o t h e r d a t a m i n i ng t a sks.

A p ri o ri A l go ri t hm – F reque n t P at t ern Al g orithms A p r i o ri a lg o ri th m w as t h e f ir s t a lg o ri th m t h at w as p r o p ose d fo r f req u en t item s et mi n i n g. I t was l a t er imp r o v ed by R A ga r w al and R Srikant and came to be k now n as A p r i o ri . T h is a l g o r it h m u s es two s teps “ j o i n ” and “ p r u n e” to red u ce t h e se a r ch s p a c e . I t is an it e r a ti v e a p p r o ach to di s c o v e r t h e m os t f r e q u e n t item s et s . Apr io ri s a y s : T h e p r o babi l ity t h at item I is no t f r e q u e n t is i f : P( I ) < mi n im u m s upp o rt t h re s h ol d, t h e n I is no t f req u en t. P ( I + A ) < mi n imum s upp o r t t h r e s h ol d, t h en I +A is no t f r e q u e n t, w h ere A a l s o b elon g s to ite mse t . I f an item s et s et h as v a l ue l e s s t h an m i n imum s upp o r t t h en a l l o f its s uper s ets w i l l a l s o f a l l b e lo w min s upp o r t, and t h us can be ig no r e d . T h is p r o perty is ca lle d t h e An tim o no t on e p r o perty. • • •

The s t e p s f o l l o w e d i n t h e A p r i ori A l g o r i t h m of d a ta m i n i n g a r e : • J o i n St e p : Thi s s t ep g e n erates (K + 1 ) i t e m set from K - i t e m se t s b y jo i n i ng ea c h i t em w i t h i t se l f . • P r u n e St e p : Thi s s t ep s c ans t h e c o u nt o f ea c h i t em i n t h e d a t a b a s e. I f t h e can d i d a t e i t em d oes not m e et m i n i mum s u pp or t , t h en i t i s re g ar d ed as i nfre q u ent and t h u s it is remov e d . Thi s s t ep i s p erformed t o red u c e t h e size o f t h e c a n d i d a t e i t emset s .

S t eps In Apr i ori A p r i o ri a lg o ri th m is a se q u en ce o f s teps to be f ollowe d to f i n d t h e m os t f r e q u e n t item s et in t h e gi v en da t ab a s e. T h is data mi n i n g techniq u e follow s t h e j o in and t h e p r u n e s teps iter a tiv el y u n til t h e m os t f r e q u e n t item s et is ac h i e v e d. A m i n im u m s upp o rt t h re s h ol d is g i v e n in t h e p r o b l e m o r it is as s um e d by t h e u s e r . • #1 ) I n t h e f i r s t iter a ti o n o f t h e a l g o r it h m, each item is t a k e n as a 1 -item s ets can d ida t e. T h e a l g o r it h m w i l l c o u n t t h e o cc u rr e n c e s o f each ite m . • # 2) L et t h ere be so me mi n imum s upp o r t, mi n _ s up ( e g 2). T h e s et o f 1 – item s ets w h os e o cc u rr e n ce is s a t i sf ying t h e min s up a r e d e termi n e d . On l y t h os e can d ida t es w h ich c o u n t m o r e t h an o r e q ual to mi n _ s up, a r e t a k e n a h ead f o r t h e n ext iter a ti o n and t h e o t h ers a r e p r u n e d .

• # 3) N ext, 2 -item s et f r e q u e n t items w ith mi n _ s up a r e di s c o v ere d . F o r t h is in t h e j o in s tep, t h e 2 -ite mse t is gene r a ted by f o rmi n g a gr o up o f 2 by c o mbi n i n g items w ith it s e l f. • # 4 ) T h e 2 -ite mse t ca n didat e s a r e p r u n e d u s i n g mi n - s up t h re s h ol d v a l u e . N o w t h e t a b l e w i l l h ave 2 – item s ets w ith mi n - s up o n l y . • # 5) T h e n ext iter a ti o n w i l l fo r m 3 – item s ets u s i n g j o in and p r u n e s tep. T h is iter a ti o n w i l l follo w antim o n o t o n e p r o per t y w h ere t h e s ub s ets o f 3 -item s et s , t h at is t h e 2 – ite mse t s u b se ts o f e ach g r o up f a l l in mi n _ s up. I f a l l 2 -ite mse t s ub s ets a r e f r e q u e n t t h en t h e s uper s et w i l l be f r e q u e n t o t h er w i s e it is p r u n e d . • #6 ) N ext s tep w i l l follo w maki n g 4 -item s et by j o i n i n g 3-item s et w ith it s e l f a n d pru n i n g if its s u b se t d o e s no t m ee t t h e mi n _ s up criteri a . T h e a lg o ri th m is s t o p p ed w h en t h e m os t f r e q u e n t item s et is ac h ie v e d .

• Ex am p le o f Apr io r i : S u p p o rt t h r e s h o l d = 5 0%, C o n f i de n c e = 6 0%

T A B LE - 1 So l u t i on : S u p p o rt t h re s h ol d = 5 0% = > . 5 *6 = 3 = > mi n _ s up =3

1 . C ou n t Of E a ch It e m T AB L E -2

2. P ru n e S t e p : T A B L E - 2 s h ow s t h at I 5 item d o es n o t m e et mi n _ s u p = 3 , t h us it is d ele te d , onl y I1 , I 2, I 3 , I 4 m ee t mi n _ s up c o u n t. T AB L E -3

3 . J oin St e p : Fo rm 2 - i t emset. F rom T A BL E - 1 f i nd ou t t h e occu rrences o f 2 - i t emset. T AB L E -4

4. P r un e St e p : TA B LE - 4 s h ows t h at i t em set {I1, I 4} and {I3, I 4} do es not me e t m i n _ s u p , t h u s it i s d ele t e d . T ABL E - 5

• 5 . J o i n and Pru n e S t e p: F o r m 3- i t e ms e t. F r om the T AB LE- 1 f in d o u t o c c u rre n ce s of 3 - i t e ms e t. F r om T AB LE - 5 , f in d o u t the 2 - i t e ms e t s ub s e ts w h i c h s u pp o r t m i n _s u p. W e c an s e e for i t e ms e t {I1, I 2 , I 3 } s ub s e ts, {I1, I 2 }, {I1, I 3 }, {I 2 , I 3 } a r e occ u rr in g i n T AB LE - 5 th u s {I 1 , I 2 , I3} i s f req u e n t. W e c a n s e e for i t e ms e t {I 1 , I 2 , I4} s ub s e ts, {I 1 , I 2 }, {I 1 , I4 } , {I 2 , I4 } , {I 1 , I4} i s n ot f re q u e n t, as i t i s not o c c u rr in g i n T AB L E - 5 t h u s {I1, I 2 , I4} i s n ot f re q u e n t, h e n c e i t i s de l e t ed . T AB L E -6 O n l y { I 1, I 2 , I 3} is f r e qu e n t.

6 . G en e ra t e A ss o c i a t i o n R u le s : Fr o m t h e f req u en t ite mse t di s c o v e red ab ov e t h e as s o cia t i o n c o u l d b e : • • • • • • • • • • • • • { I1 , I 2} = > { I 3 } C o n f id en ce = s upp o r t {I1 , I 2, I 3 } / s upp o r t {I1 , I 2} = ( 3 / 4 )* 1 00 = 75 % {I1 , I 3 } => {I 2} C o nf id en ce = s upp o rt { I1 , I 2, I 3 } / s upp o rt { I1 , I 3 } = ( 3 / 3 )* 1 00 = 1 00% {I 2, I 3 } => {I1 } C o n f id en ce = s upp o r t {I1 , I 2, I 3 } / s upp o r t {I 2, I 3 } = ( 3 / 4 )* 1 00 = 75 % { I1 } = > { I 2, I 3 } C o n f id en ce = s upp o r t {I1 , I 2, I 3 } / s upp o r t {I1 } = ( 3 / 4 )* 1 00 = 75 % {I 2} => {I1 , I 3 } C o n f id en ce = s upp o r t {I1 , I 2, I 3 } / s upp o r t {I 2 = ( 3 / 5 )* 1 00 = 6 0% {I 3 } = > {I1 , I 2} C o nf id en ce = s upp o rt { I1 , I 2, I 3 } / s upp o rt { I 3 } = ( 3 / 4 )* 1 00 = 75 % T h is s h ow s t h at a l l t h e ab ov e as s o cia t i o n r u l es a r e s t r o n g if mi n imum c o n f id en ce t h r e s h ol d is 6 %. • T he Apr io ri A l g o r i thm: P s e ud o Cod e C : Ca n dida t e item s et o f s i z e k L : Freq u en t ite mse t o f s i z e k

Adv a n t a ges • E a sy t o u n d er s t a nd a l gori t hm • Join a nd Prune steps a re ea sy t o i m p l e m e nt on l a rge i t e m s e t s i n large da t a b a s e s Di s a dv a n t a ges • It r e q u i r e s h i gh c ompu t a t i on i f t he i t e m se t s a re very l a rge a nd t he m i n i m u m s u pport i s k e p t very l o w . • The e n t i re d a t a b a se n e e d s t o be s c a n n e d .

Me t hods T o Improve A p riori E f f i c ie n cy • H a s h - B a s ed T e c h n i qu e: T h is m e t h o d u s es a h as h -bas e d s t r ucture ca l l ed a h a s h tab l e fo r g ene r a ti n g t h e k -ite mse ts a n d its c o r r es p on di n g c o u n t. I t u s es a h ash f u n cti o n fo r g e n er a ting t h e t a b l e. T r an s a c t i o n R edu c t i on : T h is m e t h o d r e d uc e s t h e n umb e r o f t r a n s acti on s s ca n n i n g in it e r a ti ons . T h e t r a n s acti on s w h ich do no t c o n t a in f r e q u e n t items a r e ma r k e d o r r e mo v e d . P a rt i t i on i n g : T h is m e t h o d r e q uires o n l y two da t ab a s e s cans to mi n e t h e f req u en t ite mse t s . I t s ays t h at fo r a n y ite mse t to be p o te n ti a ll y f req u en t in t h e da t ab a s e, it s h o u l d be f r e q u e n t in at l ea s t o n e o f t h e p ar ti t i o n s o f t h e da t ab a s e. Sa m p l i ng: T h is m e t h o d pic k s a r a n d o m s amp l e S f r o m D a t ab a s e D a n d t h en s ea r c h es fo r f r e q u e n t item s et in S. I t may be p o ss ib l e to los e a g l o bal f r e q u e n t item s et. T h is can be r e d uc e d by low eri n g t h e mi n _ s up. D ynam i c I t e m s et Co u n t i ng: T h is t e chni q ue can add ne w ca n didate item s ets at any ma r k e d s t ar t p o i n t o f t h e da t ab a s e duri n g t h e s can n i n g o f t h e da t ab a s e. • • • •

Ap p li c atio n s Of Ap r i o ri Algorithm • In Ed u c a t i on Fi e l d: E x t ra ct i ng as s o c i a t i on ru l es i n d a t a m i n i ng of a d m i tt ed s t ud ents t h ro u gh c h ar act er i s t i c s and s p ecia lt i e s . • In t h e M e d i c a l f ie l d: Fo r exam p l e A na l y s i s o f t h e p a t i ent ' s d a t a b a se . • In F o r e s try: A na l y s i s o f p roba b i l i t y and i n t ens i t y o f f o rest f i re t h e forest f i re d a t a . w i t h • Ap r i ori i s u s ed b y many c om p an i es li k e Ama z on i n t h e R e com me n d e r S ys t e m and b y Go o gle for t h e a ut o - co m p l e t e fea t u r e .
Tags