Frequent itemset mining methods

35,994 views 31 slides Apr 19, 2013

Slide 1 of 31

About This Presentation

Frequent itemset mining methods Algorithms & Solved Examples

Size: 280.16 KB

Language: en

Added: Apr 19, 2013

Slides: 31 pages

Slide Content

Frequent Item-set Mining
Methods
Prepared By-Mr.Nilesh Magar

Data Mining:
Data mining is the efficient discovery of
valuable, non obvious information from a
large collection of data.
Prepared By-Mr.Nilesh Magar

Most important concepts in Data-mining
Item-set & frequent item-set:
Market Basket model
Frequent Item-set:
Prepared By-Mr.Nilesh Magar

Example Of Market basket Model:
B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j}
B5 = {m, p, b}B6 = {m, c, b, j}B7 = {c, b, j} B8 = {b, c}
Suppose Min support =3
Frequent item-sets: {m:5}, {c:5}, {b:6}, {j:4}, {m, b:4}, {c,
b:4}, {j, c:3}.
Prepared By-Mr.Nilesh Magar

Association Rule:
Medical diagnosis dataset-symptoms and illness.
A rule is define as an implication of the form X Y where X,Y
I (Items). Or in other words: if { i1, i2,…, i
k} j, means: if a
basket contains all of i
1,…, i
kthen it is likely to Contain j.
The probability of finding Y for us to accept this rule is called
the confidenceof the rule.
Conf(XY)=SUPP(X U Y)/SUPP(X)
{m,b}c ::: Confidence= 2/4 = 50%
Thus Association mining is 2 step Process:
Find all frequent item-sets:
Generate Strong association rules from frequent item-set Prepared By-Mr.Nilesh Magar

The Apriori algorithm
Mining frequent item-set for Boolean association rule
Prior knowledge
Iterative approach known as level-wise search k-item-
sets are used to explore (k+1)-item-sets
One full scan of the database required to find l
K , L
1->
Items with Min Support. L
2-> generating 2-item-set etc.
Prepared By-Mr.Nilesh Magar

Two steps:
Join
finding Lk, a set of candidate k-itemsets is generated
by joining Lk-1 with itself
Prune
To reduce the size of Ck the Apriori property is used:
if any (k-1) subset of a candidate k-itemset is not in
Lk-1, then the candidate cannot be frequent either,so
it can be removed from Ck. –subset testing.
Prepared By-Mr.Nilesh Magar

Join & prune
Step
Prepared By-Mr.Nilesh Magar

Example:
TID List of item_IDs
T100 I1, I2, I5
T200 I2, I4
T300 I2, I3
T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
Prepared By-Mr.Nilesh Magar

Prepared By-Mr.Nilesh Magar

Scan D for count of each candidate
C1: I1 –6, I2 –7, I3 -6, I4 –2, I5 -2
Compare candidate support count with minimum support count (min_sup=2)
L1: I1 –6, I2 –7, I3 -6, I4 –2, I5 -2
Generate C2 candidates from L1 and scan D for count of each candidate
C2: {I1,I2} –4, {I1, I3} –4, {I1, I4} –1, …
Compare candidate support count with minimum support count
L2: {I1,I2} –4, {I1, I3} –4, {I1, I5} –2, {I2, I3} –4, {I2, I4} -2, {I2, I5} –2
Generate C3candidates from L2 using the join and prune steps:
Join: C3=L2xL2={{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4,
I5}}
Prune: C3: {I1, I2, I3}, {I1, I2, I5}
Scan D for count of each candidate
C3: {I1, I2, I3} -2, {I1, I2, I5} –2
Compare candidate support count with minimum support count
L3: {I1, I2, I3} –2, {I1, I2, I5} –2
Generate C4 candidates from L3
C4=L3xL3={I1, I2, I3, I5}
This itemset is pruned, because its subset {{I2, I3, I5}} is not frequent => C4=null
Prepared By-Mr.Nilesh Magar

Generating association rules from
frequent item-sets: from Slide 5
Finding the frequent item-sets from
transactions in a database D
Generating strong association rules:
Confidence(A=>B)=P(B|A)=
support_count(AUB)/support_count(A)
support_count(AUB) –number of transactions
containing the itemsets AUB
support_count(A) -number of transactions containing
the itemsets A
Prepared By-Mr.Nilesh Magar

Example:
lets have l={I1, I2, I5}
The nonempty subsets are {I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, {I5}.
Generating association rules:
I1 and I2=>I5 conf=2/4=50%
I1 and I5=>I2 conf=2/2=100%
I2 and I5=> I1 conf=2/2=100%
I1=>I2 and I5 conf=2/6=33%
I2=>I1 and I5 conf=2/7=29%
I5=>I1 and I2 conf=2/2=100%
If min_conf is 70%, then only the second, third and last rules above
are output.
Prepared By-Mr.Nilesh Magar

Advantages & Disadvantages:
Adv:
1) Uses Large item-set
Property
2) Easily parallelized
3) Easy to implement
Dis-Adv:
1) Assumes
transaction
database is
memory resident
Requires up to ‘m’
database scan.
Prepared By-Mr.Nilesh Magar

Mining Frequent Itemsets without
candidate generation
The candidate generate and test method
It may need to generate a huge number of
candidate sets
It may need to repeatedly scan the database
and check a large set of candidates by pattern
matching
Frequent-pattern growth method(FP-
growth) –frequent pattern tree(FP-tree)
Prepared By-Mr.Nilesh Magar

Example:
TID List of item_IDs
T100 I1, I2, I5
T200 I2, I4
T300 I2, I3
T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
Prepared By-Mr.Nilesh Magar

Step-1:
ItemCount
I1 6
I2 7
I3 6
I4 2
I5 2
Step-2:Arrange Transaction in descending order
TID List of item
(Before)
List of item
(After)
T100 I1, I2, I5 I2,I1,I5
T200 I2, I4 I2,I4
T300 I2, I3 I2,I3
T400 I1, I2, I4 I2,I1,I4
T500 I1, I3 I1,I3
T600 I2, I3 I2,I3
T700 I1, I3 I1,I3
T800 I1, I2, I3, I5I2,I1,I3,I5
T900 I1, I2, I3 I2,I1,I3
Prepared By-Mr.Nilesh Magar

FP-TREE
Prepared By-Mr.Nilesh Magar

ItemConditional
Pattern Base
Conditional
FP-tree
Frequent Pattern
Generated
I5{{I2, I1:1}, {I2,
I1, I3:1}}
(I2:2, I1:2){I2, I5:2}, {I1,
I5:2}, {I2, I1, I5:2}
I4{{I2, I1:2},
{I2:1}}
(I2:2) {I2, I4:2}
I3{{I2, I1:2},
{I2:2}, {I1:2}}
(I2:4, I1:2),
(I1:2),
{I2, I3:4}, {I1,
I3:4}, {I2, I1, I3:2}
I1{{I2:4}} (I2:4) {I2, I1:4}
Prepared By-Mr.Nilesh Magar

Mining frequent itemsets using vertical
data format
Transforming the horizontal data format of the
transaction database D into a vertical data
format:
ItemsetTID_set
I1 {T100, T400, T500, T700, T800, T900}
I2 {T100, T200, T300, T400, T600, T800, T900}
I3 {T300, T500, T600, T700, T800, T900}
I4 {T200, T400}
I5 {T100, T800}
Prepared By-Mr.Nilesh Magar

Example For Practice
Prepared By-Mr.Nilesh Magar

Minimum support threshold is 3
Prepared By-Mr.Nilesh Magar

Prepared By-Mr.Nilesh Magar

T List of item
(After)
T1 f,c,a,m,p
T2 f,c,a,b,m
T3 f,b
T4 c,b,p
T5 f,c,a,p,m
Prepared By-Mr.Nilesh Magar

{}
f:4 c:1
b:1
p:1
b:1c:3
a:3
b:1m:2
p:2m:1
Header Table
Item frequency head
f 4
c 4
a 3
b 3
m 3
p 3
FP-Growth Example
Prepared By-Mr.Nilesh Magar

FP-Growth Example
EmptyEmptyf
{(f:3)}|c{(f:3)}c
{(f:3, c:3)}|a{(fc:3)}a
Empty{(fca:1), (f:1), (c:1)}b
{(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m
{(c:3)}|p{(fcam:2), (cb:1)}p
Conditional FP-treeConditional pattern-base
Item
Prepared By-Mr.Nilesh Magar

FP-Tree Algorithm:
Input: DB, min_support
Output: FP-Tree
1.Scan DB & count all frequent items.
2.Create null root & set as current node.
3.For each Transaction T
Sort T’s items.
For each sorted Item I
Insert I into tree as a child of current node.
Connect new tree node to header list.
Prepared By-Mr.Nilesh Magar

FP-Growth Algorithm:
Prepared By-Mr.Nilesh Magar

Adv. & disAdv. Of FP-Growth:
Adv:
1)Only 2 Passes Over Data-set
2)No Candidate Generation
3)Much Faster Than Apriori
DisAdv:
•FP-Tree may not fit in memory.
•FP-Tree is expensive to build
Prepared By-Mr.Nilesh Magar

Subjects
1)U.M.L.
2)P.P.L.
3)D.M.D.W.
4)O.S.
5)Programming Languages
6)RDBMS
Mr. Nilesh Magar
Lecturer at MIT, Kothrud, Pune.
9975155310.
Prepared By -Mr. Nilesh Magar

Frequent itemset mining methods

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Frequent itemset mining methods

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......