Mining Frequent Patterns And Association Rules

rashmi_mestri 1,595 views 84 slides Jul 31, 2022
Slide 1
Slide 1 of 84
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84

About This Presentation

Association Mining
Frequent Pattern Mining
Example of Apriori Algorithm


Slide Content

Ms. Rashmi Bhat
Mining Frequent
Patterns And
Association
Rules

Does This Look
Familiar???

What is a Frequent Pattern?
•A frequent pattern is a pattern that appears in a data set frequently.
•What are these frequent patterns?
Frequent Itemsets

What is a Frequent Pattern?
•A frequent pattern is a pattern that appears in a data set frequently.
•What are these frequent patterns?
•Frequent Itemsets
•Frequent Sequential Pattern
•Frequent Structured Patterns etc
•Searching for recurring relationships in a given data set.
•Discovering interesting associations and correlations
between itemsetsin transactional databases.
What is Frequent Pattern Mining?

Market Basket Analysis

Market Basket Analysis
•Analyzes customer buying habits by finding associations between the different
items that customers place in their “shopping baskets”.
•How does this help retailers?
•Helps to develop marketing strategies by gaining insight into which items are frequently
purchased together by customers.

Market Basket Analysis
•Buying patterns which reflectitems frequently purchased or associated together
can be represented in rules form, known as association rules.
•e.g.
{���??????��}⇒�����������,���������[�������=5%,����??????�����=65%]
•Interestingness measures: �������and ����??????�����
•Reflect the usefulness and certainty of discovered rules.
•Association rules are considered interesting if they satisfy both a minimum support
thresholdand a minimum confidence threshold.
•Thresholds can be set by users or domain experts.

Association Rule
Let
�=??????��,??????���??????�,������,��������,�����,������,��??????�����,����……Set of items in shop
�is task relevant dataset
�=??????���??????�,??????��,��������,������,�⊂� …Transaction
�=??????��,??????���??????�,��������,��??????�����,�=������,����… Set of items
�⇒�
An Association Rule … ??????ℎ����⊂??????,�⊂??????����∩�=∅
An Association Rule �⇒�holds in transaction set with support �and confidence �

Association Rule
Support s, where s is the percentage of transactions in Dthat contain �∪�
Confidencec, where cis the the percentage of transactions in D
containing �that also contain �.
��������⇒�=??????(�∪�)
����??????������⇒�=??????(�|�)
Rules that satisfy both a minimum support threshold (min_sup) and a minimum
confidence threshold(min_conf) are called strong rules.

Some Important Terminologies
•Itemsetis a set of items.
•An itemset that contains kitems is a k-itemset.
•The occurrence frequency of an itemset is the number of transactions that contain the
itemset. This is also known as the frequency, support count, or countof the itemset.
•The occurrence frequency is called the absolute support.
•If an itemset ??????satisfies a prespecified minimum support threshold, then ??????is a frequent
itemset
•The set of frequent �-itemsetsis commonly denoted by �
�
����??????������⇒�=
�������(�∪�)
�������(�)
=
�������_�����(�∪�)
�������_�����(�)

Frequent Itemset
IDTransactions
1 A, B, C, D
2 B, D, E
3 A, D, E
4 A, B, E
5 C, D, E
ItemFrequency
A 3
B 3
C 2
D 4
E 4
ItemFrequencySupport
A 3 3/5→0.6
B 3 3/5→0.6
C 2 2/5→0.4
D 4 4/5→0.8
E 4 4/5→0.8

Association Rule Mining
•Association Rule Mining
•The overall performance of mining association rules is determined by the first step.
1. Find all frequent Itemsets
2. Generate strong association
rulesfrom the frequent itemsets

Itemsets
•Closed Itemset
•An itemset �is closedin a data set �if there exists no proper super-itemset �such that �
has the same support count as �in �.
•If �is both closed and frequent in �, then �is a closed frequent itemset in �.
•Maximal Itemset
•An itemset �is a maximal frequentitemset (or max-itemset) in set �, if �is frequent,
and there exists no super-itemset �such that �⊂�and �is frequent in �.

Closed and Maximal Itemsets
•Frequent itemset �∈�is maximal if it does not have any frequent
supersets
•That is, for all �⊂�,�∉�
•Frequent itemset �∈�is closed if it has no immediate superset with
the same frequency
•That is, for all &#3627408459;⊂&#3627408460;,&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;&#3627408460;,&#3627408439;<&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;(&#3627408459;,&#3627408439;)

TIDItemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup= 3
i.e. min_sup= 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3
3 3
4 4
2 2 3 3
2
1-Item set
2-Item set
3-Item set
4-Item set
A B C D
A,B A,C A,D B,C

TIDItemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup= 3
i.e. min_sup= 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3
3 3
4 4
2 2 3 3
2
Frequent Itemset Infrequent Itemset
1-Item set
2-Item set
3-Item set
4-frequent Item set

TIDItemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup= 3
i.e. min_sup= 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3
3 3
4 4
2 2 3 3
2
Frequent Itemset Infrequent Itemset
D 1-Item set
2-Item set
3-Item set
4-Item set
Closed Frequent Itemset
B,D C,D
(but not maximal)

TIDItemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup= 3
i.e. min_sup= 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3
3 3
4 4
2 2 3 3
2
Frequent Itemset Infrequent Itemset
D 1-Item set
2-Item set
3-Item set
4-Item set
Closed Frequent Itemset
B,D C,D
Maximal Frequent Itemset
A,C,D B,C,D
(but not maximal)

Frequent Pattern Mining
•Frequent pattern mining can be classified in various ways as follows:
Based on the completeness of patterns to be mined
Based on the levels of abstractioninvolved in the rule set
Based on the number of data dimensions involved in the rule
Based on the types of values handled in the rule
Based on the kinds of rules to be mined
Based on the kinds of patterns to be mined

Frequent Pattern Mining
•Based on the completenessof the patterns to be mined
•The complete set of frequent itemsets,
•The closed frequent itemsets, and the maximal frequent itemsets
•The constrainedfrequent itemsets
•The approximatefrequent itemsets
•The near-matchfrequent itemsets
•The top-kfrequent itemsets

Frequent Pattern Mining
•Based on the levels of abstraction involved in the rule set
•We can find rules at differing levels of abstraction
•multilevel association rules
•Based on the number of data dimensions involved in the rule
•Single-dimensional association rule
•Multidimensional association rule
•Based on the types of values handled in the rule
•Boolean association rule
•Quantitative association rule

Frequent Pattern Mining
•Based on the kinds of rules to be mined
•Association rules
•Correlation rules
•Based on the kinds of patternsto be mined
•Frequent itemset mining
•Sequential pattern mining
•Structured pattern mining

Efficient and Scalable Frequent Itemset
Mining Methods
•Methods for mining the simplest form of frequent patterns
•Single-dimensional,
•Single-level,
•Boolean frequent itemsets
•AprioriAlgorithm
•basic algorithm for finding frequent itemsets
•How to generate strong association rules from frequent itemsets?
•Variations to the Apriorialgorithm

AprioriAlgorithm
•Finds Frequent ItemsetsUsing Candidate Generation
•The algorithm uses prior knowledge of frequent itemset properties
•Employs an iterative approach known as a level-wise search
•k-itemsetsare used to explore (k+1)-itemsets
•Aprioriproperty, is used to reduce the search space.
•If a set cannot pass a test, all of its supersets will fail the same test as well.
Aprioriproperty: All nonempty subsets of a frequent itemset must also be frequent.

AprioriAlgorithm
•Apriori algorithm follows a two step process
•Join Step:
•To find &#3627408447;
&#3627408472;, a set of candidate &#3627408472;-itemsetsis generated by joning&#3627408447;
&#3627408472;−1with itself.
•This set of candidates is denoted &#3627408438;
&#3627408472;
•Prune Step:
•&#3627408438;
&#3627408472;is a superset of &#3627408447;
&#3627408472;, that is, its members may or may not be frequent, but all of the
frequent &#3627408472;-itemsetsare included in &#3627408438;
&#3627408472;.
•A scan of the database to determine the count of each candidate in &#3627408438;
&#3627408472;would result in the
determination of &#3627408447;
&#3627408472;.
•To reduce the size of &#3627408438;
&#3627408472;, the Aprioriproperty is used
•Any (&#3627408472;−1)-itemset that is not frequent cannot be a subset of a frequent &#3627408472;-itemset.
Prune
Step
Join
Step

AprioriAlgorithm
Transactional data for a retail shop
•Find the frequent itemsetsusing Apriorialgorithm
•There are eight transactions in this database, that is,
&#3627408439;=8.
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 1:
•Generate candidate itemset &#3627408438;
1(1-itemset)
•Suppose that the minimum support count required is 3, i.e,
&#3627408526;??????&#3627408527;_&#3627408532;&#3627408534;&#3627408529;=&#3627409361;(relative support is 3/8=37.5%)
Item
{ Milk }
{ Eggs }
{ Bread }
{ Butter }
{ Cheese }
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }
Support
Count
6
4
7
4
3
&#3627408490;
&#3627409359;

AprioriAlgorithm
Iteration 1:
•Generate candidate itemset &#3627408438;
1
•Suppose that the minimum support
count required is 3, i.e, &#3627408526;??????&#3627408527;_&#3627408532;&#3627408534;&#3627408529;=&#3627409361;
(relative support is 3/8=37.5%)
•The set of frequent 1-itemsets,&#3627408499;
&#3627409359;,
can then be determined.
•Prune Step: Remove all the itemsets
not satisfying minimum support.
•&#3627408447;
1consists Candidate 1-itemsets
satisfying minimum support.
Frequent 1-Itemset &#3627408499;
&#3627409359;
Item Support
{ Milk } 6
{ Eggs } 4
{ Bread } 7
{ Butter }4
{ Cheese }3
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 2:
•Join Step: Join &#3627408447;
1×&#3627408447;
1to generate candidate itemset &#3627408438;
2
&#3627408490;
&#3627409360;
Item
{ Milk, Eggs }
{ Milk, Bread }
{ Milk, Butter }
{ Milk, Cheese }
{ Eggs, Bread}
{ Eggs, Butter }
{Eggs, Cheese }
{Bread, Butter}
{Bread, Cheese}
{Butter, Cheese}
Support Count
2
5
3
2
3
2
1
4
3
1
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 2:
•Join Step: Join &#3627408447;
1×&#3627408447;
1to generate
candidate itemset &#3627408438;
2
•&#3627408526;??????&#3627408527;_??????????????????=3
•The set of frequent 2-itemsets, &#3627408447;
2,
can then be determined.
•Prune Step: Remove all the itemsets
not satisfying minimum support.
•&#3627408447;
2consists Candidate 2-itemsets
satisfying minimum support.
&#3627408490;
&#3627409360;
Item Support
Count
{ Milk, Eggs } 2
{ Milk, Bread } 5
{ Milk, Butter }3
{ Milk, Cheese }2
{ Eggs, Bread} 3
{ Eggs, Butter }2
{Eggs, Cheese } 1
{Bread, Butter} 4
{Bread, Cheese} 3
{Butter, Cheese} 1
Item Support
Count
{ Milk, Eggs } 2
{ Milk, Bread } 5
{ Milk, Butter }3
{ Milk, Cheese }2
{ Eggs, Bread} 3
{ Eggs, Butter }2
{Eggs, Cheese } 1
{Bread, Butter} 4
{Bread, Cheese} 3
{Butter, Cheese} 1
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 2:
•Join Step: Join &#3627408447;
1×&#3627408447;
1to generate
candidate itemset &#3627408438;
2
•&#3627408526;??????&#3627408527;_??????????????????=3
•The set of frequent 2-itemsets, &#3627408447;
2,
can then be determined.
•Prune Step: Remove all the itemsets
not satisfying minimum support.
•&#3627408447;
2consists Candidate 2-itemsets
satisfying minimum support.
Item Support
Count
{ Milk, Bread }5
{ Milk, Butter }3
{ Eggs, Bread}3
{Bread, Butter}4
{Bread, Cheese}3
Frequent 2-Itemset &#3627408499;
&#3627409360;
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 3:
•Join Step: Join &#3627408447;
2×&#3627408447;
2to
generate candidate itemset
&#3627408438;
3.
&#3627408490;
&#3627409361;
Item
{ Milk, Bread, Butter }
{ Milk, Eggs, Bread }
{ Milk, Bread, Cheese }
{ Eggs, Bread, Butter }
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 3:
•Join Step: Join &#3627408447;
2×&#3627408447;
2to
generate candidate itemset &#3627408490;
&#3627409361;
•&#3627408526;??????&#3627408527;_??????????????????=3
•The set of frequent 3-itemsets,
&#3627408447;
3, can then be determined.
•Prune Step: Remove all the
itemsetsnot satisfying
minimum support.
•&#3627408447;
3consists Candidate 3-
itemsets satisfying minimum
support.
&#3627408490;
&#3627409361;
Item
{ Milk, Bread, Butter }
{ Milk, Eggs, Bread }
{ Milk, Bread, Cheese }
{ Eggs, Bread, Butter }
Item Support
Count
{ Milk, Bread, Butter }3
{ Milk, Eggs, Bread }1
{ Milk, Bread, Cheese }2
{ Eggs, Bread, Butter }2
Item Support
Count
{ Milk, Bread, Butter }3
{ Milk, Eggs, Bread }1
{ Milk, Bread, Cheese }2
{ Eggs, Bread, Butter }2
frequent 3-itemsets, &#3627408447;
3
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 3:
•Join Step: Join &#3627408447;
2×&#3627408447;
2to
generate candidate itemset &#3627408490;
&#3627409361;
•&#3627408526;??????&#3627408527;_??????????????????=3
•The set of frequent 3-itemsets,
&#3627408447;
3, can then be determined.
•Prune Step: Remove all the
itemsetsnot satisfying
minimum support.
•&#3627408447;
3consists Candidate 3-
itemsets satisfying minimum
support.
Item Support
Count
{ Milk, Bread, Butter }3
frequent 3-itemsets, &#3627408447;
3
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

Algorithm:

Algorithm:

Generating Association Rules from Frequent
Itemsets
•To generate strong association rules from frequent itemsets, calculate confidence of
a rule using
&#3627408464;&#3627408476;&#3627408475;&#3627408467;??????&#3627408465;&#3627408466;&#3627408475;&#3627408464;&#3627408466;&#3627408436;⟹&#3627408437;=
&#3627408454;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;_&#3627408464;&#3627408476;&#3627408482;&#3627408475;&#3627408481;(&#3627408436;∪&#3627408437;)
&#3627408454;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;_&#3627408464;&#3627408476;&#3627408482;&#3627408475;&#3627408481;(&#3627408436;)
•Based on this, association rules can be generated as follows:
•For each frequent itemset &#3627408473;, generate all nonempty subsets of &#3627408473;.
•For every nonempty subset &#3627408480;of &#3627408473;, output the rule &#3627408480;⟹(&#3627408473;−&#3627408480;)if
&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;−??????&#3627408476;&#3627408482;&#3627408475;&#3627408481;(&#3627408473;)
&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;_??????&#3627408476;&#3627408482;&#3627408475;&#3627408481;(&#3627408480;)
≥&#3627408474;??????&#3627408475;_&#3627408464;&#3627408476;&#3627408475;&#3627408467;,
where &#3627408474;??????&#3627408475;_&#3627408464;&#3627408476;&#3627408475;&#3627408467;is the minimum confidence threshold

Generating Association Rules from Frequent
Itemsets
•E.g. Example contains frequent itemset &#3627408473;={&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;}. What are the
association rules can be generated from &#3627408473;.
•Subsets of l = {milk}, {Bread}, {butter}, {Milk,Bread}, {Milk,Butter}, {Bread, Butter}
•Resulting association rules are
&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;}
&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;}
&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;}
&#3627408448;??????&#3627408473;&#3627408472;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;}
&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;}
&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;}
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409363;=&#3627409364;&#3627409358;%
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409361;=&#3627409359;&#3627409358;&#3627409358;%
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409362;=&#3627409365;&#3627409363;%
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409364;=&#3627409363;&#3627409358;%
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409365;=&#3627409362;&#3627409360;.&#3627409366;&#3627409363;%
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409362;=&#3627409365;&#3627409363;%
List of Item
{ Milk, Eggs, Bread, Butter }
{Milk, Bread }
{ Eggs, Bread, Butter }
{Milk, Bread, Butter }
{ Milk, Bread, Cheese }
{ Eggs, Bread, Cheese }
{ Milk, Bread, Butter, Cheese}
{ Milk, Eggs, }

Generating Association Rules from Frequent
Itemsets
Sr. No. Rule ConfidenceIs Strong Rule?
1 &#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 60.00
2 &#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;} 100.00
3 &#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;} 75.00
4 &#3627408448;??????&#3627408473;&#3627408472;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 50.00
5 &#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 42.85
6 &#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;} 75.00
Minimum Confidence is
60%
Rule Confidence
&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 60.00
&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;} 100.00
&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;} 75.00
&#3627408448;??????&#3627408473;&#3627408472;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 50.00
&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 42.85
&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;} 75.00

Minimum Support = 30% and Minimum Confidence = 65%

Methods to Improve Efficiency of Apriori
Hash Based Technique
Transaction reduction
Partitioning
Dynamic Itemset Counting
Sampling

•Hash Based Technique
•Can be used to reduce the size of the candidate &#3627408472;-itemsets, &#3627408438;
&#3627408472;, for &#3627408472;>1.
•Such a hash-based technique may substantially reduce the number of the candidate
&#3627408472;−itemsetsexamined (especially when &#3627408472;=2).
•In 2
nd
iteration, i.e. generation of 2-itemset, every combination of two items, map them on
different buckets of a hash table structure and increment the bucket count.
•If count of bucket is less than min_supcount, then remove them from candidate sets.
Methods to Improve Efficiency of Apriori

•Hash Based Technique
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
&#3627408422;??????&#3627408423;_&#3627408428;&#3627408430;&#3627408425;??????&#3627408424;&#3627408430;&#3627408423;&#3627408429;=&#3627409361;
ItemsetSupport Count
A 6
B 7
C 6
D 2
E 2
&#3627408490;
&#3627409359;
Order of items
A = 1, B = 2, C = 3, D = 4 and E = 5
ItemsetCount Hash Function
A,B 4 1×10+2&#3627408474;&#3627408476;&#3627408465;7=&#3627409363;
A,C 4 1×10+3&#3627408474;&#3627408476;&#3627408465;7=&#3627409364;
A,D 1 1×10+4&#3627408474;&#3627408476;&#3627408465;7=&#3627409358;
A,E 2 1×10+5&#3627408474;&#3627408476;&#3627408465;7=&#3627409359;
B,C 4 2×10+3&#3627408474;&#3627408476;&#3627408465;7=&#3627409360;
B,D 2 2×10+4&#3627408474;&#3627408476;&#3627408465;7=&#3627409361;
B,E 2 2×10+5&#3627408474;&#3627408476;&#3627408465;7=&#3627409362;
C,D 0 −
C,E 1 3×10+5&#3627408474;&#3627408476;&#3627408465;7=&#3627409358;
D,E 0 −
&#3627408495;&#3627408537;,&#3627408538;=(&#3627408528;&#3627408531;&#3627408517;&#3627408518;&#3627408531;&#3627408528;&#3627408519;&#3627408537;×&#3627409359;&#3627409358;+(&#3627408528;&#3627408531;&#3627408517;&#3627408518;&#3627408531;&#3627408528;&#3627408519;&#3627408538;))&#3627408526;&#3627408528;&#3627408517;&#3627409365;
Hash Table
Methods to Improve Efficiency of Apriori

•Hash Based Technique
Bucket
Address
Bucket
Count
Bucket
Content
&#3627408499;
&#3627409360;
0 2 {A,D} {C,E} No
1 2 {A,E} {A,E} No
2 4
{B,C} {B,C} {B,C}
{B,C}
Yes
3 2 {B, D} {B,D} No
4 2 {B,E} {B,E} No
5 4
{A,B} {A,B} {A,B}
{A,B}
Yes
6 4
{A,C} {A,C} {A,C}
{A,C}
yes
Hash Table Structure to Generate &#3627408499;
&#3627409360;
&#3627408422;??????&#3627408423;_&#3627408428;&#3627408430;&#3627408425;??????&#3627408424;&#3627408430;&#3627408423;&#3627408429;=&#3627409361;
TIDList of Items
T1 {A, B, E}
T2 {B, D}
T3 {B, C}
T4 {A, B, D}
T5 {A, C}
T6 {B, C}
T7 {A, C}
T8 {A, B, C, E}
T9 {A, B, C}
Methods to Improve Efficiency of Apriori
ItemsetHash
Value
{A,B} &#3627409363;
{A,C} &#3627409364;
{A,D} &#3627409358;
{A,E} &#3627409359;
{B,C} &#3627409360;
{B,D} &#3627409361;
{B,E} &#3627409362;
{C,E}&#3627409358;

•Transaction Reduction
•A transaction that does not contain any frequent &#3627408472;-itemsetscannot contain any frequent
(&#3627408472;+1)-itemsets.
•Such transaction can be marked or removed from further consideration
TID List of
Items
T1 A, B, E
T2 B, C, D
T3 C, D
T4A, B, C, D
TIDABCDE
T1110013
T2011103
T3001102
T4111104
23331
&#3627408422;??????&#3627408423;_&#3627408428;&#3627408430;&#3627408425;??????&#3627408424;&#3627408430;&#3627408423;&#3627408429;=&#3627409360;
TIDABCDE
T111001
T201110
T300110
T411110
TIDABCD
T11100
T20111
T30011
T41111
Methods to Improve Efficiency of Apriori

•Transaction Reduction
TIDList of Items
T1 A, B, E
T2 B, C, D
T3 C, D
T4 A, B, C, D
TIDA,BA,CA,DB,CB,DC,D
T11000001
T20001113
T30000011
T41111116
211223&#3627408422;??????&#3627408423;_&#3627408428;&#3627408430;&#3627408425;??????&#3627408424;&#3627408430;&#3627408423;&#3627408429;=&#3627409360;
TIDA,BA,CA,DB,CB,DC,D
T11000001
T20001113
T30000011
T41111116
211223
TIDA,BB,CB,DC,D
T20111
T41111
TIDA,BB,CB,DC,D
T20 1 1 1 3
T41 1 1 1 4
1 2 2 2
TIDB,C,D
T2 1
T4 1
TIDB,CB,DC,D
T21 1 1
T41 1 1
Methods to Improve Efficiency of Apriori

•Partitioning
•Requires just two database scans to mine frequent itemsets
Transactions
in D
Frequent
itemsetsin
D
Find global
frequent
itemsets
among
candidates
(1 Scan)
Combine all
frequent
itemsetsto
form
candidate
itemset
Find frequent
Itemsetslocal
to each
partitions
(1 Scan)
Divide Dinto
npartitions
Transactions
in D
Phase I
Phase II
Methods to Improve Efficiency of Apriori

•Partitioning
Transactions
in D
TIDABCDE
T110001
T201010
T300011
T401100
T500001
T601110
Database is divided into threepartitions
Each having two transactions with
support of 20%
First Scan
Support = 20%
Min_Sup= 1
A=1, B=1, D=1, E=1
{A,E} = 1, {B,D} = 1
B=1, C=1, D=1, E=1
{D,E} = 1, {B,C} = 1
B=1, C=1, D=1, E=1
{B,C}=1, {B,D}=1, {C,D} = 1
{B,C,D} = 1
Shortlisted
Frequent
Itemsets
B=3,
C=2,
D=3,
E=3
{B,D} = 2
{B,C} = 2
Second Scan
Support = 20%
Min_Sup= 2
A=1, B=3, C=2, D=3, E=3
{A,E} = 1
{B,D} = 2
{D,E} = 1
{B,C} = 2
{C,D} = 1
{B,C,D} = 1
Methods to Improve Efficiency of Apriori

•Dynamic Itemset Counting
•Database is partitioned into blocks marked by start points.
•New candidate can be added at any start point.
•This technique uses the count-so-far as the lower bound of the actual count
•If the count-so-far passes the min_sup, the itemset is added into frequent itemset collection
and can be used to generate longer candidates
•Leads to fewer database scans
Transactions
in D
Methods to Improve Efficiency of Apriori

CB
•Dynamic Itemset Counting
Transactions
in D
TIDList of Items
T1 A, B,
T2 B, C
T3 A
T4 -
TIDABC
T1110
T2011
Minimum Support = 25%
Number of blocks (M) = 2
{ }
A
A,B A,C B,C
A,B,C
Confirmed Frequent Itemset
Suspected Frequent Itemset
Confirmed Infrequent Itemset
Suspected Infrequent Itemset
{ }
A B C
A,B A,C B,C
A,B,C
A,C
{ }
A B C
A,B B,C
A,B,C
A=0, B=0, C=0 A=1, B=2, C=1
AB=1, BC = 1
A=2, B=2, C=1
AB=1, BC = 1
Itemset Lattice
Before scanning
Itemset Lattice after
scanning 1
st
block
Itemset Lattice after
scanning 1
st
and 2
nd
block
Methods to Improve Efficiency of Apriori
TIDList of Items
T1 A, B,
T2 B, C
T3 A
T4 -
T3100
T4000

•Sampling
•Pick up a random sample S of a given dataset D,
•Search for frequent itemsetsin S instead of D
•We trade off some degree of accuracy against efficiency.
•We may lose a global frequent itemset, so we use a lower support threshold than minimum
support to find frequent itemsetslocal to S denoted as &#3627408447;
??????
.
•The rest of the database is used to compute the actual frequencies of each itemset in &#3627408447;
??????
.
•If &#3627408447;
??????
contains all frequent itemsetsin D, then only one scan of D is required.
Transactions
in D
Methods to Improve Efficiency of Apriori

•Reducing the size of candidate sets may lead to good performance, it can suffer
from two nontrivial costs:
•It may still need to generate a huge number of candidate sets.
•It may need to repeatedly scan the whole databases and check a large set of candidate by
pattern matching.
•A method is required that will mine the complete set of frequent itemsets
without a costly candidate generation process
•This method is called as Frequent Pattern Growthor FP-Growth
FP-Growth

•Adopts a divide-and-conquer strategy as:
•First it compresses the database representing frequent items into a frequent pattern tree or
FP-treewhich retains itemset association information
•Then it divides the compressed database into a set of conditional databases, each associated
with one frequent item or pattern fragment
•And then mines each database separately.
FP-Growth

FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
Scan the database
Derive set of frequent 1-
itemset and their support
count (min_sup=2)
ItemsetSupport
A 6
B 7
C 6
D 2
E 2
Sort in order of
descending support
count
&#3627408447;={&#3627408437;:7,&#3627408436;:6,&#3627408438;:6,&#3627408439;:2,{&#3627408440;:2}}

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
&#3627408447;={&#3627408437;:7,&#3627408436;:6,&#3627408438;:6,&#3627408439;:2,{&#3627408440;:2}}

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 1
A: 1
E: 1
&#3627408447;={&#3627408437;:7,&#3627408436;:6,&#3627408438;:6,&#3627408439;:2,{&#3627408440;:2}}

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 2
A: 1
E: 1
D: 1
B: 2
D: 1
&#3627408447;={&#3627408437;:7,&#3627408436;:6,&#3627408438;:6,&#3627408439;:2,{&#3627408440;:2}}

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 3
A: 1
E: 1
D: 1
C: 1

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 4
A: 2
E: 1
D: 1
C: 1
D: 1

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 4
A: 2
E: 1
D: 1
C: 1
D: 1
A: 1
C: 1

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 5
A: 2
E: 1
D: 1
C: 2
D: 1
A: 1
C: 1

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 5
A: 2
E: 1
D: 1
C: 2
D: 1
A: 2
C: 2

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 6
A: 3
E: 1
D: 1
C: 2
D: 1
C: 1
E: 1
A: 2
C: 2

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 7
A: 4
E: 1
D: 1
C: 2
D: 1
C: 2
E: 1
A:2
C: 2

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 7
A: 4
E: 1
D: 1
C: 2
D: 1
C: 2
E: 1
A:2
C: 2

FP-Growth
ItemsetSupportNode
Link
B 7
A 6
C 6
D 2
E 2
To facilitate tree traversal, an item header table is built so that each item points to occurrences in the tree via a
chain of node-links.
null { }
B: 7
A: 4
E: 1
D: 1
C: 2
D: 1
C: 2
E: 1
A:2
C: 2

Home Work
•Draw FP-tree for given database
TID List of Items
T1 {A,B}
T2 {B,C}
T3 {B,C,D}
T4 {A,C,D,E}
T5 {A,D,E}
T6 {A,B,C}
T7 {A,B,C,D}
T8 {A,C}
T9 {A,B,C}
T10 {A,D,E}
T11 {A,E}

Home Work

FP-Growth
•The FP-tree is mined as follows
•Start from each frequent length-1 pattern (as an initial suffix pattern), construct its
conditional pattern base (a “subdatabase,”which consists of the set of prefix paths in the FP-
tree co-occurring with the suffix pattern),
•Then construct its (conditional) FP-tree,
•perform mining recursively on such a tree.
•The pattern growth is achieved by the concatenation of the suffix pattern with
the frequent patterns generated from a conditional FP-tree.

FP-Growth
Item Conditional Pattern Base Conditional FP-Tree Frequent Patterns Generated
E &#3627408437;,&#3627408436;:1,&#3627408437;,&#3627408436;,&#3627408438;:1 &#3627408437;:2,&#3627408436;:2 &#3627408437;,&#3627408440;:2,&#3627408436;,&#3627408440;:2,{&#3627408437;,&#3627408436;,&#3627408440;:2}
D &#3627408437;:1,&#3627408437;,&#3627408436;:1 &#3627408437;:2 &#3627408437;,&#3627408439;:2
C &#3627408437;,&#3627408436;:2,&#3627408437;:2,{&#3627408436;:2} &#3627408437;:4,&#3627408436;:2,&#3627408436;:2 &#3627408437;,&#3627408438;:4,&#3627408436;,&#3627408438;:4,&#3627408437;,&#3627408436;,&#3627408438;:2
A &#3627408437;:4 &#3627408437;:4 &#3627408437;,&#3627408436;:4
B - - -
1.Start with Item having least support count
2.Generate conditional pattern base by
identifying the path to the item
3.Form conditional FP-Tree
4.Generate frequent patterns

Mining Frequent ItemsetsUsing Vertical
Data Format
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Item TID_set
A {T1, T4, T5, T7, T8, T9}
B {T1, T2, T3, T4, T6, T8, T9}
C {T3, T5, T6, T7, T8, T9}
D {T2, T4}
E {T1, T8}
Horizontal Data Format
{&#3627408455;??????&#3627408439;:??????&#3627408481;&#3627408466;&#3627408474;&#3627408480;&#3627408466;&#3627408481;}
Vertical Data Format
{??????&#3627408481;&#3627408466;&#3627408474;:&#3627408455;??????&#3627408439;_&#3627408480;&#3627408466;&#3627408481;}
Mining can be performed on this data set
by intersecting the TID sets of every pair of
frequent single items.
Item TID_set
A∩B {T1, T4, T8, T9}
A∩C {T5, T7, T8, T9}
A∩D {T4}
A∩E {T1, T8}
B∩C {T3, T6, T8, T9}
B∩D {T2, T4}
B∩E {T1, T8}
C∩D { }
C∩E {T8}
D∩E {}

Mining Frequent ItemsetsUsing Vertical
Data Format
Item TID_set
{A, B} {T1, T4, T8, T9}
{A, C} {T5, T7, T8, T9}
{A, D} {T4}
{A, E} {T1, T8}
{B, C} {T3, T6, T8, T9}
{B, D} {T2, T4}
{B, E} {T1, T8}
{C, E} {T8}
2-Itemset in vertical format
Item TID_set
{A, B, C, E} {T8}
3-Itemset in vertical format
4-Itemset in vertical format
There are only twofrequent 3-itemsets:
&#3627408488;,&#3627408489;,&#3627408490;:&#3627409360;and &#3627408488;,&#3627408489;,&#3627408492;:&#3627409360;
Item TID_set
{A, B} {T1, T4, T8, T9}
{A, C} {T5, T7, T8, T9}
{A, D} {T4}
{A, E} {T1, T8}
{B, C} {T3, T6, T8, T9}
{B, D} {T2, T4}
{B, E} {T1, T8}
{C, E} {T8}
Item TID_set
{A, B, C} {T8, T9}
{A, B, E} {T1, T8}
Item TID_set
{A, B, C, E} {T8}
The support count of an itemset is simply the length of
theTID_setof the itemset.

TID Itemset
1 D, B
2 C, A, B
3 D, A, B, C
4 A, C
5 D, C
6 C, A, E
7 D, C, A
8 D
9 A, B, D
10 B, C, E
11 B, A
Find the frequent itemsetsusing FP-Growth algorithm with minimum support= 50%

Mining Multilevel Association Rules
•Strong associations discovered at high levels of abstraction may represent
commonsense knowledge.
•So, data mining systems should provide capabilities for mining association rules at
multiple levels of abstraction, with sufficient flexibility for easy traversal among
different abstraction spaces.

Mining Multilevel Association Rules

Mining Multilevel Association Rules
Concept Hierarchy for Computer Items at Shop
Level 0
Level 1
Level 2
Level 3
Level 4

Mining Multilevel Association Rules
•Itisdifficulttofindinterestingpurchasepatternsatsuchraworprimitive-level
data.
•Itiseasiertofindstrongassociationsbetweengeneralizedabstractionsofthese
itemsatprimitivelevels.
•Associationrulesgeneratedfromminingdataatmultiplelevelsofabstractionare
calledmultiple-levelormultilevelassociationrules.
•Multilevelassociationrulescanbeminedefficientlyusingconcepthierarchies
underasupport-confidenceframework.
•Atop-downstrategyisemployed.

Mining Multilevel Association Rules
•Usinguniformminimumsupportforalllevels:
•Thesameminimumsupportthresholdisusedwhenminingateachlevelofabstraction.
•Thesearchprocedureissimplified.
•Ifmin_supissettoohigh,itcouldmisssomemeaningfulassociationsatlowabstractionlevels
•Ifmin_supissettoolow,itmaygeneratemanyuninterestingassociationsathighabstraction
levels

Mining Multilevel Association Rules
•Usingreducedminimumsupportatlowerlevels:
•Eachlevelofabstractionhasitsownminimumsupportthreshold.
•Thedeeperthelevelofabstraction,thesmallerthecorrespondingthresholdis

Mining Multilevel Association Rules
•Usingitemorgroup-basedminimumsupport:
•Itissometimesmoredesirabletosetupuser-specific,item,orgroupbasedminimalsupport
thresholdswhenminingmultilevelrules
•e.g.ausercouldsetuptheminimumsupportthresholdsbasedonproductprice,oronitemsof
interest,suchasbysettingparticularlylowsupportthresholdsforlaptopcomputersandflash
drivesinordertopayparticularattentiontotheassociationpatternscontainingitemsinthese
categories.
•Aserioussideeffectofminingmultilevelassociationrulesisitsgenerationofmany
redundantrulesacrossmultiplelevelsofabstractionduetothe“ancestor”
relationshipsamongitems.

Mining Multilevel Association Rules
•Aserioussideeffectofminingmultilevelassociationrulesisitsgenerationofmany
redundantrulesacrossmultiplelevelsofabstractionduetothe“ancestor”
relationshipsamongitems.
&#3627408463;&#3627408482;??????&#3627408480;(&#3627408459;,"Laptopcomputer")⇒&#3627408463;&#3627408482;??????&#3627408480;(&#3627408459;,"HPPrinter")
[&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;=8%,&#3627408464;&#3627408476;&#3627408475;&#3627408467;??????&#3627408465;&#3627408466;&#3627408475;&#3627408464;&#3627408466;=70%]
&#3627408463;&#3627408482;??????&#3627408480;(&#3627408459;,"IBMLaptopcomputer")⇒&#3627408463;&#3627408482;??????&#3627408480;(&#3627408459;,"HPPrinter")
[&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;=2%,&#3627408464;&#3627408476;&#3627408475;&#3627408467;??????&#3627408465;&#3627408466;&#3627408475;&#3627408464;&#3627408466;=72%]
•Doesthelaterrulereallyprovideanynovelinformation??
•Arule&#3627408453;1isanancestorofarule&#3627408453;2,if&#3627408453;1canbeobtainedbyreplacingtheitemsin
&#3627408453;2bytheirancestorsinaconcepthierarchy.

Mining Multidimensional Association Rules
•SingleDimensionalRule
&#3627408463;&#3627408482;??????&#3627408480;(&#3627408459;,"Milk")⇒buys(X,"&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;")
•Insteadofconsideringtransactiondataonly,salesandrelatedinformationarealso
linkedwithrelationaldataindatawarehouse.
•Suchdatastoresaremultidimensionalinnature
•Additionalinformationofcustomerswhopurchasedtheitemsmayalsobestored.
•Wecanmineassociationrulescontainingmultiplepredicates/dimesions
&#3627408514;&#3627408520;&#3627408518;(&#3627408459;,"20−29")∧&#3627408528;&#3627408516;&#3627408516;&#3627408534;&#3627408529;&#3627408514;&#3627408533;??????&#3627408528;&#3627408527;(&#3627408459;,"Housewife")⇒&#3627408515;&#3627408534;&#3627408538;&#3627408532;(&#3627408459;,"Milk")
•Associationruleswhichinvolvetwoormoredimensionsorpredicatesarereferred
asmultidimensionalassociationrules.

Mining Multidimensional Association Rules
&#3627408514;&#3627408520;&#3627408518;(&#3627408459;,"20−29")∧&#3627408528;&#3627408516;&#3627408516;&#3627408534;&#3627408529;&#3627408514;&#3627408533;??????&#3627408528;&#3627408527;(&#3627408459;,"Housewife")⇒&#3627408515;&#3627408534;&#3627408538;&#3627408532;(&#3627408459;,"Milk")
•Norepeatedpredicates.
•Associationruleswhichinvolvewithnorepeatedpredicatesarereferredas
interdimensionalassociationrules.
•AssociationruleswithrepeatedpredicatesarecalledasHybrid-dimensional
associationrules
&#3627408514;&#3627408520;&#3627408518;(&#3627408459;,"20−29")∧&#3627408515;&#3627408534;&#3627408538;&#3627408532;(&#3627408459;,"Milk")⇒&#3627408515;&#3627408534;&#3627408538;&#3627408532;(&#3627408459;,"Bread")

Mining Multidimensional Association Rules
•Dataattributescanbenominalorquantitative.
•Miningmultidimensionalassociationrules(withquantitativeattributes)canbe
categorizedintotwoapproaches
1.MiningmultidimensionalassociationrulesusingStaticdiscretizationof
quantitativeattributes
•Quantitativeattributesarediscretizationusingpredefinedconcepthierarchies
•Discretizationisdonebeforemining.
2.MiningmultidimensionalassociationrulesusingDynamicdiscretizationof
quantitativeattributes
•Quantitativeattributesarediscretizationorclusteredintobinsbasedonthedatadistribution
•Binsmaybecombinedduringtheminingprocess,that’swhydynamicprocess.