Mining Frequent Patterns And Association Rules

Ms. Rashmi Bhat
Mining Frequent
Patterns And
Association
Rules

Does This Look
Familiar???

What is a Frequent Pattern?
•A frequent pattern is a pattern that appears in a data set frequently.
•What are these frequent patterns?
Frequent Itemsets

What is a Frequent Pattern?
•A frequent pattern is a pattern that appears in a data set frequently.
•What are these frequent patterns?
•Frequent Itemsets
•Frequent Sequential Pattern
•Frequent Structured Patterns etc
•Searching for recurring relationships in a given data set.
•Discovering interesting associations and correlations
between itemsetsin transactional databases.
What is Frequent Pattern Mining?

Market Basket Analysis

Market Basket Analysis
•Analyzes customer buying habits by finding associations between the different
items that customers place in their “shopping baskets”.
•How does this help retailers?
•Helps to develop marketing strategies by gaining insight into which items are frequently
purchased together by customers.

Market Basket Analysis
•Buying patterns which reflectitems frequently purchased or associated together
can be represented in rules form, known as association rules.
•e.g.
{&#3627408500;&#3627408528;&#3627408515;??????&#3627408525;&#3627408518;}⇒&#3627408506;&#3627408516;&#3627408531;&#3627408518;&#3627408518;&#3627408527;&#3627408494;&#3627408534;&#3627408514;&#3627408531;&#3627408517;,&#3627408489;&#3627408514;&#3627408516;&#3627408524;&#3627408516;&#3627408528;&#3627408535;&#3627408518;&#3627408531;[&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;=5%,&#3627408464;&#3627408476;&#3627408475;&#3627408467;??????&#3627408465;&#3627408466;&#3627408475;&#3627408464;&#3627408466;=65%]
•Interestingness measures: &#3627408506;&#3627408534;&#3627408529;&#3627408529;&#3627408528;&#3627408531;&#3627408533;and &#3627408490;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;
•Reflect the usefulness and certainty of discovered rules.
•Association rules are considered interesting if they satisfy both a minimum support
thresholdand a minimum confidence threshold.
•Thresholds can be set by users or domain experts.

Association Rule
Let
&#3627408496;=??????&#3627408518;&#3627408527;,??????&#3627408518;&#3627408527;&#3627408516;??????&#3627408525;,&#3627408492;&#3627408531;&#3627408514;&#3627408532;&#3627408518;&#3627408531;,&#3627408501;&#3627408528;&#3627408533;&#3627408518;&#3627408515;&#3627408528;&#3627408528;&#3627408524;,&#3627408505;&#3627408534;&#3627408525;&#3627408518;&#3627408531;,&#3627408500;&#3627408514;&#3627408531;&#3627408524;&#3627408518;&#3627408531;,&#3627408506;&#3627408516;??????&#3627408532;&#3627408532;&#3627408528;&#3627408531;&#3627408532;,&#3627408494;&#3627408525;&#3627408534;&#3627408518;……Set of items in shop
&#3627408491;is task relevant dataset
&#3627408507;=??????&#3627408518;&#3627408527;&#3627408516;??????&#3627408525;,??????&#3627408518;&#3627408527;,&#3627408501;&#3627408528;&#3627408533;&#3627408518;&#3627408515;&#3627408528;&#3627408528;&#3627408524;,&#3627408492;&#3627408531;&#3627408514;&#3627408532;&#3627408518;&#3627408531;,&#3627408507;⊂&#3627408496; …Transaction
&#3627408488;=??????&#3627408518;&#3627408527;,??????&#3627408518;&#3627408527;&#3627408516;??????&#3627408525;,&#3627408501;&#3627408528;&#3627408533;&#3627408518;&#3627408515;&#3627408528;&#3627408528;&#3627408524;,&#3627408506;&#3627408516;??????&#3627408532;&#3627408532;&#3627408528;&#3627408531;&#3627408532;,&#3627408489;=&#3627408492;&#3627408531;&#3627408514;&#3627408532;&#3627408518;&#3627408531;,&#3627408494;&#3627408525;&#3627408534;&#3627408518;… Set of items
&#3627408488;⇒&#3627408489;
An Association Rule … ??????ℎ&#3627408466;&#3627408479;&#3627408466;&#3627408436;⊂??????,&#3627408437;⊂??????&#3627408462;&#3627408475;&#3627408465;&#3627408436;∩&#3627408437;=∅
An Association Rule &#3627408488;⇒&#3627408489;holds in transaction set with support &#3627408532;and confidence &#3627408516;

Association Rule
Support s, where s is the percentage of transactions in Dthat contain &#3627408488;∪&#3627408489;
Confidencec, where cis the the percentage of transactions in D
containing &#3627408488;that also contain &#3627408437;.
&#3627408506;&#3627408534;&#3627408529;&#3627408529;&#3627408528;&#3627408531;&#3627408533;&#3627408488;⇒&#3627408489;=??????(&#3627408488;∪&#3627408489;)
&#3627408490;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;&#3627408488;⇒&#3627408489;=??????(&#3627408489;|&#3627408488;)
Rules that satisfy both a minimum support threshold (min_sup) and a minimum
confidence threshold(min_conf) are called strong rules.

Some Important Terminologies
•Itemsetis a set of items.
•An itemset that contains kitems is a k-itemset.
•The occurrence frequency of an itemset is the number of transactions that contain the
itemset. This is also known as the frequency, support count, or countof the itemset.
•The occurrence frequency is called the absolute support.
•If an itemset ??????satisfies a prespecified minimum support threshold, then ??????is a frequent
itemset
•The set of frequent &#3627408472;-itemsetsis commonly denoted by &#3627408447;
&#3627408472;
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;&#3627408488;⇒&#3627408489;=
&#3627408532;&#3627408534;&#3627408529;&#3627408529;&#3627408528;&#3627408531;&#3627408533;(&#3627408488;∪&#3627408489;)
&#3627408506;&#3627408534;&#3627408529;&#3627408529;&#3627408528;&#3627408531;&#3627408533;(&#3627408488;)
=
&#3627408532;&#3627408534;&#3627408529;&#3627408529;&#3627408528;&#3627408531;&#3627408533;_&#3627408516;&#3627408528;&#3627408534;&#3627408527;&#3627408533;(&#3627408488;∪&#3627408489;)
&#3627408532;&#3627408534;&#3627408529;&#3627408529;&#3627408528;&#3627408531;&#3627408533;_&#3627408516;&#3627408528;&#3627408534;&#3627408527;&#3627408533;(&#3627408488;)

Frequent Itemset
IDTransactions
1 A, B, C, D
2 B, D, E
3 A, D, E
4 A, B, E
5 C, D, E
ItemFrequency
A 3
B 3
C 2
D 4
E 4
ItemFrequencySupport
A 3 3/5→0.6
B 3 3/5→0.6
C 2 2/5→0.4
D 4 4/5→0.8
E 4 4/5→0.8

Association Rule Mining
•Association Rule Mining
•The overall performance of mining association rules is determined by the first step.
1. Find all frequent Itemsets
2. Generate strong association
rulesfrom the frequent itemsets

Itemsets
•Closed Itemset
•An itemset &#3627408459;is closedin a data set &#3627408454;if there exists no proper super-itemset &#3627408460;such that &#3627408460;
has the same support count as &#3627408459;in &#3627408454;.
•If &#3627408459;is both closed and frequent in &#3627408454;, then &#3627408459;is a closed frequent itemset in &#3627408454;.
•Maximal Itemset
•An itemset &#3627408459;is a maximal frequentitemset (or max-itemset) in set &#3627408454;, if &#3627408459;is frequent,
and there exists no super-itemset &#3627408460;such that &#3627408459;⊂&#3627408460;and &#3627408460;is frequent in &#3627408454;.

Closed and Maximal Itemsets
•Frequent itemset &#3627408459;∈&#3627408441;is maximal if it does not have any frequent
supersets
•That is, for all &#3627408459;⊂&#3627408460;,&#3627408460;∉&#3627408441;
•Frequent itemset &#3627408459;∈&#3627408441;is closed if it has no immediate superset with
the same frequency
•That is, for all &#3627408459;⊂&#3627408460;,&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;&#3627408460;,&#3627408439;<&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;(&#3627408459;,&#3627408439;)

TIDItemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup= 3
i.e. min_sup= 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3
3 3
4 4
2 2 3 3
2
1-Item set
2-Item set
3-Item set
4-Item set
A B C D
A,B A,C A,D B,C

TIDItemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup= 3
i.e. min_sup= 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3
3 3
4 4
2 2 3 3
2
Frequent Itemset Infrequent Itemset
1-Item set
2-Item set
3-Item set
4-frequent Item set

TIDItemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup= 3
i.e. min_sup= 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3
3 3
4 4
2 2 3 3
2
Frequent Itemset Infrequent Itemset
D 1-Item set
2-Item set
3-Item set
4-Item set
Closed Frequent Itemset
B,D C,D
(but not maximal)

TIDItemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup= 3
i.e. min_sup= 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3
3 3
4 4
2 2 3 3
2
Frequent Itemset Infrequent Itemset
D 1-Item set
2-Item set
3-Item set
4-Item set
Closed Frequent Itemset
B,D C,D
Maximal Frequent Itemset
A,C,D B,C,D
(but not maximal)

Frequent Pattern Mining
•Frequent pattern mining can be classified in various ways as follows:
Based on the completeness of patterns to be mined
Based on the levels of abstractioninvolved in the rule set
Based on the number of data dimensions involved in the rule
Based on the types of values handled in the rule
Based on the kinds of rules to be mined
Based on the kinds of patterns to be mined

Frequent Pattern Mining
•Based on the completenessof the patterns to be mined
•The complete set of frequent itemsets,
•The closed frequent itemsets, and the maximal frequent itemsets
•The constrainedfrequent itemsets
•The approximatefrequent itemsets
•The near-matchfrequent itemsets
•The top-kfrequent itemsets

Frequent Pattern Mining
•Based on the levels of abstraction involved in the rule set
•We can find rules at differing levels of abstraction
•multilevel association rules
•Based on the number of data dimensions involved in the rule
•Single-dimensional association rule
•Multidimensional association rule
•Based on the types of values handled in the rule
•Boolean association rule
•Quantitative association rule

Frequent Pattern Mining
•Based on the kinds of rules to be mined
•Association rules
•Correlation rules
•Based on the kinds of patternsto be mined
•Frequent itemset mining
•Sequential pattern mining
•Structured pattern mining

Efficient and Scalable Frequent Itemset
Mining Methods
•Methods for mining the simplest form of frequent patterns
•Single-dimensional,
•Single-level,
•Boolean frequent itemsets
•AprioriAlgorithm
•basic algorithm for finding frequent itemsets
•How to generate strong association rules from frequent itemsets?
•Variations to the Apriorialgorithm

AprioriAlgorithm
•Finds Frequent ItemsetsUsing Candidate Generation
•The algorithm uses prior knowledge of frequent itemset properties
•Employs an iterative approach known as a level-wise search
•k-itemsetsare used to explore (k+1)-itemsets
•Aprioriproperty, is used to reduce the search space.
•If a set cannot pass a test, all of its supersets will fail the same test as well.
Aprioriproperty: All nonempty subsets of a frequent itemset must also be frequent.

AprioriAlgorithm
•Apriori algorithm follows a two step process
•Join Step:
•To find &#3627408447;
&#3627408472;, a set of candidate &#3627408472;-itemsetsis generated by joning&#3627408447;
&#3627408472;−1with itself.
•This set of candidates is denoted &#3627408438;
&#3627408472;
•Prune Step:
•&#3627408438;
&#3627408472;is a superset of &#3627408447;
&#3627408472;, that is, its members may or may not be frequent, but all of the
frequent &#3627408472;-itemsetsare included in &#3627408438;
&#3627408472;.
•A scan of the database to determine the count of each candidate in &#3627408438;
&#3627408472;would result in the
determination of &#3627408447;
&#3627408472;.
•To reduce the size of &#3627408438;
&#3627408472;, the Aprioriproperty is used
•Any (&#3627408472;−1)-itemset that is not frequent cannot be a subset of a frequent &#3627408472;-itemset.
Prune
Step
Join
Step

AprioriAlgorithm
Transactional data for a retail shop
•Find the frequent itemsetsusing Apriorialgorithm
•There are eight transactions in this database, that is,
&#3627408439;=8.
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 1:
•Generate candidate itemset &#3627408438;
1(1-itemset)
•Suppose that the minimum support count required is 3, i.e,
&#3627408526;??????&#3627408527;_&#3627408532;&#3627408534;&#3627408529;=&#3627409361;(relative support is 3/8=37.5%)
Item
{ Milk }
{ Eggs }
{ Bread }
{ Butter }
{ Cheese }
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }
Support
Count
6
4
7
4
3
&#3627408490;
&#3627409359;

AprioriAlgorithm
Iteration 1:
•Generate candidate itemset &#3627408438;
1
•Suppose that the minimum support
count required is 3, i.e, &#3627408526;??????&#3627408527;_&#3627408532;&#3627408534;&#3627408529;=&#3627409361;
(relative support is 3/8=37.5%)
•The set of frequent 1-itemsets,&#3627408499;
&#3627409359;,
can then be determined.
•Prune Step: Remove all the itemsets
not satisfying minimum support.
•&#3627408447;
1consists Candidate 1-itemsets
satisfying minimum support.
Frequent 1-Itemset &#3627408499;
&#3627409359;
Item Support
{ Milk } 6
{ Eggs } 4
{ Bread } 7
{ Butter }4
{ Cheese }3
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 2:
•Join Step: Join &#3627408447;
1×&#3627408447;
1to generate candidate itemset &#3627408438;
2
&#3627408490;
&#3627409360;
Item
{ Milk, Eggs }
{ Milk, Bread }
{ Milk, Butter }
{ Milk, Cheese }
{ Eggs, Bread}
{ Eggs, Butter }
{Eggs, Cheese }
{Bread, Butter}
{Bread, Cheese}
{Butter, Cheese}
Support Count
2
5
3
2
3
2
1
4
3
1
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 2:
•Join Step: Join &#3627408447;
1×&#3627408447;
1to generate
candidate itemset &#3627408438;
2
•&#3627408526;??????&#3627408527;_??????????????????=3
•The set of frequent 2-itemsets, &#3627408447;
2,
can then be determined.
•Prune Step: Remove all the itemsets
not satisfying minimum support.
•&#3627408447;
2consists Candidate 2-itemsets
satisfying minimum support.
&#3627408490;
&#3627409360;
Item Support
Count
{ Milk, Eggs } 2
{ Milk, Bread } 5
{ Milk, Butter }3
{ Milk, Cheese }2
{ Eggs, Bread} 3
{ Eggs, Butter }2
{Eggs, Cheese } 1
{Bread, Butter} 4
{Bread, Cheese} 3
{Butter, Cheese} 1
Item Support
Count
{ Milk, Eggs } 2
{ Milk, Bread } 5
{ Milk, Butter }3
{ Milk, Cheese }2
{ Eggs, Bread} 3
{ Eggs, Butter }2
{Eggs, Cheese } 1
{Bread, Butter} 4
{Bread, Cheese} 3
{Butter, Cheese} 1
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 2:
•Join Step: Join &#3627408447;
1×&#3627408447;
1to generate
candidate itemset &#3627408438;
2
•&#3627408526;??????&#3627408527;_??????????????????=3
•The set of frequent 2-itemsets, &#3627408447;
2,
can then be determined.
•Prune Step: Remove all the itemsets
not satisfying minimum support.
•&#3627408447;
2consists Candidate 2-itemsets
satisfying minimum support.
Item Support
Count
{ Milk, Bread }5
{ Milk, Butter }3
{ Eggs, Bread}3
{Bread, Butter}4
{Bread, Cheese}3
Frequent 2-Itemset &#3627408499;
&#3627409360;
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 3:
•Join Step: Join &#3627408447;
2×&#3627408447;
2to
generate candidate itemset
&#3627408438;
3.
&#3627408490;
&#3627409361;
Item
{ Milk, Bread, Butter }
{ Milk, Eggs, Bread }
{ Milk, Bread, Cheese }
{ Eggs, Bread, Butter }
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 3:
•Join Step: Join &#3627408447;
2×&#3627408447;
2to
generate candidate itemset &#3627408490;
&#3627409361;
•&#3627408526;??????&#3627408527;_??????????????????=3
•The set of frequent 3-itemsets,
&#3627408447;
3, can then be determined.
•Prune Step: Remove all the
itemsetsnot satisfying
minimum support.
•&#3627408447;
3consists Candidate 3-
itemsets satisfying minimum
support.
&#3627408490;
&#3627409361;
Item
{ Milk, Bread, Butter }
{ Milk, Eggs, Bread }
{ Milk, Bread, Cheese }
{ Eggs, Bread, Butter }
Item Support
Count
{ Milk, Bread, Butter }3
{ Milk, Eggs, Bread }1
{ Milk, Bread, Cheese }2
{ Eggs, Bread, Butter }2
Item Support
Count
{ Milk, Bread, Butter }3
{ Milk, Eggs, Bread }1
{ Milk, Bread, Cheese }2
{ Eggs, Bread, Butter }2
frequent 3-itemsets, &#3627408447;
3
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

AprioriAlgorithm
Iteration 3:
•Join Step: Join &#3627408447;
2×&#3627408447;
2to
generate candidate itemset &#3627408490;
&#3627409361;
•&#3627408526;??????&#3627408527;_??????????????????=3
•The set of frequent 3-itemsets,
&#3627408447;
3, can then be determined.
•Prune Step: Remove all the
itemsetsnot satisfying
minimum support.
•&#3627408447;
3consists Candidate 3-
itemsets satisfying minimum
support.
Item Support
Count
{ Milk, Bread, Butter }3
frequent 3-itemsets, &#3627408447;
3
T_id List of Item
1{ Milk, Eggs, Bread, Butter }
2{Milk, Bread }
3{ Eggs, Bread, Butter }
4{Milk, Bread, Butter }
5{ Milk, Bread, Cheese }
6{ Eggs, Bread, Cheese }
7{ Milk, Bread, Butter, Cheese}
8{ Milk, Eggs, }

Algorithm:

Generating Association Rules from Frequent
Itemsets
•To generate strong association rules from frequent itemsets, calculate confidence of
a rule using
&#3627408464;&#3627408476;&#3627408475;&#3627408467;??????&#3627408465;&#3627408466;&#3627408475;&#3627408464;&#3627408466;&#3627408436;⟹&#3627408437;=
&#3627408454;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;_&#3627408464;&#3627408476;&#3627408482;&#3627408475;&#3627408481;(&#3627408436;∪&#3627408437;)
&#3627408454;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;_&#3627408464;&#3627408476;&#3627408482;&#3627408475;&#3627408481;(&#3627408436;)
•Based on this, association rules can be generated as follows:
•For each frequent itemset &#3627408473;, generate all nonempty subsets of &#3627408473;.
•For every nonempty subset &#3627408480;of &#3627408473;, output the rule &#3627408480;⟹(&#3627408473;−&#3627408480;)if
&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;−??????&#3627408476;&#3627408482;&#3627408475;&#3627408481;(&#3627408473;)
&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;_??????&#3627408476;&#3627408482;&#3627408475;&#3627408481;(&#3627408480;)
≥&#3627408474;??????&#3627408475;_&#3627408464;&#3627408476;&#3627408475;&#3627408467;,
where &#3627408474;??????&#3627408475;_&#3627408464;&#3627408476;&#3627408475;&#3627408467;is the minimum confidence threshold

Generating Association Rules from Frequent
Itemsets
•E.g. Example contains frequent itemset &#3627408473;={&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;}. What are the
association rules can be generated from &#3627408473;.
•Subsets of l = {milk}, {Bread}, {butter}, {Milk,Bread}, {Milk,Butter}, {Bread, Butter}
•Resulting association rules are
&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;}
&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;}
&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;}
&#3627408448;??????&#3627408473;&#3627408472;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;}
&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;}
&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;}
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409363;=&#3627409364;&#3627409358;%
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409361;=&#3627409359;&#3627409358;&#3627409358;%
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409362;=&#3627409365;&#3627409363;%
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409364;=&#3627409363;&#3627409358;%
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409365;=&#3627409362;&#3627409360;.&#3627409366;&#3627409363;%
&#3627408516;&#3627408528;&#3627408527;&#3627408519;??????&#3627408517;&#3627408518;&#3627408527;&#3627408516;&#3627408518;=Τ&#3627409361;&#3627409362;=&#3627409365;&#3627409363;%
List of Item
{ Milk, Eggs, Bread, Butter }
{Milk, Bread }
{ Eggs, Bread, Butter }
{Milk, Bread, Butter }
{ Milk, Bread, Cheese }
{ Eggs, Bread, Cheese }
{ Milk, Bread, Butter, Cheese}
{ Milk, Eggs, }

Generating Association Rules from Frequent
Itemsets
Sr. No. Rule ConfidenceIs Strong Rule?
1 &#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 60.00
2 &#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;} 100.00
3 &#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;} 75.00
4 &#3627408448;??????&#3627408473;&#3627408472;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 50.00
5 &#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 42.85
6 &#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;} 75.00
Minimum Confidence is
60%
Rule Confidence
&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 60.00
&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;} 100.00
&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;} 75.00
&#3627408448;??????&#3627408473;&#3627408472;⟹{&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 50.00
&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;} 42.85
&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;⟹{&#3627408448;??????&#3627408473;&#3627408472;,&#3627408437;&#3627408479;&#3627408466;&#3627408462;&#3627408465;} 75.00

Minimum Support = 30% and Minimum Confidence = 65%

Methods to Improve Efficiency of Apriori
Hash Based Technique
Transaction reduction
Partitioning
Dynamic Itemset Counting
Sampling

•Hash Based Technique
•Can be used to reduce the size of the candidate &#3627408472;-itemsets, &#3627408438;
&#3627408472;, for &#3627408472;>1.
•Such a hash-based technique may substantially reduce the number of the candidate
&#3627408472;−itemsetsexamined (especially when &#3627408472;=2).
•In 2
nd
iteration, i.e. generation of 2-itemset, every combination of two items, map them on
different buckets of a hash table structure and increment the bucket count.
•If count of bucket is less than min_supcount, then remove them from candidate sets.
Methods to Improve Efficiency of Apriori

•Hash Based Technique
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
&#3627408422;??????&#3627408423;_&#3627408428;&#3627408430;&#3627408425;??????&#3627408424;&#3627408430;&#3627408423;&#3627408429;=&#3627409361;
ItemsetSupport Count
A 6
B 7
C 6
D 2
E 2
&#3627408490;
&#3627409359;
Order of items
A = 1, B = 2, C = 3, D = 4 and E = 5
ItemsetCount Hash Function
A,B 4 1×10+2&#3627408474;&#3627408476;&#3627408465;7=&#3627409363;
A,C 4 1×10+3&#3627408474;&#3627408476;&#3627408465;7=&#3627409364;
A,D 1 1×10+4&#3627408474;&#3627408476;&#3627408465;7=&#3627409358;
A,E 2 1×10+5&#3627408474;&#3627408476;&#3627408465;7=&#3627409359;
B,C 4 2×10+3&#3627408474;&#3627408476;&#3627408465;7=&#3627409360;
B,D 2 2×10+4&#3627408474;&#3627408476;&#3627408465;7=&#3627409361;
B,E 2 2×10+5&#3627408474;&#3627408476;&#3627408465;7=&#3627409362;
C,D 0 −
C,E 1 3×10+5&#3627408474;&#3627408476;&#3627408465;7=&#3627409358;
D,E 0 −
&#3627408495;&#3627408537;,&#3627408538;=(&#3627408528;&#3627408531;&#3627408517;&#3627408518;&#3627408531;&#3627408528;&#3627408519;&#3627408537;×&#3627409359;&#3627409358;+(&#3627408528;&#3627408531;&#3627408517;&#3627408518;&#3627408531;&#3627408528;&#3627408519;&#3627408538;))&#3627408526;&#3627408528;&#3627408517;&#3627409365;
Hash Table
Methods to Improve Efficiency of Apriori

•Hash Based Technique
Bucket
Address
Bucket
Count
Bucket
Content
&#3627408499;
&#3627409360;
0 2 {A,D} {C,E} No
1 2 {A,E} {A,E} No
2 4
{B,C} {B,C} {B,C}
{B,C}
Yes
3 2 {B, D} {B,D} No
4 2 {B,E} {B,E} No
5 4
{A,B} {A,B} {A,B}
{A,B}
Yes
6 4
{A,C} {A,C} {A,C}
{A,C}
yes
Hash Table Structure to Generate &#3627408499;
&#3627409360;
&#3627408422;??????&#3627408423;_&#3627408428;&#3627408430;&#3627408425;??????&#3627408424;&#3627408430;&#3627408423;&#3627408429;=&#3627409361;
TIDList of Items
T1 {A, B, E}
T2 {B, D}
T3 {B, C}
T4 {A, B, D}
T5 {A, C}
T6 {B, C}
T7 {A, C}
T8 {A, B, C, E}
T9 {A, B, C}
Methods to Improve Efficiency of Apriori
ItemsetHash
Value
{A,B} &#3627409363;
{A,C} &#3627409364;
{A,D} &#3627409358;
{A,E} &#3627409359;
{B,C} &#3627409360;
{B,D} &#3627409361;
{B,E} &#3627409362;
{C,E}&#3627409358;

•Transaction Reduction
•A transaction that does not contain any frequent &#3627408472;-itemsetscannot contain any frequent
(&#3627408472;+1)-itemsets.
•Such transaction can be marked or removed from further consideration
TID List of
Items
T1 A, B, E
T2 B, C, D
T3 C, D
T4A, B, C, D
TIDABCDE
T1110013
T2011103
T3001102
T4111104
23331
&#3627408422;??????&#3627408423;_&#3627408428;&#3627408430;&#3627408425;??????&#3627408424;&#3627408430;&#3627408423;&#3627408429;=&#3627409360;
TIDABCDE
T111001
T201110
T300110
T411110
TIDABCD
T11100
T20111
T30011
T41111
Methods to Improve Efficiency of Apriori

•Transaction Reduction
TIDList of Items
T1 A, B, E
T2 B, C, D
T3 C, D
T4 A, B, C, D
TIDA,BA,CA,DB,CB,DC,D
T11000001
T20001113
T30000011
T41111116
211223&#3627408422;??????&#3627408423;_&#3627408428;&#3627408430;&#3627408425;??????&#3627408424;&#3627408430;&#3627408423;&#3627408429;=&#3627409360;
TIDA,BA,CA,DB,CB,DC,D
T11000001
T20001113
T30000011
T41111116
211223
TIDA,BB,CB,DC,D
T20111
T41111
TIDA,BB,CB,DC,D
T20 1 1 1 3
T41 1 1 1 4
1 2 2 2
TIDB,C,D
T2 1
T4 1
TIDB,CB,DC,D
T21 1 1
T41 1 1
Methods to Improve Efficiency of Apriori

•Partitioning
•Requires just two database scans to mine frequent itemsets
Transactions
in D
Frequent
itemsetsin
D
Find global
frequent
itemsets
among
candidates
(1 Scan)
Combine all
frequent
itemsetsto
form
candidate
itemset
Find frequent
Itemsetslocal
to each
partitions
(1 Scan)
Divide Dinto
npartitions
Transactions
in D
Phase I
Phase II
Methods to Improve Efficiency of Apriori

•Partitioning
Transactions
in D
TIDABCDE
T110001
T201010
T300011
T401100
T500001
T601110
Database is divided into threepartitions
Each having two transactions with
support of 20%
First Scan
Support = 20%
Min_Sup= 1
A=1, B=1, D=1, E=1
{A,E} = 1, {B,D} = 1
B=1, C=1, D=1, E=1
{D,E} = 1, {B,C} = 1
B=1, C=1, D=1, E=1
{B,C}=1, {B,D}=1, {C,D} = 1
{B,C,D} = 1
Shortlisted
Frequent
Itemsets
B=3,
C=2,
D=3,
E=3
{B,D} = 2
{B,C} = 2
Second Scan
Support = 20%
Min_Sup= 2
A=1, B=3, C=2, D=3, E=3
{A,E} = 1
{B,D} = 2
{D,E} = 1
{B,C} = 2
{C,D} = 1
{B,C,D} = 1
Methods to Improve Efficiency of Apriori

•Dynamic Itemset Counting
•Database is partitioned into blocks marked by start points.
•New candidate can be added at any start point.
•This technique uses the count-so-far as the lower bound of the actual count
•If the count-so-far passes the min_sup, the itemset is added into frequent itemset collection
and can be used to generate longer candidates
•Leads to fewer database scans
Transactions
in D
Methods to Improve Efficiency of Apriori

CB
•Dynamic Itemset Counting
Transactions
in D
TIDList of Items
T1 A, B,
T2 B, C
T3 A
T4 -
TIDABC
T1110
T2011
Minimum Support = 25%
Number of blocks (M) = 2
{ }
A
A,B A,C B,C
A,B,C
Confirmed Frequent Itemset
Suspected Frequent Itemset
Confirmed Infrequent Itemset
Suspected Infrequent Itemset
{ }
A B C
A,B A,C B,C
A,B,C
A,C
{ }
A B C
A,B B,C
A,B,C
A=0, B=0, C=0 A=1, B=2, C=1
AB=1, BC = 1
A=2, B=2, C=1
AB=1, BC = 1
Itemset Lattice
Before scanning
Itemset Lattice after
scanning 1
st
block
Itemset Lattice after
scanning 1
st
and 2
nd
block
Methods to Improve Efficiency of Apriori
TIDList of Items
T1 A, B,
T2 B, C
T3 A
T4 -
T3100
T4000

•Sampling
•Pick up a random sample S of a given dataset D,
•Search for frequent itemsetsin S instead of D
•We trade off some degree of accuracy against efficiency.
•We may lose a global frequent itemset, so we use a lower support threshold than minimum
support to find frequent itemsetslocal to S denoted as &#3627408447;
??????
.
•The rest of the database is used to compute the actual frequencies of each itemset in &#3627408447;
??????
.
•If &#3627408447;
??????
contains all frequent itemsetsin D, then only one scan of D is required.
Transactions
in D
Methods to Improve Efficiency of Apriori

•Reducing the size of candidate sets may lead to good performance, it can suffer
from two nontrivial costs:
•It may still need to generate a huge number of candidate sets.
•It may need to repeatedly scan the whole databases and check a large set of candidate by
pattern matching.
•A method is required that will mine the complete set of frequent itemsets
without a costly candidate generation process
•This method is called as Frequent Pattern Growthor FP-Growth
FP-Growth

•Adopts a divide-and-conquer strategy as:
•First it compresses the database representing frequent items into a frequent pattern tree or
FP-treewhich retains itemset association information
•Then it divides the compressed database into a set of conditional databases, each associated
with one frequent item or pattern fragment
•And then mines each database separately.
FP-Growth

FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
Scan the database
Derive set of frequent 1-
itemset and their support
count (min_sup=2)
ItemsetSupport
A 6
B 7
C 6
D 2
E 2
Sort in order of
descending support
count
&#3627408447;={&#3627408437;:7,&#3627408436;:6,&#3627408438;:6,&#3627408439;:2,{&#3627408440;:2}}

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
&#3627408447;={&#3627408437;:7,&#3627408436;:6,&#3627408438;:6,&#3627408439;:2,{&#3627408440;:2}}

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 1
A: 1
E: 1
&#3627408447;={&#3627408437;:7,&#3627408436;:6,&#3627408438;:6,&#3627408439;:2,{&#3627408440;:2}}

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 2
A: 1
E: 1
D: 1
B: 2
D: 1
&#3627408447;={&#3627408437;:7,&#3627408436;:6,&#3627408438;:6,&#3627408439;:2,{&#3627408440;:2}}

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 3
A: 1
E: 1
D: 1
C: 1

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 4
A: 2
E: 1
D: 1
C: 1
D: 1

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 4
A: 2
E: 1
D: 1
C: 1
D: 1
A: 1
C: 1

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 5
A: 2
E: 1
D: 1
C: 2
D: 1
A: 1
C: 1

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 5
A: 2
E: 1
D: 1
C: 2
D: 1
A: 2
C: 2

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 6
A: 3
E: 1
D: 1
C: 2
D: 1
C: 1
E: 1
A: 2
C: 2

FP-Growth
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
ItemsetSupport
B 7
A 6
C 6
D 2
E 2
1.Create the root of the
tree, labeled with “null”
2.Scan the database D again.
The itemsin each transaction
are processed in L order
null { }
B: 7
A: 4
E: 1
D: 1
C: 2
D: 1
C: 2
E: 1
A:2
C: 2

FP-Growth
ItemsetSupportNode
Link
B 7
A 6
C 6
D 2
E 2
To facilitate tree traversal, an item header table is built so that each item points to occurrences in the tree via a
chain of node-links.
null { }
B: 7
A: 4
E: 1
D: 1
C: 2
D: 1
C: 2
E: 1
A:2
C: 2

Home Work
•Draw FP-tree for given database
TID List of Items
T1 {A,B}
T2 {B,C}
T3 {B,C,D}
T4 {A,C,D,E}
T5 {A,D,E}
T6 {A,B,C}
T7 {A,B,C,D}
T8 {A,C}
T9 {A,B,C}
T10 {A,D,E}
T11 {A,E}

Home Work

FP-Growth
•The FP-tree is mined as follows
•Start from each frequent length-1 pattern (as an initial suffix pattern), construct its
conditional pattern base (a “subdatabase,”which consists of the set of prefix paths in the FP-
tree co-occurring with the suffix pattern),
•Then construct its (conditional) FP-tree,
•perform mining recursively on such a tree.
•The pattern growth is achieved by the concatenation of the suffix pattern with
the frequent patterns generated from a conditional FP-tree.

FP-Growth
Item Conditional Pattern Base Conditional FP-Tree Frequent Patterns Generated
E &#3627408437;,&#3627408436;:1,&#3627408437;,&#3627408436;,&#3627408438;:1 &#3627408437;:2,&#3627408436;:2 &#3627408437;,&#3627408440;:2,&#3627408436;,&#3627408440;:2,{&#3627408437;,&#3627408436;,&#3627408440;:2}
D &#3627408437;:1,&#3627408437;,&#3627408436;:1 &#3627408437;:2 &#3627408437;,&#3627408439;:2
C &#3627408437;,&#3627408436;:2,&#3627408437;:2,{&#3627408436;:2} &#3627408437;:4,&#3627408436;:2,&#3627408436;:2 &#3627408437;,&#3627408438;:4,&#3627408436;,&#3627408438;:4,&#3627408437;,&#3627408436;,&#3627408438;:2
A &#3627408437;:4 &#3627408437;:4 &#3627408437;,&#3627408436;:4
B - - -
1.Start with Item having least support count
2.Generate conditional pattern base by
identifying the path to the item
3.Form conditional FP-Tree
4.Generate frequent patterns

Mining Frequent ItemsetsUsing Vertical
Data Format
TIDList of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Item TID_set
A {T1, T4, T5, T7, T8, T9}
B {T1, T2, T3, T4, T6, T8, T9}
C {T3, T5, T6, T7, T8, T9}
D {T2, T4}
E {T1, T8}
Horizontal Data Format
{&#3627408455;??????&#3627408439;:??????&#3627408481;&#3627408466;&#3627408474;&#3627408480;&#3627408466;&#3627408481;}
Vertical Data Format
{??????&#3627408481;&#3627408466;&#3627408474;:&#3627408455;??????&#3627408439;_&#3627408480;&#3627408466;&#3627408481;}
Mining can be performed on this data set
by intersecting the TID sets of every pair of
frequent single items.
Item TID_set
A∩B {T1, T4, T8, T9}
A∩C {T5, T7, T8, T9}
A∩D {T4}
A∩E {T1, T8}
B∩C {T3, T6, T8, T9}
B∩D {T2, T4}
B∩E {T1, T8}
C∩D { }
C∩E {T8}
D∩E {}

Mining Frequent ItemsetsUsing Vertical
Data Format
Item TID_set
{A, B} {T1, T4, T8, T9}
{A, C} {T5, T7, T8, T9}
{A, D} {T4}
{A, E} {T1, T8}
{B, C} {T3, T6, T8, T9}
{B, D} {T2, T4}
{B, E} {T1, T8}
{C, E} {T8}
2-Itemset in vertical format
Item TID_set
{A, B, C, E} {T8}
3-Itemset in vertical format
4-Itemset in vertical format
There are only twofrequent 3-itemsets:
&#3627408488;,&#3627408489;,&#3627408490;:&#3627409360;and &#3627408488;,&#3627408489;,&#3627408492;:&#3627409360;
Item TID_set
{A, B} {T1, T4, T8, T9}
{A, C} {T5, T7, T8, T9}
{A, D} {T4}
{A, E} {T1, T8}
{B, C} {T3, T6, T8, T9}
{B, D} {T2, T4}
{B, E} {T1, T8}
{C, E} {T8}
Item TID_set
{A, B, C} {T8, T9}
{A, B, E} {T1, T8}
Item TID_set
{A, B, C, E} {T8}
The support count of an itemset is simply the length of
theTID_setof the itemset.

TID Itemset
1 D, B
2 C, A, B
3 D, A, B, C
4 A, C
5 D, C
6 C, A, E
7 D, C, A
8 D
9 A, B, D
10 B, C, E
11 B, A
Find the frequent itemsetsusing FP-Growth algorithm with minimum support= 50%

Mining Multilevel Association Rules
•Strong associations discovered at high levels of abstraction may represent
commonsense knowledge.
•So, data mining systems should provide capabilities for mining association rules at
multiple levels of abstraction, with sufficient flexibility for easy traversal among
different abstraction spaces.

Mining Multilevel Association Rules

Mining Multilevel Association Rules
Concept Hierarchy for Computer Items at Shop
Level 0
Level 1
Level 2
Level 3
Level 4

Mining Multilevel Association Rules
•Itisdifficulttofindinterestingpurchasepatternsatsuchraworprimitive-level
data.
•Itiseasiertofindstrongassociationsbetweengeneralizedabstractionsofthese
itemsatprimitivelevels.
•Associationrulesgeneratedfromminingdataatmultiplelevelsofabstractionare
calledmultiple-levelormultilevelassociationrules.
•Multilevelassociationrulescanbeminedefficientlyusingconcepthierarchies
underasupport-confidenceframework.
•Atop-downstrategyisemployed.

Mining Multilevel Association Rules
•Usinguniformminimumsupportforalllevels:
•Thesameminimumsupportthresholdisusedwhenminingateachlevelofabstraction.
•Thesearchprocedureissimplified.
•Ifmin_supissettoohigh,itcouldmisssomemeaningfulassociationsatlowabstractionlevels
•Ifmin_supissettoolow,itmaygeneratemanyuninterestingassociationsathighabstraction
levels

Mining Multilevel Association Rules
•Usingreducedminimumsupportatlowerlevels:
•Eachlevelofabstractionhasitsownminimumsupportthreshold.
•Thedeeperthelevelofabstraction,thesmallerthecorrespondingthresholdis

Mining Multilevel Association Rules
•Usingitemorgroup-basedminimumsupport:
•Itissometimesmoredesirabletosetupuser-specific,item,orgroupbasedminimalsupport
thresholdswhenminingmultilevelrules
•e.g.ausercouldsetuptheminimumsupportthresholdsbasedonproductprice,oronitemsof
interest,suchasbysettingparticularlylowsupportthresholdsforlaptopcomputersandflash
drivesinordertopayparticularattentiontotheassociationpatternscontainingitemsinthese
categories.
•Aserioussideeffectofminingmultilevelassociationrulesisitsgenerationofmany
redundantrulesacrossmultiplelevelsofabstractionduetothe“ancestor”
relationshipsamongitems.

Mining Multilevel Association Rules
•Aserioussideeffectofminingmultilevelassociationrulesisitsgenerationofmany
redundantrulesacrossmultiplelevelsofabstractionduetothe“ancestor”
relationshipsamongitems.
&#3627408463;&#3627408482;??????&#3627408480;(&#3627408459;,"Laptopcomputer")⇒&#3627408463;&#3627408482;??????&#3627408480;(&#3627408459;,"HPPrinter")
[&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;=8%,&#3627408464;&#3627408476;&#3627408475;&#3627408467;??????&#3627408465;&#3627408466;&#3627408475;&#3627408464;&#3627408466;=70%]
&#3627408463;&#3627408482;??????&#3627408480;(&#3627408459;,"IBMLaptopcomputer")⇒&#3627408463;&#3627408482;??????&#3627408480;(&#3627408459;,"HPPrinter")
[&#3627408480;&#3627408482;&#3627408477;&#3627408477;&#3627408476;&#3627408479;&#3627408481;=2%,&#3627408464;&#3627408476;&#3627408475;&#3627408467;??????&#3627408465;&#3627408466;&#3627408475;&#3627408464;&#3627408466;=72%]
•Doesthelaterrulereallyprovideanynovelinformation??
•Arule&#3627408453;1isanancestorofarule&#3627408453;2,if&#3627408453;1canbeobtainedbyreplacingtheitemsin
&#3627408453;2bytheirancestorsinaconcepthierarchy.

Mining Multidimensional Association Rules
•SingleDimensionalRule
&#3627408463;&#3627408482;??????&#3627408480;(&#3627408459;,"Milk")⇒buys(X,"&#3627408437;&#3627408482;&#3627408481;&#3627408481;&#3627408466;&#3627408479;")
•Insteadofconsideringtransactiondataonly,salesandrelatedinformationarealso
linkedwithrelationaldataindatawarehouse.
•Suchdatastoresaremultidimensionalinnature
•Additionalinformationofcustomerswhopurchasedtheitemsmayalsobestored.
•Wecanmineassociationrulescontainingmultiplepredicates/dimesions
&#3627408514;&#3627408520;&#3627408518;(&#3627408459;,"20−29")∧&#3627408528;&#3627408516;&#3627408516;&#3627408534;&#3627408529;&#3627408514;&#3627408533;??????&#3627408528;&#3627408527;(&#3627408459;,"Housewife")⇒&#3627408515;&#3627408534;&#3627408538;&#3627408532;(&#3627408459;,"Milk")
•Associationruleswhichinvolvetwoormoredimensionsorpredicatesarereferred
asmultidimensionalassociationrules.

Mining Multidimensional Association Rules
&#3627408514;&#3627408520;&#3627408518;(&#3627408459;,"20−29")∧&#3627408528;&#3627408516;&#3627408516;&#3627408534;&#3627408529;&#3627408514;&#3627408533;??????&#3627408528;&#3627408527;(&#3627408459;,"Housewife")⇒&#3627408515;&#3627408534;&#3627408538;&#3627408532;(&#3627408459;,"Milk")
•Norepeatedpredicates.
•Associationruleswhichinvolvewithnorepeatedpredicatesarereferredas
interdimensionalassociationrules.
•AssociationruleswithrepeatedpredicatesarecalledasHybrid-dimensional
associationrules
&#3627408514;&#3627408520;&#3627408518;(&#3627408459;,"20−29")∧&#3627408515;&#3627408534;&#3627408538;&#3627408532;(&#3627408459;,"Milk")⇒&#3627408515;&#3627408534;&#3627408538;&#3627408532;(&#3627408459;,"Bread")

Mining Multidimensional Association Rules
•Dataattributescanbenominalorquantitative.
•Miningmultidimensionalassociationrules(withquantitativeattributes)canbe
categorizedintotwoapproaches
1.MiningmultidimensionalassociationrulesusingStaticdiscretizationof
quantitativeattributes
•Quantitativeattributesarediscretizationusingpredefinedconcepthierarchies
•Discretizationisdonebeforemining.
2.MiningmultidimensionalassociationrulesusingDynamicdiscretizationof
quantitativeattributes
•Quantitativeattributesarediscretizationorclusteredintobinsbasedonthedatadistribution
•Binsmaybecombinedduringtheminingprocess,that’swhydynamicprocess.

Mining Frequent Patterns And Association Rules

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Mining Frequent Patterns And Association Rules

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77