MiningAssociationbestRulespresentation.ppt

l228296 19 views 18 slides May 19, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

lide 7: Stack Operations - Pop

Description: Removing the top element from the stack.
Process: Retrieve the top element, decrement the stack pointer, and remove the element.
Illustration: Visual representation of the pop operation.
Slide 8: Stack Operations - Peek

Description: Viewing the top eleme...


Slide Content

1
Mining Association Rules
Mohamed G. Elfeky

2
Introduction
Data mining is the discovery of knowledge
and useful information from the large
amounts of data stored in databases.
Association Rules:describing association
relationships among the attributes in the
set of relevant data.

3
Rules
Body==> Consequent[ Support, Confidence]
Body: represents the examined data.
Consequent: represents a discovered property
for the examined data.
Support: represents the percentage of the
records satisfying the bodyor the consequent.
Confidence: represents the percentage of the
records satisfying both the body and the
consequent to those satisfying only the body.

4
Association Rules Examples
Basket Data
Tea ^ Milk ==> Sugar [0.3 , 0.9]
Relational Data
x.diagnosis = Heart ^ x.sex = Male ==> x.age > 50 [0.4 , 0.7]
Object-Oriented Data
s.hobbies = { sport , art } ==> s.age() = Young [0.5 , 0.8]

5
Topics of Discussion
Formal Statement of the Problem
Different Algorithms
AIS
SETM
Apriori
AprioriTid
AprioriHybrid
Performance Analysis

6
Formal Statement of the Problem
I= { i
1, i
2, … , i
m}is a set of items
Dis a set of transactions T
Each transaction Tis a set of items (subset of I)
TIDis a unique identifier that is associated with
each transaction
The problem is to generate all association rules
that have supportand confidencegreater than
the user-specified minimum support and
minimum confidence

7
Problem Decomposition
The problem can be decomposed into two subproblems:
1.Find all sets of items (itemsets) that have
support (number of transactions) greater than
the minimum support (large itemsets).
2.Use the large itemsetsto generate the desired
rules.
For each large itemset l, find all non-empty subsets,
and for each subset agenerate a rule a ==> (l-a)if
its confidence is greater than the minimum
confidence.

8
General Algorithm
1.In the first pass, the support of each individual
item is counted, and the largeones are
determined
2.In each subsequent pass, the largeitemsets
determined in the previous pass is used to
generate new itemsets called candidate
itemsets.
3.The support of each candidateitemset is
counted, and the largeones are determined.
4.This process continues until no new large
itemsets are found.

9
AIS Algorithm
Candidateitemsets are generated and counted on-the-
fly as the database is scanned.
1.For each transaction, it is determined which of the
largeitemsets of the previous pass are contained in
this transaction.
2.New candidateitemsets are generated by extending
these largeitemsets with other items in this
transaction.
The disadvantage is that this results in unnecessarily
generating and counting too many candidateitemsets
that turn out to be small.

10
Example
TIDItems
1001 3 4
2002 3 5
3001 2 3 5
4002 5
Database
ItemsetSupport
{1} 2
{2} 3
{3} 3
{5} 3
L
1
ItemsetSupport
{1 3}* 2
{1 4} 1
{3 4} 1
{2 3}* 2
{2 5}* 3
{3 5}* 2
{1 2} 1
{1 5} 1
C
2
ItemsetSupport
{1 3 4} 1
{2 3 5}* 2
{1 3 5} 1
C
3

11
SETM Algorithm
Candidateitemsets are generated on-the-fly as the
database is scanned, but counted at the end of the pass.
1.New candidateitemsets are generated the same way as
in AIS algorithm, but the TID of the generating
transaction is saved with the candidateitemset in a
sequential structure.
2.At the end of the pass, the support count of candidate
itemsets is determined by aggregating this sequential
structure
It has the same disadvantage of the AIS algorithm.
Another disadvantage is that for each candidateitemset,
there are as many entries as its support value.

12
Example
TIDItems
1001 3 4
2002 3 5
3001 2 3 5
4002 5
Database
ItemsetSupport
{1} 2
{2} 3
{3} 3
{5} 3
L
1
ItemsetTID
{1 3}100
{1 4}100
{3 4}100
{2 3}200
{2 5}200
{3 5}200
{1 2}300
{1 3}300
{1 5}300
{2 3}300
{2 5}300
{3 5}300
{2 5}400
C
2
ItemsetTID
{1 3 4}100
{2 3 5}200
{1 3 5}300
{2 3 5}300
C
3

13
Apriori Algorithm
Candidateitemsets are generated using only the
largeitemsets of the previous pass without
considering the transactions in the database.
1.The largeitemset of the previous pass is joined
with itself to generate all itemsets whose size is
higher by 1.
2.Each generated itemset, that has a subset which
is not large, is deleted. The remaining itemsets
are the candidateones.

14
Example
TIDItems
1001 3 4
2002 3 5
3001 2 3 5
4002 5
Database
ItemsetSupport
{1} 2
{2} 3
{3} 3
{5} 3
L
1
ItemsetSupport
{1 2} 1
{1 3}* 2
{1 5} 1
{2 3}* 2
{2 5}* 3
{3 5}* 2
C
2
ItemsetSupport
{2 3 5}* 2
C
3
{1 2 3}
{1 3 5}
{2 3 5}

15
AprioriTid Algorithm
The database is not used at all for counting the support
of candidateitemsets after the first pass.
1.The candidateitemsets are generated the same way as
in Apriori algorithm.
2.Another set C’ is generated of which each member has
the TID of each transaction and the large itemsets
present in this transaction. This set is used to count the
support of each candidateitemset.
The advantage is that the number of entries in C’ may
be smaller than the number of transactions in the
database, especially in the later passes.

16
Example
TIDItems
1001 3 4
2002 3 5
3001 2 3 5
4002 5
Database
ItemsetSupport
{1} 2
{2} 3
{3} 3
{5} 3
L
1
ItemsetSupport
{1 2} 1
{1 3}* 2
{1 5} 1
{2 3}* 2
{2 5}* 3
{3 5}* 2
C
2
ItemsetSupport
{2 3 5}* 2
C
3
100 {1 3}
200 {2 3}, {2 5}, {3 5}
300 {1 2}, {1 3}, {1 5},
{2 3}, {2 5}, {3 5}
400 {2 5}
C’
2
200 {2 3 5}
300 {2 3 5}
C’
3

17
Performance Analysis

18
AprioriHybrid Algorithm
Performance Analysis shows that:
1.Apriori does better than AprioriTid in the
earlier passes.
2.AprioriTid does better than Apriori in the
later passes.
Hence, a hybrid algorithm can be designed
that uses Apriori in the initial passes and
switches to AprioriTid when it expects that
the set C’ will fit in memory.
Tags