Improved aproiri algorithm by FP tree.pptx

khaledrahman15 44 views 24 slides Sep 14, 2024
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

Improved aproiri algorithm by FP tree.


Slide Content

IMPROVISED APRIORI ALGORITHM USING FREQUENT PATTERN TREE Supervised By Dr. Israa Hadi Presented By Sura khalid

Association Rule Association Rule mining : Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction. Market-Basket transactions Example of Association Rules { Diaper }  {Beer}, {Milk, Bread}  { Eggs,Coke }, {Beer, Bread}  {Milk},

key concepts Itemset A collection of one or more items Example: {Milk, Bread, Diaper} k- itemset An itemset that contains k items Support count ( ) Frequency of occurrence of an itemset E.g. ({Milk, Bread,Diaper }) = 2

Rule Evaluation Metrics Association Rule An implication expression of the form X  Y, where X and Y are itemsets * Support (s) Fraction of transactions that contain both X and Y * Confidence (c) Measures how often items in Y appear in transactions that contain X

Example If we had the following rule:- {Milk, Diaper}  {Beer} Find both Support (s ) and Confidence (c )

Frequent Itemset Frequent Itemset An itemset whose support is greater than or equal to a minsup threshold. Given a set of transactions T, the goal of association rule mining is to find all rules having - support ≥ minsup threshold - confidence ≥ minconf threshold

There are many algorithms to generate Frequent Itemset We will study Apriori which has been improved by using FP tree algorithm . Aproiri algorithm : The Apriori principle is an effective way to eliminate some of the candidate itemsets . Frequent Itemset Generation

Computational Complexity Given d unique items: Total number of itemsets = 2 d Total number of possible association rules: If d= 6, R = 602 rules

Frequent Itemset Generation Strategies Reduce the number of candidates (M) The Apriori principle is an effective way to eliminate some of the candidate itemsets without counting their support values. - Complete search: M=2 d - Use pruning techniques to reduce M . -If an itemset is frequent, then all of its subsets must also be frequent

Illustrating Apriori Principle

The Apriori algorithm for finding large itemsets efficiently in big DBs 1: Find all large 1-itemsets 2: For (k = 2 ; while L k-1 is non-empty; k++) 3: { C k = apriori -gen (L k-1 ) 4: For each c in C k , initialise c.count to zero 5: For all records r in the DB 6: {C r = subset (C k , r); For each c in C r , c.count ++ } 7: Set L k := all c in C k whose count >= minsup 8: } /* end -- return all of the L k sets.

Discussion of the Apriori algorithm The running time is in the worst case O(2 d )‏ Pruning really prunes in practice It makes multiple passes over the dataset One pass for every level k Multiple passes over the dataset is inefficient when we have thousands of candidates and millions of transactions

FP-Tree A FP-Tree : is a tree data structure that represents the database in a compact way. It is constructed by mapping each frequency ordered transaction onto a path in the FP-Tree .

Overview of FP-Growth: Ideas Compress a large database into a compact, Frequent-Pattern tree ( FP-tree ) structure highly compacted, but complete for frequent pattern mining avoid costly repeated database scans Develop an efficient, FP-tree-based frequent pattern mining method ( FP-growth ) A divide-and-conquer methodology: decompose mining tasks into smaller ones Avoid candidate generation: sub-database test only

Construct FP-tree Two Steps: 1- Scan the transaction DB for the first time, find frequent items (single item patterns) and order them into a list L in frequency descending order. e.g., L ={f:4, c:4, a:3, b:3, m:3, p:3} In the format of (item-name, support) 2- For each transaction, order its frequent items according to the order in L ; Scan DB the second time, construct FP-tree by putting each frequency ordered transaction onto it

FP-tree Example: Step 1 : Scan DB for the first time to generate L Let min_support =3 TID list of items 100 { f, a, c, d, g, i , m, p } 200 { a, b, c, f, l, m, o } 300 { b, f, h, j, o } 400 { b, c, k, s, p } { a, f, c, e, l, p, m, n } Item frequency f 4 c 4 a 3 b 3 m 3 p 3 By-Product of First Scan of Database

Step 2 : scan the DB for the second time, order frequent items in each transaction TID list of items (ordered) frequent items 100 { f, a, c, d, g, i , m, p } { f, c, a, m, p } 200 { a, b, c, f, l, m, o } { f, c, a, b, m } 300 { b, f, h, j, o } { f, b } 400 { b, c, k, s, p } { c, b, p } 500 { a, f, c, e, l, p, m, n } { f, c, a, m, p }

Step 2 : construct FP-tree {} { f, c, a, m, p } {} f:1 c:1 a:1 m:1 p:1 NOTE: Each transaction corresponds to one path in the FP-tree {} { f, c , a, b, m } f:2 c:2 a:2 b:1 m:1 m:1 p: 1

Step 2 : construct FP-tree {} f:3 c:2 a:2 b:1 p:1 m:1 b:1 m:1 { f, b } {} f:4 c:1 b:1 p:1 b:1 c:3 a:3 b:1 m:2 p:2 m:1 { c, b, p } c:1 b:1 p:1 {} f:3 c:2 a:2 b:1 m:1 p:1 m:1 b:1 { f, c, a, m, p } Node-Link

Final FP-tree {} f:4 c:1 b:1 p:1 b:1 c:3 a:3 b:1 m:2 p:2 m:1 Header Table Item head f c a b m p

Advantages of the FP-tree Structure The most significant advantage of the FP-tree Scan the DB only twice . Completeness: the FP-tree contains all the information related to mining frequent patterns (given the min-support threshold). Compactness: FP-tree does not generate an exponential number of nodes. The items ordered in the support-descending order indicate that FP-tree structure is usually highly compact. The size of the tree is bounded by the occurrences of frequent items The height of the tree is bounded by the maximum number of items in a transaction

*** the time complexity should be O(n 2 ) if the number of unique items in the dataset is n. The complexity depends on searching of paths in FP tree for each element of the header table, which depends on the depth of the tree. Maximum depth of the tree is upper-bounded by n for each of the conditional trees. Thus the order is: O (No. of items in header table * maximum depth of tree)= O(n*n).

Thank you
Tags