FP Growth Algorithm and its Applications

MaleehaSheikh2 98 views 9 slides May 29, 2024
Slide 1
Slide 1 of 9
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9

About This Presentation

FP Growth Algorithm Analysis


Slide Content

1
TITLE
Whatshouldbein
Objective,MethodandSignificant
Thetitleshouldconveytotheobjective/purpose,methodandsignificant
ofthepaper.Thenicerthebetter.
Example:
ASupport-OrderedTrieforFastFrequentItemsetDiscovery
method significant objective

2
TITLE
Thetitleshouldconveytotheobjective/purpose,method,and
significantofthepaper.Thenicerthebetter.
Example:
MiningFrequentPatternsWithoutCandidategeneration:A
Frequent-PatternTreeApproach
JiaWei Han et al. “Mining Frequent Patterns Without Candidate generation: A Frequent-Pattern
Tree Approach”, Data Mining and Knowledge Discovery, 8, 53-87, 2004
Method :Frequent-Pattern Tree
Objective :Mining Frequent Patterns Without Candidate generation
Significant :fast/ efficient ??

3
An abstract should briefly:
• Re-establish the topic of the research.
• Give the research problem and/or main objective of
the research (this usually comes first).
• Indicate the methodology used.
• Present the main findings.
• Present the main conclusions
ABSTRACT

4
ABSTRACT
The Body of the Abstract
The abstract is a very brief overview of your
ENTIREstudy. It tells the reader
-WHAT you did,
-WHYyou did it,
-HOWyou did it,
-WHATyou found, and
-WHATit means.

5
ABSTRACT
Briefly state the purpose of the research
(introduction),how the problem was studied/solved
(methods), the principal findings (results), and what
the findings mean (discussion and conclusion).
It is important to be descriptivebut concise--say only
what is essential, using no more words than necessary
to convey meaning.

6
Common Problems
Toolong.Ifyourabstractistoolong,itmaybebored–abstractis
usuallyaspecifiedmaximumnumberofwords.
Toomuchdetail.Abstractsthataretoolongoftenhaveunnecessary
details.Theabstractisnottheplacefordetailedexplanationsof
methodologyorfordetailsaboutthecontextofyourresearchproblem.
Tooshort.Shorterisnotnecessarilybetter.Ifyourwordlimitis200but
youonlywrite95words,youprobablyhavenotwritteninsufficient
detail.Manywritersdonotgivesufficientinformationabouttheirfindings.
Failuretoincludeimportantinformation.Youneedtobecarefulto
coverthepointslistedabove.Oftenpeopledonotcoverallofthem
becausetheyspendtoolongexplaining,forexample,themethodology
andthendonothaveenoughspacetopresenttheirconclusion.

7
ABSTRACT
Mining frequent patterns in transaction databases, time-series databases, and
many other kinds of databases has been studied popularly in data mining research.
Most of the previous studies adopt an Apriori-like candidate set generation-and-test
approach. However,candidate set generation is still costly, especially when there
exist a large number of patterns and/or long patterns.
In this study, we propose a novel frequent-pattern tree (FP-tree) structure, which is
an extended prefix-tree structure for storing compressed, crucial information about
frequent patterns, and develop an efficient FP-tree based mining method, FP-growth,
for mining the complete set of frequent patterns by pattern fragment growth.
Efficiency of mining is achieved with three techniques:(1) a large database is
compressed into a condensed, smaller data structure, FP-tree which avoids costly,
repeated database scans, (2) our FP-tree-based mining adopts a pattern-fragment
growth method to avoid the costly generation of a large number of candidate sets,
and (3) a partitioning-based, divide-and-conquer method is used to decompose the
mining task into a set of smaller tasks for mining confined patterns in conditional
databases, which dramatically reduces the search space.
Our performance study shows that the FP-growth method is efficient and scalablefor
mining both long and short frequent patterns, and is about an order of magnitude
faster than the Apriorialgorithm and also faster than some recently reported new
frequent-pattern mining methods.

8
ABSTRACT
Miningfrequentpatternsintransactiondatabases,time-seriesdatabases,andmany
otherkindsofdatabaseshasbeenstudiedpopularlyindataminingresearch.Mostof
thepreviousstudiesadoptanApriori-likecandidatesetgeneration-and-test
approach.[WHAT].
However, candidate set generation is still costly, especially when there
exist a large number of patterns and/or long patterns[WHY].
Inthisstudy,weproposeanovelfrequent-patterntree(FP-tree)structure,and
developanefficientFP-treebasedminingmethod,FP-growth,forminingthe
completesetoffrequentpatternsbypatternfragmentgrowth.Efficiencyofminingis
achievedwiththreetechniques:…[HOW].
OurperformancestudyshowsthattheFP-growthmethodisefficientandscalablefor
miningbothlongandshortfrequentpatterns,andisaboutanorderofmagnitude
fasterthantheApriorialgorithmandalsofasterthansomerecentlyreportednew
frequent-patternminingmethods.[WHATYOUFOUND]

9
ABSTRACT
Theimportanceofdataminingisapparentwiththeadventof
powerfuldatacollectionandstoragetools;rawdataissoabundantthat
manualanalysisisnolongerpossible.[WHAT]
Unfortunately,dataminingproblemsaredifficulttosolveandthis
promptedtheintroductionofseveralnoveldatastructurestoimprove
miningefficiency.[WHY]
Here,wewill,criticallyexamineexistingpreprocessingdatastructures
usedinassociationruleminingforenhancingperformanceinanattempt
tounderstandtheirstrengthandweaknesses.Ouranalysisculminatein
apracticalstructurecalledtheSOTrieT(Support-OrderedTrie
Itemset)andtwosynergisticalgorithmstoaccompanyitforthefast
discoveryoffrequentitemsets.[HOW]
Experimentsinvolvingawiderangeofsyntheticdatasetsrevealthatits
algorithmsoutperformFP-growth,arecentassociationrulemining
algorithmwithexcellentperformance,byuptotwoordersofmagnitude
and,thus,verifyingitsefficiencyandviability.[WHATYOUFOUND]