Birch

malla23 2,598 views 12 slides Jun 10, 2013

Slide 1 of 12

About This Presentation

No description available for this slideshow.

Size: 43.76 KB

Language: en

Added: Jun 10, 2013

Slides: 12 pages

Slide Content

Birch Algorithm Presented by: Binod Malla 09BIM012

Introduction B alance I terative R educing & C lustering using H ierarchies

Definition Effective data clustering method for very large databases Each clustering decision is made without scanning all data points unsupervised data mining algorithm used to perform hierarchical clustering

Data Clustering It is portioning of a database into clusters Closely packed group Collection of data objects that are similar to one another & treated collectively in a group

Clustering Factor CF CF is compact storage for data on points in cluster Has enough information to calculate the intra-cluster distances Additive theorem allows to merge sub-clusters

BIRCH goal Minimizes running time and data scans Clustering decision made without scanning whole data Exploit the non uniformity of the data Treat dense areas as one and reduce noise

Algorithm Phase 1: Scan all data and build an initial in-memory CF tree, using the given amount of memory and recycling space on disk. Phase 2: Condense into desirable length by building a smaller CF tree. Phase 3: Global clustering. Phase 4: Cluster refining – this is optional, and requires more passes over the data to refine the results.

Birch- Phase 1 Starts with initial threshold and inserts points into the tree. If it runs out of memory before it finishes scanning the data, it increases the threshold value and rebuilds a new, smaller CF tree, by re-inserting the leaf entries from the older tree and then other values Good initial threshold is important but hard to figure out. Outlier removal (when rebuilding tree).

Birch- Phase2 Optional Preparation for phase 3 Removes outliers and grouping clusters

Birch- Phase3 Problem after phase 1 Input order affects results. Splitting triggered by node size. Phase 3 Cluster all leaf node on the CF values according to an existing algorithm Algorithm used here: agglomerative hierarchical clustering

Birch-Phase4 Optional Additional scans of the datasets attaching each items to the centroids found. Recalculating the centroid and redistributing the items Always converges

Conclusion Birch performs faster than existing algorithms ( CLARANS and KMEANS) on large databases. Scans whole data only once. Handles outliers better. Superior to other algorithms in stability and scalability.

Birch

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Birch

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......