Hierarchical clustering machine learning by arpit_sharma

ErArpitSharma 266 views 12 slides Apr 10, 2020
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

Hierarchical clustering in Machine LEarning


Slide Content

Hierarchical Clustering Machine Learning Unit-3 Arpit Kumar Sharma AIETM, Jaipur

What is Clustering ? Clustering is a technique that groups similar objects such that the objects in the same group are more similar to each other than the objects in the other groups. The group of similar objects is called a Cluster.

Hierarchical Clustering Algorithm HCA is an unsupervised clustering algorithm which involves creating clusters that have predominant ordering from top to bottom. For e.g: All files and folders on our hard disk are organized in a hierarchy. The algorithm groups similar objects into groups called clusters . The endpoint is a set of clusters or groups , where each cluster is distinct from each other cluster, and the objects within each cluster are broadly similar to each other.

HCA-> Algorithm How does it work? Make each data point a single-point cluster → forms N clusters Take the two closest data points and make them one cluster → forms N-1 clusters Take the two closest clusters and make them one cluster → Forms N-2 clusters. Repeat step-3 until you are left with only one cluster.

Example:

There are several ways to measure the distance between clusters in order to decide the rules for clustering, and they are often called Linkage Methods. Some of the common linkage methods are: Complete-linkage : the distance between two clusters is defined as the longest distance between two points in each cluster. Single-linkage : the distance between two clusters is defined as the shortest distance between two points in each cluster. This linkage may be used to detect high values in your dataset which may be outliers as they will be merged at the end. Average-linkage : the distance between two clusters is defined as the average distance between each point in one cluster to every point in the other cluster. Centroid-linkage: finds the centroid of cluster 1 and centroid of cluster 2, and then calculates the distance between the two before merging.

What is Dendrogram ? A Dendrogram is a type of tree diagram showing hierarchical relationships between different sets of data.

Types of Hierarchical Clustering Algorithm Agglomerative Clustering- Bottom Up Divisive Clustering- Top Down

How to draw Dendrogram

Thank You