HIERARCHICAL CLUSTER ANALYSIS.pptx

agnivapradhan1 128 views 17 slides Apr 10, 2023
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

HIERARCHICAL


Slide Content

HIERARCHICAL CLUSTER ANALYSIS SUBMITTED TO: PROF. SOMEN SAHU DEPT. OF FES SUBMITTED BY – AGNIVA PRADHAN M.F.Sc 2 ND SEMESTER DEPT. OF FNT M/F/2021/03

DEFINITION A Hierarchical clustering method works via grouping data into a tree of clusters. Hierarchical clustering begins by treating every data point as a separate cluster. All the examples are of fruit of different kinds.

Properties: In contrast to non-hierarchical cluster analysis, hierarchical cluster analysis forms clusters iteratively, by successively joining or splitting groups. There are two kinds: Divisive - which starts with the entire data set in one large group and then successively splits it into smaller groups until each observation is its own group. Agglomerative - in which each observation starts in its own group, and groups are successively paired until at the end every observation is in the same large group. Divisive methods are computationally intensive and have had limited applications in the social sciences. Agglomerative methods have been implemented in many standard software packages.

Agglomerative CLUSTER ANALYSIS: Initially consider every data point as an individual Cluster and at every step, merge the nearest pairs of the cluster. (It is a bottom-up method). At first, every dataset is considered as an individual entity or cluster. At every iteration, the clusters merge with different clusters until one cluster is formed. The algorithm for Agglomerative Hierarchical Clustering is: Calculate the similarity of one cluster with all the other clusters (calculate proximity matrix) Consider every data point as an individual cluster Merge the clusters which are highly similar or close to each other. Recalculate the proximity matrix for each cluster Repeat Steps 3 and 4 until only a single cluster remains. Let’s see the graphical representation of this algorithm using a dendrogram.

Divisive CLUSTER ANALYSIS: We can say that the Divisive Hierarchical clustering is precisely the opposite of the Agglomerative Hierarchical clustering. In Divisive Hierarchical clustering, we take into account all of the data points as a single cluster and in every iteration, we separate the data points from the clusters which aren’t comparable. In the end, we are left with N clusters.

Advantages: It is to understand and implement. We don’t have to pre-specify any particular number of clusters. Can obtain any desired number of clusters by cutting the Dendrogram at the proper level. They may correspond to meaningful classification. Easy to decide the number of clusters by merely looking at the Dendrogram.

Disadvantages: Hierarchical Clustering does not work well on vast amounts of data. All the approaches to calculate the similarity between clusters have their own disadvantages. In hierarchical Clustering, once a decision is made to combine two clusters, it cannot be undone. Different measures have problems with one or more of the following. Sensitivity to noise and outliers. Faces Difficulty when handling with different sizes of clusters. It is breaking large clusters. In this technique, the order of the data has an impact on the final results.

Data SET: Here I have taken fish buying behaviour of consumers. There are 5 Question with the answer with 6 categories. Like 1 – Strongly Disagree, 2 – little Disagree, 3- disagree, 4- Little Agree, 5- Agree, 6- Strongly Agree. Here I have taken the data from 25 respondents. Q1: I love to buy Fish in market every day. Q2: I buy fish in discounted price. Q3: I like to bargain while buying. Q4: I compare the price of the fishes in various shop. Q5: I enjoy Eating Fish products.

  Q1 Q2 Q3 Q4 Q5 1 1 5 6 6 4 2 6 5 5 6 6 3 4 6 3 4 1 4 2 3 4 6 3 5 4 5 3 4 6 6 6 4 6 3 3 7 5 3 6 3 3 8 6 3 6 4 1 9 2 4 3 3 6 10 3 5 3 6 4 11 1 3 2 3 5 12 5 4 5 4 2 13 2 2 1 5 4 14 4 6 4 6 4 15 6 5 4 2 1 16 3 5 4 6 4 17 4 4 7 2 2 18 3 6 2 6 4 19 4 6 3 6 2 20 2 3 2 4 7 21 6 4 7 3 2 22 2 3 1 4 5 23 7 2 6 4 1 24 4 6 4 5 3 25 1 3 2 2 6

ANALYSIS IN SPSS

DATA VIEW:

RESULT:

Agglomeration Schedule Stage Cluster Combined Coefficients Stage Cluster First Appears Next Stage Cluster 1 Cluster 2 Cluster 1 Cluster 2 1 10 16 1.000 8 2 11 25 2.000 13 3 14 24 2.000 10 4 8 23 2.000 15 5 6 21 2.000 7 6 13 22 3.000 16 7 6 7 3.000 5 11 8 10 18 3.500 1 12 9 9 20 4.000 13 10 14 19 4.000 3 12 11 6 12 4.667 7 14 12 10 14 5.000 8 10 20 13 9 11 5.000 9 2 16 14 6 17 6.500 11 15 15 6 8 8.800 14 4 22 16 9 13 9.250 13 6 23 17 3 15 10.000 22 18 1 4 10.000 20 19 2 5 12.000 21 20 1 10 13.833 18 12 21 21 1 2 17.875 20 19 23 22 3 6 18.000 17 15 24 23 1 9 27.033 21 16 24 24 1 3 37.972 23 22

Cluster Membership Case 4 Clusters 3 Clusters 2 Clusters 1 1 1 1 2 1 1 1 3 2 2 2 4 1 1 1 5 1 1 1 6 3 2 2 7 3 2 2 8 3 2 2 9 4 3 1 10 1 1 1 11 4 3 1 12 3 2 2 13 4 3 1 14 1 1 1 15 2 2 2 16 1 1 1 17 3 2 2 18 1 1 1 19 1 1 1 20 4 3 1 21 3 2 2 22 4 3 1 23 3 2 2 24 1 1 1 25 4 3 1