Cluster analysis using spss

2,691 views 49 slides Mar 14, 2021
Slide 1
Slide 1 of 49
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49

About This Presentation

Hierarchical & K-means cluster analysis in SPSS


Slide Content

MULTIVARIATE
ANALYSIS
-Dr Nisha Arora

About Me Concepts
How it Works?
Q/A Session
Agenda

•Dr. Nisha Arora is a proficient educator, passionate trainer,
You Tuber, occasional writer, and a learner forever.
✓ PhD in Mathematics.
✓ Works in the area of Data Science, Statistical
Research, Data Visualization & Storytelling
✓ Creator of various courses
✓ Contributor to various research communities and
Q/A forums
✓ Mentor for women in Tech Global
3
About Me
An educator by heart & a
trainer by profession.

http://stats.stackexchange.com/users/79100/learner
https://stackoverflow.com/users/5114585/dr-nisha-arora
https://www.quora.com/profile/Nisha-Arora-9
https://www.researchgate.net/profile/Nisha_Arora2/contributions
http://learnerworld.tumblr.com/
https://www.slideshare.net/NishaArora1
https://scholar.google.com/citations?user=JgCRWh4AAAAJ&hl=en&authuser=
1
https://www.youtube.com/channel/UCniyhvrD_8AM2jXki3eEErw
https://groups.google.com/g/dataanalysistraining/search?q=nisha%20arora
https://www.linkedin.com/in/drnishaarora/detail/recent-activity/posts/
✓Research Queries
✓Coding Queries
✓Blog Posts
✓Slide Decks
✓My Talks
✓Publications
✓Lectures
✓Layman’s Term
Explanation
✓Mentoring
✓Articles & Much More
My Contribution to the Community

❖ Statistics
❖ DataAnalysis
❖ Machine Learning
❖ Analytics&Data Science
❖ DataVisualization&Storytelling
❖ Mathematics&OperationsResearch
❖ OnlineTeaching
❖ Excel/SPSS/R/Python/Shiny
❖ Tableau/PowerBI
My Expertise

Connect With Me
HTTPS://WWW.LINKEDIN.COM/IN/DRNISHAARORA/
[email protected] .

Cluster Analysis
USING SPSS

Applications
✓Clusters of covid active cases
✓Assign projects to different teams of students where each
team member have similar interest
✓Customer segmentation
✓Market Basket Analysis

Clustering Evaluations
✓Within group variation should be less
✓Between group variation should be more

Clustering Evaluations

Clustering Algorithms
Clustering
Techniques
Hierarchical
Divisive
Agglomerative
Partitional
Centroid
Model Based
Graph
Theoretic
Spectral
Bayesian
Decision Based
Non-
parametric

Clustering Algorithms

Available Options
Analyze -> Classify ->
✓Hierarchical cluster
✓K-means cluster
✓TwoStepcluster
✓Cluster Silhouttes

Hierarchical clustering

Hierarchical clustering

Hierarchical clustering

Hierarchical clustering_ Outputs

Hierarchical clustering_ Outputs
Proximity Matrix
It gives the distances or similarities
between items.
✓Double Click
✓Pivot

Hierarchical clustering_ Outputs
Agglomerationschedule
Itdisplaysthecasesorclusters
combinedateachstage,the
distancesbetweenthecasesor
clustersbeingcombined,andthe
lastclusterlevelatwhichacase
(orvariable)joinedthecluster.

Hierarchical clustering_ Outputs
Icicle
✓It displays an
icicle plot,
including all
clusters or a
specified
range of
clusters.
✓It displays
information
about how
cases are
combined into
clusters at
each iteration
of the
analysis.

Hierarchical clustering_ Outputs
Icicle
✓Double
Click
✓Options
✓Y axis
reference
line
✓Position –
10
✓Apply

Hierarchical clustering_ Outputs

Hierarchical clustering_ Outputs
Dendrogramscanbeusedtoassessthecohesivenessoftheclustersformedandcanprovideinformationabout
theappropriatenumberofclusterstokeep.
Possible Clusters –2/3/6/…
Cluster Sizes ?

Hierarchical clustering
Let’s change the number of
possible solutions

Hierarchical clustering _ Output
We get additional output
as cluster membership

Hierarchical clustering
Let’s change the icicles for
specified range of clusters

Hierarchical clustering _ Output
Let’s change the icicles for
specified range of clusters

Hierarchical clustering_ Outputs
✓ClusterMembership
✓We can save cluster
membershipsforasingle
solutionorarangeof
solutions.
✓Savedvariablescanthenbe
usedinsubsequentanalyses
toexploreotherdifferences
betweengroups.

Understanding the clusters
Cross Tab between rank and cluster
membership
We need to give suitable names to the
clusters.

Understanding the clusters
We need to give suitable names to
the clusters.
We can do it in variable view
Let’s give the names:
Cluster 1: Seniors
Cluster 3: Adjuncts
Cluster 2: Others

Understanding the clusters
We need to give suitable
names to the clusters.
We can do it in variable view
Let’s give the names:
Cluster 1: Seniors
Cluster 3: Adjuncts
Cluster 2: Others

Understanding theclustering

Understanding the clustering
Althoughcellcountistoolow&chi-
squarestatisticsisnotreliable,still
weseethere’snoassociationbetween
sex&clustermembershipprima
facie.

ValidatingHierarchical Clustering
Doubleclick‘AgglomerativeSchedule’
table→Select‘Coefficients’→Rightclick
→CreateGraph→Line
Lookattheplot(likescreeplotinfactor
analysis)→Elbowshouldbeformed
Findstagenumberwhereelbowisformed
Numberofclusters=Totalcases–stage
numberwhereelbowisformed

K-means clustering
1. Need to predefine
the number of cluster
2. Solution depends on
initial cluster center
3. Not all patterns can
be segmented
4. Bases on Euclidean
distance
1. Fast (Linear time
complexity)
2. Easy to understand
3. Most popular

K-means clustering
Number of Cluster:
Ideally between 2 to 5
[Subjective]
Number of iteration:
10/20 should be enough

K-means clustering
We can save cluster
membership.

K-means clustering
In ‘Statistics’sub-
dialogbox:
Initial cluster center:
Randomlychosen

K-means clustering _Output

K-means clustering _Output

K-means clustering _Output

K-means clustering _Output

K-means clustering _Output
We get almost similar cluster membership
Actually, we should first standardize scores
Also, k-means works on Euclidean distance

To validate K-means clustering
Analyze →Compare Means →Take all variables
used for clustering in ‘Dependent List’
And cluster membership in ‘Factor’ →
Run ‘Bonferroni or Tukey post hoc test →
See if all p-values are less than level of
significant (0.05)

How to standardize variables
Analyze →Select variables →Check
‘Save standardized values as variables’
→Click ‘OK’

How to convert string variables to
categorical
Transform →Automatic Recode →
Double-click variable State in the left
column to move it to the Variable →
‘New Name box’: Enter a name for the
new, recoded variable in the New Name
field →click ‘Add New Name’
Check the box for Treat blank string
values as user-missing.
Click OK to finish

How to add ID column to data
Transform →Compute Variable →
Give a name to ‘Target variable,
say, ‘ID’→Type ‘$CASENUM’ in
Numeric Expression box (Or double
click on $Casenumfunction from
Functions & Special Variables
menu) →click ‘OK’

Thank You