Cluster analysis using spss

2,691 views 49 slides Mar 14, 2021

Slide 1 of 49

About This Presentation

Hierarchical & K-means cluster analysis in SPSS

Size: 5.67 MB

Language: en

Added: Mar 14, 2021

Slides: 49 pages

Slide Content

MULTIVARIATE
ANALYSIS
-Dr Nisha Arora

About Me Concepts
How it Works?
Q/A Session
Agenda

•Dr. Nisha Arora is a proficient educator, passionate trainer,
You Tuber, occasional writer, and a learner forever.
✓ PhD in Mathematics.
✓ Works in the area of Data Science, Statistical
Research, Data Visualization & Storytelling
✓ Creator of various courses
✓ Contributor to various research communities and
Q/A forums
✓ Mentor for women in Tech Global
3
About Me
An educator by heart & a
trainer by profession.

http://stats.stackexchange.com/users/79100/learner
https://stackoverflow.com/users/5114585/dr-nisha-arora
https://www.quora.com/profile/Nisha-Arora-9
https://www.researchgate.net/profile/Nisha_Arora2/contributions
http://learnerworld.tumblr.com/
https://www.slideshare.net/NishaArora1
https://scholar.google.com/citations?user=JgCRWh4AAAAJ&hl=en&authuser=
1
https://www.youtube.com/channel/UCniyhvrD_8AM2jXki3eEErw
https://groups.google.com/g/dataanalysistraining/search?q=nisha%20arora
https://www.linkedin.com/in/drnishaarora/detail/recent-activity/posts/
✓Research Queries
✓Coding Queries
✓Blog Posts
✓Slide Decks
✓My Talks
✓Publications
✓Lectures
✓Layman’s Term
Explanation
✓Mentoring
✓Articles & Much More
My Contribution to the Community

❖ Statistics
❖ DataAnalysis
❖ Machine Learning
❖ Analytics&Data Science
❖ DataVisualization&Storytelling
❖ Mathematics&OperationsResearch
❖ OnlineTeaching
❖ Excel/SPSS/R/Python/Shiny
❖ Tableau/PowerBI
My Expertise

Connect With Me
HTTPS://WWW.LINKEDIN.COM/IN/DRNISHAARORA/
[email protected] .

Cluster Analysis
USING SPSS

Applications
✓Clusters of covid active cases
✓Assign projects to different teams of students where each
team member have similar interest
✓Customer segmentation
✓Market Basket Analysis

Clustering Evaluations
✓Within group variation should be less
✓Between group variation should be more

Clustering Evaluations

Clustering Algorithms
Clustering
Techniques
Hierarchical
Divisive
Agglomerative
Partitional
Centroid
Model Based
Graph
Theoretic
Spectral
Bayesian
Decision Based
Non-
parametric

Clustering Algorithms

Available Options
Analyze -> Classify ->
✓Hierarchical cluster
✓K-means cluster
✓TwoStepcluster
✓Cluster Silhouttes

Hierarchical clustering

Hierarchical clustering_ Outputs

Hierarchical clustering_ Outputs
Proximity Matrix
It gives the distances or similarities
between items.
✓Double Click
✓Pivot

Hierarchical clustering_ Outputs
Agglomerationschedule
Itdisplaysthecasesorclusters
combinedateachstage,the
distancesbetweenthecasesor
clustersbeingcombined,andthe
lastclusterlevelatwhichacase
(orvariable)joinedthecluster.

Hierarchical clustering_ Outputs
Icicle
✓It displays an
icicle plot,
including all
clusters or a
specified
range of
clusters.
✓It displays
information
about how
cases are
combined into
clusters at
each iteration
of the
analysis.

Hierarchical clustering_ Outputs
Icicle
✓Double
Click
✓Options
✓Y axis
reference
line
✓Position –
10
✓Apply

Hierarchical clustering_ Outputs

Hierarchical clustering_ Outputs
Dendrogramscanbeusedtoassessthecohesivenessoftheclustersformedandcanprovideinformationabout
theappropriatenumberofclusterstokeep.
Possible Clusters –2/3/6/…
Cluster Sizes ?

Hierarchical clustering
Let’s change the number of
possible solutions

Hierarchical clustering _ Output
We get additional output
as cluster membership

Hierarchical clustering
Let’s change the icicles for
specified range of clusters

Hierarchical clustering _ Output
Let’s change the icicles for
specified range of clusters

Hierarchical clustering_ Outputs
✓ClusterMembership
✓We can save cluster
membershipsforasingle
solutionorarangeof
solutions.
✓Savedvariablescanthenbe
usedinsubsequentanalyses
toexploreotherdifferences
betweengroups.

Understanding the clusters
Cross Tab between rank and cluster
membership
We need to give suitable names to the
clusters.

Understanding the clusters
We need to give suitable names to
the clusters.
We can do it in variable view
Let’s give the names:
Cluster 1: Seniors
Cluster 3: Adjuncts
Cluster 2: Others

Understanding the clusters
We need to give suitable
names to the clusters.
We can do it in variable view
Let’s give the names:
Cluster 1: Seniors
Cluster 3: Adjuncts
Cluster 2: Others

Understanding theclustering

Understanding the clustering
Althoughcellcountistoolow&chi-
squarestatisticsisnotreliable,still
weseethere’snoassociationbetween
sex&clustermembershipprima
facie.

ValidatingHierarchical Clustering
Doubleclick‘AgglomerativeSchedule’
table→Select‘Coefficients’→Rightclick
→CreateGraph→Line
Lookattheplot(likescreeplotinfactor
analysis)→Elbowshouldbeformed
Findstagenumberwhereelbowisformed
Numberofclusters=Totalcases–stage
numberwhereelbowisformed

K-means clustering
1. Need to predefine
the number of cluster
2. Solution depends on
initial cluster center
3. Not all patterns can
be segmented
4. Bases on Euclidean
distance
1. Fast (Linear time
complexity)
2. Easy to understand
3. Most popular

K-means clustering
Number of Cluster:
Ideally between 2 to 5
[Subjective]
Number of iteration:
10/20 should be enough

K-means clustering
We can save cluster
membership.

K-means clustering
In ‘Statistics’sub-
dialogbox:
Initial cluster center:
Randomlychosen

K-means clustering _Output

K-means clustering _Output
We get almost similar cluster membership
Actually, we should first standardize scores
Also, k-means works on Euclidean distance

To validate K-means clustering
Analyze →Compare Means →Take all variables
used for clustering in ‘Dependent List’
And cluster membership in ‘Factor’ →
Run ‘Bonferroni or Tukey post hoc test →
See if all p-values are less than level of
significant (0.05)

How to standardize variables
Analyze →Select variables →Check
‘Save standardized values as variables’
→Click ‘OK’

How to convert string variables to
categorical
Transform →Automatic Recode →
Double-click variable State in the left
column to move it to the Variable →
‘New Name box’: Enter a name for the
new, recoded variable in the New Name
field →click ‘Add New Name’
Check the box for Treat blank string
values as user-missing.
Click OK to finish

How to add ID column to data
Transform →Compute Variable →
Give a name to ‘Target variable,
say, ‘ID’→Type ‘$CASENUM’ in
Numeric Expression box (Or double
click on $Casenumfunction from
Functions & Special Variables
menu) →click ‘OK’

Cluster analysis using spss

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Cluster analysis using spss

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Earthquakes_Type of Faults_Science G8.pptx

Quiz #1 Science 10 in the first quarter for jhs

Astronomy history from long ago till doday

Great history of astronomy from long ago till today

EARTHQUAKE-DRILL.powerpoint.............

History of astronomy from old times to the present times