Discriminant analysis using spss

3,047 views 60 slides Mar 14, 2021
Slide 1
Slide 1 of 60
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60

About This Presentation

Linear & Multiple discriminant analysis using spss


Slide Content

MULTIVARIATE
ANALYSIS
-Dr Nisha Arora

About Me Concepts
How it Works?
Q/A Session
Agenda

•Dr. Nisha Arora is a proficient educator, passionate trainer,
You Tuber, occasional writer, and a learner forever.
✓ PhD in Mathematics.
✓ Works in the area of Data Science, Statistical
Research, Data Visualization & Storytelling
✓ Creator of various courses
✓ Contributor to various research communities and
Q/A forums
✓ Mentor for women in Tech Global
3
About Me
An educator by heart & a
trainer by profession.

http://stats.stackexchange.com/users/79100/learner
https://stackoverflow.com/users/5114585/dr-nisha-arora
https://www.quora.com/profile/Nisha-Arora-9
https://www.researchgate.net/profile/Nisha_Arora2/contributions
http://learnerworld.tumblr.com/
https://www.slideshare.net/NishaArora1
https://scholar.google.com/citations?user=JgCRWh4AAAAJ&hl=en&authuser=
1
https://www.youtube.com/channel/UCniyhvrD_8AM2jXki3eEErw
https://groups.google.com/g/dataanalysistraining/search?q=nisha%20arora
https://www.linkedin.com/in/drnishaarora/detail/recent-activity/posts/
✓Research Queries
✓Coding Queries
✓Blog Posts
✓Slide Decks
✓My Talks
✓Publications
✓Lectures
✓Layman’s Term
Explanation
✓Mentoring
✓Articles & Much More
My Contribution to the Community

❖ Statistics
❖ Data Analysis
❖ Machine Learning
❖ Analytics & Data Science
❖ Data Visualization & Storytelling
❖ Mathematics & Operations Research
❖ Online Teaching
❖ Excel/SPSS/R/Python/Shiny
❖ Tableau/PowerBI
My Expertise

Connect With Me
HTTPS://WWW.LINKEDIN.COM/IN/DRNISHAARORA /
[email protected] .

Discriminant Analysis
USING SPSS

My answer to ‘classification of multiple outcomes
with categorical and continuous predictors’:
https://stats.stackexchange.com/a/513616/79100

When to use LDA?
✓Non-Ordinalresponsevariable
✓Metricpredictors
✓Workswellforlowsamplesize
✓Workswellwhencasesarewellseparable
✓Morerestrictivethanlogisticregression

Assumptions of LDA
✓BothLDAandQDAassumethepredictorvariablesXare
drawnfromamultivariateGaussiandistribution.
✓LDAassumesequalityofcovariancesamongthepredictor
variablesXacrosseachalllevelsofY
✓LDAandQDArequirethenumberofpredictorvariables(p)to
belessthenthesamplesize(n).Asimpleruleofthumbisto
useLDA&QDAondatasetswheren≥5p

Default Prediction
✓Theinformationon700pastcustomersiscontainedin
bankloan.sav
✓Thesearethecustomerswhowerepreviouslygivenloans.
✓Usearandomsampleofthese80%customerstocreatea
discriminantanalysismodel,settingtheremainingcustomers
asidetovalidatetheanalysis.
✓Thenusethemodeltoclassifytheremaining20%prospective
customersasgoodorbadcreditrisks.

Data Preparation for LDA
Toreplicatemy
results

Data Preparation for LDA
Creating anew
variablefortraining
andvalidationset

Discriminant Analysis
Analyze→Classify→
Discriminant

Discriminant Analysis
✓Groupingvariable–
Categoricalresponse
variable
✓DefineRange–Asper
numberofcategories
[seecodinginvariable
view]
✓Independents–Metric
predictors
✓How to choose
predictors–
✓Domainknowledge
✓Previousresearch
✓EDA
✓Step-wise

Discriminant Analysis
Forhold-out/validation
set
Useselectionvariable
Selectforvalue‘1’

Discriminant Analysis
Statisticssub-dialogbox
UnivariateANOVAs
Box’sMand
Fisher’sstandardizescore
willbeusedforreportingthe
results.

Discriminant Analysis
Classifysub-dialogbox
Almostalwayscheck
‘Computefromgroupsizes’

Discriminant Analysis
Savesub-dialogbox

Caseprocessingsummary
✓Nomissingvalues
✓Heremodelistrainedon566observations
&134areunselectedcases(hold-outset)

GroupStatistics
Observeifvariablesarediscriminatingthe
response
Largerstdindicatesissueswithpredictors,
specificallyincome&debttoincomeratio.
Youmaywanttotransformthesepredictors

Testofequalityofgroupmeans
✓Allpredictorsarecontributingtothemodel
excepthouseholdincome
✓Wilk’slambdavalues(unexplainedvarationin
eachpredictorbygroupsofresponsevariable)
ishigher
✓ThetablesuggeststhatDebttoincomeratio
(x100)isbest,followedbyYearswithcurrent
employer,Creditcarddebtinthousands,and
Yearsatcurrentaddress,andthenotherdebts

Pooled within-group matrices
✓Multi-colinearitymaybe
anissue
✓Look fordifferences
betweenthestructure
matrixanddiscriminant
functioncoefficientstobe
sure.

Box Test
Box'sMtests
NullHypothesis:Equalityofcovariances
acrossgroups
P-value<alpha(0.05)
NullRejected
Useseparatematricestoseeifitgives
radicallydifferentclassificationresults.
We will see using separate groups covariance
matrices later

Summary of Canonical Discriminant Functions
✓EigenValue-Higherthebetter
✓CanonicalCorrelation-Pearson'scorrelationbetween
thediscriminantscoresandthegroups.Higherthe
better
✓Wilks'lambda-Itmeasureshowwelleachfunction
separatescasesintogroups.
SmallervaluesofWilks'lambdaindicategreater
discriminatoryabilityofthefunction.
✓Associatedchi-squaretests-Null:themeansofthe
functionslistedareequalacrossgroups
P-value<0.05thediscriminantfunctiondoesbetterthan
chanceatseparatingthegroups.

Standardized canonical DF coefficient
Coefficientswith large
absolutevaluescorrespondto
variableswith greater
discriminatingability
Differentorderinbothtables
indicatescollinearityor
presenceofoutlier
Insuchcase,it’ssafetouse
structurematrix

Standardized canonical DF coefficient
SameOrder

Canonical Discriminant Function Coefficients
Used for writing equation & computing
discriminant function for each predictor

Functions at Group Centroids
Used for determining cut-off value

Discriminant Analysis_ Outputs
Classificationfunctions
✓Theclassificationfunctionsareusedtoassigncasestogroups.
✓Thereisaseparatefunctionforeachgroup.Foreachcase,a
classificationscoreiscomputedforeachfunction.
✓Thediscriminantmodelassignsthecasetothegroupwhose
classificationfunctionobtainedthehighestscore.

Discriminant Analysis _Output
Thewithin-groupscorrelationmatrix
showsthecorrelationsbetweenthe
predictors.Thelargestcorrelations
occurbetweenCreditcarddebtin
thousandsandtheothervariables,but
itisdifficulttotelliftheyarelarge
enoughtobeaconcern.Lookfor
differencesbetweenthestructure
matrixanddiscriminantfunction
coefficientstobesure.

Discriminant Analysis _Output
Box'sMteststheassumptionofequalityofcovariances
acrossgroups.Logdeterminantsareameasureofthe
variabilityofthegroups.Largerlogdeterminants
correspondtomorevariablegroups.Largedifferencesin
logdeterminantsindicategroupsthathavedifferent
covariancematrices.
SinceBox'sMissignificant,youshouldrequestseparate
matricestoseeifitgivesradicallydifferentclassification
results.Seethesectiononspecifyingseparate-groups
covariancematricesformoreinformation.

Discriminant Analysis _Output
Thereareseveraltablesthatassess
thecontributionofeachvariabletothe
model,includingthetestsofequalityof
groupmeans,thediscriminantfunction
coefficients,andthestructurematrix

Discriminant Analysis _Output
Thestandardizedcoefficientsallowyou
tocomparevariablesmeasuredon
differentscales.Coefficientswithlarge
absolutevaluescorrespondtovariables
withgreaterdiscriminatingability.
Thistabledowngradestheimportance
ofDebttoincomeratio(x100),butthe
orderisotherwisethesame.

Prior Probabilities for Groups
Apriorprobabilityisanestimateofthe
likelihoodthatacasebelongstoa
particulargroupwhennoother
informationaboutitisavailable

Classification Function Coefficients
These areusedtocompute
probabilitiesforgroupmembership.

Classification Results
Trainingsetaccuracy=82.2%
Validationsetaccuracy=78.4%

How to improve model
Usevariableselection
InSPSS,step-wisemethod
Useseparatecasecovariancematrix

Discriminant Analysis
SinceBox'sMissignificant,it'sworth
runningasecondanalysistosee
whetherusingaseparate-groups
covariancematrixchangesthe
classification.

Discriminant Analysis _Output
Thestructurematrixshowsthecorrelationofeachpredictorvariablewiththe
discriminantfunction.Theorderinginthestructurematrixisthesameasthatsuggested
bythetestsofequalityofgroupmeansandisdifferentfromthatinthestandardized
coefficientstable.ThisdisagreementislikelyduetothecollinearitybetweenYearswith
currentemployerandCreditcarddebtinthousandsnotedinthecorrelationmatrix.
Sincethestructurematrixisunaffectedbycollinearity,it'ssafetosaythatthis
collinearityhasinflatedtheimportanceofYearswithcurrentemployerandCreditcard
debtinthousandsinthestandardizedcoefficientstable.Thus,Debttoincomeratio
(x100)bestdiscriminatesbetweendefaultersandnondefaulters.

Discriminant Analysis _Output
Inadditiontomeasuresforchecking
thecontributionofindividual
predictorstoyourdiscriminantmodel,
theDiscriminantAnalysisprocedure
providestheeigenvaluesandWilks'
lambdatablesforseeinghowwellthe
discriminantmodelasawholefitsthe
data.

Discriminant Analysis _Output
Theeigenvaluestableprovides
informationabouttherelativeefficacy
ofeachdiscriminantfunction.When
therearetwogroups,thecanonical
correlationisthemostusefulmeasure
inthetable,anditisequivalentto
Pearson'scorrelationbetweenthe
discriminantscoresandthegroups.

Discriminant Analysis _Output
Wilks'lambdaisameasureofhowwelleachfunction
separatescasesintogroups.Itisequaltothe
proportionofthetotalvarianceinthediscriminant
scoresnotexplainedbydifferencesamongthegroups.
SmallervaluesofWilks'lambdaindicategreater
discriminatoryabilityofthefunction.
Theassociatedchi-squarestatisticteststhehypothesis
thatthemeansofthefunctionslistedareequalacross
groups.Thesmallsignificancevalueindicatesthatthe
discriminantfunctiondoesbetterthanchanceat
separatingthegroups.

Discriminant Analysis _Output
The classification table shows the practical
results of using the discriminant model. Of
the cases used to create the model, 94 of
the 124 people who previously defaulted are
classified correctly. 281 of the 375
nondefaultersare classified correctly.
Overall, 75.2% of the cases are classified
correctly.
Classifications based upon the cases used to
create the model tend to be too "optimistic"
in the sense that their classification rate is
inflated. The cross-validated section of the
table attempts to correct this by classifying
each case while leaving it out from the
model calculations; however, this method is
generally still more "optimistic" than subset
validation.
Subset validation is obtained by classifying

Discriminant Analysis _Output
Restalltablesaresame.
Theclassificationresultshavenot
changedmuch,soit'sprobablynot
worthusingseparatecovariance
matrices.Box'sMcanbeoverly
sensitivetolargedatafiles,whichis
likelywhathappenedhere.

Using z-scores

Get your hands dirty!
Playaroundwithdifferentmodels&seewhatworksbestforyourproblem

How to report the results?
1.ANOVATable[univariateanovainstatisticssubdialogbox)
relationofindividualpredictor
2.BOXM(Assumptionchecking)
it'snotverystrongmeasure...forlargesample,mostly,itgivespvalue>
0.05
3.Performance(EigenValue,Wilkslambda,Classificationtable)
4.Discriminantequations&centroidscores
5.Relativeimportance

Multiple Discriminant Analysis
Dataset:Iris.sav
Responsevariable=Species_3categories(storedasstringvariable)
Iris-setosa,Iris-versicolor,andIris-virginica
Alternatetechnique:Multonominallogisticregression
ForMDA,weneedtoconvertstringvariabletocategoricalvariable
STEPS:ClickTransform>AutomaticRecode.
Double-clickvariableStateintheleftcolumntomoveittotheVariable->NewNamebox.
Enteranameforthenew,recodedvariableintheNewNamefield,thenclickAddNewName.
ChecktheboxforTreatblankstringvaluesasuser-missing.
ClickOKtofinish.

Group Statistics

Box M
No need for separate group

Summary of Canonical Discriminant Functions

Variable Importance

Canonical Discriminant Functions

Functions at Group Centroids

Prior Probabilities for Groups

Classification Function Coefficients

Classification Results
Using leave-one-out cross validation

Combined Group Plot

Thank You