Discriminant Analysis in Multivariate data analysis.pptx

apsapssingh9 11 views 39 slides Feb 28, 2025
Slide 1
Slide 1 of 39
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39

About This Presentation

Discriminant Analysis in Multivariate data analysis an insight from research methodology prespective


Slide Content

Discriminant Analysis 1 Credit Seminar

Discriminant Analysis Discriminant analysis (DA) is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor or independent variables are interval in nature. It is a technique to discriminate between two or more mutually exclusive and exhaustive groups on the basis of some explanatory variables Linear D A - when the criterion / dependent variable has two categories eg : adopters & non-adopters Multiple D A- when three or more categories are involved eg : SHG1, SHG2,SHG3 2 Types of D.A

Similarities and Differences 3 ANALYSIS ANOVA REGRESSION DISCRIMINANT Similarities 1.Number of dependent One One One variables 2.Number of independent Multiple Multiple Multiple variables Differences 1.Nature of the dependent Metric Metric Categorical 2.Nature of the independent Categorical Metric Metric

Assumptions 1. Sample size (n) group sizes of the dependent should not be grossly different i.e. 80:20. It should be at least five times the number of independent variables. 2. Normal distribution Each of the independent variable is normally distributed. 3. Homogeneity of variances / covariances All variables have linear and homoscedastic relationships. 4. Outliers Outliers should not be present in the data. DA is highly sensitive to the inclusion of outliers. 4

5. Non- multicollinearity There should NOT BE MULTICOLLINEARITY among the independent variables. 6. Mutually exclusive The groups must be mutually exclusive, with every subject or case belonging to only one group. 7. Classification Each of the allocations for the dependent categories in the initial classification are correctly classified. 5

Discriminant Analysis Model The discriminant analysis model involves linear combinations of the following form: D = b + b 1 X 1 + b 2 X 2 + b 3 X 3 + . . . + b k X k where D = discriminant score b 's = discriminant coefficient or weight X 's = predictor or independent variable The coefficients, or weights ( b ), are estimated so that the groups differ as much as possible on the values of the discriminant function. Discriminant analysis – creates an equation which will minimize the possibility of misclassifying cases into their respective groups or categories 6

Hypothesis Discriminant analysis tests the following hypotheses: H0: The group means of a set of independent variables for two or more groups are equal. Against H1: The group means for two or more groups are not equal This group means is referred to as a centroid. 7

Statistics Associated with Discriminant Analysis Canonical correlation : Canonical correlation measures the extent of association between the discriminant scores and the groups. It is a measure of association between the single discriminant function and the set of dummy variables that define the group membership. The canonical correlation is the multiple correlation between the predictors and the discriminant function Centroid . The centroid is the mean values for the discriminant scores for a particular group. There are as many centroids as there are groups, as there is one for each group. The means for a group on all the functions are the group centroids . 8

Classification matrix . Sometimes also called confusion or prediction matrix , the classification matrix contains the number of correctly classified and misclassified cases. Discriminant function coefficients . The discriminant function coefficients (unstandardized) are the multipliers of variables, when the variables are in the original units of measurement. F values and their significance . These are calculated from a one-way ANOVA, with the grouping variable serving as the categorical independent variable. Each predictor, in turn, serves as the metric dependent variable in the ANOVA. 9

Discriminant scores. The unstandardized coefficients are multiplied by the values of the variables. These products are summed and added to the constant term to obtain the discriminant scores. Eigenvalue . For each discriminant function, the Eigenvalue is the ratio of between-group to within-group sums of squares. Large Eigenvalues imply superior functions. Pooled within-group correlation matrix . The pooled within-group correlation matrix is computed by averaging the separate covariance matrices for all the groups. 10

Standardized discriminant function coefficients . The standardized discriminant function coefficients are the discriminant function coefficients and are used as the multipliers Structure correlations. Also referred to as discriminant loadings , the structure correlations represent the simple correlations between the predictors and the discriminant function. Group means and group standard deviations . These are computed for each predictor for each group. 11

Wilks‘ lambda . Sometimes also called the U statistic, Wilks‘ λ for each predictor is the ratio of the within-group sum of squares to the total sum of squares. Its value varies between 0 and 1. Large values of λ (near 1) indicate that group means do not seem to be different. Small values of λ (near 0) indicate that the group means seem to be different. It is (1-R 2 ) where R 2 is the canonical correlation It is used to measure how well each function separates cases into groups. It also indicates the significance of the discriminant function and provides the proportion of total variability not explained. 12

Linear discriminant analysis : Hypothetical example Groups based on adoption intention quality (x1) accessibility (x2) Price (x3) Group A: would adopt Person 1 Person 2 Person 3 Person 4 Person 5 8 6 10 9 4 9 7 6 4 8 6 5 3 4 2 Group B: would not adopt Person 6 Person 7 Person 8 Person 9 Person 10 5 3 4 2 2 4 7 5 4 2 7 2 5 3 2 13

14 Mis -classification Non-adopters

15 Mis -classification Non-adopters

16 Mis -classification Non-adopters

Out put : Function Eigen value % of variance Cumulative % Canonical correlation 1 3.315 100 100 0.877 17 Test of functions Wilk’s lambda Chi- squre d.f. Sig. 1 0.232 9.504 3 0.023 Function 1 X1 1.110 X2 0.709 x3 -0.564 Standardised canonical discrimination function coefficients Z i = 1.110 x 1 +0.709 x 2 -0.564 x 3 Discriminant function can be written as Note : more eigen value and lesser wilk’s lambda preferred

Predicting group membership: Group centroids are calculated as 10.77 and 4.52. by taking the mean of respective discriminant scores of the Group. Thus the cut of score is average of both = 7.65 One can predict a person’s choice of dependent variable i.e. adopting / non – adopting 18

Multiple discriminant analysis When we need to discriminate among more than two groups, we use multiple discriminant analysis. This technique requires fitting g-1 number of discriminant functions, where g is the number of groups Assumptions remain same for this type too.. The best D will be judged as per the comparison between functions 19

Case study 1: 20

Case study : 2 21 Application of Discriminant Function Analysis in Agricultural Extension Research Ayogu , Chiebonam Justina, Madukwe , Micheal.C , Yekinni , Oyedeji Taofeeq A research study was carried out to select the variables which could best discriminate between two groups of Extension Agents – Effective Extension Agents (Group 1); and Ineffective Extension Agents (Group 2). note : join research gate

1. Analyse ˃ >> Classify >>>D iscriminant  22

2. Click Define Range button and enter the lowest and highest code for your groups. 23

24 3.Click on Statistics button and select Means, Univariate Anovas , Box’s M,

25 4. Click on Save and then select Predicted Group Membership and Discriminant Scores, click Continue

Findings of case study 2: 26

    Wilks' Lambda F df1 df2 Sig.       Age .999 .069 1 48 .794       Years of experience .710 19.625 1 48 .000             Distance of residence to work place .999 .065 1 48 .799       Communication skills .540 40.846 1 48 .000       Positive attitude to work .589 33.464 1 48 .000                             27 Table provides statistical evidence of significant differences between means of effective EAs and ineffective EA groups for all independent variables with communication skill and positive attitude to work producing very high value F’s.

28 Test of                         Functi Wilks'           on(s) Lambda Chi-square df   Sig.                 1 .350 47.825   5 .000                 Wilks' Lambda The significance of the discriminant function is indicated by Wilks’ lambda and provides the proportion of total variability not explained, i.e. it is the converse of the squared canonical correlation.

Pooled Within-Groups Matrices               Distance of   Positive         Years of residence to Communicatio attitude to     Correlation   age experience work place n skills work                     Age 1.000 .094 -.149 -.036 .243     Years of experience .094 1.000 -.231 .139 .021         Distance of residence -.149 -.231 1.000 -.198 -.303     to work place                   Communication skills -.036 .139 -.198 1.000 .214     Positive attitude to work .243 .021 -.303 .214 1.000                                           29 The within- groups correlation matrix shows the correlations between the predictors.  

30 An eigenvalue provides information on the proportion of variance explained. A canonical correlation of 0.807 suggests the model explains 65.13% (i.e.0.807 2 ×100) of the variation in the grouping variable, i.e. whether an extension agent is effective or ineffective Eigenvalues table     Functi         Canonical             on Eigenvalue % of Variance Cumulative % Correlation                   1 1.861 a   100.0 100.0 .807                    

Structure matrix table 31     Function         1       Communication skills .676       Positive attitude to work .612             Years of experience .469       age .028       Distance of residence to work .027             place                       These unstandardized coefficients (b) operate like unstandardized b (in regression) coefficients and are used to create the actual prediction equation which are used to classify new cases.

32

D= (-0.009 age ) + (0.053 × years of experience in extension work ) + (0.175 × distance of residence to work place ) + (0.110 × communication skill ) + (0.940 × positive attitude to work ) - 5.329. 33

Advantages Discrimination of different groups Accuracy of classification of groups can be determined Helps for categorical regression analysis Visual graphics makes clear understanding for the two or more categories with computational logics. 34

Linear discrimination cannot be used when subgroups are stronger. The selection of the predictor variables are not strong until a strong classification exists. It cannot be used when there is insufficient data to define sample means 35 Limitations

Contd … If the number of observations are less, the discrimination method cannot be used.( 5 times more than the no. of predictor variables) : Lawrence – applied multivariate research) If the overlap in the distribution is small, the discriminant function separates the groups well. If the overlap is large, the function is a poor discriminator between the groups. 36

Applications 37

References Hajong Dipika .(2014). A study on agri -entrepreneurship behaviour of farmers. PhD thesis. IARI, New Delhi Kothari, C. R. (2004).  Research methodology: Methods and techniques . New Age International. Meyers, L. S., Gamst , G., & Guarino , A. J. (2006).  Applied multivariate research: Design and interpretation . Sage. Poulsen , J., & French, A. (2008). Discriminant function analysis.  San Francisco State University: San Francisco, CA. SPSS Chapter 25 Data File B. Retrieved from www.uk.sagepub.com/ www.youtube.com/watch?v=7zYcMZ-61c4 38

39 Thank You…. All great people are gifted with intuition. Just an Analysis and a reasoning will fructify their contribution…. conclusion