Discriminant function analysis (DFA) Presented by Fathima Hameed
Outlines Introduction Purpose of discriminant analysis Basics of discriminant function analysis Steps in analysis Discriminant analysis model Hypothesis Similarities & differences Types of DA Assumption Advantages Limitation Application References
What is discriminant function analysis? DA is a statistical method Used by researches to help them understand the relationship between a “dependent variable” & one/ more “independent variables”. DA is similar to regression analysis (RA) & analysis of variance (ANOVA) DFA is useful in determining whether a set of variables is effective in predicting category membership.
What are discriminant functions? Discriminant analysis works by creating one or more linear combinations of predictors , creating a new latent variable for each function. These functions are called as “discriminant functions”.
Why do we use DA? DA has various benefits as a statistical tool and is quite similar to regression analysis. It can be used to determine which predictor variables are related to the dependant variable and to predict the value of the dependant variable given certain values of the predictor Variables.
When to use DA Data must be from different groups. Membership of group should be already known before the starting analysis. It is used for the analysis of differences in groups. It is used for classification of new objects.
Purpose of DA The objective of DA is to develop discriminant functions that are nothing but the linear combination of independent variables that will discriminate between the categories of the dependent variable in a perfect manner.
Basics of DFA Discriminating variables (predictors): independent variables which construct a discriminant function Dependent variable (criterion variable): Object of classification on the basis of independent variables Needs to be categorical Known as grouping variables in SPSS.
Steps in analysis
Discriminant Analysis model The DAM involves linear combinations of the following form; Where , D = discriminant score b’s = discriminant coefficient / weight x’s = predictor/ independent variable The coefficient / weights , are estimated so that the groups differ as much as possible on the values of the discriminant function. DA – creates an equation which will minimize the possibility of misclassifying cases into their respective groups / categories. D= b₀ + b₁x ₁ + b₂x ₂ + b₃x ₃ +….+ bĸXĸ
Canonical correlation: Canonical correlation measures the extent of association between the discriminant scores & the groups. Centroid: The centroid is the mean values for the discriminant scores for a particular groups. Classification matrix: Also called confusion / prediction matrix, the classification matrix contains the no. of correctly classified & misclassified cases. Discriminant scores: The unstandardized coefficients are multiplied by the values of the variables. These products are summed & added to the constant term to obtain the discriminant scores.
Hypothesis DA tests the following hypotheses; H₀ : the group means of a set of independent variables for two /more groups are equal. H₁ : the group means for two/ more groups are not equal. Here, this group means is referred to as a centroid .
Similarities & differences in analysis Analysis ANOVA REGRESSION DISCRIMINANT Similarities No. of dependent variables One One One No. of independent variables Multiple Multiple Multiple Differences Nature of the dependent Metric Metric Categorical Nature of the independent Categorical Metric Metric
Linear Discriminant Analysis A linear combination of features Ronald Fishers in 1936 This methods group images of the same classes & separates images of the different classes. To identify an input test image, the projected test image is compared to each projected training image, & the test image is identified as the closest training image. This classification involving 2 target categories & 2 predictor variables. Images are projected from 2D spaces to C dimensional space, where C is the no. of classes of the images.
How does LDA work? Step 1: To calculate the seperability between different classes also called as between – class variance . Step 2: To calculate the distance between the mean & sample of each class, is called the within class variance . step 3: To construct the lower dimensional space which maximizes the between class variance & minimizes the within class variance.
LDA
2) Multiple Discriminant analysis To discriminate among more than 2 groups It requires g-1 no. of discriminant functions, where g is the no. of groups The best discriminant will be judged as per the comparison between functions. Similar to multiple regression assumptions remain same.
Assumptions in DA Assumptions: A predictors are normally distributed The variance covariance matrices for the predictors within each of the groups are equal. Sample size Normal distribution Homogenecity of variance/ covariances Outliers Non- multicollinearity Mutually exclusive Classification Variability
Advantages Discrimination of different groups Accuracy of classification groups can be determined Helps for categorical regression analysis Visual graphics makes clear & understanding 2/ more categories.
Limitations LD can’t be used when subgroups are stronger. Predictor variables don’t strong. It can’t be used when there is insufficient data It was not usable to less no. of observation Small distribution gives good discriminant functions between groups. Large distribution gives poor discriminant functions between groups.
Applications Prediction & description DA Agriculture, fisheries, crop & yield studies, geoinformatics, bioinformatics, social sciences, researches. Socio economics Hydrological & physico -chemical studies in water sources Face recognition Marketing Financial research Human resources