Introduction to Factor Analyses staticstics.pptx

AkhilShrivastav4 10 views 50 slides Jun 07, 2024
Slide 1
Slide 1 of 50
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50

About This Presentation

Factor analytics


Slide Content

Introduction to Factor Analyses Amitava Bandyopadhyay SQC & OR Division Indian Statistical Institute 1

Contents Concept of latent (unobservable) variables and dimensions. Meaning of factor loadings and their usages 2

Examples The CEO of a company wants to know how the company is perceived by the customers. S/he is aware that perception is a conceptual variable and may have many dimensions. S/he wants to discover the different dimensions of perception on the basis of a variety of questionnaires administered on different members of the customers at different periods of time. Customer satisfaction consists of many different dimensions. A company has proposed certain dimensions and wanted to find customer satisfaction scores by finding answers to specific questions within the proposed dimensions. As an analyst you want to verify whether the suggested questions truly represent the proposed dimensions and whether some scale can be developed to measure customer satisfaction. A project may fail due to many broadly defined causes like excessive complexity; high level of expectation of the customers; difficult working environment and so on. Note that these causes are themselves defined at a rather high level and are not truly observable. It may be necessary to identify variables to measure these causes and eventually develop a cause-effect relationship between these broadly defined causes and occurrence of failure. A manufacturing company may wish to find the major dimensions of the many characteristics it controls so that only a handful of the characteristics may be looked at. 3

Case Example 4 Suppose we have test score of students in the following subjects – Mathematics (M), Physics (P), Chemistry (C), History (H), English (E), and Bengali (B). Suppose we assume that the scores are functions of general intelligence, I. In addition it may be assumed that the aptitude of students for subject areas could be different, and the aptitude for a particular student is unique to the student. Thus, it may be assumed that the score obtained by a particular student is a function of The general level of intelligence of the student, and The aptitude of the student for a particular subject Thus, the scores may be expressed in terms of the following equations: M = 0.80*I + A m ; P = 0.90*I + A p ; C = 0.70*I + A c ; H = 0.50*I + A h ; E = 0.65*I + A e ; B = 0.60*I + A b (A) It may be observed that the score obtained by a student in any subject, say mathematics, is a linear combination of the general level of intelligence and his / her aptitude in that subject. The general intelligence level is called the common factor that impacts the individual variables. The associated scores are called the factor loadings. Note that the common factor is an unobservable construct. Note that the equations given in (A) may be considered as a set of regression equations on unobserved explanatory variables. These variables are called latent variables or dimensions. These variables are also referred to as constructs / factors.

Some Definitions Domain: Phenomenon of interest in a particular project like assessment of complexity of a product; measuring level of satisfaction of customers; developing scale for the measurement of skill of operators; assessing the level of riskiness of projects etc. Studies are taken up in specific domains Manifest Variables (MV): Variables that can be directly measured from the domain under investigation. Examples of MV are: Domain: Customer satisfaction for a telecom service provider Network performance Strength of signal when calling / receiving calls from indoor locations Clarity of calls when calling / receiving calls from outdoor locations Availability of network in remote areas Billing performance Accuracy of bills Timely availability of bills Ease of payment of bills What could be the manifest variables in the case of a blast furnace or steel making? Latent Variable: These variables are essentially conceptual and cannot be measured directly. These variables are often referred to as constructs or factors. As these variables are not directly observable, scores for these variables cannot be determined precisely. 5

Definitions (Continued….) Basic Principle: There exists a small number of factors in a given domain. These factors influence the manifest variables and consequently they co-vary. Thus, it is assumed that the manifest variables are correlated (have covariance) as they are impacted by the same factor. It is further assumed that variation in an MV and covariance between MVs occur because of the variation of the factors. Note on Factor Analysis: Factor analysis is an interdependency technique. The data are collected as a p X N matrix where p is the number of variables and N gives the number of respondents or entities studied. The information contained in the data are extracted in terms of variances / co-variances or correlation matrix. Factor analysis aims at identifying the structure (i.e. identify the factors and the MVs that are impacted by the factors) that produce the variance / covariance or equivalently the correlations. 6

Factor Loadings Influence of a factor on MVs is measured by factor loading. Factor loadings are similar to regression coefficients, representing the influence of a factor (independent variable) on an MV (dependent variable). Note: A factor is defined by a subset of the manifest variables that are substantially influenced by the factor. These manifest variables have large loadings. Aim of factor analysis: Given the data, we want to determine the number and nature of the underlying factors and their pattern of influence (loadings) on the MVs. 7

Factor Loadings (Continued…) There are two types of loading Pattern Loadings: The coefficients of the regression equations with the manifest variables as the dependent variable and the factors as the independent variables are called pattern loadings Structure Loadings: The correlations between the factors and the manifest variables are known as the structure loadings Notes: When factors are orthogonal, the pattern loadings and the structure loadings are the same. Note that in regression equation, the regression coefficients equal the simple correlations between the independent and dependent variables when all independent variables are uncorrelated. Correlation between the manifest variables is the sum of product of the pattern loadings of the variables on the different factors. In case there are m factors then the correlation between manifest variables j and k is given by ρ jk = Σλ ji λ ki i =1,2..m. 8

Factor Loadings and Communality Communality is the shared variance of a manifest variable with the common factors. Higher communality indicates better fit of factor models as the shared variance between the common factors and the manifest variables increase. Communality is computed as the sum of square of structure loadings of a manifest variable with the common factors. The communality gives the proportion of variance the manifest variable shares with the common factors Unique variance of the manifest variable x j is defined as V( x j ) – Communality( x j ). 9

Common Factor Model Common Factors: More than one MV is influenced by a common factor. As these factors are common to more than one MV, they are called common factors ’. MVs are correlated because they are influenced by same (one or more) common factors. Unique Factor: Influence only one manifest variable. These factors explain only that part of the MV that is not explained by the common factors. Unique factors do not explain correlations among MVs and they account only for the variance in each MV that is not accounted for by the common factors. Each unique factor has two components Specific factor that is due to the particular MV only Error of measurement 10

Communality and Reliability The observed and unique variances of the Manifest Variables are given by Observed Variance = Common Variance + Unique Variance Unique Variance = Specific Variance + Error variance Communality is defined to be the proportion of observed variance attributable to common factors, i.e. the extent to which the variation of an MV is due to the variation of the common factor that influences it. When communality is large, the common factor has a large influence Communality = Common Variance / Observed Variance = 1 – Unique Variance / Observed Variance Reliability is defined to be proportion of observed variance that is systematic, i.e. does not occur due to measurement error Reliability = (Common Variance + Specific Variance) / Observed variance = 1 – Error Variance / Observed Variance 11

Example of Factor Analysis Suppose we conduct tests on 4 subjects, namely Paragraph Comprehension, Vocabulary, Arithmetic Skills, and mathematical Problem Solving on a set of individuals. The paragraph comprehension skill refers to the ability of reading a paragraph and answering questions related to the content of the paragraph. Arithmetic skill test involves questions on different types of arithmetic computation and the general sense that an individual has about numbers. And Mathematical Problem Solving Skill involves reading and understanding verbally stated problems, formulating them into mathematical terms and solving them. Suppose the correlation between the scores were found and the same are given below: 12 Tests Tests PC VOC ARITH MPS Paragraph Comprehension (PC) 1.00 Vocabulary (VOC) 0.49 1.00 Arithmetic Skill (ARITH) 0.14 0.07 1.00 Mathematical Problem Solving (MPS) 0.48 0.42 0.48 1.00 Can we conclude anything abut the underlying abilities that may influence the scores?

Example (Continued…) 13 Suppose there are two factors – say Factor 1 and Factor 2. Let the loading matrix for the two factors be as follows: Manifest Variables Factor Loadings for Factor 1 Factor 2 Paragraph Comprehension 0.70 0.10 Vocabulary 0.70 0.00 Arithmetic Skill 0.10 0.70 Mathematical Problem Solving 0.60 0.60 We wish to interpret the two factors and we argue as follows: For Factor 1, we see high loadings for paragraph comprehension, vocabulary and mathematical problem solving. The manifest variable arithmetic skill had a low loading on this factor. We, therefore, theorize that the underlying ability has something to do with comprehending written matter, even if it contains difficult words and the ability to use the knowledge to formulate problems. However, sense of numbers does not form a part of this ability. We may, therefore, call this ability as ‘verbal ability’. What will be the interpretation of factor 2?

Computation of Correlations In factor analyses we often need to find correlations between variables X and Y when one or both the variables are nominal or ordinal. The correlations to be applied in these situations are given below: 14 Type of Correlation Conditions on X and Y Tetrachoric Both X and Y are dichotomous, i.e. nominal or ordinal Polychoric Both X and Y are ordinal variables representable through a r X c contingency table Biserial X is dichotomous (nominal / ordinal) and Y is quantitative Polyserial X is ordinal with k levels (k ≥ 2) and Y is quantitative

Examining The Correlations First the correlations are examined visually. Substantial number of correlations should be reasonable. We expect the absolute values of many correlations to be greater than 0.30 The nature of relationship between variables may be looked at in terms of partial correlations as well. If “true” factors exist in the data, the partial correlations should be small. When true factors exist the correlations between manifest variables are due to the underlying factors. Thus correlation after adjusting for the effect should be small. If partial correlations are not small, there ae reasons to believe that the factor model is inappropriate. SPSS and SAS provide the anti image correlation matrix which is the negative value of the partial correlation 15

Examining The Correlations (Continued….) Bartlett’s Test of Sphericity The null hypothesis that the correlation matrix is an identity matrix is tested. A χ 2 statistic is computed from the sample correlations. A low p-value leads to the rejection of the null hypothesis and implies that the correlation matrix differs significantly from the identity matrix Rejection of the null hypothesis implies that factor analyses may be carried out on the data Kaiser-Meyer- Olkin (KMO) Measure of Sampling Adequacy (MSA) : This is an index used to examine the appropriateness of factor analysis. The index ranges from 0 to 1, reaching 1 when each variable is perfectly predicted without error by the other variables. The measure can be interpreted as follows: > 0.90 – marvelous; > 0.80 but ≤ 0.90 – meritorious; > 0.70 but ≤ 0.80 – middling; > 0.60 but ≤ 0.70 – mediocre; > 0.50 but ≤ 0.60 – miserable; ≤ 0.50 - unacceptable 16

Some Cautions and Guidelines Regarding MSA The MSA increases as The sample size increases The average correlations increase The number of variables increase The number of factors decrease The MSA may be computed for individual variables as well Overall Guideline: The analyst should first examine the MSA values for each variable and exclude those falling in the unacceptable range. The computations and the logic behind overall and individual MSA are given in a subsequent slide. Once the individual variables achieve an acceptable level, the overall MSA may be evaluated and finally a decision may be taken regarding continuation of factor analysis 17

Bartlett’s Test of Sphericity - Explanation Let R be the sample correlation matrix The test statistic for Bartlett’s test of Sphericity is computed as T = – (n – 1 – (( 2p + 5) / 6)) ln( det (R)) Under H (i.e. population correlation matrix is an identity matrix), the test statistic T follows χ 2 distribution with p(p – 1) / 2 degrees of freedom Note that when R ≈ I, det (R) ≈ 1 and hence T ≈ 0. On the other hand, when the variables are highly correlated, det (R) ≈ 0 and hence ln( det (R)) is a large negative number. 18

KMO Test – Explanation In KMO test the correlation matrix is the starting point. In this test the correlation matrix and the partial correlation matrix are computed. The KMO compares the partial correlations with the correlations. Let R be the sample correlation matrix. R = (( r ij )), i,j = 1, 2, …p where p gives the number of variables Let R -1 = (( ν ij )) and let A be the matrix of partial correlations (( a ij )), where a ij = ― ( ν ij / ( ν ii * ν jj ) 1/2 ) KMO overall = ( ΣΣ r ij 2 )/ ( ΣΣ r ij 2 + ΣΣ a ij 2 ); i ≠ j KMO j = ( Σ r ij 2 )/ ( Σ r ij 2 + Σ a ij 2 ); i ≠ j The KMO overall gives the overall measure of sample adequacy and KMO j gives the adequacy at the level of a variable It is easily observed that the MSA increases as the partial correlations become closer and closer to zero – a measure of adequacy of factor analyses 19

General Note on Factor Analysis 20 The conceptual assumptions underlying factor analysis relate to the set of variables and the sample chosen. A basic assumption is that some underlying structure does exist in the set of selected variables. It is the responsibly of the analyst to ensure that the observed patterns are conceptually valid and appropriate to study using factor analysis. It should be noted that the technique has no means of validation except looking at the correlations among variables. Some examples are given below: Mixing dependent and independent variables in a single factor analysis and then using the derived factors to study dependency relations is inappropriate. The analyst may focus on a set of explanatory variables and try to discover factors that provide guideline regarding the broad dimensions of the explanatory. It is inappropriate to mix two or more different segments. For instance, it is inappropriate to apply factor analysis to a sample of males and females on variables that are known to differ among these two segments. When the two subsamples are combined, the resulting correlations and factor structure will be a poor representation of the structure in each group. In these case, it will usually be better to carry out separate analysis for each group and compare the analyses with each other as well as with the findings of the combined group before taking a decision about the underlying constructs.

Common Factor Model Let X 1 , X 2 , X 3 ,….. X p be the manifest variables Let F 1 , F 2 , F 3 ,….. F m be the m factors, m < p. In practice p should be much higher than m Let us assume that n subjects are being studied. Thus the data matrix is an n X p matrix and x ij denotes the score on manifest variable j for subject i . Let f ij be the factor score for subject i , factor j; i = 1,2,…n and j = 1,2…m Let λ jk be the loading for variable j and factor k; j = 1,2,…p and k = 1,2,…m Then x ij = μ j + Σ f ij λ jk + 1.u ij where the summation is over k, where k = 1,2,…p Note that u ij is the unique factor score for subject i for variable j. By definition, unique factor impacts manifest variable j only and has a loading of 1.0. Note that u ij = s ij + e ij where s ij denotes the specific factor and e ij represents the error 21

Deciding About Number Of Factors The number of factors to be extracted needs to be decided in advance. Two methods to determine the number of factors to be extracted are Substantive knowledge: We may have some idea about the major dimensions in the domain under consideration. For example, human health has dimensions like strength, endurance, balance, sensory organs, ability to coordinate and immunity. Restaurant service may have dimensions like taste, timely service, behaviour, cleanliness etc. Iron and steel making may have known dimensions. Scree Plot: The eigen values of the correlation matrix are plotted to see when they become more or less parallel to the X-axis. The plot is visually examined and a sharp bend (often referred to as the elbow) is detected. The number of factors appearing above the elbow is considered to be the number of factors to be extracted. 22

Scree Plot 23 Explain what is the meaning of Scree Plot and its interpretations Provide explanation for the chart given above

Methods of Factor Extraction Three most popular methods of factor extraction in the context of exploratory factor analyses are Principal Component Factoring Principal Axis Factoring Maximum Likelihood PAF and ML are usually better methods. ML should be used when the MVs follow normal distribution. Otherwise PAF may be used. 24

Principle Axis Factoring In this method attempts are made to estimate the communalities in a step-by-step manner. This is an iterative procedure and the steps are as follows: The prior estimates of communalities are taken as 1 and the PCF solution is obtained. Thus decision is taken regarding the number of factors to be extracted and loadings, unique variances and communalities are computed. The maximum change in the estimated communality is computed. To start with the estimated communalities are compared with 1 and the maximum difference is noted. In subsequent steps the difference between subsequent estimates are looked at. If the maximum change is > a predefined criterion, the original correlation matrix is modified by replacing the diagonal element by the latest estimated communalities and steps 2 and 3 are repeated. 25

Principle Axis Factoring In this method attempts are made to estimate the communalities in a step-by-step manner. This is an iterative procedure and the steps are as follows: The prior estimates of communalities are taken as 1 and the PCF solution is obtained. Thus decision is taken regarding the number of factors to be extracted and loadings, unique variances and communalities are computed. The maximum change in the estimated communality is computed. To start with the estimated communalities are compared with 1 and the maximum difference is noted. In subsequent steps the difference between subsequent estimates are looked at. If the maximum change is > a predefined criterion, the original correlation matrix is modified by replacing the diagonal element by the latest estimated communalities and steps 2 and 3 are repeated. 26

Outputs of Factor Analysis As a result of the factor analysis we get the loadings for the different factors, the standardized scores and certain tests for the goodness of the factor model. Loadings: We need to first examine the loadings and understand which factors impact which manifest variable. Generally loadings > 0.60 are considered high . However, many practitioners take 0.50 and even 0.40 as cut off points. Whenever the absolute value of loading of a particular variable on a factor exceeds the cut off point, it is assumed that the particular variable is impacted by the factor. We look at all the variables that have been loaded significantly on the factors and attempt to attach labels or meanings to the factors. Discovering and naming the right factors is the most important objective of factor analyses 27

Structure Matrix - ADM JAVA   Factor   1 2 3 Build Tools .789 Code Quality Tools .827 Core Java .697 Cross Cultural Adaptability .693 Customer Expectation Mgmt. .678 Drive for Results .638 IDE .720 Interpersonal Effectiveness .626 Java Script .756 Java Servlets .797 JDBC .555 JSP .845 Knowledge Management .804 Leadership .727 ORM Tools .789 Process Focus .681 Process Management .793 Professional Development .720 Quality Assurance .742 Requirements Management .691 REST Web services .707 SCM Focus .669 SOA .637 Spring Core .820 Spring MVC .781 Struts .650 Testing Frameworks .748

Dot Net Competencies 29 S.N DotNet Competencies Description 1 .NET Design and Architecture Design and Architecture of softwares using .NET framework. 2 .NET Dev Tools & Best Practice Knowledge of various .NET based softwares and their -standardized techniques. 3 .Net Framework fundamentals2.0 Knowledge of .NET V2.0 programming environment. 4 .Net Framework fundamentals3.5 Knowledge of .NET V3.5 programming environment. 5 .NET Web Prog 3.5/4.0 Knowledge of web programming using .NET V3.5/4 programming environment. 6 .NET Windows Prog 3.5/4.0 Knowledge of Windows programming using .NET V3.5/4 programming environment. 7 Application Support Range of services providing assistance with computers and software products. 8 Communication Communication skills. 9 Cross Cultural Adaptability Adaptability on terms of ethnicity, rationality etc.  10 Customer Expectation Mgmt. To move customers from satisfied to loyal and then from loyal to advocate by improving customer value and customer experience by implementing CRM.  11 Database Design Producing a detailed data model of a database. 12 Domain Knowledge Knowledge used to refer to an area of human endeavour, an autonomous computer activity, or other specialized discipline. 13 Drive for Results A functional and behavioural quality that, when fully realized, can help lead to professional success. 14 Estimation Estimation of various parameters of a project. 15 Interpersonal Effectiveness Skills which helps to - attend to relationships, balance priorities versus demands, balance the 'wants' and 'shoulds', build a sense of mastery and self-respect. 16 Knowledge Management Strategies and practices used to identify, create, represent, distribute and enable adoption of insights and experiences. 17 Leadership Leadership qualities.

Scree Plot 30 ADM – Dot Net Competencies

Study of Correlations 31 Range of Correlation No. of Correlations Proportion of Correlations (%) ADM - Java ADM - DotNet ADM - Java ADM - DotNet < = 0 0 – 0.1 9 7 2.47 2.89 0.1 – 0.2 138 56 37.91 23.14 0.2 – 0.3 82 60 22.53 24.79 > = 0.3 135 119 37.09 49.17 Total 364 242 100 100 It is noted that a large number of correlations are reasonably high and hence there is a prima facie reason to go ahead with factor analysis

KMO and Bartlett’s Test 32 KMO and Bartlett's Test for AMD - Java Kaiser-Meyer- Olkin (KMO) Measure of Sampling Adequacy. 0.916 Bartlett's Test of Sphericity Approx. Chi-Square 4816.062 df 351 Sig. 0.000 KMO and Bartlett's Test for AMD - DotNet Kaiser-Meyer- Olkin (KMO) Measure of Sampling Adequacy. 0.906 Bartlett's Test of Sphericity Approx. Chi-Square 2606.386 df 231 Sig. 0.000 Provide your interpretation

Manifest Variables in Identified Factors 33 Factor 1 2 3 Communication .NET Design and Architecture Windows Communication Foundtn. Customer Expectation Mgmt. .NET Dev Tools & Best Practice Windows Workflow Foundation Database Design .Net Framework fundamentals3.5 Windows Presentation Foundation Domain Knowledge .NET Web Prog 3.5/4.0   Drive for Results .NET Windows Prog 3.5/4.0   Estimation   Interpersonal Effectiveness   Knowledge Management   Leadership   Problem Solving & Decision Making   Quality Assurance   Requirements Management   Technical Documentation     What could be the three identified broad skill dimensions?

Methods of Rotation The initial loadings obtained from the factor extraction method may not lead to clean solutions. We say that a solution is clean when There will be a small number of factors that will be loaded on three or more manifest variables Generally we will not have free standing variables, i.e. we will not have factors with only one variable having a high loading We will not have cross-loaded variables, i.e. generally one manifest variable will be highly loaded on one factor only. However, there are exceptions to this rule. We may design the factors in a way such that there is one general factor. In that all identified variables will be loaded on the general factor anyway. Apart from the general factor, we will have factors on which a few variables will load significantly Note that the factor solutions are indeterminate, i.e. we can rotate the axes and obtain a different loading that satisfies all the basic constraints (what are the constraints?). Thus we can often hope to obtain a better (cleaner) solution by rotating the factors. 34

Types of Rotation There are two broad classes of rotation – orthogonal and oblique Orthogonal rotation produces factors that are uncorrelated whereas oblique rotations allow the factors to be correlated. As the factors remain uncorrelated in orthogonal rotation, the pattern and structure loadings remain the same and it is generally easier to interpret compared to oblique rotation. However, in social sciences, in customer satisfaction oriented domains, in risk related domains, and in software engineering, the factors are likely to be correlated and consequently it may be better to use oblique rotation, even if it increases the complexity of interpretation 35

Orthogonal Rotations Varimax Rotation: In this rotation the major objective is to have a factor structure such that each manifest variable load highly on only one factor. Such a factor structure will result in each factor representing a distinct construct. Example: Recall the example of marks obtained by students in different subjects. One possibility is to explain the marks through two factors – say verbal ability and mathematical ability. (Refer back to the example and convince yourself) Exercise: What could be the factors leading to ‘risk’ of a project? Quartimax Rotation: In this method the objective is to arrive at a factor structure such that All variables have a fairly high loading on one factor – say the general factor Each manifest variable should have a high loading on another factor and near zero loading on all other factors Notes: In quartimax rotation the presence of a general factor is assumed Varimax rotation suppresses the general factor and should not be used when the presence of one is suspected 36

Comparison of Varimax and Quartimax Rotations In quartimax method of rotation, attempts are made to simplify the rows of the factor matrix. In contrast, in varimax , attempts are made to simplify the column structure. In the varimax approach, columns become most simplified when the columns consist of only 0’s and 1’s. A compromise of varimax and quartimax approach is the equimax approach. In this approach a compromise solution between varimax and quartimax is attempted. This method did not gain much acceptance and is used rather infrequently. Extensive studies have been carried out to compare the different methods of rotation. The following points are important to note Rotation finds a cleaner structure. However, it cannot improve the basic aspects of the analyses like the amount of variance extracted or the residual correlation (i.e. the goodness of fit) Within the orthogonal rotations, varimax appears to provide cleaner structure. It was also found to be more invariant than quartimax when different subsets of variables are analysed. 37

Rotation Methods (Continued…) Oblique Rotation: These are similar to orthogonal rotations, except that oblique rotations allow correlated factors instead of maintaining independence between the rotated factors. Two oblique rotations are most popular – oblimin and promax 38

Introduction to CFA Exploratory Factor Analysis (EFA) does not require proposing prior hypotheses about which indicator (manifest) variables will have zero loadings on which factor. In Confirmatory Factor Analysis (CFA) the factor structure is proposed beforehand In general the factor structure needs to be proposed from substantive knowledge of the practitioner / researcher. CFA attempts to verify (validate or invalidate) the model 39

Note on EFA and CFA It is dangerous to carry out EFA with a blind rotation to decide on the number of factors and the positions of fixed zero loadings and check the model by carrying out CFA on the same data . Checking of the model must be carried out on new data as otherwise factors identified through mere chance may appear to be theoretically valid construct Ideally speaking the model should be specified before the data were collected 40

Indices of Goodness of Fit (GOF) Investigators check the validity of a model by carrying out different tests of goodness of fit. Certain points need to be noted Bad fit indicates that the proposed model is unusable Good fit does not tell you that the model is true or correct. It is likely to be usable. However, there may be other models that fit the data just as well or even better. The overall fit must be judged using the GOF statistics, primary loadings and substantive knowledge Six indices namely RMSEA, SRMR, TLI (NNFI), CFI, GFI and AGFI are discussed in this course 41

Indices of Goodness of Fit (Continued…) RMSEA (Root Mean Square Error of Approximation): Measures the discrepancy from the observed correlations. Usually denoted by ε a . Guidelines of good fit are ε a < 0.05 – excellent; 0.05 ≤ ε a < 0.08 – good fit; 0.08 ≤ ε a < 0.15 – mediocre / acceptable; ε a ≥ 0.15 – unacceptable SRMR (Standardized Root Mean Square Residual): This index calculates the square of the residuals ( S ij – I ij ) where S ij give the sample correlations and I ij denotes the implied correlations. For a perfect fit, SRMR is zero. In general SRMR ≤ 0.08 indicates an excellent fit. 42

Indices of Goodness of Fit (Continued…) Tucker Lewis Index (TLI) aka NNFI (Non Normed Fit Index): This index compares the implied fit with the null model. A value of 1 indicates perfect fit. General ranges of TLI / NNFI are: TLI > 0.95 indicates excellent fit; 0.90 ≤ TLI < 0.95 indicates a good fit; 0.80 ≤ TLI < 0.90 indicates mediocre fit and lower values are unacceptable. Comparative Fit Index (CFI): Compares with the null model after adjusting for the free parameters. The ranges of CFI are the same as that of TLI GFI (Goodness of Fit Index) and AGFI(Adjusted Goodness of Fit Index) are two other measures that should usually have values greater than 0.80. A model that fits well with respect to all the indices may be considered to be acceptable 43

Correlation Structure for CSAT 44

KMO and Bartlett Test for Senior Management 45 Kaiser's Measure of Sampling Adequacy : Overall MSA = 0.90883877

Proposed Model 46 Dimensions Reliability Assurance Tangibles Empathy Responsiveness Delivery within time and budget Understands business Problem resolution & associated responsiveness Global delivery model working well Thought leadership Interfacing effectively with your team No business risk No loss of control Business transformation engagement Integral partner

Model Adequacy 47

Summary Factor analysis is a technique to identify factors (latent variable) from data The data are collected in terms of observable manifest (indicator) variables. The factors are concepts that cannot be directly observed Two types of factor analyses are EFA and CFA Factor analysis is carried out on the correlation between the manifest variables The factor model is often called the common factor analysis. It is assumed that the variation of the manifest variables is accounted for by a handful of factors. The correlation between manifest variables occur as they are impacted by the common factor The correlations between the common factors and the manifest variables are called factor loadings 48

Summary (Continued…) The variability of the manifest variables are divided into two parts – the common variance and the unique variance The proportion of variance that is explained by the common factors is called the communality. Higher the communality better the factor model. The sum of product of the loadings of two different variables give the estimated correlation between the variables. This estimate provides an opportunity to assess the goodness of the model fit. As factor analysis is carried out on the correlation between the manifest variables, certain amount of correlation between the variables is necessary for a successful factor model. It is important to test whether the correlation matrix of the manifest variables is an identity matrix or not. If the matrix does not differ significantly from an identity matrix, factor analysis should not be carried out 49

Summary (Continued…) While computing the correlation between the manifest variables, the variable type should be given due consideration. The product moment correlation is not applicable when the manifest variables are categorical. Two different types of factor analyses are the EFA and CFA. In EFA the factors are extracted empirically and in CFA, the investigator specifies which manifest variables load on which factors. The loadings of manifest variables on the factors may not lead to a clean structure. In order to have a clean structure the factors may have to be rotated. There are two types of rotation – orthogonal and oblique. After obtaining a clean structure, the investigator may choose to find the factor score. In computing the factor score the standardized score for the manifest variables need to be computed. The factor scores are indeterminate but may be used as a scale or subscale A number of tests exist to assess goodness of proposed factor models. It is important to assess questionnaires from this perspective. Major departure suggests requirements to change the questionnaires 50