Multivariate_Data_Analysis_Session1.pptx

apsapssingh9 14 views 44 slides Feb 27, 2025
Slide 1
Slide 1 of 44
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44

About This Presentation

Factor Analysis and its Types


Slide Content

Multivariate Data Analysis & Factor Analysis

What is Multivariate Data? - Data with multiple variables for each observation - Example: Customer dataset with age, income, and spending score - Used in marketing, finance, healthcare, and social sciences

Why Multivariate Analysis? - Real-world data is complex and multidimensional - Helps uncover patterns and relationships - Used for segmentation, prediction, and classification

Types of Multivariate Analysis - Factor Analysis: Data reduction - Regression Analysis: Predictive modeling - Discriminant Analysis: Classification - Cluster Analysis: Segmentation

What is Factor Analysis? - Statistical technique to reduce dimensionality - Identifies hidden relationships between variables - Example: Grouping customer satisfaction survey questions into key factors

Types of Factor Analysis 1. Exploratory Factor Analysis (EFA) 2. Confirmatory Factor Analysis (CFA)

Exploratory Factor Analysis (EFA) EFA is used when the researcher does not have prior knowledge about the structure of the data or the number of latent factors. The goal is to explore potential underlying factors without imposing a predefined structure. Characteristics of EFA: Data-driven approach : It identifies the number of factors based on statistical criteria. No predefined factor structure : The algorithm determines factor loadings. Used for theory development : Helps in identifying potential relationships in the data. Steps in EFA: Extracting Factors : Uses techniques like Principal Component Analysis (PCA) or Principal Axis Factoring (PAF) to extract factors. Determining the Number of Factors : Using eigenvalues, scree plot, or Kaiser Criterion (eigenvalue >1). Factor Rotation : Methods like Varimax (orthogonal) or Promax (oblique) help interpret factors. Interpreting Factors : Factors are analyzed based on high-loading variables. Example of EFA Use Case: In psychology, EFA can be used to uncover underlying personality traits from survey responses (e.g., discovering the Big Five personality traits ).

Exploratory Factor Analysis (EFA) Exploratory Factor Analysis (EFA) is a statistical technique used to uncover the underlying structure in a dataset by identifying latent (hidden) factors that explain the correlations among observed variables. It is commonly used in psychology, social sciences, finance, and machine learning for dimensionality reduction and pattern recognition.

Understanding EFA EFA is used when: The researcher does not know the number of factors beforehand. The goal is to explore potential factor structures in the data. Variables are assumed to be correlated and influenced by underlying factors . It helps answer questions like: How many latent factors explain the observed data? Which variables are strongly associated with which factors? Can we reduce many observed variables into fewer meaningful dimensions?

Steps in Exploratory Factor Analysis EFA consists of the following key steps: 1: Data Collection & Preparation Gather data with multiple observed variables (survey responses, test scores, etc.). Ensure a large enough sample size ( Rule of Thumb: At least 5-10 observations per variable ). Check for missing values, outliers, and data normality. 2: Compute Correlation Matrix A correlation matrix is calculated to see how variables relate to each other. High correlations indicate shared variance, which suggests latent factors. 3: Determine the Number of Factors The number of factors can be determined using: Eigenvalues (Kaiser’s Criterion: Factors with eigenvalues > 1 are retained). Scree Plot (Look for the "elbow" where eigenvalues drop sharply). Parallel Analysis (Statistical method to compare eigenvalues with random data).

4: Factor Extraction Extracts latent factors from the dataset using methods such as: Principal Component Analysis (PCA) : A computational approach that maximizes variance but is not true factor analysis. Principal Axis Factoring (PAF) : Finds latent factors while ignoring unique variance. Maximum Likelihood (ML) : Uses statistical likelihood to estimate factor 5: Factor Rotation Factor rotation simplifies interpretation by adjusting factor loadings. Common rotation methods: Orthogonal Rotation (Varimax) : Assumes factors are uncorrelated. Oblique Rotation (Promax, Oblimin ) : Allows correlation between factors. 6: Interpretation of Factor Loadings Factor Loadings show the strength of relationships between observed variables and latent factors. Variables with high loadings (> 0.4) are assigned to factors. Naming factors is done based on common themes among high-loading variables. 7: Reliability & Validation Check Cronbach’s Alpha for reliability (α > 0.7 is considered good). Conduct additional tests like Confirmatory Factor Analysis (CFA) for validation.

Example of EFA Scenario: Identifying Personality Traits Suppose we conduct a survey to measure personality traits using 10 questions, where participants rate themselves on a scale of 1 to 5.

Step 1: Compute Correlation Matrix We calculate correlations between the 10 questions and find that: Q1, Q2, Q3 are highly correlated → Could indicate an Extroversion factor. Q4, Q5 are correlated → Could represent Openness . Q6, Q7 are correlated → Could suggest Neuroticism . Q8, Q9, Q10 are correlated → Might form a Conscientiousness factor. Step 2: Determine the Number of Factors Eigenvalues suggest keeping 4 factors. Scree plot shows a clear "elbow" at 4. Parallel analysis confirms 4 meaningful factors.

Step 3: Factor Extraction & Rotation Using Principal Axis Factoring and Varimax Rotation , we extract 4 factors:

Step 4: Interpretation Factor 1 (Extroversion): High loadings on Q1, Q2 (Socialization, Public Speaking). Factor 2 (Openness): High loadings on Q4, Q5 (New Experiences, Creativity). Factor 3 (Neuroticism): High loadings on Q6, Q7 (Nervousness, Worrying). Factor 4 (Conscientiousness): High loadings on Q8, Q9, Q10 (Routine, Planning). Thus, our 10 personality-related questions can be grouped into 4 underlying personality traits .

Advantages & Disadvantages of EFA Advantages Reduces data dimensionality by identifying key factors. Helps in theory development and identifying latent constructs. Provides a basis for creating shorter surveys/tests with relevant questions. Disadvantages Requires subjective interpretation of factor loadings. Sensitive to sample size and variable selection. Rotation method choice can affect results.

Applications of EFA Psychology : Identifying personality traits (e.g., Big Five Model). Marketing : Understanding customer preferences (e.g., pricing, quality, brand perception). Education : Identifying learning styles in students. Healthcare : Finding symptoms related to different diseases.

Confirmatory Factor Analysis (CFA) CFA is used when the researcher has a predefined hypothesis or model about the structure of the data and wants to test whether the collected data fits the expected factor structure. Characteristics of CFA: Hypothesis-driven : The researcher specifies the expected number of factors and the relationship between variables. Model testing : Uses statistical techniques like structural equation modeling (SEM) to assess model fit. Confirmatory nature : Validates whether the assumed structure holds true in the dataset. Steps in CFA: Specify the Factor Model : Define the number of factors and which observed variables load onto each factor. Estimate Factor Loadings : Use methods like Maximum Likelihood Estimation (MLE) to compute relationships. Assess Model Fit : Fit indices like Chi-square test, RMSEA, CFI, and TLI determine if the model is appropriate. Refining the Model : Adjust the model based on modification indices. Example of CFA Use Case: In education, CFA can validate whether an intelligence test measures distinct abilities like verbal, spatial, and logical reasoning as assumed.

Confirmatory Factor Analysis (CFA) Confirmatory Factor Analysis (CFA) is a statistical technique used to verify the factor structure of a dataset. Unlike Exploratory Factor Analysis (EFA) , which discovers latent factors without prior assumptions, CFA tests a predefined hypothesis about the relationship between observed variables and latent constructs. CFA is widely used in psychology, social sciences, education, marketing, and finance to validate measurement models in research.

Key Concepts of CFA Latent Variables (Factors): Unobservable concepts that influence observed variables (e.g., intelligence, personality traits). Observed Variables (Indicators): Measured variables that are used to infer latent variables. Factor Loadings: The strength of relationships between observed variables and their respective latent factors. Model Fit: Statistical tests that determine how well the hypothesized model fits the data.

Difference Between CFA & EFA

Steps in Confirmatory Factor Analysis (CFA) 1: Define the Hypothesized Model A theoretical model is defined based on prior research or EFA results. The number of factors and their corresponding observed variables are specified. 2: Collect and Prepare Data Data is collected using surveys, tests, or other measurement tools. The dataset is checked for missing values, outliers, normality, and multicollinearity . 3: Specify the CFA Model Define which observed variables load onto which latent factors. Set relationships between factors (correlated or independent).

4: Estimate Parameters Factor Loadings: Measure how strongly each observed variable is associated with its latent factor. Error Terms (Residuals): Account for variance in observed variables not explained by latent factors. Covariances: If factors are related, their correlation is estimated. 5: Assess Model Fit Various model fit indices are used to evaluate how well the model represents the data:

6: Modify and Improve the Model If model fit is poor, modifications can be made: Remove weak factor loadings (<0.4) Allow error terms to correlate (if justified) Add or remove factors based on theory 7: Interpret the Results Confirmatory Factor Loadings: If high (above 0.5), observed variables strongly represent latent factors. Good Model Fit: Suggests the hypothesized model correctly explains the data. Poor Model Fit: Requires model modification or reconsideration of theoretical assumptions.

Example of CFA Scenario: Measuring Employee Job Satisfaction A company wants to validate a job satisfaction survey consisting of three factors : Work Environment (WE) Salary & Benefits (SB) Work-Life Balance (WLB)

1: Define the Hypothesized Model 3 latent variables (Work Environment, Salary & Benefits, Work-Life Balance) Each latent variable has 2-3 observed variables. 2: Data Collection Employees answer each question on a 1-5 Likert scale (1 = Strongly Disagree, 5 = Strongly Agree). The dataset is checked for missing values and normality. 3: Specify the CFA Model Work Environment (WE) → Q1, Q2, Q3 Salary & Benefits (SB) → Q4, Q5 Work-Life Balance (WLB) → Q6, Q7

4: Estimate Parameters Factor loadings are calculated for each observed variable: All factor loadings are above 0.5 , indicating strong relationships.

5: Assess Model Fit The following model fit indices are obtained: The model fits well with the data.

Interpretation The Work Environment, Salary & Benefits, and Work-Life Balance factors are valid constructs . The survey accurately measures job satisfaction . No modifications are needed as model fit is excellent .

Applications of CFA Psychology & Social Sciences: Validating personality tests (e.g., Big Five Personality Traits). Education: Confirming structure of academic tests. Healthcare: Testing validity of symptom questionnaires. Marketing: Measuring brand perception and customer satisfaction. Business: Employee engagement and job satisfaction surveys.

Advantages & Disadvantages Advantages Provides strong theoretical validation for latent constructs. Ensures measurement reliability and validity. Helps confirm relationships between observed variables and factors. Disadvantages Requires a large sample size for accurate estimation. Sensitive to model misspecification. Poor model fit may require modifications, making interpretation complex.

Steps in Factor Analysis 1. Collect Data 2. Check Factorability (Bartlett’s Test, KMO Test) 3. Extract Factors (PCA, MLE) 4. Rotate Factors (Varimax Rotation) 5. Interpret Results (Factor Loadings, Scree Plot)

Factor Scores in Factor Analysis Factor scores are numerical values that represent an individual’s position on a latent (unobserved) factor . They indicate how strongly an individual or an observation aligns with a particular factor in Exploratory Factor Analysis (EFA) or Confirmatory Factor Analysis (CFA) . In simple terms, factor scores are similar to standardized test scores —they help quantify a person’s or an observation's standing on an underlying factor.

How Are Factor Scores Used? Factor scores are used in various fields such as: Psychology: Measuring personality traits from survey responses. Marketing: Understanding customer preferences. Finance: Assessing risk factors in investment portfolios. Education: Evaluating students' cognitive abilities.

How Are Factor Scores Calculated? Factor scores are computed using different methods, including: A. Regression Method (Most Common) Uses least squares regression to estimate scores based on observed variable weights. Formula

B. Bartlett’s Method Produces unbiased factor scores, reducing correlations between factors. More precise but computationally complex. C. Anderson-Rubin Method Produces uncorrelated factor scores , ensuring they are standardized.

Example of Factor Scores Calculation Scenario: A university conducts a study on student academic performance based on three factors: Intelligence (F1) Study Habits (F2) Motivation (F3) After running factor analysis , each student gets a factor score like: Interpretation: Student A scores high in intelligence (1.5) but low in motivation (-0.2) . Student C has high motivation (1.8) but struggles with study habits (-0.5) . These scores help educators personalize teaching strategies .

Rotation in Factor Analysis In Factor Analysis , rotation is a mathematical transformation applied to the factor loadings to achieve a simpler and more interpretable structure . The goal is to make the relationship between observed variables and factors more distinct , so that each variable loads highly on one factor while having minimal loadings on others. Rotation does not change the underlying meaning of the data; it only redistributes factor loadings for better clarity.

Why is Rotation Needed? Improves interpretability by making factor loadings clearer. Reduces complexity so that each variable strongly loads onto one factor. Enhances differentiation between factors. Example: Without rotation, a variable might have moderate loadings on multiple factors. After rotation, it will load strongly on one factor and weakly on others , making interpretation easier.

Types of Rotation Rotation is categorized into two main types: A. Orthogonal Rotation (Factors Remain Uncorrelated) Maintains 90-degree angles between factors, keeping them uncorrelated . Used when factors are assumed to be independent . Common methods: Varimax (most widely used) Quartimax Equamax Example: In psychological research , if factors represent independent traits (e.g., extraversion vs. intelligence), Varimax rotation ensures factors remain uncorrelated

B. Oblique Rotation (Factors Can Be Correlated) Allows factors to be correlated , meaning they are not forced to stay at 90-degree angles . More realistic , as factors in real-world data often have some correlation. Common methods: Oblimin Promax Example: In educational research , reading ability and verbal reasoning might be correlated. Using Oblimin rotation , we can allow these factors to overlap naturally .

Hands-on Factor Analysis - Perform Factor Analysis using: 1. Excel 2. SPSS 3. Python (PCA) - Identify key factors from a dataset - Interpret the results using Scree Plot & Factor Loadings

Real-World Applications of Factor Analysis - Marketing: Identifying key factors in customer behavior - Finance: Risk assessment and investment analysis - Healthcare: Grouping symptoms to diagnose diseases - Social Sciences: Personality trait identification

Summary & Key Takeaways - Multivariate Analysis helps understand complex datasets - Factor Analysis is useful for dimensionality reduction - Hands-on experience with Excel, SPSS, or Python - Applications in marketing, finance, healthcare, and more