Applying Factor Analysis on Airline Dataset

AnjaliPrajapati75 20 views 18 slides Jun 30, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

I had performed the PCA which is a very used ML algorithum on the dataset of Airlines, but there is a error that since age is a demographic variable so you should ignore it!!


Slide Content

On Airline data FACTOR ANALYSIS

#1 Objective of our Project #2 #3 #4 INSIGHTFUL ANALYSIS OPTIMIZE LOGISTIC REGRESSION PCA FACTORS EXTRACTION DERIVING INFERENCES

LET’S UNDERSTAND THE DATA ! Age Data Set Size Rows : 25976 columns : 24 VARIABLE OF INTEREST: CHOOSING 16 VARIABLES Inflight wifi service Departure/Arrival time convenient Cleanliness Gate location Online boarding Seat comfort Checkin service Inflight entertainment Ease of Online booking On-board service Leg room service Baggage handling Food and drink Inflight service Satisfaction Satisfaction

Converting Satisfaction Levels to Binary Values. Creating a Subset DataFrame Correlation Heatmap EDA (EXPLORATORY DATA ANALYSIS) STEP 1:

H0: The observed variables in the dataset are not correlated, and therefore, the correlation matrix is an identity matrix (spherical) Hypothesis: V/S H1: The observed variables in the dataset are correlated, and the correlation matrix is not an identity matrix (non-spherical) Statistic: Chi Sq : 154473.615 P value: 0.0 p-value is less than any conventional significance level. Therefore, We would reject the null hypothesis Bartlett’s Test of Sphericity ( Adequacy Test) STEP 2:

Kaiser-Meyer-Olkin (KMO) Test The observed variables in the dataset are not suitable for structure detection, indicating that the partial correlations are close to zero. The observed variables in the dataset are suitable for structure detection, indicating that the partial correlations are significantly different from zero. V/S H0: H1: KMO Value= 0.7785712021381315 KMO value above 0.5 is considered meritorious for factor analysis.

PCA Factor Extraction STEP 3: We computed the eigenvalues & individual variance explained

We derived 6 factors instead of 4, as it resulted in a more meaningful interpretation. Using Equmax Factor Extraction OBTAINING LOADING MATRIX

Factor rotation matrix is a mathematical transformation applied to the factor loadings matrix to achieve a simpler and more interpretable factor structure Rotation Matrix

COMMUNALITIES & SPECIFIC VARIANCE

OBTAINING FACTOR SCORES After the completion of factor analysis, we obtained factor scores

WE EMPLOYED LOGISTIC REGRESSION IN THE CLASSIFICATION PROCESS WE TRANSFORMED SATISFACTION RATINGS INTO BINARY OUTPUTS, REPRESENTED BY 0'S (NEUTRAL OR DISSATISFIED) AND 1'S (SATISFIED) Classification Method STEP 4:

Logistic Regression using Scikit-learn Data Splitting: Split into X (Factor Score) and y Classifier Initialization: Initiate Logistic Regression. Hyperparameter Tuning: Use GridSearchCV. 2 1 3

'Intercept*-0.44 + 0.95*Overall_Inflight_Experience + 1.03*E-flight_Experiece + 0.84*Luggage_Logistics + 0.38*Age -0.55*Off_flight_Experience + 0.45*Check-IN_Experience' ESTIMATOR INSIGHTS: MODEL, COEFFICIENTS Model: Sigmoid Function

0.8058 ACCURACY 0.7979 Train Test R Sq Value Confusion Matrix 1 1

Generated predictions for the initial 20 values using the trained model. Prediction

CONCLUSION EDA successfully uncovered valuable insights, adding a better understanding of the dataset. PCA efficiently reduced the dimensionality of the data, capturing its essential features and enhancing interpretability for more efficient analysis or modeling. Logistic regression proves to be a valuable and interpretable model for binary classification tasks. Deriving inference from the analysis adds a crucial layer of understanding, translating data patterns into meaningful insights for informed decision-making.

THANK YOU
Tags