Graduate admission Prediction: Comparing Regression and Classification models

FaizaNoor21 89 views 23 slides Apr 28, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

As internationalgraduatestudents,ourprimaryconcernisassessingouradmissionprospectstorep-
utable universities.Toaddressthis,we’vedevelopedamodelutilizingtworegressiontechniquesand
twoclassificationtechniquestopredictadmissionlikelihood.Thiswillgivetheideaaboutwhichmodel
is thebestforprediction.Afte...


Slide Content

Graduate Admission Prediction: Comparing Regression and Classification models ‹#› ‹#›

Data Analytics (ECMP5005D) Team 4 Sebastian Duque Salazar Faiza Ullah Muskaan Sultana Shaik ‹#› ‹#›

Introduction As international graduate students, the foremost concern is determining the chances of admission to reputable universities. To address this, a predictive model employing two regression and three classification techniques have been developed. The dataset, sourced from Kaggle, is credited to Mohan S Acharya and draws inspiration from the UCLA dataset. It is useful for the graduate students as a prediction tool to check their chances of getting into the university of their choice. ‹#› ‹#›

Overview We have developed two regression models: Linear regression and KNN. This will be a tool for students. The models that are developed in classification are: Decision Tree Model, Bayesian model, KNN. The models will predict student’s chances of being accepted based on their characteristics. This will help students to save time. There is an increase in graduate students in recent years and they do not know their chances to get admitted to a certain university. ‹#› ‹#›

Dataset ‹#› ‹#› GRE Scores ( out of 340 ) TOEFL Scores ( out of 120 ) University Rating ( out of 5 ) Statement of Purpose Strength ( out of 5 ) Letter of Recommendation Strength ( out of 5 ) Undergraduate GPA ( out of 10 ) Research Experience ( either 0 or 1 ) Target Variable - Chance of Admit ( ranging from 0 to 1 for regression models and 3 categories i.e. high, medium, low for classification models)

Methodology Data extraction Data Exploratory Analysis (EDA) Data cleaning Model creation and predictions Regression and Classification ‹#› ‹#›

Methodology - Data cleaning and data splitting Data cleaning Null observations were removed Remove unnecessary features and rename variables Transform categorical variables into factors Standardization of data Outliers and leverages were removed Data splitting Train Dataset - 80% Test Dataset 20% ‹#› ‹#›

Methodology - Data Exploratory Analysis Helped us to understand the relations and patterns of our data. ‹#› ‹#›

Methodology - Regression Models creation Linear Regression Step backward model was used as feature selection We got an a djusted R-squared of 0.823 Outliers were removed The model accomplished all the assumptions: Linearity, Normality, Heteroscedasticity and Multicollinearity Chance= −1.359 + 0.00216× GRE + 0.00266× TOEFL + 0.00782× Rating + 0.01271× L O R + 0.12003× CGPA + 0.02375× Research Final equation ‹#› ‹#›

Methodology - Regression Models creation KNN 10-fold c ros s validation was used for tuning hyperparameter K . A 10-40 search grid was defined The best K was 21. Which is very close to √500 = 22.3 ‹#› ‹#›

Transformation of target variable Target variable chance was transformed into 3 categories: low, medium and high . ‹#› ‹#› C hance of being admitted low Medium High

Methodology - Classification Models creation KNN 10-fold c ros s validation was used for tuning hyperparameter K . A 10-40 search grid was defined The best K was 15 ‹#› ‹#›

Methodology - Classification Models creation Decision Tree ‹#› ‹#›

Methodology - Classification Models creation Bayesian Model ‹#› ‹#› Naive Bayes function was used Confusion matrix was used to assess the model’s performance Predicted Low Medium High Low 19 6 Medium 5 16 14 High 1 3 35 Accuracy was 0.71

Methodology - Regression Models creation Linear Regression Step backward model was used as feature selection We got an adjusted R-squared of 0.823 Outliers were removed The model accomplished all the assumptions: Linearity, Normality, Heteroscedasticity and Multicollinearity Chance= −1.359 + 0.00216× GRE + 0.00266× TOEFL + 0.00782× Rating + 0.01271× LOR + 0.12003× CGPA + 0.02375× Research Final equation ‹#› ‹#›

What were the results of the regression models? ‹#› ‹#›

Regression Models Linear KNN Root Mean Squared Error (RMSE) ‹#› ‹#›

Classification Models Decision Tree KNN Accuracy ‹#› ‹#› Bayesian

Results: Linear Regression Vs KNN Regression 0.00 0.025 0.050 0.075 0.0925 0.06197 KNN Regression Linear Regression Model RMSE ‹#› ‹#›

Results: Classification models 0.00 0.025 0.050 0.075 0.0727 0.7071 Decision Tree Accuracy ‹#› ‹#› 0.06197 Naive Bayes KNN

Feature Vs Importance of All Predictor Values 0.5 0.0 1.5 1.0 2.5 2.0 3.0 CGPA GRE SOP TOEFL LOR Rating Research Importance Features ‹#› ‹#›

Linear Regression Model outperforms the KNN regression model CGPA has the biggest impact on admission chances Higher rated universities are more likely to accept applicants Applicants with higher tend to apply to high rated universities Research plays an important role for low rating universities CGPA, GRE, TOEFL and research experience are the most important factors. Finally, students with good academic records have more probability to get into graduate programs Conclusions ‹#› ‹#›

Thank You Q&A ‹#› ‹#›
Tags