Graduate admission Prediction: Comparing Regression and Classification models
FaizaNoor21
89 views
23 slides
Apr 28, 2024
Slide 1 of 23
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
About This Presentation
As internationalgraduatestudents,ourprimaryconcernisassessingouradmissionprospectstorep-
utable universities.Toaddressthis,we’vedevelopedamodelutilizingtworegressiontechniquesand
twoclassificationtechniquestopredictadmissionlikelihood.Thiswillgivetheideaaboutwhichmodel
is thebestforprediction.Afte...
As internationalgraduatestudents,ourprimaryconcernisassessingouradmissionprospectstorep-
utable universities.Toaddressthis,we’vedevelopedamodelutilizingtworegressiontechniquesand
twoclassificationtechniquestopredictadmissionlikelihood.Thiswillgivetheideaaboutwhichmodel
is thebestforprediction.Afterevaluatingvariousmodels,we’vedeterminedthemosteffectiveone.The
universitiesarecategorizedbasedontheirrankingstoaidintheshortlistingprocess.Utilizingadataset
obtained fromKaggle,creditedtoMohanSAcharyaandinspiredbytheUCLAdataset,thismodel
assists prospectivestudentsinevaluatingtheiradmissionchances,ultimatelysavingtimeandresources
otherwise spentonapplications.
The datasetthatwasselectedhas500observationsandthevariablesincludeSerialnumber,GRE
Score, TOEFLScore,UniversityRankingwhichrangesfrom1-5where1isthelowestrankand5is
the highest,SOPstrength,LORstrength,CGPA(outof10),ResearchExperiencewhichisabinary
variablewhichtakes0or1where0indicatesnoexperienceand1indicatesthatthereisanexperience.
The targetvariableistheChanceofAdmitwhichisaprobabilityrangingfrom0to1incaseof
regression modelsandlow,mediumandhighfortheclassificationmodels.
Size: 7.29 MB
Language: en
Added: Apr 28, 2024
Slides: 23 pages
Slide Content
Graduate Admission Prediction: Comparing Regression and Classification models ‹#› ‹#›
Data Analytics (ECMP5005D) Team 4 Sebastian Duque Salazar Faiza Ullah Muskaan Sultana Shaik ‹#› ‹#›
Introduction As international graduate students, the foremost concern is determining the chances of admission to reputable universities. To address this, a predictive model employing two regression and three classification techniques have been developed. The dataset, sourced from Kaggle, is credited to Mohan S Acharya and draws inspiration from the UCLA dataset. It is useful for the graduate students as a prediction tool to check their chances of getting into the university of their choice. ‹#› ‹#›
Overview We have developed two regression models: Linear regression and KNN. This will be a tool for students. The models that are developed in classification are: Decision Tree Model, Bayesian model, KNN. The models will predict student’s chances of being accepted based on their characteristics. This will help students to save time. There is an increase in graduate students in recent years and they do not know their chances to get admitted to a certain university. ‹#› ‹#›
Dataset ‹#› ‹#› GRE Scores ( out of 340 ) TOEFL Scores ( out of 120 ) University Rating ( out of 5 ) Statement of Purpose Strength ( out of 5 ) Letter of Recommendation Strength ( out of 5 ) Undergraduate GPA ( out of 10 ) Research Experience ( either 0 or 1 ) Target Variable - Chance of Admit ( ranging from 0 to 1 for regression models and 3 categories i.e. high, medium, low for classification models)
Methodology Data extraction Data Exploratory Analysis (EDA) Data cleaning Model creation and predictions Regression and Classification ‹#› ‹#›
Methodology - Data cleaning and data splitting Data cleaning Null observations were removed Remove unnecessary features and rename variables Transform categorical variables into factors Standardization of data Outliers and leverages were removed Data splitting Train Dataset - 80% Test Dataset 20% ‹#› ‹#›
Methodology - Data Exploratory Analysis Helped us to understand the relations and patterns of our data. ‹#› ‹#›
Methodology - Regression Models creation Linear Regression Step backward model was used as feature selection We got an a djusted R-squared of 0.823 Outliers were removed The model accomplished all the assumptions: Linearity, Normality, Heteroscedasticity and Multicollinearity Chance= −1.359 + 0.00216× GRE + 0.00266× TOEFL + 0.00782× Rating + 0.01271× L O R + 0.12003× CGPA + 0.02375× Research Final equation ‹#› ‹#›
Methodology - Regression Models creation KNN 10-fold c ros s validation was used for tuning hyperparameter K . A 10-40 search grid was defined The best K was 21. Which is very close to √500 = 22.3 ‹#› ‹#›
Transformation of target variable Target variable chance was transformed into 3 categories: low, medium and high . ‹#› ‹#› C hance of being admitted low Medium High
Methodology - Classification Models creation KNN 10-fold c ros s validation was used for tuning hyperparameter K . A 10-40 search grid was defined The best K was 15 ‹#› ‹#›
Methodology - Classification Models creation Decision Tree ‹#› ‹#›
Methodology - Classification Models creation Bayesian Model ‹#› ‹#› Naive Bayes function was used Confusion matrix was used to assess the model’s performance Predicted Low Medium High Low 19 6 Medium 5 16 14 High 1 3 35 Accuracy was 0.71
Methodology - Regression Models creation Linear Regression Step backward model was used as feature selection We got an adjusted R-squared of 0.823 Outliers were removed The model accomplished all the assumptions: Linearity, Normality, Heteroscedasticity and Multicollinearity Chance= −1.359 + 0.00216× GRE + 0.00266× TOEFL + 0.00782× Rating + 0.01271× LOR + 0.12003× CGPA + 0.02375× Research Final equation ‹#› ‹#›
What were the results of the regression models? ‹#› ‹#›
Regression Models Linear KNN Root Mean Squared Error (RMSE) ‹#› ‹#›
Classification Models Decision Tree KNN Accuracy ‹#› ‹#› Bayesian
Results: Linear Regression Vs KNN Regression 0.00 0.025 0.050 0.075 0.0925 0.06197 KNN Regression Linear Regression Model RMSE ‹#› ‹#›
Feature Vs Importance of All Predictor Values 0.5 0.0 1.5 1.0 2.5 2.0 3.0 CGPA GRE SOP TOEFL LOR Rating Research Importance Features ‹#› ‹#›
Linear Regression Model outperforms the KNN regression model CGPA has the biggest impact on admission chances Higher rated universities are more likely to accept applicants Applicants with higher tend to apply to high rated universities Research plays an important role for low rating universities CGPA, GRE, TOEFL and research experience are the most important factors. Finally, students with good academic records have more probability to get into graduate programs Conclusions ‹#› ‹#›