Predicting Loan Approval: A Data Science Project

jadavvineet73 716 views 24 slides May 02, 2024
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

This project aims to predict whether a loan application will be approved or denied based on various factors such as applicant's income, credit score, loan amount, etc. Using a dataset containing historical loan application data, we employed machine learning algorithms to build a predictive model...


Slide Content

Presented by: Nicia Dias
PROJECT
LOAN
PREDICTION

ABOUT THE PROJECT
Financial loan services are leveraged by companies across many
industries, from big banks to financial institutions to government
loans. One of the primary objectives of companies with financial
loan services is to decrease payment defaults and ensure that
individuals are paying back their loans as expected. In order to do
this efficiently and systematically, many companies employ
machine learning to predict which individuals are at the highest
risk of defaulting on their loans, so that proper interventions can
be effectively deployed to the right audience.

The dataset contains 255,347 rows and 18 columns in total.
It is a binary classification problem to determine if a borrower is
default or non-default.

Statistics of the Numerical Columns
Statistics of the Categorical Columns
No
duplicate
or null
values
were to
be found.
The ‘DEFAULT’ column is the Dependent column
indicating if a person is a defaulter or not.

Analysis of the Default Columns
In target variable,the classes are imbalanced in which
11.62% customers defaulted their loan and 88.38%
customers are not defaulted their loan.

Analysis of the Numerical value Column

Age Column:
The younger individuals (in the 20-29 and 30-39 age categories) have a higher
frequency of defaults compared to older individuals. The frequency of defaults
decreases as age increases, with the lowest default rates observed among those
aged 60-69. This shows that age could be a significant factor in predicting loan
default risk, with younger age groups potentially representing a higher risk profile
for lenders.

Income Column:
Here we can see that the higher the income the more
the customers are not in default. Also the Default
columns start decreasing a bit as the income gets
higher.
Loan Term column:
The balance between the
defaulted and not defaulted
remains the same for the loan
taken for different terms from
1 year to 5 years.

Credit Score Column:
The majority of the entries fall into the credit
score range of 300-579. The default rate is
highest in the lowest credit score range (300-
579) and decreases as the credit score range
increases(740-799 and 800-850).

Months Employed Column:
No. of Credit Lines column:
We can see that as the
customers have worked for
more number of years, the
Default rates start to
decrease gradually and the
Not Default rates start to
increase.
As the number of credit lines increases, the
default rate tends to rise as well. Borrowers
with fewer credit lines (1 or 2) have relatively
lower default rates compared to those with
higher numbers of credit lines (3 or 4).

Loan Amount Column:
As the loan amount increases, the default rate tends to rise as well. The default rate for loans
less than 30k is relatively low and gradually increases as the loan amount categories increase.
Notably, the default rate spikes in the higher loan amount categories, particularly from 180k
onwards.

Debt to Income Column:
Borrowers with a higher DTI percentage (>43%) exhibit a significantly higher default rate
compared to those with lower DTI percentages (<36% and 37-43%). The default rate increases
as the DTI percentage increases, indicating a strong association between high DTI levels and
default propensity. But also as the DTI% is more than 43% there are more Non Default
borrowers.

Analysis of the Categorical value Columns
The target
variable classes
are almost
equally
distributed
among all
categories of
feature
variables.It is a
good sign which
indicates each
and every feature
in the dataset are
related with the
target.

Education Column:
Across all education categories, the majority of individuals have credit scores below 670. The
distribution of credit scores is relatively similar across different education levels.

Loan Purpose Column:
The statistics of taking the loan is almost
similar across different purposes
Divorced individuals have a higher
proportion of defaults compared to their
non-defaults. Marital status appears to
have some correlation with loan default,
with Divorced individuals being at a
relatively higher risk of default compared
to others.
Marital Status Column:

Employment Type Column:
Full-time and Part-time employment have the
highest numbers of both defaulting and not
defaulting individuals. Self-employed individuals
have the lowest default count, indicating a
slightly better financial stability. Unemployed
individuals have the highest default count,
which is expected given the lack of regular
income.
Mortage status Column:
The default rate is slightly higher
among individuals with mortgages
compared to those without.

Dependents Status Column:
The group with no dependents has a slightly
higher count of defaulting than the group with
dependent. Having dependents seems to
slightly decrease the likelihood of defaulting,
as the group with dependents has a lower
proportion of defaulters compared to the
group without dependents.
The presence of a co-signer appears to
have a positive impact on loan repayment,
as fewer defaults are observed among
individuals with co-signers compared to
those without co-signers.
Cosigner Status Column:

Correlation Table
The strongest positive correlation with
default status is observed with the 'Age'
of the borrower, indicating that younger
individuals are more likely to default.
'Income' and 'MonthsEmployed' also
show positive correlations with default,
suggesting that lower income and
shorter employment years are
associated with higher default rates.
Factors such as 'HasCoSigner',
'HasDependents', and 'CreditScore' show
weaker positive correlations with
default.
'InterestRate' shows the strongest
negative correlation with default,
implying that higher interest rates are
associated with lower default rates.
Other features such as 'LoanAmount'
and 'EmploymentType' also show
negative correlations with default,
although these correlations are relatively
weaker compared to age and income.

MACHINE LEARNING
Decision Tree Model
01
Logistic Regression
Model
02
XG BOOST Model
04
Naive Bayes Model
05
Random Forest
Model
03
Models Used:

Decision Tree
Model
Logistic Regression
Model

Random Forest
Model
Naive Bayes
Model

XG BOOST
Model
Conclusion:
Here Logistic Regression, Random Forest,
XG Boost and Naive Bayes Models have
alomost similiar accuracy rate but there
is only points differce which make
Random Forest the best Model.
We can futher use it for Deployment.
Upon trying the SVM model as well, that
is the only model which took above 10
minutes to load its accuracy rate, hence
have dropped that model.

OF THE RANDOM FOREST MODEL
INPUT
OUTPUT

THANK
YOU