Diabetespredictionbyusingmachinelearning.pdf

AnnisaSriWardifa1 30 views 49 slides Jun 27, 2024

Slide 1 of 49

About This Presentation

Predict Diabets by using machine learning with several methods such as KKN, Decision Tree, MLP, and Logistic Regression.

Size: 7.26 MB

Language: en

Added: Jun 27, 2024

Slides: 49 pages

Slide Content

21 May 2024FI4002 Simulasi dan
Pemodelan Sistem Fisis
10220003 Bernike Hernita Sofiana
10220027 Annisa Sri Wardifa
10220075 Alyssa Hanifa Dhiyani
Diabetes Prediction
Kharwal, A. (2020, October 23). Predict diabetes with machine learning:
Aman Kharwal. thecleverprogrammer.
https://thecleverprogrammer.com/2020/07/13/predict-diabetes-with-
machine-learning/
Research Based Learning:
Machine Learning

Diabetes
Definition: chronic disease caused when the
pancreas does not produce enough insulin or
when the body can not effectively use the
insulin.
Number of people with diabetes: 108 million in
1980 to 422 million in 2014.
Prevalence rising more rapidly in low- and
middle-income countries.
Manifestations: thick skin, high blood pressure,
weight loss, etc.
Loke, A. (2023, April 5). Diabetes. World Health Organization. https://www.who.int/news-room/fact-sheets/detail/diabetes
1

K-Nearest Neighbors
Definition: supervised machine
learning method employed to tackle
classification and regression
problems
Ability: adapt to different patterns
and make predictions based on the
local structure of the data.
Steps:
Selecting optimal K value1.
Calculating distance2.
Finding nearest neighbors3.
Voting for Classification or Taking
Average for Regression
4.
K-Nearest Neighbor(KNN) algorithm. GeeksforGeeks. (2024, January 25). https://www.geeksforgeeks.org/k-nearest-neighbours/
2

Import Data
3

Data Set Used
768 data points with 9 features each
4

Outcome
Outcome: feature going to be predicted -> 0 = No diabetes, 1 = Diabetes
5

Outcome
Distribution
0: 500 counts
1: 268 counts
6

Confirm the connection
between model
complexity & accuracy
8

KNN Model
Confusion Matrix

Results
Accuracy of KNN on training set: 79%
Accuracy of KNN on test set: 78%
Accuracy of Decision Tree on training set: 100%
Accuracy of Decision Tree on test set: 71.4%
Overfitting -> apply pre-pruning
9

Decision Tree Classifier
Definition : Non-parametric supervised learning method
used for classification and regression
Goal : To create a model that predicts the value of a target
variable by learning simple decision rules
Advantage : Able to handle both numerical and
categorical data
Disadvantage : Predictions of decision trees are neither
smooth nor continous and not good at extrapolation
10
Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. https://scikit-learn.org/stable/modules/tree.html

Pre-Pruning
Set max_depth = 3 to decrease overfitting
Accuracy of Decision Tree on training set: 77.3%
Accuracy of Decision Tree on test set: 74%
11
Accuracy of Decision Tree on training set: 76.4%
Accuracy of Decision Tree on test set: 71.9%

Feature Importance
0: not used at all
1: perfectly predicts the target
How important each feature is for the decision a decision tree classifier makes.
12

Feature Importance
Visualization
(without test size)
13

Feature Importance
Visualization
(with test size)
14

Feature Importance
Visualization Comparation
14
No test size Test size = 0.3

Correlation
Matrix
15

Correlation Matrix
16

Multi-Layer Perceptron
to Predict Diabetes
Accuracy on training set : 0.73
Accuracy on test set : 0.72
17

Re-scale the Data
Accuracy on training set : 0.823
Accuracy on test set : 0.802
17

Increasing the Number
of Parameters
Accuracy on training set : 0.806
Accuracy on test set : 0.797
18

Logistic Regression
19

Accuracy
Score
Logistic Regression Model Accuracy :
0,7359307359307359
20

50
30
120
Confusion
Matrix
21
31

Diabetes
with Equal
Outcome
22

Data Set Used
536 data points with 9 features each
23

Outcome
Distribution
0: 268 counts
1: 268 counts
24
Outcome: feature going
to be predicted ->
0 = No diabetes,
1 = Diabetes

Confirm the connection between model
complexity & accuracy
25
Original Data Equal Outcome

KNN
Decision
Tree
Training Set 87% 100%
Test Set 81% 70,8%
Results
26

Pre-Pruning
Set max_depth = 3 to decrease overfitting
Accuracy of Decision Tree on training set: 82,4%
Accuracy of Decision Tree on test set: 85,7%
27
Accuracy of Decision Tree on training set: 77.3%
Accuracy of Decision Tree on test set: 74%
Original Data Equal Outcome

Feature Importance
0: not used at all
1: perfectly predicts the target
How important each feature is for the decision a decision tree classifier makes.
28
Original Data
Equal Outcome

Feature Importance
Visualization
29
Original Data Equal Outcome

Correlation Matrix
30

Original
Accuracy
Re-scale
Data
Increasing
Number of
Partitions
Training
Set
83% 84% 99,7%
Test Set 71% 72,7% 69,6%
Results of MLP Classifier
31

Confusion
Matrix
Logistic Regression Model Accuracy :
0,7453416149068323
32

KNN
Decision
Tree
MLP
Classifier
Logistic
Regression
Most
Important
Feature
Correlation
Training
Set
Test
Set
Training
Set
Test
Set
Training
Set
TestSet
Original
Data
79% 78% 100% 71,40% 73% 72% 73% Glucose Glucose
Equal
Diabetes
Outcome
87% 81% 82,4% 85,7% 83% 72,7% 74% Insulin Glucose
Summary
Original Data vs Equal Diabetes Outcome
33

Diabetes without
Glucose Feature
34

Confirm the connection between model
complexity & accuracy
35
Original Data Data without glucose

KNN
Decision
Tree
Training Set 74% 79%
Test Set 66% 72,7%
Results
36
Data without glucose
KNN
Decision
Tree
Training
Set
79% 100%
Test Set 78% 71,4%
Original Data

Pre-Pruning
Set max_depth = 3 to decrease overfitting
Accuracy of KNN on training set: 79%
Accuracy of KNN on test set: 72,7%
37
Accuracy of KNN on training set: 77.3%
Accuracy of KNN on test set: 74%
Original Data
Data without glucose

Feature Importance
How important each feature is for the decision a decision tree classifier makes.
38
Original Data
Data without glucose

Feature Importance
Visualization
39
Original Data Data without glucose

Correlation Matrix
41
Data without glucose

Confusion Matrix
Logistic Regression Model Accuracy :
0,7359307359307359
Original Data
Logistic Regression Model Accuracy :
0.670995670995671
Data without glucose
42

KNN
Decision
Tree
MLP
Classifier Logistic
Regressi
on
Feature
Importance
Correlation
Training
Set
TestSet
Trainin
gSet
Test
Set
Training
Set
TestSet
Original
Data
79% 78% 100% 71,40% 73% 72% 73% Glucose Glucose
Data
without
glucose
74% 76,6% 100% 71,9% 72% 69,3% 73% Age BMI
Summary
Original Data vs Data without Glucose
43

Diabetespredictionbyusingmachinelearning.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Diabetespredictionbyusingmachinelearning.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

TLE-9-Prepare-Salad-and-Dressing.pptxkkk

LESSON 1 ABOUT MEDIA AND INFORMATION.pptx

GRADE-8-AQUACULTURE-WEEKQ1.pdfdfawgwyrsewru

Feelings PP Game FOR CHILDREN IN ELEMENTARY SCHOOL.pptx

Jeopardy_Figures_of_Speech_Template.pptx [Autosaved].pptx

Jeopardy_Figures_of_Speech.pptxvdsvdsvsdvsd