Heart Disease Prediction: A Data Science Approach

jatniwalafizza786 79 views 26 slides Jul 31, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Join Priyanka as she explores the use of data science in predicting heart disease. This presentation covers the methodologies, algorithms, and data analysis techniques employed to forecast heart disease risks. Gain insights into data preprocessing, feature selection, model building, and evaluation. ...


Slide Content

Heart Disease Prediction

Abstract This project focuses on predicting heart disease using various machine learning models. The goal is to evaluate the performance of different algorithms and identify the most accurate model for predicting the presence of heart disease. The dataset used for this study is publicly available and contains various medical features that are indicative of heart disease.

Table of Contents Introduction Objective Methodology Data Preprocessing Exploratory Data Analysis Feature Selection Model Training and Evaluation Results and Discussion Conclusion

Introduction Heart disease is a leading cause of death worldwide. Early detection and intervention can significantly improve patient outcomes. This project aims to leverage machine learning techniques to predict the presence of heart disease based on medical data. The objectives of this project are to compare the performance of various machine learning models and identify the best-performing model.

Objective The field of heart disease prediction using machine learning has seen significant advancements with the application of various algorithms, datasets, and techniques. The common approaches, including supervised learning, feature selection, and data preprocessing, play crucial roles in enhancing model performance. Ensemble methods and deep learning models have shown great promise in achieving high accuracy. Future research should focus on improving data quality, incorporating more features, and making models interpretable for practical clinical use.

Methodology The methodology section describes the overall approach taken to achieve the project objectives. This includes data collection, preprocessing, model selection, training, evaluation, and comparison.

Data Preprocessing The dataset used in this project is sourced from a publicly available heart disease dataset. Data preprocessing steps include handling missing values, removing duplicate entries, and scaling features.

Exploratory Data Analysis

Box Plot of the Numeric Columns

This Are The Boxplot After Removing The Duplicates

Feature Selection Feature selection involves choosing relevant features for model training. In this project, all features except the target variable are used for training.

Model Training and Evaluation Multiple machine learning models are trained and evaluated on the dataset. The models used include Logistic Regression, Decision Tree, Random Forest, SVM, KNN, Gradient Boosting, AdaBoost, Naive Bayes, and MLP Neural Network.

Training of model without duplicates

Training of model with duplicates

Results and Discussion Model Performance without duplicates : Without duplicates, the Decision tree model performed better with testing accuracy of 90.32%, achieved precision (0.91), recall (0.90), and F1-score (0.91), showing reasonable performance but some variability. The Ada boost model had testing accuracy of 83.87% with precision (0.94), recall (0.93), and F1-score (0.93), indicating stronger generalization. Model Performance with duplicates : With duplicates, the Random Forest model achieved near-perfect scores: precision (0.99), recall (0.99), and F1-score (0.99), testing accuracy of 98.05% indicating overfitting. The Decision Tree model also showed inflated metrics: precision (0.95), recall (0.94), and F1-score (0.94) and testing accuracy of 97.08%.

Future Improvements: Future work could involve hyperparameter tuning for the best-performing models to further improve accuracy. Developing a real-time prediction system for practical deployment.

Conclusion In this heart disease prediction project, models like Random Forest and Decision tree performed exceptionally well with duplicates, achieving high precision, recall, and F1-scores. Given the sensitivity and critical nature of healthcare data, maintaining duplicates may be necessary to preserve valuable information. The inflated performance metrics indicate that these models can accurately predict heart disease when all data points are considered. Thus, despite the risk of overfitting, keeping duplicates ensures that the models leverage all available data, leading to better predictive accuracy in this healthcare context. This approach highlights the balance between data preprocessing and maintaining data integrity in sensitive applications.

Questions ?

Thank You!