Efficient Medical Diagnosis of Human Heart Diseases Using Machine Learning Techniques With and Without GridSearchCV Presented by: V Lalitha Authors : Ghulab Nabi Ahmad, Hira Fatima, Shafiullah, Abdelaziz Salah and Imadadullah IEEE Access
Introduction Cardiovascular diseases are a significant global health concern, accounting for a substantial portion of annual deaths. Early and accurate diagnosis of heart diseases is essential for effective treatment and management. Machine learning has emerged as a valuable tool for medical diagnosis, offering the potential for improved accuracy and efficiency. The research paper focuses on utilizing machine learning techniques to enhance the diagnosis of human heart diseases. The primary aim is to improve the accuracy of heart disease diagnosis using a variety of techniques and hyperparameter optimization.
Model’s Flow Diagram 1. Data Collection: They are Kaggle's Heart Disease Cleveland, Hungary, Switzerland & Long Beach V dataset, Heart Disease UCI Kaggle dataset. 2. Data Pre-Processing: Data CleaningData TransformationFeature Selection D ata Splitting, Handle Class Imbalance. 3. D ata M ining: It g enerally refers to the process of extracting valuable insights, patterns, and knowledge from large datasets, particularly related to human heart diseases. 4. Proposed Model: The proposed model involves the application of machine learning algorithms like Logistic Regression, k-Nearest Neighbors (K-NN), Support Vector Machine (SVM), and XGBoost . Image source: https://ieeexplore.ieee.org/document/9751602/
Dataset Information
Heart Disease Prediction Proposed Model Image source: https://ieeexplore.ieee.org/document/9751602/
Logistic Regression Logistic Regression is primarily used for binary classification problems, where you want to predict one of two possible outcomes. 1. Binary Classification: Logistic Regression is employed when you want to classify data into one of two classes, such as Yes/No, True/False, Spam/Not Spam, or 1/0. 2. Model: It uses a logistic (sigmoid) function to transform a linear combination of input features into a probability score between 0 and 1. 3. Probability: The logistic function maps the output to a probability, where values closer to 1 indicate a high probability of belonging to one class, and values closer to 0 indicate a high probability of belonging to the other class.
K-Nearest Neighbours (KNN) K-Nearest Neighbors (KNN) is a machine learning algorithm for classification and regression. It predicts outcomes by comparing a new data point with its k-nearest neighbors from the training data. The choice of k is very important in KNN. Image source: https://ieeexplore.ieee.org/document/9751602/
Support Vector Machine Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for classification and regression tasks. 1. Objective : SVM aims to find the hyperplane that best separates different classes in the feature space while maximizing the margin between the classes. 2. Margin : The margin is the distance between the hyperplane and the nearest data points from each class. SVM looks for the hyperplane with the largest margin. 3. Kernel Trick : SVM can handle both linear and non-linear data by using kernel functions to map data into higher-dimensional spaces.
Extreme Gradient Boosting Extreme Gradient Boosting, often referred to as XGBoost , is a powerful and popular machine learning algorithm that belongs to the ensemble learning family, specifically gradient boosting. 1. Ensemble Learning : XGBoost is an ensemble learning technique, which means it combines the predictions of multiple weaker models (typically decision trees) to create a strong predictive model. This ensemble approach helps improve predictive accuracy. 2. Gradient Boosting : XGBoost employs a gradient boosting framework, which is a sequential training method. It builds a series of decision trees where each tree corrects the errors made by the previous one. This leads to a more accurate and robust model.
GridSearchCV Grid Search Cross-Validation ( GridSearchCV ) is a hyperparameter tuning technique used in machine learning to systematically search for the best combination of hyperparameter values for a given model. 1. Define the Hyperparameter Grid: First, you need to specify the hyperparameters that you want to tune and the range of values you want to test. For each hyperparameter, create a list of possible values to explore. This defines the hyperparameter grid. 2. Choose a Model: Select the machine learning algorithm you want to use. This could be a classifier or a regressor, depending on your problem.
GridSearchCV 3. Split the Data: Divide your dataset into two parts: A training set for model training and a validation set for hyperparameter tuning. The validation set helps assess the performance of different hyperparameter combinations. 4. Perform Cross-Validation: GridSearchCV uses k-fold cross-validation to evaluate each combination of hyperparameters. It divides the training data into k subsets (folds) and iterates through them, using one as the validation set while training on the other k-1 sets.
Hyperparameters The best hyper parameter of XG BOOST with optimization technique for Kaggle’s heart disease Cleveland, Hungary, Switzerland & long beach V dataset . The best hyperparameter of XG BOOST with optimization technique for heart disease UCI Kaggle dataset. Image source: https://ieeexplore.ieee.org/document/9751602/
With VS Without GridSearchCV Image source: https://ieeexplore.ieee.org/document/9751602/