This presentation has a general overview of how naive bayes works for machine learning
Size: 3.8 MB
Language: en
Added: Aug 22, 2024
Slides: 21 pages
Slide Content
Classification using Naive Bayes Classifier
1. Muhammad Zain Ul Abeedin (01-132202-035) Group Members 2. Raffeain Khalil (01-132202-037) 3. Raja Muhammad Miraj Ul Islam (01-132202-038) 4. Syed Furquan Haider Zaidi (01-132202-041)
01 What is Naive Bayes Classifier? Table of contents 02 Our Dataset 03 Data Preprocessing 04 Python Code and Results
What is Naive Bayes Classifier? 01 (ML)
Supervised Learning Classification Algorithm Based on the Bayes Theorem which describes the conditional probability of an event such that P(A|B): Event A occurring, given event B has occurred P(B|A): Event B occurring, given event A has occurred P(A): Event A P(B): Event B P(A|B)= P(B|A) P(A) P(B) Naive Bayes Classifier
Contd. Based on the probability of a hypothesis, given the data and some prior knowledge Assumes that all the features of dataset are independent from each other Fast, easy to implement Works best for text classification
Our Dataset 02 (ML)
Diabetes Prediction The dataset we are using consists of 8 features which can be used to predict the possibility of diabetes in a person Gender Age Hypertension Heart Disease Smoking Habit Body Mass Index (BMI) HbA1c Level
Contd. The dataset was downloaded from Kaggle. The URL for it is mentioned in the references section. 100,000 (One Hundred Thousand) instances Verified to have no missing values Out of 8 features, only “gender” and “smoking history” has categorical data on which preprocessing will have to be done. Using this dataset, we will implement the classifier to predict whether a person has diabetes.
Data Preprocessing 03 (ML)
Data Preprocessing “Data preprocessing refers to the process of cleaning, manipulating, transforming, and deleting data from our datasets before performing any ML based activity on it. ¹ ” In order for our ML algorithm to successfully “read” and perform on our data, we need to convert it into the appropriate form. Datasets can also have missing values which need to be filled or irrelevant features which should be dropped.
Pre-processing on our dataset Our dataset’s features have binary and integer values. Only “Gender” and “Smoking History” have categorical data on which we will transform (a) Gender This feature contains categorical nominal data i.e. Male and Female. The order of this feature is of no consequence. We will perform basic encoding to convert it into binary values. (b) Smoking History This feature consists of categorical ordinal data i.e. the answers in this feature range from “Never” (never a smoker) to “Current” (smokes at present) to “Former” (Used to smoke). We will perform One Hot Encoding to convert this data into useable form.
Gender Data before preprocessing Data after preprocessing Code:
Smoking Habits Code: Data before preprocessing
Data after preprocessing
Python Code and Results 04 (ML)
Python Code
Result Our model has worked properly and has shown an accuracy rate of 90% with the random state at 0. 20% of the dataset was used to test the model on and 80% was used to train the model.