Classification using Naive Bayes Algorithm

RaffeainKhalil 14 views 21 slides Aug 22, 2024
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

This presentation has a general overview of how naive bayes works for machine learning


Slide Content

Classification using Naive Bayes Classifier

1. Muhammad Zain Ul Abeedin (01-132202-035) Group Members 2. Raffeain Khalil (01-132202-037) 3. Raja Muhammad Miraj Ul Islam (01-132202-038) 4. Syed Furquan Haider Zaidi (01-132202-041)

01 What is Naive Bayes Classifier? Table of contents 02 Our Dataset 03 Data Preprocessing 04 Python Code and Results

What is Naive Bayes Classifier? 01 (ML)

Supervised Learning Classification Algorithm Based on the Bayes Theorem which describes the conditional probability of an event such that P(A|B): Event A occurring, given event B has occurred P(B|A): Event B occurring, given event A has occurred P(A): Event A P(B): Event B P(A|B)= P(B|A) P(A) P(B) Naive Bayes Classifier

Contd. Based on the probability of a hypothesis, given the data and some prior knowledge Assumes that all the features of dataset are independent from each other Fast, easy to implement Works best for text classification

Our Dataset 02 (ML)

Diabetes Prediction The dataset we are using consists of 8 features which can be used to predict the possibility of diabetes in a person Gender Age Hypertension Heart Disease Smoking Habit Body Mass Index (BMI) HbA1c Level

Contd. The dataset was downloaded from Kaggle. The URL for it is mentioned in the references section. 100,000 (One Hundred Thousand) instances Verified to have no missing values Out of 8 features, only “gender” and “smoking history” has categorical data on which preprocessing will have to be done. Using this dataset, we will implement the classifier to predict whether a person has diabetes.

Data Preprocessing 03 (ML)

Data Preprocessing “Data preprocessing refers to the process of cleaning, manipulating, transforming, and deleting data from our datasets before performing any ML based activity on it. ¹ ” In order for our ML algorithm to successfully “read” and perform on our data, we need to convert it into the appropriate form. Datasets can also have missing values which need to be filled or irrelevant features which should be dropped.

Pre-processing on our dataset Our dataset’s features have binary and integer values. Only “Gender” and “Smoking History” have categorical data on which we will transform (a) Gender This feature contains categorical nominal data i.e. Male and Female. The order of this feature is of no consequence. We will perform basic encoding to convert it into binary values. (b) Smoking History This feature consists of categorical ordinal data i.e. the answers in this feature range from “Never” (never a smoker) to “Current” (smokes at present) to “Former” (Used to smoke). We will perform One Hot Encoding to convert this data into useable form.

Gender Data before preprocessing Data after preprocessing Code:

Smoking Habits Code: Data before preprocessing

Data after preprocessing

Python Code and Results 04 (ML)

Python Code

Result Our model has worked properly and has shown an accuracy rate of 90% with the random state at 0. 20% of the dataset was used to test the model on and 80% was used to train the model.

References 05 (ML)

References [1] https://www.geeksforgeeks.org/data-preprocessing-in-data-mining/ [2] https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/ [3] https://www.javatpoint.com/machine-learning-naive-bayes-classifier [4] https://www.saedsayad.com/categorical_variables.htm#:~:text=There%20are%20two%20types%20of,variable%20has%20a%20clear%20ordering . [5] https://www.benthonlabs.com/AI.html

Thank You!
Tags