Data Classification in Data Science for Students

PhanthomThomas 14 views 55 slides Mar 03, 2025

Slide 1 of 55

About This Presentation

Size: 3.85 MB

Language: en

Added: Mar 03, 2025

Slides: 55 pages

Slide Content

Classification

Models

Classification Target variable is categorical. Predictors could be of any data type. Algorithms Decision Trees Rule induction kNN Naive Bayesian Neural Networks Support Vector Machines Ensemble Meta Models

Decision Trees

Decision Trees Predictors / Attributes Target / Class

Decision Tree

Tree Split - Entropy

Measure of impurity Every split ties to make child node more pure. Gini impurity Information Gain (Entropy) Misclassification Error https://www.quora.com/What-are-the-advantages-of-different-Decision-Trees-Algorithms

Rule Induction

Tree to Rules Rule 1: if (Outlook = overcast) then yes Rule 2: if (Outlook = rain) and (Wind = false) then yes Rule 3: if (Outlook = rain) and (Wind = true) then no Rule 4: if (Outlook = sunny and (Humidity > 77.5) then no Rule 5: if (Outlook = sunny and (Humidity ≤ 77.5) then yes

Rules R = { r 1 ∩ r 2 ∩ r 3 ∩ .. r k } Where k is the number of disjuncts in a rule set. Individual disjuncts can be represented as r i = (antecedent or condition) then (consequent)

Approaches

Sequential covering

K Nearest Neighbors

Adage Birds of same feather flock together

Guess the species for A and B

KNN

Measure of Proximity Distance

Measure of Proximity Correlation similarity Simple matching coefficient Jaccard Similarity Cosine similarity

NAÏVE BAYESIAN

Predict your commute time http://www.wired.com/2015/08/pretty-maps-bay-area-hellish-commutes/#slide-2

Bayes’ theorem Class conditional probability Posterior probability Probability of the outcome Probability of the conditions

Data set

Class conditional probability

Test record

Calculation of posterior probability P( Y /X)

Issues Incomplete training set -> Use laplace correction Continuous numeric attributes -> Use Probability density function Attributes independence -> remove correlated attributes

NEURAL NETWORKS

Model Y = 1 + 2X1 + 3X2 + 4X3

Neurons

SUPPORT VECTOR MACHINES

Boundary

Margin

Transforming linearly non-separable data

Optimal hyperplane

Ensemble Learners

Ensemble model Wisdom of the Crowd Meta learners = sum of several base models Reduces the model generalization error

Data Classification in Data Science for Students

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Data Classification in Data Science for Students

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx