Data Classification in Data Science for Students

PhanthomThomas 19 views 55 slides Mar 03, 2025
Slide 1
Slide 1 of 55
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55

About This Presentation

Data Classification in Data Science for Students


Slide Content

Classification

Models

Classification Target variable is categorical. Predictors could be of any data type. Algorithms Decision Trees Rule induction kNN Naive Bayesian Neural Networks Support Vector Machines Ensemble Meta Models

Decision Trees

Decision Trees Predictors / Attributes Target / Class

Decision Tree

Tree Split - Entropy

Measure of impurity Every split ties to make child node more pure. Gini impurity Information Gain (Entropy) Misclassification Error https://www.quora.com/What-are-the-advantages-of-different-Decision-Trees-Algorithms

Rule Induction

Tree to Rules Rule 1: if (Outlook = overcast) then yes Rule 2: if (Outlook = rain) and (Wind = false) then yes Rule 3: if (Outlook = rain) and (Wind = true) then no Rule 4: if (Outlook = sunny and (Humidity > 77.5) then no Rule 5: if (Outlook = sunny and (Humidity ≤ 77.5) then yes

Rules R = { r 1 ∩ r 2 ∩ r 3 ∩ .. r k } Where k is the number of disjuncts in a rule set. Individual disjuncts can be represented as r i = (antecedent or condition) then (consequent)

Approaches

Sequential covering

K Nearest Neighbors

Adage Birds of same feather flock together

Guess the species for A and B

KNN

Measure of Proximity Distance

Measure of Proximity Correlation similarity Simple matching coefficient Jaccard Similarity Cosine similarity

NAÏVE BAYESIAN

Predict your commute time http://www.wired.com/2015/08/pretty-maps-bay-area-hellish-commutes/#slide-2

Bayes’ theorem Class conditional probability Posterior probability Probability of the outcome Probability of the conditions

Data set

Class conditional probability

Test record

Calculation of posterior probability P( Y /X)

Issues Incomplete training set -> Use laplace correction Continuous numeric attributes -> Use Probability density function Attributes independence -> remove correlated attributes

NEURAL NETWORKS

Model Y = 1 + 2X1 + 3X2 + 4X3

Neurons

SUPPORT VECTOR MACHINES

Boundary

Margin

Transforming linearly non-separable data

Optimal hyperplane

Ensemble Learners

Ensemble model Wisdom of the Crowd Meta learners = sum of several base models Reduces the model generalization error

Ensemble models
Tags