What is Classification?
2
CAR
CAR
BIKE
BIKE
Samples
+
Labels
Training Dataset
??????(,)= CAR/BIKE
Classification
3
CAR
CAR
BIKE
BIKE
Samples
+
Labels
Training Dataset
??????(,)= CAR/BIKE
Given a dataset D = { x
1,x
2x
3… x
n} and set of class labels C = { c
1c
2c
3… c
k },
the task of classification to devise a mapping function f : D -> C.
Bayesian Classifier
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
6
{2 ?H}
Pr(CAR| 4,H) = 100%
Pr(CAR| 2,H) = 100%
Pr(BIKE| 4,L) = 100%
Pr(BIKE| 2,L) = 100%
Pr(CAR| 4,L) = 0%
Pr(BIKE|4,H) = 0%
Pr(CAR| 2,L) = 0%
Pr(BIKE| 2,H) = 0%
??????????????????
��,∀??????
�??????�
??????????????????????????????=argmax
??????
??????
Pr(??????
�|�)
Bayesian Classifier
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
7
{2 ?H}
Pr(CAR| 4,H) = 100%
Pr(CAR| 2,H) = 100%
Pr(BIKE| 4,L) = 100%
Pr(BIKE| 2,L) = 100%
Pr(CAR| 4,L) = 0%
Pr(BIKE|4,H) = 0%
Pr(CAR| 2,L) = 0%
Pr(BIKE| 2,H) = 0%
??????????????????
��,∀??????
�??????�
????????????��??????)
????????????��??????{2,�})=1
??????????????????????????????=argmax
??????
??????
Pr(??????
�|�)
????????????��????????????{2,�})=0
??????????????????????????????= CAR
Bayes Rule
11
Pr(??????
�|�)=
Pr(??????
�,�)
Pr(�)
=
Pr�??????
�Pr(??????
�)
Pr(�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Bayes Rule
12
Pr(??????
�|�)=
Pr(??????
�,�)
Pr(�)
=
Pr�??????
�Pr(??????
�)
Pr(�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Bayes Rule
13
Pr(??????
�|�)=
Pr(??????
�,�)
Pr(�)
=
Pr�??????
�Pr(??????
�)
Pr(�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Bayes Rule
14
Pr(??????
�|�)=
Pr(??????
�,�)
Pr(�)
=
Pr�??????
�Pr(??????
�)
Pr(�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Bayes Rule
15
Pr(??????
�|�)=
Pr(??????
�,�)
Pr(�)
=
Pr�??????
�Pr(??????
�)
Pr(�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Bayes Rule
16
Pr(??????
�|�)=
Pr(??????
�,�)
Pr(�)
=
Pr�??????
�Pr(??????
�)
Pr(�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Bayes Rule
17
Pr(??????
�|�)=
Pr(??????
�,�)
Pr(�)
=
Pr�??????
�Pr(??????
�)
Pr(�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Bayes Rule
18
Pr(??????
�|�)=
Pr(??????
�,�)
Pr(�)
=
Pr�??????
�Pr(??????
�)
Pr(�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
=
Pr�??????
�Pr(??????
�)
Pr�??????
1Pr??????
1+Pr�??????
2Pr??????
2+…+Pr�??????
�Pr(??????
�)
Bayes Rule
19
Pr(??????
�|�)=
Pr(??????
�,�)
Pr(�)
=
Pr�??????
�Pr(??????
�)
Pr(�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Bayes Rule
20
Pr(??????
�|�)=
Pr(??????
�,�)
Pr(�)
=
Pr�??????
�Pr(??????
�)
Pr(�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Bayesian Classifier
21
Pr(??????
�|�)=Pr??????
�{�
1,�
2�
3…�
�})=
Pr�
1,�
2�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2�
3…�
�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Pr��??????)=Pr��??????{4,�})=
Pr4,���??????Pr(��??????)
Pr(4,�)
=
0.75×0.5
0.375
=1
Bayesian Classifier
22
Pr(??????
�|�)=Pr??????
�{�
1,�
2�
3…�
�})=
Pr�
1,�
2�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2�
3…�
�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Pr��??????)=Pr��??????{4,�})=
Pr4,���??????Pr(��??????)
Pr(4,�)
=
0.75×0.5
0.375
=1
Pr��????????????)=Pr��????????????{4,�})=
Pr4,���????????????Pr(��????????????)
Pr(4,�)
=
0×0.5
0.375
=0
Bayesian Classifier
23
Pr(??????
�|�)=Pr??????
�{�
1,�
2�
3…�
�})=
Pr�
1,�
2�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2�
3…�
�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Pr��??????)=Pr��??????{4,�})=
Pr4,���??????Pr(��??????)
Pr(4,�)
=
0.75×0.5
0.375
=1
Pr��????????????)=Pr��????????????{4,�})=
Pr4,���????????????Pr(��????????????)
Pr(4,�)
=
0×0.5
0.375
=0
Bayesian Classifier
24
Pr(??????
�|�)=Pr??????
�{�
1,�
2�
3…�
�})=
Pr�
1,�
2�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2�
3…�
�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Pr��??????)
=Pr��??????{4,�})
=
Pr4,���??????Pr(��??????)
Pr(4,�)
Pr��????????????)
=Pr��????????????{4,�})
=
Pr4,���????????????Pr(��????????????)
Pr(4,�)
Bayesian Classifier
25
Pr(??????
�|�)=Pr??????
�{�
1,�
2�
3…�
�})=
Pr�
1,�
2�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2�
3…�
�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Pr��??????)
=Pr��??????{4,�})
=
Pr4,���??????Pr(��??????)
Pr(4,�)
Pr��????????????)
=Pr��????????????{4,�})
=
Pr4,���????????????Pr(��????????????)
Pr(4,�)
Bayesian Classifier
26
Pr(??????
�|�)=Pr??????
�{�
1,�
2�
3…�
�})=
Pr�
1,�
2�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2�
3…�
�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Pr��??????)
=Pr��??????{4,�})
~Pr4,���??????Pr(��??????)
Pr��????????????)
=Pr��????????????{4,�})
~Pr4,���????????????Pr(��????????????)
Bayesian Classifier
27
Pr(??????
�|�)=Pr??????
�{�
1,�
2�
3…�
�})=
Pr�
1,�
2�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2�
3…�
�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Pr��??????)
=Pr��??????{4,�})
~Pr4,���??????Pr(��??????)
Pr��????????????)
=Pr��????????????{4,�})
~Pr4,���????????????Pr(��????????????)
Bayesian Classifier
28
Pr(??????
�|�)=Pr??????
�{�
1,�
2,�
3…�
�})=
Pr�
1,�
2,�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2,�
3…�
�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
If k(the number of classes) is small,
Bayesian Classifier
29
Pr(??????
�|�)=Pr??????
�{�
1,�
2,�
3…�
�})=
Pr�
1,�
2,�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2,�
3…�
�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
If k(the number of classes) is small,
estimating likelihoodPr�
1,�
2,�
3…�
�??????
�is feasible.
Bayesian Classifier
30
Pr(??????
�|�)=Pr??????
�{�
1,�
2,�
3…�
�})=
Pr�
1,�
2,�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2,�
3…�
�)
However, if k(the number of classes) is very large,
estimating likelihoodPr�
1,�
2,�
3…�
�??????
�isa very expensive task over
a large dataset.
Bayesian Classifier
31
Pr(??????
�|�)=Pr??????
�{�
1,�
2,�
3…�
�})=
Pr�
1,�
2,�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2,�
3…�
�)
However, if k(the number of classes) is very large,
estimating likelihoodPr�
1,�
2,�
3…�
�??????
�is a very expensive task over
a large dataset.
Pr�
1,�
2,�
3…�
�??????
�
=Pr�
1�
2,�
3,…,�
3,??????
�.????????????�
2�
3,�
4,…,�
3,??????
�….????????????�
�??????
�
Bayesian Classifier
32
Pr(??????
�|�)=Pr??????
�{�
1,�
2,�
3…�
�})=
Pr�
1,�
2,�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2,�
3…�
�)
However, if k(the number of classes) is very large,
estimating likelihoodPr�
1,�
2,�
3…�
�??????
�is a very expensive task over
a large dataset.
Pr�
1,�
2,�
3…�
�??????
�
=Pr�
1�
2,�
3,…,�
3,??????
�.????????????�
2�
3,�
4,…,�
3,??????
�….????????????�
�??????
�
Naïve Bayes Classifier
33
Pr(??????
�|�)=Pr??????
�{�
1,�
2,�
3…�
�})=
Pr�
1,�
2,�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2,�
3…�
�)
To simplify the estimation, we make an assumption
•The features are conditionally independent.
Naïve Bayes Classifier
34
Pr(??????
�|�)=Pr??????
�{�
1,�
2,�
3…�
�})=
Pr�
1,�
2,�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2,�
3…�
�)
To simplify the estimation, we make an assumption
•The features are conditionally independent.
Pr�
1,�
2,�
3…�
�??????
�
=Pr�
1�
2,�
3,…,�
3,??????
�.????????????�
2�
3,�
4,…,�
3,??????
�….????????????�
�??????
�
Naïve Bayes Classifier
35
Pr(??????
�|�)=Pr??????
�{�
1,�
2,�
3…�
�})=
Pr�
1,�
2,�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2,�
3…�
�)
To simplify the estimation, we make an assumption
•The features are conditionally independent.
Pr�
1,�
2,�
3…�
�??????
�
=Pr�
1�
2,�
3,…,�
3,??????
�.????????????�
2�
3,�
4,…,�
3,??????
�….????????????�
�??????
�
Naïve Bayes Classifier
38
Pr(??????
�|�)=Pr??????
�{�
1,�
2,�
3…�
�})=
Pr�
1,�
2,�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2,�
3…�
�)
~ෑ
�=1
�
Pr(�
�|??????
�)Pr(??????
�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Pr��??????{4,�})=Pr(4|��??????)×Pr(�|��??????)×Pr(��??????)
=0.75×1×0.5 = 0.375
Pr��????????????{4,�})=Pr(4|��????????????)×Pr(�|��????????????)×Pr(��????????????)
=0.25×0×0.5 = 0
Naïve Bayes Classifier
39
Pr(??????
�|�)=Pr??????
�{�
1,�
2,�
3…�
�})=
Pr�
1,�
2,�
3…�
�??????
�Pr(??????
�)
Pr(�
1,�
2,�
3…�
�)
~ෑ
�=1
�
Pr(�
�|??????
�)Pr(??????
�)
#Wheel
4
4
4
2
2
2
Class Label
CAR
CAR
CAR
BIKE
BIKE
BIKE
4
2
BIKE
CAR
Height
H
H
H
L
L
L
L
H
Pr��??????{4,�})=Pr(4|��??????)×Pr(�|��??????)×Pr(��??????)
=0.75×1×0.5 = 0.375
Pr��????????????{4,�})=Pr(4|��????????????)×Pr(�|��????????????)×Pr(��????????????)
=0.25×0×0.5 = 0
Pr�
1,�
2,�
3…�
�??????
�~Pr�
1??????
�.????????????�
2??????
�….????????????�
�??????
�=ෑ
�=1
�
Pr(�
�|??????
�)
What is one of the estimate in the likelihoodis zero?
Pr��??????{4,??????})=Pr(4|��??????)×Pr(??????|��??????)×Pr(��??????)
=0.75×0×0.5 = 0
Pr�
1,�
2,�
3…�
�??????
�~Pr�
1??????
�.????????????�
2??????
�….????????????�
�??????
�=ෑ
�=1
�
Pr(�
�|??????
�)
What is one of the estimate in the likelihoodis zero?
Pr��??????{4,??????})=Pr(4|��??????)×Pr(??????|��??????)×Pr(��??????)
=0.75×0×0.5 = 0
In some of the machine learning tools, you may find
•Naïve Bayes with Gaussian
•Naïve Bayes with Multinomial
In some of the machine learning tools, you may find
•Naïve Bayes with Gaussian
In some of the machine learning tools, you may find
•Naïve Bayes with Multinomial