2.3 bayesian classification

24,794 views 16 slides May 07, 2015

Slide 1 of 16

About This Presentation

Data mining-bayesian classification

Size: 396.38 KB

Language: en

Added: May 07, 2015

Slides: 16 pages

Slide Content

1
Bayesian Classification

2
Bayesian Classification
A statistical classifier
Probabilistic prediction
Predict class membership probabilities
Based on Bayes’ Theorem
Naive Bayesian classifier
comparable performance with decision tree and selected neural
network classifiers
Accuracy and Speed is good when applied to large databases
Incremental

3
Bayesian Classification
Naïve Bayesian Classifier
Class Conditional Independence
Effect of an attribute value on a given class is
independent of the values of other attributes
Simplifies Computations
Bayesian Belief Networks
Graphical models
Represent dependencies among subsets of
attributes

4
Bayesian Theorem: Basics
Let X be a data sample class label is unknown
Let H be a hypothesis that X belongs to class C
Classification is to determine P(H|X), the probability that the
hypothesis holds given the observed data sample X
Posterior Probability
P(H) (prior probability), the initial probability
P(X): probability that sample data is observed
P(X|H) (posteriori probability), the probability of observing the
sample X, given that the hypothesis holds
X – Round and Red Fruit H - Apple

5
Bayesian Theorem
Given training data X, posteriori probability of a hypothesis H,
P(H|X), follows the Bayes theorem
Predicts X belongs to C
i iff the probability P(C
i|X) is the highest
among all the P(C
k|X) for all the k classes
Practical difficulty: require initial knowledge of many probabilities,
significant computational cost
)(
)()|(
)|(
X
X
X
P
HPHP
HP =

6
Naïve Bayesian Classifier
Let D be a training set of tuples and their associated class
labels, and each tuple is represented by an n-D attribute
vector X = (x
1
, x
2
, …, x
n
)
Suppose there are m classes C
1
, C
2
, …, C
m
.
Classification is to derive the maximum posteriori, i.e., the
maximal P(C
i
|X)
This can be derived from Bayes’ theorem
)(
)()|(
)|(
X
X
X
P
i
CP
i
CP
i
CP =

7
Naïve Bayesian Classifier
Since P(X) is constant for all classes, only

needs to be maximized
Can assume that all classes are equally likely and maximize P(X|
C
i
)
A simplified assumption: attributes are conditionally independent
(i.e., no dependence relation between attributes):
)()|()|(
i
CP
i
CP
i
CP XX=
)|(...)|()|(
1
)|()|(
21
CixPCixPCixP
n
k
CixPCi
P
nk
´´´=Õ
=
=X

8
Derivation of Naïve Bayes Classifier
This greatly reduces the computation cost: Only counts the
class distribution
If A
k
is categorical, P(x
k
|C
i
) = s
ik
/s
i
where s
ik
is the # of tuples in C
i

having value x
k
for A
k
and s
i
is the number of training samples
belonging to C
i
If A
k
is continuous-valued, P(x
k
|C
i
) is usually computed based
on Gaussian distribution with a mean μ and standard deviation
σ
P(x
k
|C
i
) is g(x
k
, m
Ci
, s
Ci
)
2
2
2
)(
2
1
),,(
s
m
sp
sm
-
-
=
x
exg

9
Example
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data sample
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)
age incomestudentcredit_ratingbuys_computer
<=30high nofair no
<=30high noexcellentno
31…40high nofair yes
>40 mediumnofair yes
>40 low yesfair yes
>40 low yesexcellentno
31…40low yesexcellentyes
<=30mediumnofair no
<=30low yesfair yes
>40 mediumyesfair yes
<=30mediumyesexcellentyes
31…40mediumnoexcellentyes
31…40high yesfair yes
>40 mediumnoexcellentno

10
Example
P(C
i): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
Compute P(X|C
i
) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|C
i) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|C
i)*P(C
i) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)

11
Avoiding the 0-Probability Problem
Naïve Bayesian prediction requires each conditional prob. be
non-zero. Otherwise, the predicted prob. will be zero
Ex. Suppose a dataset with 1000 tuples, income=low (0),
income= medium (990), and income = high (10),
Use Laplacian correction (or Laplacian estimator)
Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
The “corrected” prob. estimates are close to their
“uncorrected” counterparts
Õ
=
=
n
k
Cixk
PCi
XP
1
)|()|(

12
Naïve Bayesian Classifier
Advantages
Easy to implement
Good results obtained in most of the cases
Disadvantages
Assumption: class conditional independence, therefore loss of
accuracy
Practically, dependencies exist among variables
E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
Dependencies among these cannot be modeled by Naïve
Bayesian Classifier

13
Bayesian Belief Networks
Models dependencies between variables
Defined by Two components
Directed Acyclic Graph
Conditional Probability Table (CPT) for each variable
Bayesian belief network allows a subset of the
variables to be conditionally independent

14
Bayesian Belief Networks
A graphical model of causal relationships
Represents dependency among the variables
Gives a specification of joint probability distribution
X
Y
Z
P
 Nodes: random variables
 Links: dependency
 X and Y are the parents of Z, and Y is
the parent of P
 No dependency between Z and P
 Has no loops or cycles

15
Bayesian Belief Network: An Example
Family
History
LungCancer
PositiveXRay
Smoker
Emphysema
Dyspnea
LC
~LC
(FH, S)(FH, ~S)(~FH, S)(~FH, ~S)
0.8
0.2
0.5
0.5
0.7
0.3
0.1
0.9
Bayesian Belief Networks
The conditional probability table
(CPT) for variable LungCancer:
Õ
=
=
n
i
YParents
i
xi
PxxP
n
1
))(|(),...,(
1
CPT shows the conditional probability for
each possible combination of its parents
Derivation of the probability of a
particular combination of values of X,
from CPT:

16
Training Bayesian Networks
Several scenarios:
Given both the network structure and all variables observable:
learn only the CPTs
Network structure known, some hidden variables: gradient
descent (greedy hill-climbing) method, analogous to neural
network learning
Network structure unknown, all variables observable: search
through the model space to reconstruct network topology
Unknown structure, all hidden variables: No good algorithms
known for this purpose

2.3 bayesian classification

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

2.3 bayesian classification

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......