NB classifier to use your next exam aslo

kuntalpatra420 7 views 35 slides May 01, 2024
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

Thanks


Slide Content

Supervised Classification
Bayesian classification

Bayesian Classification
A statistical classifier: performs probabilistic
prediction, i.e.,predicts class membership
probabilities
Foundation:Based on Bayes’ Theorem.
Performance:A simple Bayesian classifier,
naïve Bayesian classifier, has comparable
performance with that of decision tree and
some neural network classifiers

Bayesian Theorem: Basics
Let Xbe a data sample
Let H be a hypothesisthat X belongs to class C
Classification objective is to determine P(H|X), the
probability that the hypothesis holds given the
observed data sample X
Example: customer X will buy a computer given that the
customer’s age and income are known

Bayesian Theorem: Basics
P(H) (prior probability), the initial probability
E.g.,Xwill buy computer, regardless of age, income, …
P(X) (evidence): probability that sample data is observed
P(X|H) (likelihood), the probability of observing the sample X,
given that the hypothesis holds
E.g.,Given the hypothesis is thatXwill buy computer, then P(X|H)
denotes the prob. that Age of X is between 31…40, having
medium income

Bayesian Theorem
Given training dataX, posteriori
probability of a hypothesis H, P(H|X),
follows the Bayes theorem)(
)()|(
)|(
X
X
X
P
HPHP
HP 

Joint Probability Model
What Is a Joint Probability?
Joint probability is a statistical measure
that calculates the likelihood of two events
occurring together and at the same point
in time.
Joint probability is the probability of event
Y occurring at the same time that event X
occurs.

The Formula for Joint Probability:
Notation for joint probability can take a few
different forms. The following formula
represents the probability of events
intersection: ??????(�∩�)where �,�=
�������������������ℎ�����������
??????�����,??????(��)
Using conditional probability we will write:
??????�∩�=??????��??????(�)

Let �be a training set of tuples with their
associated class labels, and each tuple is
represented by an �attribute vector �=
�
1,�
2,…,�
??????
Suppose there are �classes �
1,�
2,…,�
�
??????�
��=
??????(??????
??????,??????)
??????(??????)
(1)
��,??????�
��=
????????????
????????????(??????|??????
??????)
??????(??????)
(2)

Since the denominator does not depend
on �
�and all the values of the
features(i.e. �
�) are given, the
denominator is effectively constant. So
we are interested only in the numerator
part of the fraction.
The numerator is equivalent to
thejoint probabilitymodel ??????(�
�,�
1,…,�
??????)

Based on Equation 1 and 2 the numerator can be written as:
??????�
�??????��
�=??????(�
�,�)
Or, ??????�
�??????�
1,�
2,…,�
�,…�
??????�
�=??????�
1,�
2,…,�
�,…�
??????,�
�
Where �=(�
1,�
2,…,�
�,…,�
??????)

Now the "naive" conditional
independence assumptions come into
play: assume that each feature �
�is
conditionally independent of every
other feature �
�for �≠�, given the
category �
�. This means that
??????�
��
�+1,…,�
??????,�
�=??????�
��
�(3)

Thus, the joint model can be expressed
as
??????�
��
1,…,�
??????∝??????�
�,�
1,…,�
??????
=
??????(�
1|�
2,…,�
�,…,�
??????,�
�)??????(�
2|�
3,…,�
�,…,�
??????,�
�)…
??????�
�−1�
�,…,�
??????,�
�…??????(�
??????−1|�
??????,�
�)??????(�
??????|�
�)??????(�
�)
=??????�
�??????�
1�
�??????�
2�
�….??????�
??????�
��������.3
=??????(�
�)ෑ
�=1
??????
??????(�
�|�
�)

Constructing a classifier from
the probability model

Towards Naïve Bayesian
Classifier

Naïve Bayesian Classifier: Training
Dataset
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data sample
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)ageincomestudentcredit_ratingbuys_computer
<=30high nofair no
<=30high noexcellentno
31…40high nofair yes
>40 mediumnofair yes
>40 low yesfair yes
>40 low yesexcellentno
31…40low yesexcellentyes
<=30mediumnofair no
<=30low yesfair yes
>40 mediumyesfair yes
<=30mediumyesexcellentyes
31…40mediumnoexcellentyes
31…40high yesfair yes
>40 mediumnoexcellentno

Naïve Bayesian Classifier: An
Example
P(C
i): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
Compute P(X|C
i) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

Naïve Bayesian Classifier: An
Example
X = (age <= 30 , income = medium, student = yes, credit_rating =
fair)
P(X|C
i) :
P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|C
i)*P(C
i) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)

Avoiding the 0-Probability
Problem
Naïve Bayesian prediction requires each conditional probability be non-
zero. Otherwise, the postreiorprobability will be zero.
??????&#3627408459;&#3627408438;
&#3627408470;=ෑ
&#3627408472;=1
??????
??????(&#3627408485;
&#3627408472;|&#3627408438;
&#3627408470;)
Example. Suppose we have a dataset with 1000 tuples, income=low(0),
income=medium(990), and income=high(10)
Use Laplaciancorrection (or Laplacianestimator)
Adding 1 to each case
P(income=low)=1/1003
P(income=medium)=991/1003
P(income=high)=11/1003

Naïve Bayesian Classifier
Advantages
Easy to implement
Good results obtained in most of the cases
Disadvantages
Assumption: class conditional independence, Not always
valid for real life problems, since dependencies do exist
among variables
E.g., hospitals, patient’s name, age, family history, etc.
Dependencies among these cannot be modeled by Naïve
Bayesian Classifier

Bayesian network
A Bayesian network is a probabilistic graphical
model that represents a set of variables and
their conditional dependencies via a directed
acyclic graph (DAG).
For example, a Bayesian network could
represent the probabilistic relationships
between diseases and symptoms. Given
symptoms, the network can be used to
compute the probabilities of the presence of
various diseases.

Formally, Bayesian networks are DAGs whose
nodes represent variables in the Bayesian
sense: they may be observable quantities,
latent variables, unknown parameters or
hypotheses.
Edges represent conditional dependencies.

Nodes that are not connected represent
variables that are conditionally independent
of each other.
Each node is associated with a probability
function that takes, as input, a particular set
of values for the node's parent variables, and
gives (as output) the probability (or
probability distribution, if applicable) of the
variable represented by the node.

Bayesian Belief Networks
A graphical model of causal relationships
Represents dependency among the variables
Gives a specification of joint probability
distribution

Definition of Conditional Probability:
??????&#3627408462;&#3627408463;=
??????(&#3627408462;,&#3627408463;)
??????(&#3627408463;)
Joint Probability:
??????&#3627408462;,&#3627408463;=??????&#3627408462;&#3627408463;??????(&#3627408463;)
Bayes Rule:
??????&#3627408463;&#3627408462;=
??????&#3627408462;&#3627408463;??????(&#3627408463;)
??????(&#3627408462;)

Bayesian Network
•Each of the node represent a variable. Each of the variable can be True
or False
•Rain (R)-> Wet Ground (W) means Probability of the Ground being wet
is dependent on Rain.
•Since Win Lottery (L) is independent of W and R, the Joint probability
????????????&#3627408453;&#3627408458;=??????????????????&#3627408453;??????(&#3627408458;|&#3627408453;)

Bayesian Network
•Joint probability of all four variables:
????????????,&#3627408453;,&#3627408458;,&#3627408454;=??????????????????&#3627408453;??????&#3627408458;&#3627408453;??????(&#3627408454;|&#3627408458;)
•??????&#3627408454;&#3627408458;,&#3627408453;indicates Probability of slipping given that the ground is wet
and it is raining. Since we have to capture the chain of cause and
effects ??????&#3627408454;&#3627408458;,&#3627408453;has been ignored.

Bayesian Network
•Joint probability of all four variables:
??????&#3627408453;,&#3627408458;,&#3627408454;,&#3627408438;=??????&#3627408453;??????&#3627408438;??????&#3627408458;&#3627408438;,&#3627408453;??????&#3627408454;&#3627408458; (4)
•??????&#3627408458;&#3627408438;,&#3627408453;represents the ground can be wet due to car wash or
rain or both

Inference in Bayesian network
Suppose we want to calculate ??????&#3627408479;&#3627408480;.
Note: Generally capital symbol represents variable (e.g. &#3627408453;) and small symbol
represents value (e.g. r)
??????&#3627408479;&#3627408480;=σ
??????σ
????????????(&#3627408479;,&#3627408484;,&#3627408480;,&#3627408464;)/??????(&#3627408480;)
From, above we can write
??????&#3627408479;&#3627408480;∝෍
??????

??????
??????&#3627408479;??????&#3627408464;??????&#3627408484;&#3627408464;,&#3627408479;??????&#3627408480;&#3627408484;&#3627408467;&#3627408479;&#3627408476;&#3627408474;&#3627408440;&#3627408478;&#3627408475;(4)
Then We can Write
??????(&#3627408479;|&#3627408480;)∝??????(&#3627408479;)σ
????????????(&#3627408480;|&#3627408484;)σ
????????????&#3627408464;??????(&#3627408484;|&#3627408464;,&#3627408479;)

Evaluation Tree

An example with conditional
probability tables (CPT)
Suppose that there are two events which could cause
grass to be wet: either the sprinkler is on or it's
raining.
Also, suppose that the rain has a direct effect on the
use of the sprinkler (namely that when it rains, the
sprinkler is usually not turned on).
Then the situation can be modelled with a Bayesian
network (shown to the right). All three variables have
two possible values, T (for true) and F (for false).

A simple Bayesian network with
conditional probability tables

The joint probability function is: ????????????,&#3627408454;,&#3627408453;=
????????????&#3627408454;,&#3627408453;??????&#3627408454;&#3627408453;??????(&#3627408453;)
where the names of the variables have been
abbreviated to ??????= Grass wet (true/false), &#3627408454;=
Sprinkler turned on (true/false), and &#3627408453;=
Raining (true/false).
Query: "What is the probability that it is
raining, given the grass is wet?"

??????&#3627408453;=&#3627408481;??????=&#3627408481;=
??????(??????=??????,&#3627408453;=??????)
??????(??????=??????)
=
σ
&#3627408454;∈{??????,??????}
??????(??????=??????,&#3627408454;,&#3627408453;=??????)
σ
&#3627408454;,&#3627408453;∈{??????,??????}
??????(??????=??????,&#3627408454;,&#3627408453;)
(5)
Using the expansion for the joint probability function ??????(??????,&#3627408454;,&#3627408453;)and the
conditional probabilities from the conditional probability tables (CPTs)
stated in the diagram, one can evaluate each term in the sums in the
numerator and denominator. For example,
????????????=&#3627408481;,&#3627408454;=&#3627408481;,&#3627408453;=&#3627408481;=????????????=&#3627408481;&#3627408454;=&#3627408481;,&#3627408453;=&#3627408481;??????&#3627408454;=&#3627408481;&#3627408453;=&#3627408481;??????&#3627408453;=&#3627408481;
=0.99∗0.01∗0.2
=.00198

From Eqn. 5
??????&#3627408453;=&#3627408481;??????=&#3627408481;=
0.00198
??????????????????+0.1584
??????????????????
0.00198
??????????????????+0.288
??????????????????+0.1584
??????????????????+0.0
??????????????????
=
891
2491
≈35.77%

Typical Use of Bayesian
networks
To model and explain a domain.
To update beliefs about states of certain variables
when some other variables were observed, i.e.,
computing conditional probability distributions, e.g.,
??????(&#3627408459;
23|&#3627408459;
17=&#3627408486;&#3627408466;&#3627408480;,&#3627408459;
54=&#3627408475;&#3627408476;)).
To find most probable configurations of variables
To support decision making under uncertainty
To find good strategies for solving tasks in a domain
with uncertainty.
Tags