NB classifier to use your next exam aslo

Supervised Classification
Bayesian classification

Bayesian Classification
A statistical classifier: performs probabilistic
prediction, i.e.,predicts class membership
probabilities
Foundation:Based on Bayes’ Theorem.
Performance:A simple Bayesian classifier,
naïve Bayesian classifier, has comparable
performance with that of decision tree and
some neural network classifiers

Bayesian Theorem: Basics
Let Xbe a data sample
Let H be a hypothesisthat X belongs to class C
Classification objective is to determine P(H|X), the
probability that the hypothesis holds given the
observed data sample X
Example: customer X will buy a computer given that the
customer’s age and income are known

Bayesian Theorem: Basics
P(H) (prior probability), the initial probability
E.g.,Xwill buy computer, regardless of age, income, …
P(X) (evidence): probability that sample data is observed
P(X|H) (likelihood), the probability of observing the sample X,
given that the hypothesis holds
E.g.,Given the hypothesis is thatXwill buy computer, then P(X|H)
denotes the prob. that Age of X is between 31…40, having
medium income

Bayesian Theorem
Given training dataX, posteriori
probability of a hypothesis H, P(H|X),
follows the Bayes theorem)(
)()|(
)|(
X
X
X
P
HPHP
HP 

Joint Probability Model
What Is a Joint Probability?
Joint probability is a statistical measure
that calculates the likelihood of two events
occurring together and at the same point
in time.
Joint probability is the probability of event
Y occurring at the same time that event X
occurs.

The Formula for Joint Probability:
Notation for joint probability can take a few
different forms. The following formula
represents the probability of events
intersection: ??????(&#3627408459;∩&#3627408460;)where &#3627408459;,&#3627408460;=
&#3627408455;&#3627408484;&#3627408476;&#3627408465;&#3627408470;&#3627408467;&#3627408467;&#3627408466;&#3627408479;&#3627408466;&#3627408475;&#3627408481;&#3627408466;&#3627408483;&#3627408466;&#3627408475;&#3627408481;&#3627408480;&#3627408481;ℎ&#3627408462;&#3627408481;&#3627408470;&#3627408475;&#3627408481;&#3627408466;&#3627408479;&#3627408480;&#3627408466;&#3627408464;&#3627408481;
??????&#3627408459;&#3627408462;&#3627408475;&#3627408465;&#3627408460;,??????(&#3627408459;&#3627408460;)
Using conditional probability we will write:
??????&#3627408459;∩&#3627408460;=??????&#3627408459;&#3627408460;??????(&#3627408460;)

Let &#3627408439;be a training set of tuples with their
associated class labels, and each tuple is
represented by an &#3627408475;attribute vector &#3627408459;=
&#3627408485;
1,&#3627408485;
2,…,&#3627408485;
??????
Suppose there are &#3627408472;classes &#3627408438;
1,&#3627408438;
2,…,&#3627408438;
&#3627408472;
??????&#3627408438;
&#3627408472;&#3627408459;=
??????(??????
??????,??????)
??????(??????)
(1)
&#3627408476;&#3627408479;,??????&#3627408438;
&#3627408472;&#3627408459;=
????????????
????????????(??????|??????
??????)
??????(??????)
(2)

Since the denominator does not depend
on &#3627408438;
&#3627408472;and all the values of the
features(i.e. &#3627408485;
&#3627408470;) are given, the
denominator is effectively constant. So
we are interested only in the numerator
part of the fraction.
The numerator is equivalent to
thejoint probabilitymodel ??????(&#3627408438;
&#3627408472;,&#3627408485;
1,…,&#3627408485;
??????)

Based on Equation 1 and 2 the numerator can be written as:
??????&#3627408438;
&#3627408472;??????&#3627408459;&#3627408438;
&#3627408472;=??????(&#3627408438;
&#3627408472;,&#3627408459;)
Or, ??????&#3627408438;
&#3627408472;??????&#3627408485;
1,&#3627408485;
2,…,&#3627408485;
&#3627408470;,…&#3627408485;
??????&#3627408438;
&#3627408472;=??????&#3627408485;
1,&#3627408485;
2,…,&#3627408485;
&#3627408470;,…&#3627408485;
??????,&#3627408438;
&#3627408472;
Where &#3627408459;=(&#3627408485;
1,&#3627408485;
2,…,&#3627408485;
&#3627408470;,…,&#3627408485;
??????)

Now the "naive" conditional
independence assumptions come into
play: assume that each feature &#3627408485;
&#3627408470;is
conditionally independent of every
other feature &#3627408485;
&#3627408471;for &#3627408471;≠&#3627408470;, given the
category &#3627408438;
&#3627408472;. This means that
??????&#3627408485;
&#3627408470;&#3627408485;
&#3627408470;+1,…,&#3627408485;
??????,&#3627408438;
&#3627408472;=??????&#3627408485;
&#3627408470;&#3627408438;
&#3627408472;(3)

Thus, the joint model can be expressed
as
??????&#3627408438;
&#3627408472;&#3627408485;
1,…,&#3627408485;
??????∝??????&#3627408438;
&#3627408472;,&#3627408485;
1,…,&#3627408485;
??????
=
??????(&#3627408485;
1|&#3627408485;
2,…,&#3627408485;
&#3627408470;,…,&#3627408485;
??????,&#3627408438;
&#3627408472;)??????(&#3627408485;
2|&#3627408485;
3,…,&#3627408485;
&#3627408470;,…,&#3627408485;
??????,&#3627408438;
&#3627408472;)…
??????&#3627408485;
&#3627408470;−1&#3627408485;
&#3627408470;,…,&#3627408485;
??????,&#3627408438;
&#3627408472;…??????(&#3627408485;
??????−1|&#3627408485;
??????,&#3627408438;
&#3627408472;)??????(&#3627408485;
??????|&#3627408438;
&#3627408472;)??????(&#3627408438;
&#3627408472;)
=??????&#3627408438;
&#3627408472;??????&#3627408485;
1&#3627408438;
&#3627408472;??????&#3627408485;
2&#3627408438;
&#3627408472;….??????&#3627408485;
??????&#3627408438;
&#3627408472;&#3627408467;&#3627408479;&#3627408476;&#3627408474;&#3627408466;&#3627408478;&#3627408475;.3
=??????(&#3627408438;
&#3627408472;)ෑ
&#3627408470;=1
??????
??????(&#3627408485;
&#3627408470;|&#3627408438;
&#3627408472;)

Constructing a classifier from
the probability model

Towards Naïve Bayesian
Classifier


Naïve Bayesian Classifier: Training
Dataset
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data sample
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)ageincomestudentcredit_ratingbuys_computer
<=30high nofair no
<=30high noexcellentno
31…40high nofair yes
>40 mediumnofair yes
>40 low yesfair yes
>40 low yesexcellentno
31…40low yesexcellentyes
<=30mediumnofair no
<=30low yesfair yes
>40 mediumyesfair yes
<=30mediumyesexcellentyes
31…40mediumnoexcellentyes
31…40high yesfair yes
>40 mediumnoexcellentno

Naïve Bayesian Classifier: An
Example
P(C
i): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
Compute P(X|C
i) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

Naïve Bayesian Classifier: An
Example
X = (age <= 30 , income = medium, student = yes, credit_rating =
fair)
P(X|C
i) :
P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|C
i)*P(C
i) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)

Avoiding the 0-Probability
Problem
Naïve Bayesian prediction requires each conditional probability be non-
zero. Otherwise, the postreiorprobability will be zero.
??????&#3627408459;&#3627408438;
&#3627408470;=ෑ
&#3627408472;=1
??????
??????(&#3627408485;
&#3627408472;|&#3627408438;
&#3627408470;)
Example. Suppose we have a dataset with 1000 tuples, income=low(0),
income=medium(990), and income=high(10)
Use Laplaciancorrection (or Laplacianestimator)
Adding 1 to each case
P(income=low)=1/1003
P(income=medium)=991/1003
P(income=high)=11/1003

Naïve Bayesian Classifier
Advantages
Easy to implement
Good results obtained in most of the cases
Disadvantages
Assumption: class conditional independence, Not always
valid for real life problems, since dependencies do exist
among variables
E.g., hospitals, patient’s name, age, family history, etc.
Dependencies among these cannot be modeled by Naïve
Bayesian Classifier

Bayesian network
A Bayesian network is a probabilistic graphical
model that represents a set of variables and
their conditional dependencies via a directed
acyclic graph (DAG).
For example, a Bayesian network could
represent the probabilistic relationships
between diseases and symptoms. Given
symptoms, the network can be used to
compute the probabilities of the presence of
various diseases.

Formally, Bayesian networks are DAGs whose
nodes represent variables in the Bayesian
sense: they may be observable quantities,
latent variables, unknown parameters or
hypotheses.
Edges represent conditional dependencies.

Nodes that are not connected represent
variables that are conditionally independent
of each other.
Each node is associated with a probability
function that takes, as input, a particular set
of values for the node's parent variables, and
gives (as output) the probability (or
probability distribution, if applicable) of the
variable represented by the node.

Bayesian Belief Networks
A graphical model of causal relationships
Represents dependency among the variables
Gives a specification of joint probability
distribution

Definition of Conditional Probability:
??????&#3627408462;&#3627408463;=
??????(&#3627408462;,&#3627408463;)
??????(&#3627408463;)
Joint Probability:
??????&#3627408462;,&#3627408463;=??????&#3627408462;&#3627408463;??????(&#3627408463;)
Bayes Rule:
??????&#3627408463;&#3627408462;=
??????&#3627408462;&#3627408463;??????(&#3627408463;)
??????(&#3627408462;)

Bayesian Network
•Each of the node represent a variable. Each of the variable can be True
or False
•Rain (R)-> Wet Ground (W) means Probability of the Ground being wet
is dependent on Rain.
•Since Win Lottery (L) is independent of W and R, the Joint probability
????????????&#3627408453;&#3627408458;=??????????????????&#3627408453;??????(&#3627408458;|&#3627408453;)

Bayesian Network
•Joint probability of all four variables:
????????????,&#3627408453;,&#3627408458;,&#3627408454;=??????????????????&#3627408453;??????&#3627408458;&#3627408453;??????(&#3627408454;|&#3627408458;)
•??????&#3627408454;&#3627408458;,&#3627408453;indicates Probability of slipping given that the ground is wet
and it is raining. Since we have to capture the chain of cause and
effects ??????&#3627408454;&#3627408458;,&#3627408453;has been ignored.

Bayesian Network
•Joint probability of all four variables:
??????&#3627408453;,&#3627408458;,&#3627408454;,&#3627408438;=??????&#3627408453;??????&#3627408438;??????&#3627408458;&#3627408438;,&#3627408453;??????&#3627408454;&#3627408458; (4)
•??????&#3627408458;&#3627408438;,&#3627408453;represents the ground can be wet due to car wash or
rain or both

Inference in Bayesian network
Suppose we want to calculate ??????&#3627408479;&#3627408480;.
Note: Generally capital symbol represents variable (e.g. &#3627408453;) and small symbol
represents value (e.g. r)
??????&#3627408479;&#3627408480;=σ
??????σ
????????????(&#3627408479;,&#3627408484;,&#3627408480;,&#3627408464;)/??????(&#3627408480;)
From, above we can write
??????&#3627408479;&#3627408480;∝෍
??????
෍
??????
??????&#3627408479;??????&#3627408464;??????&#3627408484;&#3627408464;,&#3627408479;??????&#3627408480;&#3627408484;&#3627408467;&#3627408479;&#3627408476;&#3627408474;&#3627408440;&#3627408478;&#3627408475;(4)
Then We can Write
??????(&#3627408479;|&#3627408480;)∝??????(&#3627408479;)σ
????????????(&#3627408480;|&#3627408484;)σ
????????????&#3627408464;??????(&#3627408484;|&#3627408464;,&#3627408479;)

Evaluation Tree

An example with conditional
probability tables (CPT)
Suppose that there are two events which could cause
grass to be wet: either the sprinkler is on or it's
raining.
Also, suppose that the rain has a direct effect on the
use of the sprinkler (namely that when it rains, the
sprinkler is usually not turned on).
Then the situation can be modelled with a Bayesian
network (shown to the right). All three variables have
two possible values, T (for true) and F (for false).

A simple Bayesian network with
conditional probability tables

The joint probability function is: ????????????,&#3627408454;,&#3627408453;=
????????????&#3627408454;,&#3627408453;??????&#3627408454;&#3627408453;??????(&#3627408453;)
where the names of the variables have been
abbreviated to ??????= Grass wet (true/false), &#3627408454;=
Sprinkler turned on (true/false), and &#3627408453;=
Raining (true/false).
Query: "What is the probability that it is
raining, given the grass is wet?"

??????&#3627408453;=&#3627408481;??????=&#3627408481;=
??????(??????=??????,&#3627408453;=??????)
??????(??????=??????)
=
σ
&#3627408454;∈{??????,??????}
??????(??????=??????,&#3627408454;,&#3627408453;=??????)
σ
&#3627408454;,&#3627408453;∈{??????,??????}
??????(??????=??????,&#3627408454;,&#3627408453;)
(5)
Using the expansion for the joint probability function ??????(??????,&#3627408454;,&#3627408453;)and the
conditional probabilities from the conditional probability tables (CPTs)
stated in the diagram, one can evaluate each term in the sums in the
numerator and denominator. For example,
????????????=&#3627408481;,&#3627408454;=&#3627408481;,&#3627408453;=&#3627408481;=????????????=&#3627408481;&#3627408454;=&#3627408481;,&#3627408453;=&#3627408481;??????&#3627408454;=&#3627408481;&#3627408453;=&#3627408481;??????&#3627408453;=&#3627408481;
=0.99∗0.01∗0.2
=.00198

From Eqn. 5
??????&#3627408453;=&#3627408481;??????=&#3627408481;=
0.00198
??????????????????+0.1584
??????????????????
0.00198
??????????????????+0.288
??????????????????+0.1584
??????????????????+0.0
??????????????????
=
891
2491
≈35.77%

Typical Use of Bayesian
networks
To model and explain a domain.
To update beliefs about states of certain variables
when some other variables were observed, i.e.,
computing conditional probability distributions, e.g.,
??????(&#3627408459;
23|&#3627408459;
17=&#3627408486;&#3627408466;&#3627408480;,&#3627408459;
54=&#3627408475;&#3627408476;)).
To find most probable configurations of variables
To support decision making under uncertainty
To find good strategies for solving tasks in a domain
with uncertainty.

NB classifier to use your next exam aslo

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

NB classifier to use your next exam aslo

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......