Probability theory graphical model seee.ppt

Meganath7 15 views 46 slides May 28, 2024
Slide 1
Slide 1 of 46
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46

About This Presentation

Graphical model


Slide Content

DR.M.THIAGARAJAN
ASSOCIATE PROFESSOR OF
MATHEMATICS
ST JOSEPH’S COLLEGE
TRICHIRAPPALLI

Uncertainty in AI
Outline:
Introduction
Basic Probability Theory
Probabilistic Reasoning
Why should we use probability theory?
Dutch Book Theorem

Sources of Uncertainty
Information is partial
Information is not fully reliable.
Representation language is inherently
imprecise.
Information comes from multiple sources
and it is conflicting.
Information is approximate
Non-absolute cause-effect relationships exist

Basic Probability
Probability theory enables us to make
rational decisions.
Which mode of transportation is safer:
Car or Plane?
What is the probability of an accident?

Basic Probability Theory
An experimenthas a set of potential outcomes,
e.g., throw a dice
Thesample spaceof an experiment is the set
of all possible outcomes, e.g., {1, 2, 3, 4, 5, 6}
An eventis a subset of the sample space.
{2}
{3, 6}
even = {2, 4, 6}
odd = {1, 3, 5}

Probability as Relative
Frequency
An event has a probability.
Consider a long sequence of experiments. If we
look at the number of times a particular event
occurs in that sequence, and compare it to the
total number of experiments, we can compute a
ratio.
This ratio is one way of estimating the probability
of the event.
P(E) = (# of times E occurred)/(total # of trials)

Example
100 attempts are made to swim a length in 30
secs. The swimmer succeeds on 20 occasions
therefore the probability that a swimmer can
complete the length in 30 secs is:
20/100 = 0.2
Failure = 1-.2 or 0.8
The experiments, the sample space and the
events must be defined clearly for probability
to be meaningful
What is the probability of an accident?

Theoretical Probability
Principle of Indifference—
Alternatives are always to be judged
equiprobable if we have no reason
to expect or prefer one over the
other.
Each outcome in the sample space is
assigned equal probability.
Example: throw a dice
P({1})=P({2})= ... =P({6})=1/6

Law of Large Numbers
As the number of experiments increases the
relative frequency of an event more closely
approximates the theoretical probability of
the event.
if the theoretical assumptions hold.
Buffon’s Needle for Computing π
Draw parallel lines 1 inch apart on a plane
Throw a 1-inch needle on the plane
P( needle crossing a line )=2/πnumber of throws

2 number of crossings


Large Number Reveals
Untruth in Assumptions
Results of 1,000,000 throws of a die
Number 123456
Fraction.155.159.164.169.174.179

Axioms of Probability Theory
Suppose P(.) is a probability function, then
1.for any event E, 0≤P(E) ≤1.
2.P(S) = 1, where S is the sample space.
3.for any two mutually exclusive events E1 and
E2,
P(E1 E2) = P(E1) + P(E2)
Any function that satisfies the above three
axioms is a probability function.

Joint Probability
Let A, B be two events, the joint probability of both A and
B being true is denoted by P(A, B).
Example:
P(spade) is the probability of the top card being a spade.
P(king) is the probability of the top card being a king.
P(spade, king) is the probability of the top card being
both a spade and a king, i.e., the king of spade.
P(king, spade)=P(spade, king) ???

Properties of Probability
1.P(E) = 1–P(E)
2.If E1 and E2 are logically equivalent, then
P(E1)=P(E2).
E1: Not all philosophers are more than six feet tall.
E2: Some philosopher is not more that six feet tall.
Then P(E1)=P(E2).
3.P(E1, E2)≤P(E1).

Conditional Probability
The probability of an event may change
after knowing another event.
The probability of A given B is denoted
by P(A|B).
Example
P( W=space ) the probability of a randomly
selected word from an English text is ‘space’
P( W=space | W’=outer) the probability of
‘space’ if the previous word is ‘outer’

Example
A:the top card of a deck of poker cards is a king of
spade
P(A) = 1/52
However, if we know
B:the top card is a king
then, the probability of A given B is true is
P(A|B) = 1/4.

How to Compute P(A|B)?
A
BP(A|B)=
N(A and B)
N(B)
=
N(A and B)
N
N(B)
N
=
P(A, B)
P(B) P(brown|cow)=
N(brown-cows)
N(cows)
=
P(brown-cow)
P(cow)

Business Students
Of 100 students completing a course, 20 were
business major. Ten students received As in the
course, and three of these were business majors.,
suppose A is the event that a randomly selected
student got an A in the course, B is the event that a
randomly selected event is a business major. What
is the probability of A? What is the probability of A
after knowing B is true?3
7
80
20
A
B not B

Probabilistic Reasoning
Evidence
What we know about a situation.
Hypothesis
What we want to conclude.
Compute
P( Hypothesis | Evidence )

Credit Card Authorization
E is the data about the applicant's age,
job, education, income, credit history,
etc,
H is the hypothesis that the credit card
will provide positive return.
The decision of whether to issue the
credit card to the applicant is based on
the probability P(H|E).

Medical Diagnosis
E is a set of symptoms, such as, coughing,
sneezing, headache, ...
H is a disorder, e.g., common cold, SARS,
flu.
The diagnosis problem is to find an H
(disorder) such that P(H|E) is maximum.

Linda is 31 years old, single, outspoken, and very bright.
She majored in philosophy. As a student, she was deeply
concerned with issues of discrimination and social justice,
and also participated in antinuclear demonstrations.
Please rank the following statements by their probability,
using 1 for the most probable and 8 for the least
probable.
a.Linda is a teacher in elementary school.
b.Linda works in a bookstore and takes yoga classes.
c.Linda is active in the feminist movement.
d.Linda is psychiatric social worker.
e.Linda is a member of the League of Women Voters.
f.Linda is a bank teller.
g.Linda is an insurance salesperson.
h.Linda is a bank teller and is active in the feminist
movement.

Example
A patient takes a lab test and the result comes back
positive. The test has a false negative rate of 2%
and false positive rate of 3%. Furthermore, 0.8%
of the entire population have this cancer.
What is the probability of cancer if we know the
test result is positive?

Bayes Theorem
If P(E2)>0, then
P(E1|E2)=P(E2|E1)P(E1)/P(E2)
This can be derived from the definition of
conditional probability.

The Three-Card Problem
Three cards are in a hat. One is red on both sides (the
red-red card). One is white on both sides (the white-
white card). One is red on one side and white on the
other (the red-white card). A single card is drawn
randomly and tossed into the air.
a.What is the probability that the red-red card was
drawn? (RR)
b. What is the probability that the drawn cards lands
with a white side up? (W-up)
c. What is the probability that the red-red card was not
drawn, assuming that the drawn card lands with the
a red side up. (not-RR|R-up)

Fair Bets
A bet is fair to an individual I if, according to the individual's
probability assessment, the bet will break even in the long run.
The following three bet are fair :
Bet (a): Win $4.20 if RR;
lose $2.10
otherwise. [since you believe P(RR)=1/3]
Bet (b): Win $2.00 if W-up;
lose $2.00
otherwise. [since you believe P(W-up)=1/2]
Bet (c): Win $4.00 if R-up and not-RR;
lose $4.00 if R-up and RR;
neither win nor lose if not-R-up.
[since you believe P(not-RR|R-up)=1/2]

Dutch Book
The bets that you accepted have an
interesting property:
No matter what card is drawn in the
three-card problem, and no matter how
it lands, you are guaranteed to lose
money.
This is called a Dutch Book

Verification
there are three possible outcomes
1.Some card other than red-red is drawn, and it lands with
white side up. That is, W-up and not-RR
2.Some card other than red-red is drawn, and it lands with a
red side up. That is, R-up and not-RR.
3. The red-red card is drawn, and it lands (of course) with a
red side up. That is, R-up and RR.
1 2 3
a.–$2.10 –$2.10 +$4.20
b.+$2.00 –$2.00 –$2.00
c.±$0.00 +$4.00 –$4.00
total–$0.10 –$0.10 –$1.80

The Dutch Book Theorem
Suppose that an individual I is willing to
accept any bet that is fair for I. Then a
Dutch book can be made against I if and
only if I's assessment of probability
violates Bayesian axiomatization.

Independence: Intuition
Events are independent if one has nothing
whatever to do with others. Therefore, for
two independent events, knowing one
happening does change the probability of the
other event happening.
one toss of coin is independent of another coin
(assuming it is a regular coin).
price of tea in England is independent of the result
of general election in Canada.

Independent or Dependent?
Getting cold and getting cat-allergy
Mile Per Gallon and acceleration.
Size of a person’s vocabulary the
person’s shoe size.

Independence: Definition
Events A and B are independent iff:
P(A, B) = P(A) x P(B)
which is equivalent to
P(A|B) = P(A) and
P(B|A) = P(B)
when P(A, B) >0.
T1: the first toss is a head.
T2: the second toss is a tail.
P(T2|T1) = P(T2)

Conditional Independence
Dependent events can become
independent given certain other events.
Example,
Size of shoe
Age
Size of vocabulary
Two events A, B are conditionally
independent given a third event C iff
P(A|B, C) = P(A|C)

Conditional Independence:
Definition
Let E1 and E2 be two events, they are
conditionally independent given E iff
P(E1|E, E2)=P(E1|E),
that is the probability of E1 is not
changed after knowing E2, given E is
true.
Equivalent formulations:
P(E1, E2|E)=P(E1|E) P(E2|E)
P(E2|E, E1)=P(E2|E)

Example: Play Tennis?OutlookTemperatureHumidityWindyClass
sunnyhot high false −
sunnyhot high true −
overcasthot high false +
rain mild high false +
rain cool normalfalse +
rain cool normaltrue −
overcastcool normaltrue +
sunnymild high false −
sunnycool normalfalse +
rain mild normalfalse +
sunnymild normaltrue +
overcastmild high true +
overcasthot normalfalse +
rain mild high true −
Predict playing tennis when <sunny, cool, high, strong>
What probability should be used to make the prediction?
How to compute the probability?

Probabilities of Individual
Attributes
Given the training set, we can compute the
probabilitiesOutlook + − Humidity+−
sunny 2/93/5 high 3/94/5
overcast 4/90 normal6/91/5
rain 3/92/5
Tempreature Windy
hot 2/92/5 true 3/93/5
mild 4/92/5 false6/92/5
cool 3/91/5
P(+) = 9/14
P(−) = 5/14

Naïve Bayes Method
Knowledge Base contains
A set of hypotheses
A set of evidences
Probability of an evidence given a hypothesis
Given
A sub set of the evidences known to be present in
a situation
Find
the hypothesis with the highest posterior
probability: P(H|E
1, E
2, …, E
k).
The probability itself does not matter so much.

Naïve Bayes Method
Assumptions
Hypotheses are exhaustive and mutually
exclusive
H
1v H
2 v … v H
k
¬(H
i^ H
j) for any i≠j
Evidences are conditionally independent
given a hypothesis
P(E
1, E
2,…, E
k|H) = P(E
1|H)…P(E
k|H)
P(H | E
1, E
2,…, E
k)
= P(E
1, E
2,…, E
k, H)/P(E
1, E
2,…, E
k)
= P(E
1, E
2,…, E
k|H)P(H)/P(E
1, E
2,…, E
k)

Naïve Bayes Method
The goal is to find H that maximize P(H|E
1, E
2,…, E
k)
Since
P(H|E
1, E
2,…, E
k) = P(E
1, E
2,…, E
k|H)P(H)/P(E
1, E
2,…, E
k)
and P(E
1, E
2,…, E
k) is the same for different hypotheses,
Maximizing P(H|E
1, E
2,…, E
k) is equivalent to maximizing
P(E
1, E
2,…, E
k|H)P(H)= P(E
1|H)…P(E
k|H)P(H)
Naïve Bayes Method
Find a hypothesis that maximizes P(E
1|H)…P(E
k|H)P(H)

Example: Play Tennis
P(+| sunny, cool, high, strong) vs.
P(−| sunny, cool, high, strong)
P(sunny|+)P(cool|+)P(high|+)P(strong|+)P(+) vs.
P(sunny|−)P(cool|−)P(high|−)P(strong|−)P(−)Outlook + − Humidity+ −
sunny 2/93/5 high 3/94/5
overcast 4/9 0 normal6/91/5
rain 3/92/5
Tempreature Windy
hot 2/92/5 true 3/93/5
mild 4/92/5 false 6/92/5
cool 3/91/5
P(+) = 9/14
P(−) = 5/14

Application: Spam Detection
Spam
Dear sir, We want to transfer to overseas ($ 126,000.000.00
USD) One hundred and Twenty six million United States
Dollars) from a Bank in Africa, I want to ask you to quietly
look for a reliable and honest person who will be capable
and fit to provide either an existing ……
Legitimate email
Ham: for lack of better name.

Hypotheses: {Spam, Ham}
Evidence: a document
The document is treated as a set (or bag) of
words
Knowledge
P(Spam)
The prior probability of an e-mail message being a spam.
How to estimate this probability?
P(w|Spam)
the probability that a word is w if we know w is chosen
from a spam.
How to estimate this probability?

Limitations of Naïve Bayesian
Cannot handle hypotheses of composite
hypotheses well
Suppose are independent of each
other
Consider a composite hypothesis
How to compute the posterior probability? ),...,|^(
121 lEEHHP nHH,...,
1 21^HH

Using the Bayes’ Theoremtindependen are they because )()()^(
2121 HPHPHHP  ?)^|( compute toHow
21
HHEP
j ),...(
)^()^|,...(
),...,|^(
1
21211
121
l
l
l
EEP
HHPHHEEP
EEHHP  21
211211
^given t,independen are assuming
)^|()^|,...(
HHE
HHEPHHEEP
j
j
l
jl 

but this is a very unreasonable assumption
Need a better representation and a better
assumption ),...,|( ),...,|( ),...,|^(
1211121 lll EEHPEEHPEEHHP  ?,...,given t,independen are ,..., Assuming
11 ln EEHH
E: earth quake B: burglar
A: alarm set off
E and B are independent
But when A is given, they
are (adversely) dependent
because they become
competitors to explain A
P(B|A, E) <<P(B|A)
E explains away of A

Cannot handle causal chaining
Ex. A: weather of the year
B: cotton production of the year
C: cotton price of next year
Observed: A influences C
The influence is not direct (A -> B -> C)
P(C|B, A) = P(C|B): instantiation of B
blocks influence of A on C

Summary
Basics of Probability Theory
Experiment, sample space, events
Axioms and prosperities
Joint Probability
Conditional Probability
Probabilistic Reasoning
Bayes Theorem
Dutch Book Theorem
Independence and Conditional Independence
Naïve Bayes Method