DR.M.THIAGARAJAN
ASSOCIATE PROFESSOR OF
MATHEMATICS
ST JOSEPH’S COLLEGE
TRICHIRAPPALLI
Uncertainty in AI
Outline:
Introduction
Basic Probability Theory
Probabilistic Reasoning
Why should we use probability theory?
Dutch Book Theorem
Sources of Uncertainty
Information is partial
Information is not fully reliable.
Representation language is inherently
imprecise.
Information comes from multiple sources
and it is conflicting.
Information is approximate
Non-absolute cause-effect relationships exist
Basic Probability
Probability theory enables us to make
rational decisions.
Which mode of transportation is safer:
Car or Plane?
What is the probability of an accident?
Basic Probability Theory
An experimenthas a set of potential outcomes,
e.g., throw a dice
Thesample spaceof an experiment is the set
of all possible outcomes, e.g., {1, 2, 3, 4, 5, 6}
An eventis a subset of the sample space.
{2}
{3, 6}
even = {2, 4, 6}
odd = {1, 3, 5}
Probability as Relative
Frequency
An event has a probability.
Consider a long sequence of experiments. If we
look at the number of times a particular event
occurs in that sequence, and compare it to the
total number of experiments, we can compute a
ratio.
This ratio is one way of estimating the probability
of the event.
P(E) = (# of times E occurred)/(total # of trials)
Example
100 attempts are made to swim a length in 30
secs. The swimmer succeeds on 20 occasions
therefore the probability that a swimmer can
complete the length in 30 secs is:
20/100 = 0.2
Failure = 1-.2 or 0.8
The experiments, the sample space and the
events must be defined clearly for probability
to be meaningful
What is the probability of an accident?
Theoretical Probability
Principle of Indifference—
Alternatives are always to be judged
equiprobable if we have no reason
to expect or prefer one over the
other.
Each outcome in the sample space is
assigned equal probability.
Example: throw a dice
P({1})=P({2})= ... =P({6})=1/6
Law of Large Numbers
As the number of experiments increases the
relative frequency of an event more closely
approximates the theoretical probability of
the event.
if the theoretical assumptions hold.
Buffon’s Needle for Computing π
Draw parallel lines 1 inch apart on a plane
Throw a 1-inch needle on the plane
P( needle crossing a line )=2/πnumber of throws
2 number of crossings
Large Number Reveals
Untruth in Assumptions
Results of 1,000,000 throws of a die
Number 123456
Fraction.155.159.164.169.174.179
Axioms of Probability Theory
Suppose P(.) is a probability function, then
1.for any event E, 0≤P(E) ≤1.
2.P(S) = 1, where S is the sample space.
3.for any two mutually exclusive events E1 and
E2,
P(E1 E2) = P(E1) + P(E2)
Any function that satisfies the above three
axioms is a probability function.
Joint Probability
Let A, B be two events, the joint probability of both A and
B being true is denoted by P(A, B).
Example:
P(spade) is the probability of the top card being a spade.
P(king) is the probability of the top card being a king.
P(spade, king) is the probability of the top card being
both a spade and a king, i.e., the king of spade.
P(king, spade)=P(spade, king) ???
Properties of Probability
1.P(E) = 1–P(E)
2.If E1 and E2 are logically equivalent, then
P(E1)=P(E2).
E1: Not all philosophers are more than six feet tall.
E2: Some philosopher is not more that six feet tall.
Then P(E1)=P(E2).
3.P(E1, E2)≤P(E1).
Conditional Probability
The probability of an event may change
after knowing another event.
The probability of A given B is denoted
by P(A|B).
Example
P( W=space ) the probability of a randomly
selected word from an English text is ‘space’
P( W=space | W’=outer) the probability of
‘space’ if the previous word is ‘outer’
Example
A:the top card of a deck of poker cards is a king of
spade
P(A) = 1/52
However, if we know
B:the top card is a king
then, the probability of A given B is true is
P(A|B) = 1/4.
How to Compute P(A|B)?
A
BP(A|B)=
N(A and B)
N(B)
=
N(A and B)
N
N(B)
N
=
P(A, B)
P(B) P(brown|cow)=
N(brown-cows)
N(cows)
=
P(brown-cow)
P(cow)
Business Students
Of 100 students completing a course, 20 were
business major. Ten students received As in the
course, and three of these were business majors.,
suppose A is the event that a randomly selected
student got an A in the course, B is the event that a
randomly selected event is a business major. What
is the probability of A? What is the probability of A
after knowing B is true?3
7
80
20
A
B not B
Probabilistic Reasoning
Evidence
What we know about a situation.
Hypothesis
What we want to conclude.
Compute
P( Hypothesis | Evidence )
Credit Card Authorization
E is the data about the applicant's age,
job, education, income, credit history,
etc,
H is the hypothesis that the credit card
will provide positive return.
The decision of whether to issue the
credit card to the applicant is based on
the probability P(H|E).
Medical Diagnosis
E is a set of symptoms, such as, coughing,
sneezing, headache, ...
H is a disorder, e.g., common cold, SARS,
flu.
The diagnosis problem is to find an H
(disorder) such that P(H|E) is maximum.
Linda is 31 years old, single, outspoken, and very bright.
She majored in philosophy. As a student, she was deeply
concerned with issues of discrimination and social justice,
and also participated in antinuclear demonstrations.
Please rank the following statements by their probability,
using 1 for the most probable and 8 for the least
probable.
a.Linda is a teacher in elementary school.
b.Linda works in a bookstore and takes yoga classes.
c.Linda is active in the feminist movement.
d.Linda is psychiatric social worker.
e.Linda is a member of the League of Women Voters.
f.Linda is a bank teller.
g.Linda is an insurance salesperson.
h.Linda is a bank teller and is active in the feminist
movement.
Example
A patient takes a lab test and the result comes back
positive. The test has a false negative rate of 2%
and false positive rate of 3%. Furthermore, 0.8%
of the entire population have this cancer.
What is the probability of cancer if we know the
test result is positive?
Bayes Theorem
If P(E2)>0, then
P(E1|E2)=P(E2|E1)P(E1)/P(E2)
This can be derived from the definition of
conditional probability.
The Three-Card Problem
Three cards are in a hat. One is red on both sides (the
red-red card). One is white on both sides (the white-
white card). One is red on one side and white on the
other (the red-white card). A single card is drawn
randomly and tossed into the air.
a.What is the probability that the red-red card was
drawn? (RR)
b. What is the probability that the drawn cards lands
with a white side up? (W-up)
c. What is the probability that the red-red card was not
drawn, assuming that the drawn card lands with the
a red side up. (not-RR|R-up)
Fair Bets
A bet is fair to an individual I if, according to the individual's
probability assessment, the bet will break even in the long run.
The following three bet are fair :
Bet (a): Win $4.20 if RR;
lose $2.10
otherwise. [since you believe P(RR)=1/3]
Bet (b): Win $2.00 if W-up;
lose $2.00
otherwise. [since you believe P(W-up)=1/2]
Bet (c): Win $4.00 if R-up and not-RR;
lose $4.00 if R-up and RR;
neither win nor lose if not-R-up.
[since you believe P(not-RR|R-up)=1/2]
Dutch Book
The bets that you accepted have an
interesting property:
No matter what card is drawn in the
three-card problem, and no matter how
it lands, you are guaranteed to lose
money.
This is called a Dutch Book
Verification
there are three possible outcomes
1.Some card other than red-red is drawn, and it lands with
white side up. That is, W-up and not-RR
2.Some card other than red-red is drawn, and it lands with a
red side up. That is, R-up and not-RR.
3. The red-red card is drawn, and it lands (of course) with a
red side up. That is, R-up and RR.
1 2 3
a.–$2.10 –$2.10 +$4.20
b.+$2.00 –$2.00 –$2.00
c.±$0.00 +$4.00 –$4.00
total–$0.10 –$0.10 –$1.80
The Dutch Book Theorem
Suppose that an individual I is willing to
accept any bet that is fair for I. Then a
Dutch book can be made against I if and
only if I's assessment of probability
violates Bayesian axiomatization.
Independence: Intuition
Events are independent if one has nothing
whatever to do with others. Therefore, for
two independent events, knowing one
happening does change the probability of the
other event happening.
one toss of coin is independent of another coin
(assuming it is a regular coin).
price of tea in England is independent of the result
of general election in Canada.
Independent or Dependent?
Getting cold and getting cat-allergy
Mile Per Gallon and acceleration.
Size of a person’s vocabulary the
person’s shoe size.
Independence: Definition
Events A and B are independent iff:
P(A, B) = P(A) x P(B)
which is equivalent to
P(A|B) = P(A) and
P(B|A) = P(B)
when P(A, B) >0.
T1: the first toss is a head.
T2: the second toss is a tail.
P(T2|T1) = P(T2)
Conditional Independence
Dependent events can become
independent given certain other events.
Example,
Size of shoe
Age
Size of vocabulary
Two events A, B are conditionally
independent given a third event C iff
P(A|B, C) = P(A|C)
Conditional Independence:
Definition
Let E1 and E2 be two events, they are
conditionally independent given E iff
P(E1|E, E2)=P(E1|E),
that is the probability of E1 is not
changed after knowing E2, given E is
true.
Equivalent formulations:
P(E1, E2|E)=P(E1|E) P(E2|E)
P(E2|E, E1)=P(E2|E)
Example: Play Tennis?OutlookTemperatureHumidityWindyClass
sunnyhot high false −
sunnyhot high true −
overcasthot high false +
rain mild high false +
rain cool normalfalse +
rain cool normaltrue −
overcastcool normaltrue +
sunnymild high false −
sunnycool normalfalse +
rain mild normalfalse +
sunnymild normaltrue +
overcastmild high true +
overcasthot normalfalse +
rain mild high true −
Predict playing tennis when <sunny, cool, high, strong>
What probability should be used to make the prediction?
How to compute the probability?
Probabilities of Individual
Attributes
Given the training set, we can compute the
probabilitiesOutlook + − Humidity+−
sunny 2/93/5 high 3/94/5
overcast 4/90 normal6/91/5
rain 3/92/5
Tempreature Windy
hot 2/92/5 true 3/93/5
mild 4/92/5 false6/92/5
cool 3/91/5
P(+) = 9/14
P(−) = 5/14
Naïve Bayes Method
Knowledge Base contains
A set of hypotheses
A set of evidences
Probability of an evidence given a hypothesis
Given
A sub set of the evidences known to be present in
a situation
Find
the hypothesis with the highest posterior
probability: P(H|E
1, E
2, …, E
k).
The probability itself does not matter so much.
Naïve Bayes Method
Assumptions
Hypotheses are exhaustive and mutually
exclusive
H
1v H
2 v … v H
k
¬(H
i^ H
j) for any i≠j
Evidences are conditionally independent
given a hypothesis
P(E
1, E
2,…, E
k|H) = P(E
1|H)…P(E
k|H)
P(H | E
1, E
2,…, E
k)
= P(E
1, E
2,…, E
k, H)/P(E
1, E
2,…, E
k)
= P(E
1, E
2,…, E
k|H)P(H)/P(E
1, E
2,…, E
k)
Naïve Bayes Method
The goal is to find H that maximize P(H|E
1, E
2,…, E
k)
Since
P(H|E
1, E
2,…, E
k) = P(E
1, E
2,…, E
k|H)P(H)/P(E
1, E
2,…, E
k)
and P(E
1, E
2,…, E
k) is the same for different hypotheses,
Maximizing P(H|E
1, E
2,…, E
k) is equivalent to maximizing
P(E
1, E
2,…, E
k|H)P(H)= P(E
1|H)…P(E
k|H)P(H)
Naïve Bayes Method
Find a hypothesis that maximizes P(E
1|H)…P(E
k|H)P(H)
Example: Play Tennis
P(+| sunny, cool, high, strong) vs.
P(−| sunny, cool, high, strong)
P(sunny|+)P(cool|+)P(high|+)P(strong|+)P(+) vs.
P(sunny|−)P(cool|−)P(high|−)P(strong|−)P(−)Outlook + − Humidity+ −
sunny 2/93/5 high 3/94/5
overcast 4/9 0 normal6/91/5
rain 3/92/5
Tempreature Windy
hot 2/92/5 true 3/93/5
mild 4/92/5 false 6/92/5
cool 3/91/5
P(+) = 9/14
P(−) = 5/14
Application: Spam Detection
Spam
Dear sir, We want to transfer to overseas ($ 126,000.000.00
USD) One hundred and Twenty six million United States
Dollars) from a Bank in Africa, I want to ask you to quietly
look for a reliable and honest person who will be capable
and fit to provide either an existing ……
Legitimate email
Ham: for lack of better name.
Hypotheses: {Spam, Ham}
Evidence: a document
The document is treated as a set (or bag) of
words
Knowledge
P(Spam)
The prior probability of an e-mail message being a spam.
How to estimate this probability?
P(w|Spam)
the probability that a word is w if we know w is chosen
from a spam.
How to estimate this probability?
Limitations of Naïve Bayesian
Cannot handle hypotheses of composite
hypotheses well
Suppose are independent of each
other
Consider a composite hypothesis
How to compute the posterior probability? ),...,|^(
121 lEEHHP nHH,...,
1 21^HH
Using the Bayes’ Theoremtindependen are they because )()()^(
2121 HPHPHHP ?)^|( compute toHow
21
HHEP
j ),...(
)^()^|,...(
),...,|^(
1
21211
121
l
l
l
EEP
HHPHHEEP
EEHHP 21
211211
^given t,independen are assuming
)^|()^|,...(
HHE
HHEPHHEEP
j
j
l
jl
but this is a very unreasonable assumption
Need a better representation and a better
assumption ),...,|( ),...,|( ),...,|^(
1211121 lll EEHPEEHPEEHHP ?,...,given t,independen are ,..., Assuming
11 ln EEHH
E: earth quake B: burglar
A: alarm set off
E and B are independent
But when A is given, they
are (adversely) dependent
because they become
competitors to explain A
P(B|A, E) <<P(B|A)
E explains away of A
Cannot handle causal chaining
Ex. A: weather of the year
B: cotton production of the year
C: cotton price of next year
Observed: A influences C
The influence is not direct (A -> B -> C)
P(C|B, A) = P(C|B): instantiation of B
blocks influence of A on C
Summary
Basics of Probability Theory
Experiment, sample space, events
Axioms and prosperities
Joint Probability
Conditional Probability
Probabilistic Reasoning
Bayes Theorem
Dutch Book Theorem
Independence and Conditional Independence
Naïve Bayes Method