Bayesian inference

5,582 views 22 slides Oct 15, 2019
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

Very useful in algorithms.


Slide Content

BAYESIAN INFERENCE
Chartha. Gaglani.

CONTENTS
1.Introduction
2.Likelihood function
3.Example
4.Prior probability distribution
5.Introduction to Naïve Bayes
6.Applications
7.Advantages
8.Disadvantages

INTRODUCTION
•Bayesianinferenceisamethodofstatisticalinferenceinwhich
Bayes'theoremisusedtoupdatetheprobabilityforahypothesisas
moreevidenceorinformationbecomesavailable.
•Bayesianinferenceisanimportanttechniqueinstatistics,and
especiallyinmathematicalstatistics.
•Bayesianinferencehasfoundapplicationinawiderangeof
activities,includingscience,engineering,philosophy,medicine,
sport,andlaw.
•Inthephilosophyofdecisiontheory,Bayesianinferenceisclosely
relatedtosubjectiveprobability,oftencalled"Bayesianprobability".

•Bayestheoremadjustsprobabilitiesgivennewevidenceinthe
followingway:
P(H0|E)= P(E|H0) P(H0)/ P(E)
•Where,H0representsthehypothesis,calledanullhypothesis,
inferredbeforenewevidence
•P(H0)iscalledthepriorprobabilityofH0.
•P(E|H0)iscalledtheconditionalprobability
•P(E)iscalledthemarginalprobabilityofE:theprobabilityof
witnessingthenewevidence.
•P(H0|E)iscalledtheposteriorprobabilityofH0givenE.
•ThefactorP(E|H0)/P(E)representstheimpactthatthe
evidencehasonthebeliefinthehypothesis.

•MultiplyingthepriorprobabilityP(H0)bythefactor
P(E|H0)/P(E)willnevertheyieldaprobabilitythatisgreater
than1.
•SinceP(E)isatleastasgreatasP(E∩H0),whichequalsto
P(E|H0).P(H0),replacingP(E)withP(E∩H0)inthefactor
P(E|H0)/P(E)willyieldaposteriorprobabilityof1.
•Therefore,theposteriorprobabilitycouldyieldaprobability
greaterthan1onlyifP(E)werelessthanP(E∩H0)whichis
nevertrue.

LIKELIHOOD FUNCTION
•TheprobabilityofEgivenH0,P(E|H0),canberepresentedas
functionofitssecondargumentwithitsfirstargumentheldat
agivenvalue.Suchafunctioniscalledlikelihoodfunction;it
isafunctionofH0givenE.Aratiooftwolikelihoodfunctions
iscalledalikelihoodratio,
˄= L(H0|E)/L(not H0|E)=P(E|H0)/P(E| not H0)
•Themarginalprobability,P(E),canalsoberepresentedasthe
sumoftheproductofallprobabilitiesofmutuallyexclusive
hypothesisandcorrespondingconditionalprobabilities:
P(E| H0) P(H0)+ P(E| not H0) P(not H0)

•Asaresult,wecanrewriteBayesTheoremas
P(H0|E)=P(E|H0) P(H0)/P(E|H0) P(H0)+ P(E| not H0) P(not
H0)-˄P(H0)/˄P(H0)| P(not H0)
•WithtwoindependentpiecesofevidenceE1andE2,Bayesian
inferencecanbeappliediteratively.
•Wecouldusethefirstpieceofevidencetocalculateaninitial
posteriorprobability,andthenusethatposteriorprobabilityas
anewpriorprobabilitytocalculateasecondposterior
probabilitygiventhesecondpieceofevidence.

•Independence of evidence implies that,
P(E1, E2| H0)= P(E1| H0) * P(E2|H0)
P(E1, E2)= P(E1)* P(E2)
P(E1, E2| not H0)= P(E1| not H0) * P(E2| not H0)

From which bowl is the cookie?
•Supposetherearetwofullbowlsofcookies.Bowl#1has10
chocolatechipand30plaincookies,whilebowl#2has20ofeach.
OurfriendHardikapicksabowlatrandom,andthenpicksacookie
atrandom.WemayassumethereisnoreasontobelieveHardika
treatsonebowldifferentlyfromanother,likewiseforthecookies.
Thecookieturnsouttobeaplainone.Howprobableisitthat
Hardikapickeditoutofbowl#1?
Bowl#1 Bowl#2

•Intuitively,itseemsclearthattheanswershouldbemorethan
ahalf,sincetherearemoreplaincookiesinbowl#1.
•ThepreciseanswerisgivenbyBayes'theorem.LetH1
correspondtobowl#1,andH2tobowl#2.
•ItisgiventhatthebowlsareidenticalfromHardika’spointof
view,thusP(H1)=P(H2)andthetwomustaddupto1,so
bothareequalto0.5.
•TheDistheobservationofaplaincookie.
•Fromthecontentsofthebowls,weknowthat
P(D|H1)=30/40=0.75andP(D|H2)=20/40=0.5

•Bayes formula then yields,
P(H1| D)= P(H1)*P(D|H1)/ P(H1)*P(D|H1)+ P(H2)*P(D|H2)
= 0.5* 0.75/ 0.5*0.75+0.5*0.5
= 0.6
•Before observing the cookie, the probability that Hardika
chose bowl#1 is the prior probability, P(H1) which is 0.5.
After observing the cookie, we revise the probability as 0.6.
•Its worth noting that our belief that observing the plain cookie
should somewhat affect the prior probability P(H1) has formed
the posterior probability P(H1| D), increased from 0.5 to 0.6

•Thisreflectsourintuitionthatthecookieismorelikelyfrom
thebowl#1,sinceithasahigherratioofplaintochocolate
cookiesthantheother.

PRIOR PROBABILITY DISTRIBUTION
•InBayesianstatisticalinference,apriorprobability
distribution,oftencalledsimplytheprior,ofanuncertain
quantityp(Fore.g.supposepistheproportionofvoterswho
willvoteforMr.NarendraModiinafutureelection)isthe
probabilitydistributionthatwouldexpressonesuncertainty
aboutpbeforethedata(Fore.g.anelectionpoll)aretakeninto
account.
•Itismeanttoattributeuncertaintyratherthanrandomnessto
theuncertainquantity.

INTRODUCTION TO NAIVE BAYES
•Supposeyourdataconsistoffruits,describedbytheircolor
andshape.
•Bayesianclassifiersoperatebysaying“Ifyouseeafruitthatis
redandround,whichtypeoffruitmostlikelytobe,basedon
theobserveddatasample?Infuture,classifyredandround
fruitasthattypeoffruit.”
•Adifficultyariseswhenyouhavemorethanafewvariables
andclasses-youwouldrequireanenormousnumberof
observationstoestimatetheseprobabilities.

•NaïveBayesclassifierassumethattheeffectofavariable
valueonagivenclassisindependentofthevaluesofother
variable.
•Thisassumptioniscalledclassconditionalindependence.
•Itismadetosimplifythecomputationandinthissense
consideredtobeNaïve.

APPLICATIONS
1.Computerapplications
•Bayesianinferencehasapplicationsinartificial
intelligenceandexpertsystems.
•Bayesianinferencetechniqueshavebeenafundamental
partofcomputerizedpatternrecognitiontechniquessince
thelate1950s.
•RecentlyBayesianinferencehasgainedpopularityamong
thephylogeneticscommunityforthesereasons;anumber
ofapplicationsallowmanydemographicandevolutionary
parameterstobeestimatedsimultaneously.

2.Bioinformaticsapplications
•Bayesianinferencehasbeenappliedindifferent
Bioinformaticsapplications,includingdifferentially
geneexpressionanalysis,single-cellclassification,
cancersubtyping,andetc.

ADVANTAGES
•Includinggoodinformationshouldimprove
prediction.
•Includingstructurecanallowthemethodto
incorporatemoredata(forexample,hierarchical
modelingallowspartialpoolingsothatexternaldata
canbeincludedinamodeleveniftheseexternal
datashareonlysomecharacteristicswiththecurrent
databeingmodeled).

DISADVANTAGES
•Ifthepriorinformationiswrong,itcansend
inferencesinthewrongdirection.
•Bayesinferencecombinesdifferentsourcesof
information;thusitisnolongeran
encapsulationofaparticulardataset(whichis
sometimesdesired,forreasonsthatgobeyond
immediatepredictiveaccuracyandinstead
touchonissuesofstatisticalcommunication).

THANK YOU.