Bayesian regression algorithm for machine learning

SivaSankar306103 98 views 14 slides Mar 05, 2025
Slide 1
Slide 1 of 14
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14

About This Presentation

bayesian regression is a probablistic method using in machine learning algorithms


Slide Content

Bayesian Linear Regression

BayesRule
•Estimatestheposteriorprobabilityofanevent,basedonprior
knowledgeofevidences(observations)
•Event/Hypothesis(h):OutcomeofExperiment
Ex:OccurrenceofRain:(Rain=YES,Rain=NO)
•Evidence/Data(d):Observations
Ex:ClimaticConditions:(Darkcloud=YES,Wind=speedy)
•FindsposteriorprobabilityofRaineventbasedonpriorknowledgeof
climaticconditions

Bayes’ Rule)(
)()|(
)|(
dP
hPhdP
dhp  ) data the seen having after hypothesis ofty (probabili posterior
data) the ofy probabilit (marginal evidence data
is hypothesis the if data the ofty (probabili likelihood
data)any seeing before hypothesis ofty (probabili belief prior
dh
h
h
:)|(
:)()|()(
true) :)|(
:)(
dhP
hPhdPdP
hdP
hP
h
 sides both on
y probabilitjoint same the
),(),(
)()|()()|(
grearrangin -
(model) hypothesish
datad
rule Bayes'ingUnderstand
hdPhdP
hPhdPdPdhp



•Ex:)(
)()|(
)|(
CP
YesRainPYesRainCP
CYesRainP

 )conditionsclimatic the on based Rain getting ofty (probabili Posterior
evidence? getting ofy probabilit the is what(Rain) outcome past the (Given likelihood
) Rain the getrting ofty (probabili belief prior
:)|(
evidence thegetting ofy probabilitPrior :)()|()(
) :)|(
:)(
CRainP
RainPRainCPCP
RainCP
RainP
h


Example of BayesTheorem
•Given:
•A doctor knows that covid causes fever 50% of the time
•Prior probability of any patient having covid is 1/50,000
•Prior probability of any patient having fever is 1/20
•If a patient has fever, what’s the probability he/she has covid?0002.0
20/1
50000/15.0
)(
)()|(
)|( 


SP
MPMSP
SMP

Linear Regression
•Thelinearregressionmodelassumesthattheresponsevariable(y)isa
linearcombinationofweightsmultipliedbyasetofpredictorvariables
(x).Thefullformulaalsoincludesanerrortermtoaccountforrandom
samplingnoise.Forexample,ifwehavetwopredictors,theequation
is
•yistheresponsevariable(alsocalledthedependentvariable)
•β’saretheweights(knownasthemodelparameters),
•x’sarethevaluesofthepredictorvariables,and
•εisanerrortermrepresentingrandomsamplingnoiseortheeffectof
variablesnotincludedinthemodel.

•Wecangeneralizethelinearmodeltoanynumberofpredictorsusing
matrixequations.Addingaconstanttermof1tothepredictormatrixto
accountfortheintercept,
•wecanwritethematrixformulaas:
•Thegoaloflearningalinearmodelfromtrainingdataistofindthe
coefficients,β,thatbestexplainthedata.Inlinearregression,thebest
explanationistakentomeanthecoefficients,β,thatminimizetheMean
SquareError(MSE)orResidualSumofSquares(RSS).

•RSSisthetotalofthesquareddifferencesbetweentheknownvalues
(y)andthepredictedmodeloutputs(ŷ,pronouncedy-hatindicatingan
estimate).Theresidualsumofsquaresisafunctionofthemodel
parameters:
•Thisequationhasaclosedformsolutionforthemodelparameters,β,
thatminimizetheerror.Thisisknownasthemaximumlikelihood
estimateofβbecauseitisthevaluethatisthemostprobablegiventhe
inputs,X,andoutputs,y.

•Theclosedformsolutionexpressedinmatrixformis:
•Thismethodoffittingthemodelparametersbyminimizingthe
RSS/MSEiscalledOrdinaryLeastSquares(OLS).
•Whatweobtainfromfrequentistlinearregressionisasingleestimate
forthemodelparametersbasedonlyonthetrainingdata.Ourmodelis
completelyinformedbythedata:inthisview,everythingthatweneed
toknowforourmodelisencodedinthetrainingdatawehave
available.
•Oncewehaveβ-hat,wecanestimatetheoutputvalueofanynewdata
point by applying our model equation:

Bayesian Linear Regression
•IntheBayesianviewpoint,weformulatelinearregressionusing
probabilitydistributionsratherthanpointestimates.Theresponse,y,is
notestimatedasasinglevalue,butisassumedtobedrawnfroma
probabilitydistribution.ThemodelforBayesianLinearRegression
withtheresponsesampledfromanormaldistributionis:
•Theoutput,yisgeneratedfromanormal(Gaussian)Distribution
characterizedbyameanandvariance.Themeanforlinearregression
isthetransposeoftheweightmatrixmultipliedbythepredictor
matrix.Thevarianceisthesquareofthestandarddeviationσ
(multipliedbytheIdentitymatrixbecausethisisamulti-dimensional
formulationofthemodel).

•TheaimofBayesianLinearRegressionisnottofindthesingle“best”
valueofthemodelparameters,butrathertodeterminetheposterior
distributionforthemodelparameters.
•Notonlyistheresponsegeneratedfromaprobabilitydistribution,but
themodelparametersareassumedtocomefromadistributionaswell.

•Theposteriorprobabilityofthemodelparametersisconditionalupon
thetraininginputsandoutputs:
Here,P(β|y,X)istheposteriorprobabilitydistributionofthemodel
parametersgiventheinputsandoutputs.
Thisisequaltothelikelihoodofthedata,P(y|β,X),multipliedbythe
priorprobabilityoftheparametersanddividedbya
normalization constant.

•ThisisasimpleexpressionofBayesTheorem,thefundamental
underpinningofBayesianInference:
•IncontrasttoOLS,wehaveaposteriordistributionforthemodel
parametersthatisproportionaltothelikelihoodofthedatamultiplied
bythepriorprobabilityoftheparameters

PrimarybenefitsofBayesianLinear
Regression.
•Priors:Ifwehavedomainknowledge,oraguessforwhatthemodel
parametersshouldbe,wecanincludetheminourmodel,unlikeinthe
frequentistapproachwhichassumeseverythingthereistoknowabout
theparameterscomesfromthedata.Ifwedon’thaveanyestimates
aheadoftime,wecanusenon-informativepriorsfortheparameters
suchasanormaldistribution.
•Posterior:TheresultofperformingBayesianLinearRegressionisa
distributionofpossiblemodelparametersbasedonthedataandthe
prior.Thisallowsustoquantifyouruncertaintyaboutthemodel:ifwe
havefewerdatapoints,theposteriordistributionwillbemorespread
out.
Tags