Module 4_Machine Learning_Evaluating Hyp

DrShivashankar1 869 views 24 slides Aug 13, 2024
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

Module 4_Machine Learning_Evaluating Hypothesis


Slide Content

MACHINE LEARNING (INTEGRATED)
(21ISE62)
Module 4
Dr. Shivashankar
Professor
Department of Information Science & Engineering
GLOBAL ACADEMY OF TECHNOLOGY-Bengaluru
8/13/2024 1Dr. Shivashankar, ISE, GAT
GLOBAL ACADEMY OF TECHNOLOGY
Ideal Homes Township, RajarajeshwariNagar, Bengaluru –560 098
Department of Information Science & Engineering

Course Outcomes
AfterCompletionofthecourse,studentwillbeableto:
IllustrateRegressionTechniquesandDecisionTreeLearning
Algorithm.
ApplySVM,ANNandKNNalgorithmtosolveappropriateproblems.
ApplyBayesianTechniquesandderiveeffectivelearningrules.
IllustrateperformanceofAIandMLalgorithmsusingevaluation
techniques.
Understandreinforcementlearninganditsapplicationinrealworld
problems.
TextBook:
1.TomM.Mitchell,MachineLearning,McGrawHillEducation,IndiaEdition2013.
2.EthemAlpaydın,Introductiontomachinelearning,MITpress,Secondedition.
3.Pang-NingTan,MichaelSteinbach,VipinKumar,IntroductiontoDataMining,
Pearson,FirstImpression,2014.
8/13/2024 2Dr. Shivashankar, ISE, GAT

Module 4: Evaluating Hypothesis
•Ahypothesisisamathematicalfunctionormodelthatconvertsinputdataintooutput
predictions.
•Thehypothesisistypicallyexpressedasacollectionofparameterscharacterizingthe
behaviorofthemodel.
•Machinelearninginvolvesconductingexperimentsbasedonpastexperiences,and
thesehypothesesarecrucialinformulatingpotentialsolutions.
•Ahypothesisinmachinelearningisthemodel’spresumptionregardingtheconnection
betweentheinputfeaturesandtheresult.
The following are the necessary steps to evaluate hypothesis
Evaluatingtheaccuracyofhypothesesisfundamentaltomachinelearning.-reasons:
Observedaccuracyofahypothesisoveralimitedsampleofdata,howwelldoesthis
estimateitsaccuracyoveradditionalexamples?
Onehypothesisoutperformsanotheroversomesampleofdata,howprobableisit
thatthishypothesisismoreaccurateingeneral?
whendataislimitedwhatisthebestwaytousethisdatatobothlearnahypothesis
andestimateitsaccuracy?Becauselimitedsamplesofdatamightmisrepresentthe
generaldistributionofdata,estimatingtrueaccuracyfromsuchsamplescanbe
misleading.
8/13/2024 3Dr. Shivashankar, ISE, GAT

ESTIMATING HYPOTHESIS ACCURACY
•Whenevaluatingalearnedhypothesiswearemostofteninterestedin
estimatingtheaccuracywithwhichitwillclassifyfutureinstances.
•Atthesametime,wewouldliketoknowtheprobableerrorinthisaccuracy
estimate(i.e.,whaterrorbarsto
•LetusconsiderXaspositiveornegativeexample;itonlydetentionsthe
probabilitythatxwillbeencountered.
•Thelearningtaskistolearnthetargetconceptortargetfunctionfby
consideringaspaceHofpossiblehypotheses.
•Trainingexamplesofthetargetfunctionfareprovidedtothelearnerbya
trainerwhodrawseachinstanceindependently,accordingtothedistribution
D,andwhothenforwardstheinstancexalongwithitscorrecttargetvaluef(x)
tothelearner.associatewiththisestimate).
•ThedistributionDspecifiesforeachpersonxtheprobabilitythatxwillbe
encounteredasthenextpersonarrivingtotheevent.
•Thetargetfunctionf:X+{O,1}classifieseachpersonaccordingtowhether
ornottheyplan.
8/13/2024 4Dr. Shivashankar, ISE, GAT

ESTIMATING HYPOTHESIS ACCURACY
•Whenevaluatingalearnedhypothesiswearemostofteninterestedin
estimatingtheaccuracywithwhichitwillclassifyfutureinstances.
•Atthesametime,wewouldliketoknowtheprobableerrorinthisaccuracy
estimate(i.e.,whaterrorbarsto
•LetusconsiderXaspositiveornegativeexample;itonlydetentionsthe
probabilitythatxwillbeencountered.
•Thelearningtaskistolearnthetargetconceptortargetfunctionfby
consideringaspaceHofpossiblehypotheses.
•Trainingexamplesofthetargetfunctionfareprovidedtothelearnerbya
trainerwhodrawseachinstanceindependently,accordingtothedistribution
D,andwhothenforwardstheinstancexalongwithitscorrecttargetvaluef(x)
tothelearner.associatewiththisestimate).
•ThedistributionDspecifiesforeachpersonxtheprobabilitythatxwillbe
encounteredasthenextpersonarrivingtotheevent.
•Thetargetfunctionf:X+{O,1}classifieseachpersonaccordingtowhether
ornottheyplan.
8/13/2024 5Dr. Shivashankar, ISE, GAT

Sample Error and True Error
•Oneistheerrorrateofthehypothesisoverthesampleofdata
thatisavailable.
•Theotheristheerrorrateofthehypothesisovertheentire
unknowndistributionDofexamples.
•ThesampleerrorofahypothesiswithrespecttosomesampleS
ofinstancesdrawnfromXisthefractionofSthatitmisclassifies:
•Thesampleerror(�����
�(h))ofhypothesishwithrespectto
targetfunctionfanddatasampleSis
�����
�ℎ=
1
�

�????????????
??????��,ℎ(�)
•WherenisthenumberofexamplesinS,andthequantity??????(f(x),
h(x))is1iff(x)≠h(x),and0otherwise.
8/13/2024 6Dr. Shivashankar, ISE, GAT

Conti..
•Thetrueerrorofahypothesisistheprobabilitythatitwill
misclassifyasinglerandomlydrawninstancefromthe
distributionD.
•Thetrueerror(�����
??????(h))ofhypothesishwithrespecttotarget
functionfanddistributionD,istheprobabilitythathwill
misclassifyaninstancedrawnatrandomaccordingtoD.
�����
??????ℎ=??????�
�????????????
��≠ℎ�
??????�
�????????????
denotesthattheprobabilityistakenovertheinstance
distributionD.
Weusuallywishtoknowisthetrue�����
??????ℎofthehypothesis,
becausethisistheerrorwecanexpectwhenapplyingthe
hypothesistofutureexamples.
8/13/2024 7Dr. Shivashankar, ISE, GAT

Confidence Intervals for Discrete-Valued Hypotheses
•"Howgoodanestimateof�����
??????(ℎ)isprovidedby�����
??????(ℎ)?”forthecase
inwhichhisadiscrete-valuedhypothesis.
•supposewewishtoestimatethetrueerrorforsomediscretevalued
hypothesish,basedonitsobservedsampleerroroverasampleS,where
thesampleScontainsnexamplesdrawnindependentofoneanother,and
independentofh,accordingtotheprobabilitydistributionD.
n≥30
hypothesishcommitsrerrorsoverthesenexamples(i.e.,�����
�(h)=r/n).
Undertheseconditions,statisticaltheoryallowsustomakethefollowing
assertions:
1.Givennootherinformation,themostprobablevalueof�����
??????ℎis
�����
�(h)
2.Withapproximately95%probability,thetrue�����
??????ℎliesintheinterval
�����
�ℎ±1.96
�����
�(ℎ)1−�����
�(ℎ)
�
The95%confidenceintervalcanbegeneralizedtoanydesiredconfidencelevel.
Theconstant1.96isusedincasewedesirea95%confidenceinterval.
8/13/2024 8Dr. Shivashankar, ISE, GAT

Conti…
Adifferentconstant,�
??????,isusedtocalculatetheN%confidence
interval.
ThegeneralexpressionforapproximateN%confidenceintervalsfor
�����
�ℎis
�����
�ℎ±�
??????
�����
�(ℎ)1−�����
�(ℎ)
�
wheretheconstant�
??????ischosendependingonthedesired
confidencelevel
8/13/2024 9Dr. Shivashankar, ISE, GAT

BASICS OF SAMPLING THEORY
•Arandomvariablecanbeviewedasthenameofanexperiment
withaprobabilisticoutcome.Itsvalueistheoutcomeofthe
experiment.
•AprobabilitydistributionforarandomvariableYspecifiesthe
probabilityPr(Y=�
??????)thatYwilltakeonthevalue�
??????,foreach
possiblevalue�
??????.
•Theexpectedvalue,ormean,ofarandomvariableYisE[Y]=
σ
??????�
??????Pr(Y=�
??????)
•ThevarianceofarandomvariableisVar(Y)=E[�−??????�
2
].The
variancecharacterizesthewidthordispersionofthedistribution
aboutitsmean.
•ThestandarddeviationofYis??????��(�)
•TheBinomialdistributiongivestheprobabilityofobservingrheads
inaseriesofnindependentcointosses,iftheprobabilityofheads
inasingletossisP.
8/13/2024 10Dr. Shivashankar, ISE, GAT

Conti..
•TheNormaldistributionisabell-shapedprobabilitydistribution
thatcoversmanynaturalphenomena.
•TheCentralLimitTheoremisatheoremstatingthatthesumofa
largenumberofindependent,identicallydistributedrandom
variablesapproximatelyfollowsaNormaldistribution.
•AnestimatorisarandomvariableYusedtoestimatesome
parameterPofanunderlyingpopulation.
•TheestimationbiasofYasanestimatorforPisthequantity
(E(Y)]-P).Anunbiasedestimatorisoneforwhichthebiasiszero.
•AN%confidenceintervalestimateforparameterpisaninterval
thatincludespwithprobabilityN%.
8/13/2024 11Dr. Shivashankar, ISE, GAT

Error Estimation and Estimating Binomial Proportions
•WefirstcollectarandomsampleSofnindependentlydrawninstances
fromthedistributionD,andthenmeasurethesample�����
�(h).
•Ifweweretorepeatthisexperimentmanytimes,eachtimedrawinga
differentrandomsample�
??????ofsizen,wewouldexpecttoobservedifferent
valuesforthevarious�����
�(h)dependingonrandomdifferencesinthe
makeupofthevarious�
??????.
•Wesayinsuchcasesthat�����
�??????(h),theoutcomeofthe??????
�ℎ
such
experiment,isarandomvariable.
•Imaginethatweweretorunksuchrandomexperiments,measuringthe
randomvariables�����
�1
ℎ,�����
�2
ℎ,�����
�3
ℎ,……,�����
�??????
ℎ.
•ParticularprobabilitydistributioncalledtheBinomialdistribution.
•ABinomialdistributiongivestheprobabilityofobservingrheads
inasampleofnindependentcointosses,whentheprobabilityof
headsonasinglecointossisp.Itisdefinedbytheprobability
function:??????(??????)=
??????!
??????!(??????−??????)!
??????
??????
??????−??????
??????−??????
8/13/2024 12Dr. Shivashankar, ISE, GAT

Cont…
•IftherandomvariableXfollowsaBinomialdistribution,then:
•TheprobabilityPr(X=r)thatXwilltakeonthevaluerisgivenbyP(r)
•Theexpected,ormeanvalueofX,E[X]=np
•ThevarianceofX,Var(X)=np(1-p)
•ThestandarddeviationofX,??????
??????=????????????(??????−??????)
•ForsufficientlylargevaluesofntheBinomialdistributionisclosely
approximatedbyaNormaldistributionwiththesamemeanandvariance.
•MoststatisticiansrecommendusingtheNormalapproximationonlywhen
np(1-p)≥5.
8/13/2024 13Dr. Shivashankar, ISE, GAT
Fig4.2:ABinomial
distributionforKrandom
experiments

The Binomial Distribution
•Thebinomialdistributionisthediscreteprobabilitydistribution
thatgivesonlytwopossibleresultsinanexperiment,
eitherSuccessorFailure.
•Itiscalculatedbymultiplyingthenumberofindependenttrialsby
theprobabilityofsuccess
•Thisdistributionisalsocalledabinomialprobabilitydistribution.
•Therearetwoparametersnandpusedhereinabinomial
distribution.
•Thevariable‘n’statesthenumberoftimestheexperimentruns
andthevariable‘p’tellstheprobabilityofanyoneoutcome.
•TheBinomialdistributiondependsonthespecificsamplesizen
andthespecificprobabilitypor�����
??????(h).
8/13/2024 14Dr. Shivashankar, ISE, GAT

Conti…
•ThegeneralsettingtowhichtheBinomialdistributionappliesis:
Aseriesofnindependenttrialsoftheunderlyingexperimentis
performed(e.g.,nindependentcointosses),producingthe
sequenceofindependent,identicallydistributedrandomvariables
�
1,�
2,�
3,…,�
�.LetRdenotethenumberoftrialsforwhich�
??????=1
inthisseriesofnexperiments.
�=෍
??????=1
�
�
??????
•TheprobabilitythattherandomvariableRwilltakeonaspecific
valuer(e.g.,theprobabilityofobservingexactlyrheads)isgiven
bytheBinomialdistribution
??????�(�−�)=
�!
�!(�−�)!
??????
�
1−�
�−�
•TheBinomialdistributioncharacterizestheprobabilityof
observingrheadsfromncoinflipexperiments.
8/13/2024 15Dr. Shivashankar, ISE, GAT

Mean and Variance
•The"mean"istheaveragevalueofadataset.Itiscalculatedby
addingupallthevaluesinthedatasetanddividingbythe
numberofobservations.
•Theexpectedvalueistheaverageofthevaluestakenonby
repeatedlysamplingtherandomvariable.
•Mean:ConsiderarandomvariableYthattakesonthepossible
values�
1,�
2,...�
�.TheexpectedvalueofY,E[Y]
•��=σ
??????=1
�
�
??????Pr(�−�
1)
•ifYtakesonthevalue1withprobability.7andthevalue2with
probability.3,thenitsexpectedvalueis(1.0.7+2.0.3=1.3).
8/13/2024 16Dr. Shivashankar, ISE, GAT

Conti.
IncasetherandomvariableYisgovernedbyaBinomial
distribution,thenitcanbeshownthatE[Y]=np,wherenandp
aretheparametersoftheBinomialdistribution.
Variance:referstothechangesinthemodelwhenusingdifferent
portionsofthetrainingdataset.Simplystated,varianceisthe
variabilityinthemodelprediction—howmuchtheMLfunctioncan
adjustdependingonthegivendataset.
capturesthe"widthor"spread"oftheprobabilitydistribution;that
is,itcaptureshowfartherandomvariableisexpectedtovaryfrom
itsmeanvalue.
ThevarianceofarandomvariableY,Var[Y]≡�[(�−�[�])
2
]
Thevariancedescribestheexpectedsquarederrorinusingasingle
observationofYtoestimateitsmeanE[Y].Thesquarerootofthe
varianceiscalledthestandarddeviationofY,denoted??????
�.
8/13/2024 17Dr. Shivashankar, ISE, GAT

Estimators, Bias and Variance
�����
�(h)=
�
�
:expectedvalueoftheestimator
�����
??????(h)=P:thetruevalueoftheparameter
wherenisthenumberofinstancesinthesampleS,risthenumberof
instancesfromSmisclassifiedbyh,andpistheprobabilityof
misclassifyingasingleinstancedrawnfromD.
Statisticianscall�����
�(h)anestimatorforthetrue�����
??????(h).
Sowedefineestimationbiastobethedifferencebetweenthe
expectedvalueoftheestimatorandthetruevalueoftheparameter.
Definition:TheestimationbiasofanestimatorYforanarbitrary
parameterpisE[Y]−??????
Iftheestimationbiasiszero,wesaythatYisanunbiasedestimatorfor
p.
Theestimationbiasisanumericalquantity,whereastheinductivebias
isasetofassertions.
8/13/2024 18Dr. Shivashankar, ISE, GAT

Confidence Intervals
•Confidenceintervalsareastatisticalconceptusedtoestimatethe
uncertaintyaroundameasurement.
•Typically,itprovidesarangeofvalues(lowerandupperbound)that
isbelievedtocontainthetruevalueofanunknownpopulation
parameter.
•Definition:AnN%confidenceintervalforsomeparameterpisan
intervalthatisexpectedwithprobabilityN%tocontainp.
•Toderivea95%confidenceinterval,weneedonlyfindtheinterval
centeredaroundthemeanvalue�����
??????(h)whichiswideenoughto
contain95%ofthetotalprobabilityunderthisdistribution.
•Thisprovidesanintervalsurrounding�����
??????(h)intowhich
�����
�(h)mustfall95%ofthetime.
•Equivalently,itprovidesthesizeoftheintervalsurrounding
�����
??????(h)intowhich�����
??????(h)mustfall95%ofthetime.
8/13/2024 19Dr. Shivashankar, ISE, GAT

A General Approach For Deriving Confidence Intervals
Toderiveconfidenceintervalestimatesforoneparticularcase:
estimating�����
??????(h)foradiscrete-valuedhypothesish,basedonasampleofn
independentlydrawninstance.
Aproblemofestimatingthemean(expectedvalue)ofapopulationbasedonthe
meanofarandomlydrawnsampleofsizen.
Thegeneralprocessincludesthefollowingsteps:
1.Identifytheunderlyingpopulationparameterptobeestimated,forexample,
�����
??????(h)
2.DefinetheestimatorY(e.g.,�����
�(h)).Itisdesirabletochooseaminimum
variance,unbiasedestimator.
3.Determinetheprobabilitydistribution�
�thatgovernstheestimatorY,
includingitsmeanandvariance.
4.DeterminetheN%confidenceintervalbyfindingthresholdsLandUsuch
thatN%ofthemassintheprobabilitydistribution�
�falls
betweenLandU.
8/13/2024 20Dr. Shivashankar, ISE, GAT

Cont…
Central Limit Theorem
•AttemptstoderiveconfidenceintervalsistheCentralLimitTheorem.
•Theorem5.1.:Considerasetofindependent,identicallydistributedrandom
variables�
1,�
2,...�
�governedbyanarbitraryprobabilitydistributionwith
mean??????andfinitevariance??????
2
.
•Definethesamplemean
•�
�=
1
�
σ
??????=1
�
�
??????
•Thenasn→∞co,thedistributiongoverning=

??????
??????−??????
??????
??????
•approachesaNormaldistribution,withzeromeanandstandarddeviation
equalto1.
•CentralLimitTheoremdescribeshowthemeanandvarianceofത�canbeused
todeterminethemeanandvarianceoftheindividual�
??????.
8/13/2024 21Dr. Shivashankar, ISE, GAT

Difference in error of two hypothesis
•Considerthecasewherewehavetwohypothesesℎ
1andℎ
2for
somediscretevaluedtargetfunction.
•Hypothesisℎ
1hasbeentestedonasample�
1containing
�
1randomlydrawnexamples,andℎ
2hasbeentestedonan
independentsampleℎ
2containing�
2examplesdrawnfromthe
samedistribution.Supposewewishtoestimatethedifferenced
betweenthetrueerrorsofthesetwohypotheses.
d = �����
??????(ℎ
1) -�����
??????(ℎ
2)
Theobviouschoiceforanestimatorinthiscaseisthedifference
betweenthesampleerrors,whichwedenotebyመ�
መ�≡�����
�1ℎ
1-�����
�2ℎ
2
Itcanbeshownthatመ�givesanunbiasedestimateofd;
thatisE[መ�]=d.
8/13/2024 22Dr. Shivashankar, ISE, GAT

Difference in error of two hypothesis
•Itcanalsobeshownthatthevarianceofthisdistributionisthe
sumofthevariancesof������
1(ℎ
1)and������
2(ℎ
2)
•Theapproximatevarianceofeachofthesedistributionisdefined
by
•??????
�
2

������1(ℎ1)(1−������1(ℎ1))
�
1
+
������2(ℎ2)(1−������2(ℎ2))
�
2
•Forarandomvariableመ�obeyingaNormaldistributionwith
meandandvariance??????
2
,theN%confidenceintervalestimatefor
disመ�±�
????????????.Usingtheapproximatevariance??????
�
2
;givenabove,
thisapproximateN%confidenceintervalestimatefordis
•መ�±�
????????????
������1(ℎ1)(1−������1(ℎ1))
�
1
+
������2(ℎ2)(1−������2(ℎ2))
�
2
•Where�
??????istheconstant.
8/13/2024 23Dr. Shivashankar, ISE, GAT

Cont…
•ℎ
1andℎ
2aretestedonasinglesampleS(whereSisstill
independentofhlandh2).Inthislatercase,weredefine2as
•መ�=errors(ℎ
1)-errors(ℎ
2)
•Thevarianceinthisnewመ�willusuallybesmallerthanthe
variancegivenbyEquation,whenweset�
1and�
2toS.
•ThisisbecauseusingasinglesampleSeliminatesthevariance
duetorandomdifferencesinthecompositionsof�
1and�
2.
8/13/2024 24Dr. Shivashankar, ISE, GAT