Evaluation Metrics for Classification and Regression

rahuljain582793 34 views 49 slides Feb 27, 2025
Slide 1
Slide 1 of 49
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49

About This Presentation

ML Evaluation Metrics


Slide Content

Course
Outcomes
Aftercompletionofthiscourse,studentswillbeableto
Understandmachine-learningconcepts.
UnderstandandimplementClassificationconcepts.
UnderstandandanalysethedifferentRegression
algorithms.
ApplytheconceptofUnsupervisedLearning.
ApplytheconceptsofArtificialNeuralNetworks.

Topics -
Supervised
Learning
Classification Techniques:
Naive Bayes Classification
Fitting Multivariate Bernoulli
Distribution
Gaussian Distribution and
Multinomial Distribution
K-Nearest Neighbours
Decision tree
Random Forest
EnsembleLearning
Support Vector Machines
Evaluation metrics for
Classification Techniques:
Confusion Matrix, Accuracy,
Precision, Recall, F1 Score,
Threshold, AUC-ROC
Regression Techniques:
Basic concepts and
applications of Regression
Simple Linear Regression -
Gradient Descent and Normal
Equation Method
Multiple Linear Regression
Non-Linear Regression
LinearRegression with
Regularization
Overfitting and Underfitting
Hyperparametertuning
Evaluation Measures for
Regression Techniques: MSE,
RMSE, MAE, R2

Evaluation metrics for
Classification
Techniques
Confusion Matrix, Accuracy, Precision, Recall, F1 Score,
Threshold, AUC-ROC

Confusion
Matrix
True Positive (TP): (1)the model predicts 1 and the actual class is 1
True Negative (TN): (0)the model predicts 0 and the actual class is 0
False Positive (FP): (1)the model predicts 1 but the actual class is 0
False Negative (FN): (0)the model predicts 0 but the actual class is 1

Confusion
Matrix

Example:
Thecasesinwhichthepatientsactuallyhaveheart
diseaseandourmodelalsopredictedashavingitare
calledtheTruePositives.Forourmatrix,TruePositives
=43
Thecasesinwhichthepatientsactuallydidnothave
heartdiseaseandourmodelalsopredictedasnot
havingitarecalledtheTrueNegatives.Forourmatrix,
TrueNegatives=33.

Example:
However,therearesomecaseswherethepatientactually
hasnoheartdisease,butourmodelhaspredictedthatthey
do.ThiskindoferroristheTypeIError,andwecallthe
valuesFalsePositives.Forourmatrix,FalsePositives=8
Similarly,therearesomecaseswherethepatientactually
hasheartdisease,butourmodelhaspredictedthathe/she
doesn’t.ThiskindoferrorisaTypeIIError,andwecallthe
valuesFalseNegatives.Forourmatrix,FalseNegatives=7

Accuracy
Accuracyistheratioofthetotalnumberofcorrect
predictionsandthetotalnumberofpredictions.
CanyouguesswhattheformulaforAccuracywillbe?

Accuracy
Forourmodel,Accuracywillbe=0.835.
Theremightbeothersituationswhereouraccuracyis
veryhigh,butourprecisionorrecallislow

Precision
Precisiondefinesofallthepredictionsy=1,whichones
arecorrect.
Inthesimplestterms,Precisionistheratiobetween
theTruePositivesandallthePositives.Forourproblem
statement,thatwouldbethemeasureofpatientsthat
wecorrectlyidentifyashavingaheartdiseaseoutofall
thepatientsactuallyhavingit.Mathematically:

Precision
WhatisthePrecisionforourmodel?
Yes,itis0.843,orwhenitpredictsthatapatienthas
heartdisease,itiscorrectaround84%ofthetime.
Precisionalsogivesusameasureoftherelevantdata
points.

Recall /
Sensitivity/
True Positive
Rate
Recalldefinesofalltheactualy=1,whichonesdidthe
modelpredictcorrectly
Therecallisthemeasureofourmodelcorrectly
identifyingTruePositives.Thus,forallthepatientswho
actuallyhaveheartdisease,recalltellsushowmanywe
correctlyidentifiedashavingaheartdisease.
Mathematically:

Recall /
Sensitivity/
True Positive
Rate
For our model, Recall= 0.86.
Recall also gives a measure of how accurately our
model is able to identify the relevant data.
We refer to it as Sensitivity or True Positive Rate.

F1-Score
F1-Scoreisameasurecombiningbothprecisionand
recall.
Itisgenerallydescribedastheharmonicmeanofthe
two.Harmonicmeanisjustanotherwaytocalculatean
“average”ofvalues,generallydescribedasmore
suitableforratios(suchasprecisionandrecall)thanthe
traditionalarithmeticmean.
TheformulausedforF1-scoreinthiscaseis:

F1-Score For our model, F1-Score = ??

Example to
understand
precision and
recall

Example
(Contd.)

Example
(Contd.)

Example
(Contd.)

Example
(Contd.)

Example
(Contd.)

Example
(Contd.)

Example
(Contd.)

Threshold /
The Tradeoff
PrecisionandRecallaretwoimportantmeasuresin
evaluatingtheperformanceofamachinelearningmodel,
especiallyinclassificationproblems.
Precisionfocusesonhowmanyoftheitemsselectedbythe
modelarerelevant.
Recallfocusesonhowmanyoftherelevantitemsare
selectedbythemodel.
Bychangingthethresholdvaluefortheclassifier
confidence,onecanadjusttheprecisionandrecallforthe
model.

Threshold /
The Tradeoff
Want to
Increase
Precision
Value
Precision
Try to Get
Low False
Positive
FP 
But Value of
False
Negative will
Increase
FN
Getting the
less value For
Recall
Recall
Want to
Increase
Recall Value
Recall 
Try to Get
Low False
Negative
FN 
But Value of
False Positive
will Increase
FP
Getting the
less value For
Precision
Precision 

Video that are
actually safe
for Kids are
correctly
Identify as safe
Goal: Mark Video as safe (no Violent or Adult Content)
Want to
Increase
Precision Value
Precision
Try to Get Low
False Positive
FP 
But Value of
False Negative
will Increase
FN
Getting the less
value For Recall
Recall
Want to
Increase
Precision Value
(Video that are
actually safe for
Kids are
correctly
Identify as safe)
Precision
Try to Get Low
False Positive
(Videos that are
not safe for the
Kids but are
incorrectly
identify as safe
by Model)
FP 
But Value of
False Negative
will Increase
(We are very
cautions about
marking of
Video as safe
so, Some time
safe videos
being wrongly
blocked or
marked as a
unsafe due to
any reason.
FN
Getting the less
value For Recall
Recall

Patients
Disease
detection and
if detected not
having disease
it should be
correct only
Want to
Increase
Recall Value
Recall 
Try to Get
Low False
Negative
FN 
But Value of
False Positive
will Increase
FP
Getting the
less value For
Precision
Precision 
Want to
Increase Recall
Value
(Patients
Disease
detection and if
detected not
having disease
it should be
correct only)
Recall 
Try to Get Low
False Negative
(Identify as
Patients having
a disease but
which is
incorrect. So try
to reduce case
where Patients
not having
disease and we
have correct
identification)
FN 
But Value of
False Positive
will Increase
(we try to
identify not
having disease
perfectly so
might be we
get small
problem we are
going to mark
them as disease
so in future we
don’t have any
problem.
Means making
as a disease is
More.
FP
Getting the less
value For
Precision
Precision 

YouTube's
restricted
mode as an
example to
explain high
precision
Goal:Ensure that videos marked as safe for kids are indeed
safe (no violent or adult content).
True Positives (TP): Videos that are actually safe for kids and
are correctly identified as safe by the model.
True Negatives (TN): Videos that are not safe for kids and
are correctly identified as not safe by the model
False Positives (FP): Videos that are not safe for kids but are
incorrectly identified as safe by the model.
False Negatives (FN): Videos that are safe for kids but are
incorrectly identified as not safe by the model.
.

Example
(Contd.)
Precisionistheratiooftruepositivestothesumof
truepositivesandfalsepositives:
Highprecisionmeansthatwhenthemodelidentifiesa
positivecase,itisverylikelycorrect.
TruePositives(TP):Videosthatareactuallysafefor
kidsandarecorrectlyidentifiedassafebythemodel.
FalsePositives(FP):Videosthatarenotsafeforkids
butareincorrectlyidentifiedassafebythemodel.

Example..
(Contd.)
Toachievehighprecision,YouTube’srestrictedmode
modelwould:
Onlymarkvideosassafeifitisveryconfidenttheyare
appropriateforkids.
Thisisachievedbybeingverycautious,whichmight
resultinsomesafevideosbeingwronglyblocked
(morefalsenegatives),butitensuresthatalmostno
inappropriatevideosaremarkedassafe.

Example..
(Contd.)
If more false negatives are there, Value of Recall will
be low.
So, if we want to achieve a High Precision , the value of
Recall is getting Low.

YouTube's
restricted
mode as an
example to
explain high
recall
Goal:Ensurethatallinappropriatevideos(violentoradult
content)arecorrectlyidentifiedandblockedfromkids.
TruePositives(TP):VideosthatareactuallyNotsafeforkidsand
arecorrectlyidentifiedasNotsafebythemodel.
TrueNegatives(TN):Videosthataresafeforkidsandarecorrectly
identifiedassafebythemodel
FalsePositives(FP):Videosthataresafeforkidsbutare
incorrectlyidentifiedasNotsafebythemodel.
FalseNegatives(FN):Videosthatarenotsafeforkidsbutare
incorrectlyidentifiedassafebythemodel.

YouTube's
restricted
mode as an
example to
explain high
recall
Goal:Ensurethatallinappropriatevideos(violentor
adultcontent)arecorrectlyidentifiedandblockedfrom
kids.
HighRecallFocus:
HighRecallmeansthatthemodelsuccessfully
identifiesmost,ifnotall,inappropriatevideos.This
reducesthenumberoffalsenegatives(FN).

Example..
(Contd.)
Toachievehighrecall,YouTube’srestrictedmodemodel
would:
Beveryinclusiveandstrictaboutblockingvideos.
Aimtocatcheverysingleinappropriatevideo,evenifit
meanssometimesblockingsafevideos.
Result:
Fewfalsenegatives(FN):Mostinappropriatevideoswillbe
correctlyidentifiedandblocked.
Morefalsepositives(FP):Somesafevideosmightbe
wronglyblocked,butthisisatradeofftoensurehighrecall.

Example..
(Contd.)
If more false Positive are there, Value of Precision
will be low.
So, if we want to achieve a High Recall, the value of
Precisionis getting Low.

Precision /
Recall tradeoff
Unfortunately,youcan’thavebothprecisionandrecall
high.Ifyouincreaseprecision,itwillreducerecall,and
viceversa.Thisiscalledtheprecision/recalltradeoff.

AUC-ROC
AnROCcurve(receiveroperatingcharacteristiccurve)isa
graphshowingtheperformanceofaclassificationmodelat
allclassificationthresholds.Thiscurveplotstwoparameters:
TruePositiveRate
FalsePositiveRate
TruePositiveRate(TPR)isasynonymforrecallandis
thereforedefinedasfollows:
FalsePositiveRate(FPR)isdefinedasfollows:

AUC-ROC
AnROCcurveplotsTPRvs.FPRatdifferent
classificationthresholds.Loweringtheclassification
thresholdclassifiesmoreitemsaspositive,thus
increasingbothFalsePositivesandTruePositives.The
followingfigureshowsatypicalROCcurve.

AUC-ROC
AUCstandsfor"AreaundertheROCCurve."Thatis,
AUCmeasurestheentiretwo-dimensionalarea
underneaththeentireROCcurve(thinkintegral
calculus)from(0,0)to(1,1).

AUC-ROC
AUCrepresentstheprobabilitythatarandompositive
(green)exampleispositionedtotherightofarandom
negative(red)example.
AUCrangesinvaluefrom0to1.Amodelwhose
predictionsare100%wronghasanAUCof0.0;one
whosepredictionsare100%correcthasanAUCof1.0.

Evaluation Measures
for Regression
Techniques
MSE, RMSE, MAE, R2

Evaluation
Measures for
Regression
Techniques
Commonregressionevaluationmetricsforregression
include
MeanAbsoluteError(MAE)
MeanSquaredError(MSE)
RootMeanSquaredError(RMSE)
R-squared(CoefficientofDetermination)(R2)

Why We
Require
Evaluation
Metrics?
Insimplewords,Regressioncanbedefinedasa
Machinelearningproblemwherewehavetopredict
continuousvalueslikeprice,Rating,Fees,etc.
Itisnecessarytoobtaintheaccuracyontrainingdata,
Butitisalsoimportanttogetagenuineand
approximateresultonunseendataotherwiseModelis
ofnouse.

Mean Absolute
Error (MAE)
MAEisaverysimplemetricwhichcalculatesthe
absolutedifferencebetweenactualandpredicted
values.
MAEisbasicallyamistakemadebythemodelknown
asanerror.
so,sumalltheerrorsanddividethembyatotal
numberofobservationsAndthisisMAE.
weaimtogetaminimumMAEbecausethisisaloss.

Mean Squared
Error (MSE)
Meansquarederrorstatesthatfindingthesquared
differencebetweenactualandpredictedvalue.
Itrepresentsthesquareddistancebetweenactualand
predictedvalues.
Weperformsquaredtoavoidthecancellationof
negativetermsanditisthebenefitofMSE.

Root Mean
Squared Error
(RMSE)
AsRMSEisclearbythenameitself,thatitisasimple
squarerootofmeansquarederror.

R Squared (R2)
(Coefficient of
Determination)
MAEandMSEdependonthecontextaswehaveseen
whereastheR2scoreisindependentofcontext.
withhelpofRsquaredwehaveabaselinemodelto
compareamodelwhichnoneoftheothermetrics
provides.
Thesamewehaveinclassificationproblemswhichwe
callathresholdwhichisfixedat0.5.
Hence,R2squaredisalsoknownasCoefficientof
DeterminationorsometimesalsoknownasGoodness
offit.

R Squared (R2)
(Coefficient of
Determination)
ThevalueofR-squareliesbetween0to1.
WherewegetR-squareequals1whenthemodel
perfectlyfitsthedataandthereisnodifference
betweenthepredictedvalueandactualvalue.
However,wegetR-squareequals0whenthemodel
doesnotpredictanyvariabilityinthemodelanditdoes
notlearnanyrelationshipbetweenthedependentand
independentvariables.

R Squared (R2)
(Coefficient of
Determination)
Sowecanconcludethatasourregressionlinemoves
towardsperfection,R2scoremovetowardsone.And
themodelperformanceimproves.
ThenormalcaseiswhentheR2scoreisbetweenzero
andonelike0.8whichmeansyourmodeliscapableto
explain80percentofthevarianceofdata.
Tags