1_Standard error Experimental Data_ML.ppt

VGaneshKarthikeyan 12 views 20 slides Jun 29, 2024
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

Standard Error Experimental Data


Slide Content

Evaluating Hypotheses
•Sample error, true error
•Confidence intervals for observed hypothesis error
•Estimators
•Binomial distribution, Normal distribution,
Central Limit Theorem
•Paired t-tests
•Comparing Learning Methods

Problems Estimating Error
1. Bias: If Sis training set, error
S(h)is optimistically
biased
For unbiased estimate, hand Smust be chosen
independently
2. Variance: Even with unbiased S, error
S(h)may
still vary from error
D(h))()]([ herrorherrorEbias
DS

Two Definitions of Error
The true errorof hypothesis h with respect to target function f
and distribution Dis the probability that hwill misclassify
an instance drawn at random according to D.
The sample errorof hwith respect to target function fand
data sample Sis the proportion of examples hmisclassifies
How well does error
S(h)estimate error
D(h)? )()(Pr)( xhxfherror
Dx
D 
  
  otherwise 0 and ),()( if 1 is )()( where
)()(
1
)(
xhxfxhxf
xhxf
n
herror
Sx
S




Example
Hypothesis hmisclassifies 12 of 40 examples in S.
What is error
D(h)?30.
40
12
)( herror
S

Estimators
Experiment:
1. Choose sample Sof size naccording to
distribution D
2. Measure error
S(h)
error
S(h)is a random variable (i.e., result of an
experiment)
error
S(h)is an unbiased estimatorfor error
D(h)
Given observed error
S(h)what can we conclude
about error
D(h)?

Confidence Intervals
If
•S contains n examples, drawn independently of hand each
other

Then
•With approximately N% probability, error
D(h)lies in
interval30n 2.53 2.33 1.96 1.64 1.28 1.00 0.67 :
99% 98% 95% 90% 80% 68% 50% :N%
where
))(1)((
)(
N
SS
NS
z
n
herrorherror
zherror

Confidence Intervals
If
•S contains n examples, drawn independently of h and each
other

Then
•With approximately 95% probability, error
D(h)lies in
interval30n n
herrorherror
herror
SS
S
))(1)((
)(

1.96

error
S(h)is a Random Variable
•Rerun experiment with different randomly drawn S(size n)
•Probability of observing rmisclassified examples:Binomial distribution for n=40, p=0.3
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0 5 10 15 20 25 30 35 40
r
P(r) rn
D
r
D herrorherror
rnr
n
rP



 ))(1()(
)!(!
!
)(

Binomial Probability DistributionBinomial distribution for n=40, p=0.3
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0 5 10 15 20 25 30 35 40
r
P(r) rnr
pp
rnr
n
rP



 )1(
)!(!
!
)( )1(]])[[(σ : ofdeviation Standard
)1(]])[[( : of Variance
)( : of mean valueor Expected,
Pr if flips,coin in heads of Probabilty
2
2
0
pnpXEXEX
pnpXEXEVar(X)X
npiiPE[X] X
(heads)pnrP(r)
X
n
i





Normal Probability DistributionNormal distribution with mean 0, standard deviation 1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
-3-2.5-2-1.5-1-0.500.511.522.53 
2
σ
μ
2
1
2
πσ2
1
)(



x
erP σσ : ofdeviation Standard
: of Variance
μ : of mean valueor Expected,
)(
bygiven is interval theinto fall willy that probabilit The
2




X
b
a
X
Var(X)X
E[X] X
dxxp
(a,b)X
σ

Normal Distribution Approximates Binomialn
herrorherror
herrorμ
n
herrorherror
herrorμ
herror
SS
herror
Dherror
DD
herror
Dherror
s
S
S
S
S
))(1)((
σ
deviation standard
)(mean
on withdistributi Normal aby thiseApproximat
))(1)((
σ
deviation standard
)(mean
withon,distributi Binomial a follows )(
)(
)(
)(
)(









Normal Probability Distribution0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
-3-2.5-2-1.5-1-0.500.511.522.53 2.53 2.33 1.96 1.64 1.28 1.00 0.67 :
99% 98% 95% 90% 80% 68% 50% :N%
σ in lies ty)(probabili area of N%
σ1.28 in lies ty)(probabili area of 80%
N
N
z
z


Confidence Intervals, More Correctly
If
•S contains n examples, drawn independently of h and each
other

Then
•With approximately 95% probability, error
S(h)lies in
interval
•equivalently, error
D(h)lies in interval
•which is approximately30n n
herrorherror
herror
DD
D
))(1)((
)(

1.96 n
herrorherror
herror
DD
S
))(1)((
)(

1.96 n
herrorherror
herror
SS
S
))(1)((
)(

1.96

Calculating Confidence Intervals
1. Pick parameter pto estimate
•error
D(h)
2. Choose an estimator
•error
S(h)
3. Determine probability distribution that governs estimator
•error
S(h)governed by Binomial distribution, approximated
by Normal when
4. Find interval (L,U)such that N% of probability mass falls
in the interval
•Use table of z
Nvalues30n

Central Limit Theorem.
n
σ
varianceand mean with on,distributi Normal a approaches
governingon distributi the, As .
1

mean sample theDefine . variancefinite and mean with
ondistributiy probabilitarbitrary an by governed all , variables
random ddistributey identicall t,independen ofset aConsider
2
1
2


Yn
Y
n
Y
n
i
i





TheoremLimit Central
n1YY

Difference Between Hypotheses2
22
1
11
2
22
1
11
d
21
21
2211
))(1)(())(1)((
ˆ

interval in the
falls massy probabilit of N%such that U)(L, interval Find 4.
))(1)(())(1)((
σ
estimator governson that distributiy probabilit Determine 3.
)()(
estimatoran Choose 2.
)()(
estimate toparameter Pick 1.
on test , sampleon Test
2211
2211
21
n
herrorherror
n
herrorherror
zd
n
herrorherror
n
herrorherror
herrorherrord
herrorherrord
ShSh
SSSS
N
SSSS
SS
DD











Paired ttest to Compare h
A,h
Bbutedlly distritely Norma approximaNote δ
kk
s
st
d
herrorherror
ki
,...,T,TTk
i
k
i
N,k-
k
i
BTAT
k
ii










1
2

δ1
1
i
i
21
)δδ(
)1(
1

δ
:for estimate interval confidence N%
δ
k
1
δ
whered, valueReturn the 3.
)()(δ
do to1 from For 2.
30.least at is size this wheresize, equal
of setsest disjoint t into dataPartition 1.

Comparing Learning Algorithms L
Aand L
B






k
i
i
BTATi
iBB
iAA
ii
ii
k
k
herrorherror
)(SLh
)(SLh
TDS
ST
ki
,...T,TTkD
ii
1
0
210
δ
1
δ
where,δ valueReturn the 3.
)()(δ


}{
set ngfor traini data remaining theand set, test for the use
do , to1 from For 2.
30.least at is size this where
size, equal of setsest disjoint t into dataPartition 1.

Comparing Learning Algorithms L
Aand L
B
What we would like to estimate:
where L(S)is the hypothesis output by learner Lusing
training set S
i.e., the expected difference in true error between hypotheses output
by learners L
Aand L
B, when trained using randomly selected
training sets Sdrawn according to distribution D.
But, given limited data D
0, what is a good estimator?
Could partition D
0into training set Sand training set T
0and
measure
even better, repeat this many times and average the results
(next slide)))](())(([ SLerrorSLerrorE
BDADDS 
 ))(())((
00
00
SLerrorSLerror
BTAT

Comparing Learning Algorithms L
Aand L
B
Notice we would like to use the paired ttest on to
obtain a confidence interval
But not really correct, because the training sets in
this algorithm are not independent (they overlap!)
More correct to view algorithm as producing an
estimate of
instead of
but even this approximation is better than no
comparisonδ ))](())(([
0
SLerrorSLerrorE
BDADDS

 ))](())(([ SLerrorSLerrorE
BDADDS 