1
STA 517 –Introduction: Distribution and Inference
1.4.3 Proportion of Vegetarians
Example
A questionnaire: Alan Agresti asked his students
whether he/she was a vegetarians.
Sample size: n=25
Outcome: y=0 answered “yes”
Estimate and 95% confidence interval
2
STA 517 –Introduction: Distribution and Inference
Recall
Tests and Confidence Intervals
At significant level ,
reject H
0: , if2/
0
ˆ
z
SE
100(1-)%
confidence interval2/
0
ˆ
z
SE
} :{
0 } :{
0 0
3
STA 517 –Introduction: Distribution and Inference
Wald method
MLE:
SE:
95% confidence interval
Wald methods do not provide sensible answers.0
25
0
ˆ
n
y
0
25
0)ˆ1(ˆ
n
SE
4
STA 517 –Introduction: Distribution and Inference
Score interval
From
=(0.0, 0.133) 05.0,96.1,25,0,0ˆ
2/
zny 0 0.05 0.1 0.15 0.2 0.25
-4
-3
-2
-1
0
1
2
z
S
0
score CI for binomial parameter
CI=(0, 0.1332)
5
STA 517 –Introduction: Distribution and Inference
LR interval
When y=0 and n=25, kernel of the likelihood function
likelihood-ratio statistic
Solve above inequation
i.e. the confidence interval equals (0.0, 0.074).
6
STA 517 –Introduction: Distribution and Inference
LR interval
7
STA 517 –Introduction: Distribution and Inference
Example (problem 1.5)
MLE
Wald interval4616.0
1824
842
ˆ,842,1824982842 yn )4845.0,4387.0(
1824
)4616.01(4616.0
96.14616.0
)ˆ1(ˆ
96.1ˆ
n
8
STA 517 –Introduction: Distribution and Inference
Score interval
From
=(.4388, .4846) 05.0,96.1,1824,842,4616.ˆ
2/
zny 0.35 0.4 0.45 0.5 0.55
-15
-10
-5
0
5
10
15
z
S
0
score CI for binomial parameter
CI=(0.43885, 0.48456)
9
STA 517 –Introduction: Distribution and Inference
LR interval
LR statistic
CI
(0.4388, 0.4845)84.3)05.0(
1
4616.01
log982
4616.0
log842(2
2
1
00
0.40.420.440.460.480.5
0
5
10
15
20
25
30
LR
0
LR CI for binomial parameter
CI=(0.43881, 0.48454)
10
STA 517 –Introduction: Distribution and Inference
Comparison25,0ny 1824,842 ny
Wald
(0, 0)
Score
(0, .133)
LR
(0, .074)
Wald
(.4387, .4845)
Score
(.4388, .4846)
LR
(.4388, .4345)
Wald adjust
(.4388, .4846)
Wald adjust
(0, .1576)
11
STA 517 –Introduction: Distribution and Inference
Conclusion
When sample size is large, all three methods are about
the same
When is near 0 or 1, Wald test performs poorly unless
n is very large. An adjustment that adds
observations of each type to the sample before using
this formula performs much better (Problem 1.24).
Likelihood ratio interval is simple in principle, but is
more complex computationally. With current computer
power, it is not a problem and preferable.
12
STA 517 –Introduction: Distribution and Inference
P-value
At significant level ,
reject H
0: , if)
ˆ
Pr(
0
SE
Z
Two sided P-value
for test H
0:2/
0
ˆ
z
SE
))
ˆ
()((2Pr(
0
2
1
LLLR 0 0 ) Pr(
2
1 )
ˆ
Pr(
2
02
1
SE
or
13
STA 517 –Introduction: Distribution and Inference
Test statistic and P-value
Statistic P-value
Wald
10.8093 0.0010
LR
10.7562 0.0010
Score
10.7456 0.00105.0:
1824,842
00
H
ny
procIML;
y=842; n=1824;pi0=0.5;
pihat=y/n; SE=sqrt(pihat*( 1-pihat)/n); /*MLE*/
WaldStat=(pihat-pi0)**2/SE**2;
pWald=1-CDF('CHISQUARE', WaldStat, 1);
LR=2*(y*log(pihat/(pi0))
+(n-y)*log((1-pihat)/(1-pi0)));
pLR=1-CDF('CHISQUARE',LR, 1);
ScoreStat=(pihat-pi0)**2/(pi0*(1-pi0)/n);
pScore=1-CDF('CHISQUARE',ScoreStat, 1);
printWaldStatpWald;
printLR pLR;
printScoreStatpScore;
14
STA 517 –Introduction: Distribution and Inference
SAS code
dataD;
inputoutcome $ w;
cards;
Yes 842
No 982
;
procfreq;
weightw;
tableoutcome/allCL
BINOMIAL(P=0.5
LEVEL="Yes");
exact binomial;
run;
15
STA 517 –Introduction: Distribution and Inference
Vegetarianism example (n=25, y=0)
Test H0: =0.5
Score statistic = -5.0
Squared score statistic = 25
P-value=6.733E-7
LR=34.7, P-value=3.8463e-009
Wald Z is infinite
SEE Problem 1.6
SAS Code?
16
STA 517 –Introduction: Distribution and Inference
Example 2: n=100, y=45 (success)
WALDSTAT PWALD
1.010101 0.3148786
LR PLR
1.0016734 0.3169059
SCORESTAT PSCORE
1 0.31731055.0:
00 H 8.0:
00 H
WALDSTAT PWALD
49.494949 1.989E-12
LR PLR
59.493327 1.232e-14
SCORESTAT PSCORE
76.5625 0
Notice how close they are;
this is because the sample
size is quite large and
because the data could
reasonably have arisen
under the null hypothesis.
The test statistics are no longer close
to one another because H
0is highly
implausible and could not have
generated the data. But the p-values
are all essentially zero, so we are led
to the same conclusion regardless of
which test we use.
17
STA 517 –Introduction: Distribution and Inference
1.4.4 Exact Small-Sample Inference
With modern computational power, it is not necessary
to rely on large-sample approximations for the
distribution of statistics such as ˆ.
Tests and confidence intervals can use the binomial
distribution directly rather than its normal
approximation.
Such inferences occur naturally for small samples, but
apply for any n.
18
STA 517 –Introduction: Distribution and Inference
Exact test –vegetarianism example
Score statistic
Base null distribution bin(25, 0.5)
is the exact P-value for this statistic.
100(1-)% confidence intervals consist of all
0for
which P-values exceed in exact binomial tests.
The best known interval (Clopper and Pearson 1934)
uses the tail method for forming confidence intervals. It
requires each one-sided P-value to exceed /2.
Recall:
19
STA 517 –Introduction: Distribution and Inference
Exact CI
The lower and upper endpoints are the solutions in
0
to the equations
CI for vegetarianism example is (0, 0.137)2/z
Comparison:
large sample score CI
20
STA 517 –Introduction: Distribution and Inference
SAS Procedure Freq
21
STA 517 –Introduction: Distribution and Inference
1.4.5 Inference Based on the
Mid-P-Value(Lancaster 1961)
To adjust for discreteness in small-sample distributions,
one can base inference on the mid-P-value
For a test statistic T with observed value t
oand one-
sidedH
asuch that large T contradictsH
0,
with probabilities calculated from the null distribution.
Compared to the ordinary P-value, the mid-P-value
behaves more like the P-value for a test statistic having
a continuous distribution.
We recommend it both for tests and confidence
intervals with highly discrete distributions to eliminate
problems from discreteness.
22
STA 517 –Introduction: Distribution and Inference
Mid-P-Value Cropper-Pearson CI2
)1(
2
1
)1(
0000
1
0
ynyknk
y
k y
n
k
n 2
)1(
2
1
)1(
0000
1
ynyknk
n
yk y
n
k
n
The lower and upper endpoints are the solutions in
0to the
equations
RECALL: the example about the proportion of vegetarians
The mid-P-value is half the ordinary P-value, or 0.00000003.
CI: y=02
)1(
2
1
1
0
0
n 2
)1(
2
1
0
0
n -0.0271)2(1
/1
0
n
0.11291
/1
0
n
CI: (0, 0.1129)