categorical data analysis Chap STA517-2.ppt

AbaMacha 20 views 22 slides Jun 10, 2024
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

CDA


Slide Content

1
STA 517 –Introduction: Distribution and Inference
1.4.3 Proportion of Vegetarians
Example
A questionnaire: Alan Agresti asked his students
whether he/she was a vegetarians.
Sample size: n=25
Outcome: y=0 answered “yes”
Estimate and 95% confidence interval

2
STA 517 –Introduction: Distribution and Inference
Recall
Tests and Confidence Intervals
At significant level ,
reject H
0: , if2/
0
ˆ


z
SE


100(1-)%
confidence interval2/
0
ˆ


z
SE

  } :{
0  } :{
0 0

3
STA 517 –Introduction: Distribution and Inference
Wald method
MLE:
SE:
95% confidence interval
Wald methods do not provide sensible answers.0
25
0
ˆ 
n
y
 0
25
0)ˆ1(ˆ



n
SE


4
STA 517 –Introduction: Distribution and Inference
Score interval
From
=(0.0, 0.133) 05.0,96.1,25,0,0ˆ
2/  
zny 0 0.05 0.1 0.15 0.2 0.25
-4
-3
-2
-1
0
1
2
z
S

0
score CI for binomial parameter
CI=(0, 0.1332)

5
STA 517 –Introduction: Distribution and Inference
LR interval
When y=0 and n=25, kernel of the likelihood function
likelihood-ratio statistic
Solve above inequation
i.e. the confidence interval equals (0.0, 0.074).

6
STA 517 –Introduction: Distribution and Inference
LR interval

7
STA 517 –Introduction: Distribution and Inference
Example (problem 1.5)
MLE
Wald interval4616.0
1824
842
ˆ,842,1824982842  yn )4845.0,4387.0(
1824
)4616.01(4616.0
96.14616.0
)ˆ1(ˆ
96.1ˆ 




n


8
STA 517 –Introduction: Distribution and Inference
Score interval
From
=(.4388, .4846) 05.0,96.1,1824,842,4616.ˆ
2/  
zny 0.35 0.4 0.45 0.5 0.55
-15
-10
-5
0
5
10
15
z
S

0
score CI for binomial parameter
CI=(0.43885, 0.48456)

9
STA 517 –Introduction: Distribution and Inference
LR interval
LR statistic
CI
(0.4388, 0.4845)84.3)05.0(
1
4616.01
log982
4616.0
log842(2
2
1
00



 
 0.40.420.440.460.480.5
0
5
10
15
20
25
30
LR

0
LR CI for binomial parameter
CI=(0.43881, 0.48454)

10
STA 517 –Introduction: Distribution and Inference
Comparison25,0ny 1824,842 ny
Wald
(0, 0)
Score
(0, .133)
LR
(0, .074)
Wald
(.4387, .4845)
Score
(.4388, .4846)
LR
(.4388, .4345)
Wald adjust
(.4388, .4846)
Wald adjust
(0, .1576)

11
STA 517 –Introduction: Distribution and Inference
Conclusion
When sample size is large, all three methods are about
the same
When is near 0 or 1, Wald test performs poorly unless
n is very large. An adjustment that adds
observations of each type to the sample before using
this formula performs much better (Problem 1.24).
Likelihood ratio interval is simple in principle, but is
more complex computationally. With current computer
power, it is not a problem and preferable.

12
STA 517 –Introduction: Distribution and Inference
P-value
At significant level ,
reject H
0: , if)
ˆ
Pr(
0
SE
Z


Two sided P-value
for test H
0:2/
0
ˆ


z
SE

  ))
ˆ
()((2Pr(
0
2
1
 LLLR   0 0 ) Pr(
2
1 )
ˆ
Pr(
2
02
1
SE




or

13
STA 517 –Introduction: Distribution and Inference
Test statistic and P-value
Statistic P-value
Wald
10.8093 0.0010
LR
10.7562 0.0010
Score
10.7456 0.00105.0:
1824,842
00 

H
ny
procIML;
y=842; n=1824;pi0=0.5;
pihat=y/n; SE=sqrt(pihat*( 1-pihat)/n); /*MLE*/
WaldStat=(pihat-pi0)**2/SE**2;
pWald=1-CDF('CHISQUARE', WaldStat, 1);
LR=2*(y*log(pihat/(pi0))
+(n-y)*log((1-pihat)/(1-pi0)));
pLR=1-CDF('CHISQUARE',LR, 1);
ScoreStat=(pihat-pi0)**2/(pi0*(1-pi0)/n);
pScore=1-CDF('CHISQUARE',ScoreStat, 1);
printWaldStatpWald;
printLR pLR;
printScoreStatpScore;

14
STA 517 –Introduction: Distribution and Inference
SAS code
dataD;
inputoutcome $ w;
cards;
Yes 842
No 982
;
procfreq;
weightw;
tableoutcome/allCL
BINOMIAL(P=0.5
LEVEL="Yes");
exact binomial;
run;

15
STA 517 –Introduction: Distribution and Inference
Vegetarianism example (n=25, y=0)
Test H0: =0.5
Score statistic = -5.0
Squared score statistic = 25
P-value=6.733E-7
LR=34.7, P-value=3.8463e-009
Wald Z is infinite
SEE Problem 1.6
SAS Code?

16
STA 517 –Introduction: Distribution and Inference
Example 2: n=100, y=45 (success)
WALDSTAT PWALD
1.010101 0.3148786
LR PLR
1.0016734 0.3169059
SCORESTAT PSCORE
1 0.31731055.0:
00 H 8.0:
00 H
WALDSTAT PWALD
49.494949 1.989E-12
LR PLR
59.493327 1.232e-14
SCORESTAT PSCORE
76.5625 0
Notice how close they are;
this is because the sample
size is quite large and
because the data could
reasonably have arisen
under the null hypothesis.
The test statistics are no longer close
to one another because H
0is highly
implausible and could not have
generated the data. But the p-values
are all essentially zero, so we are led
to the same conclusion regardless of
which test we use.

17
STA 517 –Introduction: Distribution and Inference
1.4.4 Exact Small-Sample Inference
With modern computational power, it is not necessary
to rely on large-sample approximations for the
distribution of statistics such as ˆ.
Tests and confidence intervals can use the binomial
distribution directly rather than its normal
approximation.
Such inferences occur naturally for small samples, but
apply for any n.

18
STA 517 –Introduction: Distribution and Inference
Exact test –vegetarianism example
Score statistic
Base null distribution bin(25, 0.5)
is the exact P-value for this statistic.
100(1-)% confidence intervals consist of all 
0for
which P-values exceed in exact binomial tests.
The best known interval (Clopper and Pearson 1934)
uses the tail method for forming confidence intervals. It
requires each one-sided P-value to exceed /2.
Recall:

19
STA 517 –Introduction: Distribution and Inference
Exact CI
The lower and upper endpoints are the solutions in 
0
to the equations
CI for vegetarianism example is (0, 0.137)2/z
Comparison:
large sample score CI

20
STA 517 –Introduction: Distribution and Inference
SAS Procedure Freq

21
STA 517 –Introduction: Distribution and Inference
1.4.5 Inference Based on the
Mid-P-Value(Lancaster 1961)
To adjust for discreteness in small-sample distributions,
one can base inference on the mid-P-value
For a test statistic T with observed value t
oand one-
sidedH
asuch that large T contradictsH
0,
with probabilities calculated from the null distribution.
Compared to the ordinary P-value, the mid-P-value
behaves more like the P-value for a test statistic having
a continuous distribution.
We recommend it both for tests and confidence
intervals with highly discrete distributions to eliminate
problems from discreteness.

22
STA 517 –Introduction: Distribution and Inference
Mid-P-Value Cropper-Pearson CI2
)1(
2
1
)1(
0000
1
0

 





















ynyknk
y
k y
n
k
n 2
)1(
2
1
)1(
0000
1

 




















ynyknk
n
yk y
n
k
n
The lower and upper endpoints are the solutions in 
0to the
equations
RECALL: the example about the proportion of vegetarians
The mid-P-value is half the ordinary P-value, or 0.00000003.
CI: y=02
)1(
2
1
1
0
0

 
n 2
)1(
2
1
0
0

 
n -0.0271)2(1
/1
0 
n
 0.11291
/1
0 
n

CI: (0, 0.1129)
Tags