categorical data analysis Chap STA517-3.ppt

AbaMacha 10 views 23 slides Jun 10, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

CDA


Slide Content

1
STA 517 –Introduction: Distribution and Inference
1.5 STATISTICAL INFERENCE FOR
MULTINOMIAL PARAMETERS
Recall multi(n, =(
1, 
2, …, 
c))
Suppose that each of nindependent, identical trials can
have outcome in any of ccategories.
if trial ihas outcome in category j
= 0otherwise

represents a multinomial trial, with
Let denote the number of trials having
outcome in categoryj.
The counts have the multinomial
distribution.
Note: are random variables

2
STA 517 –Introduction: Distribution and Inference
Example: Mendel’s theory
To test Mendel’s theories of natural inheritance. Mendel
crossed pea plants of pure yellow strain with plants of
pure green strain.
He predicted that second-generation hybrid seeds
would be 75% yellow and 25% green, yellow being the
dominant strain.
One experiment: produce n=8023 seeds, and observed
n
1=6022 yellow, n
2=2001 green.
He want to test whether it follows 3:1 ratio.

3
STA 517 –Introduction: Distribution and Inference
1.5.1 Estimation of Multinomial
Parameters
To obtain MLE, the multinomial probability mass
function is proportional to the kernel
The MLE are the {
j} that maximize (1.14).
Log likelihood
Differentiating L with respect to 
jgives the likelihood
equation
ML solution satisfies)1log(loglog)(
1
1
1
1






c
j
jc
c
j
jjj
j
j nnnL 

4
STA 517 –Introduction: Distribution and Inference
MLE
Now
Thus
MLE
The MLE are the sample proportions.

5
STA 517 –Introduction: Distribution and Inference
1.5.2 Pearson Statistic for Testing a
Specified Multinomial
In 1900 the eminent British statistician Karl Pearson
introduced a hypothesis test that was one of the first
inferential methods.
It had a revolutionary impact on categorical data
analysis, which had focused on describing associations.
Pearson’s test evaluates whether multinomial
parameters equal certain specified values.

6
STA 517 –Introduction: Distribution and Inference
Pearson Statistic
Consider
When H
0is true, the expected values of {n
j}, called
expected frequencies, are
Pearson proposed the test statistics
Greater difference produce greater X
2
values, for fixed n.
Let denote the observed value of X
2
. The P-value is

7
STA 517 –Introduction: Distribution and Inference
1.5.3 Example: Testing Mendel’s
Theories
n
1=6022 yellow, n
2=2001 green
MLE:
test whether it follows 3:1 ratio, i.e.
Expected frequencies are
This does not contradict Mendel’s hypothesis.,2494.0
8023
2001
ˆ,7506.0
8023
6022
ˆ
21   25.0,75.0:
2021010  H

8
STA 517 –Introduction: Distribution and Inference
SAS code
dataD;
inputoutcome $ w;
cards;
yellow 6022
green 2001
;
procfreq; weightw;
tableoutcome/chisqTESTP=(0.250.75);
run;

9
STA 517 –Introduction: Distribution and Inference
Pearson statistic
When c=2, it can be proved Pearson chi-square statistic
is squared score statistic
PROOF: by Maple in matlab
How about c>2?
syms ynpi0
f=(y-n*pi0)^2/pi0+((n-y)-n*(1-pi0))^2/(1-pi0);
f1=simplify(f)
%result: -(-y+pi0*n)^2/n/pi0/(-1+pi0)

10
STA 517 –Introduction: Distribution and Inference
An alternative test for multinomial parameters uses the
likelihood-ratio test.
The kernel of the multinomial likelihood is
Under H0 the likelihood is maximized when
In the general case, it is maximized when
The ratio of the likelihoods equals
Thus, the likelihood-ratio statistic is
1.5.5 Likelihood-Ratio Chi-Squared

11
STA 517 –Introduction: Distribution and Inference
LR
In the general case, the parameter space consists of
{
j} subject to 
j=1, so the dimensionality is c-1.
Under H0, the {
j} are specified completely, so the
dimension is 0. The difference in these dimensions
equals c-1.
For large n, G
2
has a chi-squared null distribution with
df c-1.

12
STA 517 –Introduction: Distribution and Inference
Both chi-squared dist. With df=c-1
Asymptotically equivalent

13
STA 517 –Introduction: Distribution and Inference
Wu, Ma, George (2007)

14
STA 517 –Introduction: Distribution and Inference
1.5.6 Testing with Estimated
Expected Frequencies
Pearson’s chi-square was proposed for testing
H
0: 
j=
j0, where 
j0are fixed.
In some application, 
j0=
j0() are function of a small
set of unknown parameters .
ML estimates of determine ML estimates of
{
j0=
j0()} andhence ML estimates of
expected frequencies in X
2
.
Replacing by estimates affects the distribution
of X
2
.
the true df=(c-1)-dim()

15
STA 517 –Introduction: Distribution and Inference
Example
A sample of 156 dairy calves born in Okeechobee
County, Florida, were classified according to whether
they caught pneumonia within 60 days of birth.
Calves that got a pneumonia infection were also
classified according to whether they got a secondary
infection within 2 weeks after the first infection cleared
up.
Hypothesis: the primary infection had an immunizing
effect that reduced the likelihood of a secondary
infection.
How to test it?

16
STA 517 –Introduction: Distribution and Inference
Data structure
Calves that did not get a primary infection could not get
a secondary infection, so no observations can fall in the
category for ‘‘no’’ primary infection and ‘‘yes’’
secondary infection.
That combination is called a structural zero.

17
STA 517 –Introduction: Distribution and Inference
Test: whether the probability of primary
infection was the same as the conditional probability of
secondary infection, given that the calf got the primary
infection.

abdenotes the probability that a calf is classified in row
a and column b of this table, the null hypothesis is
Let =
11+
12denote the probability of primary
infection. Then hypothesis probability is

18
STA 517 –Introduction: Distribution and Inference
MLE and chi-squared test
Likelihood
Log likelihood
Differentiation with respect to 
Solution
For the example
Expected counts for each cell
Conclusion: the primary infection had an immunizing effect that
reduced the likelihood of a secondary infection.

19
STA 517 –Introduction: Distribution and Inference
Standard Error
Since
the information is its expected value, which is
which simplifies to
The asymptotic standard error is the square root of the
inverse information, or

20
STA 517 –Introduction: Distribution and Inference
How about confidence limits?

21
STA 517 –Introduction: Distribution and Inference
SAS code -MLE, test for binomial
procIML;
y=842; n=1824;pi0=0.5; /*data*/
pihat=y/n; SE=sqrt(pihat*(1-pihat)/n); /*MLE*/
WaldStat=(pihat-pi0)**2/SE**2;
pWald=1-CDF('CHISQUARE', WaldStat, 1);
LR=2*(y*log(pihat/(pi0)) +(n-y)*log((1-pihat)/(1-pi0)));
pLR=1-CDF('CHISQUARE',LR, 1);
ScoreStat=(pihat-pi0)**2/(pi0*(1-pi0)/n);
pScore=1-CDF('CHISQUARE',ScoreStat, 1);
printWaldStat pWald;
printLR pLR;
printScoreStat pScore;

22
STA 517 –Introduction: Distribution and Inference
SAS code -MLE, test for binomial
dataD;
inputoutcome $ w;
cards;
Yes 842
No 982
;
procfreq;
weightw;
tableoutcome/allCL
BINOMIAL(P=0.5
LEVEL="Yes");
exact binomial;
run;

23
STA 517 –Introduction: Distribution and Inference
SAS code –multinomial
dataD;
inputoutcome $ w;
cards;
yellow 6022
green 2001
;
procfreq; weightw;
tableoutcome/chisqTESTP=(0.250.75);
run;
Tags