categorical data analysis Chap STA517-3.ppt

1
STA 517 –Introduction: Distribution and Inference
1.5 STATISTICAL INFERENCE FOR
MULTINOMIAL PARAMETERS
Recall multi(n, =(
1, 
2, …, 
c))
Suppose that each of nindependent, identical trials can
have outcome in any of ccategories.
if trial ihas outcome in category j
= 0otherwise

represents a multinomial trial, with
Let denote the number of trials having
outcome in categoryj.
The counts have the multinomial
distribution.
Note: are random variables

2
STA 517 –Introduction: Distribution and Inference
Example: Mendel’s theory
To test Mendel’s theories of natural inheritance. Mendel
crossed pea plants of pure yellow strain with plants of
pure green strain.
He predicted that second-generation hybrid seeds
would be 75% yellow and 25% green, yellow being the
dominant strain.
One experiment: produce n=8023 seeds, and observed
n
1=6022 yellow, n
2=2001 green.
He want to test whether it follows 3:1 ratio.

3
STA 517 –Introduction: Distribution and Inference
1.5.1 Estimation of Multinomial
Parameters
To obtain MLE, the multinomial probability mass
function is proportional to the kernel
The MLE are the {
j} that maximize (1.14).
Log likelihood
Differentiating L with respect to 
jgives the likelihood
equation
ML solution satisfies)1log(loglog)(
1
1
1
1






c
j
jc
c
j
jjj
j
j nnnL 

4
STA 517 –Introduction: Distribution and Inference
MLE
Now
Thus
MLE
The MLE are the sample proportions.

5
STA 517 –Introduction: Distribution and Inference
1.5.2 Pearson Statistic for Testing a
Specified Multinomial
In 1900 the eminent British statistician Karl Pearson
introduced a hypothesis test that was one of the first
inferential methods.
It had a revolutionary impact on categorical data
analysis, which had focused on describing associations.
Pearson’s test evaluates whether multinomial
parameters equal certain specified values.

6
STA 517 –Introduction: Distribution and Inference
Pearson Statistic
Consider
When H
0is true, the expected values of {n
j}, called
expected frequencies, are
Pearson proposed the test statistics
Greater difference produce greater X
2
values, for fixed n.
Let denote the observed value of X
2
. The P-value is

7
STA 517 –Introduction: Distribution and Inference
1.5.3 Example: Testing Mendel’s
Theories
n
1=6022 yellow, n
2=2001 green
MLE:
test whether it follows 3:1 ratio, i.e.
Expected frequencies are
This does not contradict Mendel’s hypothesis.,2494.0
8023
2001
ˆ,7506.0
8023
6022
ˆ
21   25.0,75.0:
2021010  H

8
STA 517 –Introduction: Distribution and Inference
SAS code
dataD;
inputoutcome $ w;
cards;
yellow 6022
green 2001
;
procfreq; weightw;
tableoutcome/chisqTESTP=(0.250.75);
run;

9
STA 517 –Introduction: Distribution and Inference
Pearson statistic
When c=2, it can be proved Pearson chi-square statistic
is squared score statistic
PROOF: by Maple in matlab
How about c>2?
syms ynpi0
f=(y-n*pi0)^2/pi0+((n-y)-n*(1-pi0))^2/(1-pi0);
f1=simplify(f)
%result: -(-y+pi0*n)^2/n/pi0/(-1+pi0)

10
STA 517 –Introduction: Distribution and Inference
An alternative test for multinomial parameters uses the
likelihood-ratio test.
The kernel of the multinomial likelihood is
Under H0 the likelihood is maximized when
In the general case, it is maximized when
The ratio of the likelihoods equals
Thus, the likelihood-ratio statistic is
1.5.5 Likelihood-Ratio Chi-Squared

11
STA 517 –Introduction: Distribution and Inference
LR
In the general case, the parameter space consists of
{
j} subject to 
j=1, so the dimensionality is c-1.
Under H0, the {
j} are specified completely, so the
dimension is 0. The difference in these dimensions
equals c-1.
For large n, G
2
has a chi-squared null distribution with
df c-1.

12
STA 517 –Introduction: Distribution and Inference
Both chi-squared dist. With df=c-1
Asymptotically equivalent

13
STA 517 –Introduction: Distribution and Inference
Wu, Ma, George (2007)

14
STA 517 –Introduction: Distribution and Inference
1.5.6 Testing with Estimated
Expected Frequencies
Pearson’s chi-square was proposed for testing
H
0: 
j=
j0, where 
j0are fixed.
In some application, 
j0=
j0() are function of a small
set of unknown parameters .
ML estimates of determine ML estimates of
{
j0=
j0()} andhence ML estimates of
expected frequencies in X
2
.
Replacing by estimates affects the distribution
of X
2
.
the true df=(c-1)-dim()

15
STA 517 –Introduction: Distribution and Inference
Example
A sample of 156 dairy calves born in Okeechobee
County, Florida, were classified according to whether
they caught pneumonia within 60 days of birth.
Calves that got a pneumonia infection were also
classified according to whether they got a secondary
infection within 2 weeks after the first infection cleared
up.
Hypothesis: the primary infection had an immunizing
effect that reduced the likelihood of a secondary
infection.
How to test it?

16
STA 517 –Introduction: Distribution and Inference
Data structure
Calves that did not get a primary infection could not get
a secondary infection, so no observations can fall in the
category for ‘‘no’’ primary infection and ‘‘yes’’
secondary infection.
That combination is called a structural zero.

17
STA 517 –Introduction: Distribution and Inference
Test: whether the probability of primary
infection was the same as the conditional probability of
secondary infection, given that the calf got the primary
infection.

abdenotes the probability that a calf is classified in row
a and column b of this table, the null hypothesis is
Let =
11+
12denote the probability of primary
infection. Then hypothesis probability is

18
STA 517 –Introduction: Distribution and Inference
MLE and chi-squared test
Likelihood
Log likelihood
Differentiation with respect to 
Solution
For the example
Expected counts for each cell
Conclusion: the primary infection had an immunizing effect that
reduced the likelihood of a secondary infection.

19
STA 517 –Introduction: Distribution and Inference
Standard Error
Since
the information is its expected value, which is
which simplifies to
The asymptotic standard error is the square root of the
inverse information, or

20
STA 517 –Introduction: Distribution and Inference
How about confidence limits?

21
STA 517 –Introduction: Distribution and Inference
SAS code -MLE, test for binomial
procIML;
y=842; n=1824;pi0=0.5; /*data*/
pihat=y/n; SE=sqrt(pihat*(1-pihat)/n); /*MLE*/
WaldStat=(pihat-pi0)**2/SE**2;
pWald=1-CDF('CHISQUARE', WaldStat, 1);
LR=2*(y*log(pihat/(pi0)) +(n-y)*log((1-pihat)/(1-pi0)));
pLR=1-CDF('CHISQUARE',LR, 1);
ScoreStat=(pihat-pi0)**2/(pi0*(1-pi0)/n);
pScore=1-CDF('CHISQUARE',ScoreStat, 1);
printWaldStat pWald;
printLR pLR;
printScoreStat pScore;

22
STA 517 –Introduction: Distribution and Inference
SAS code -MLE, test for binomial
dataD;
inputoutcome $ w;
cards;
Yes 842
No 982
;
procfreq;
weightw;
tableoutcome/allCL
BINOMIAL(P=0.5
LEVEL="Yes");
exact binomial;
run;

23
STA 517 –Introduction: Distribution and Inference
SAS code –multinomial
dataD;
inputoutcome $ w;
cards;
yellow 6022
green 2001
;
procfreq; weightw;
tableoutcome/chisqTESTP=(0.250.75);
run;

categorical data analysis Chap STA517-3.ppt

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

categorical data analysis Chap STA517-3.ppt

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......