Hypothesis testing for nonparametric data

1,536 views 45 slides Aug 09, 2021
Slide 1
Slide 1 of 45
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45

About This Presentation

Hypothesis testing for nonparametric data


Slide Content

Hypothesis testing for non-
parametric data
By Emmanuel BIKORIMANA

Overview of most common statistical testsDependant
Independe
nt
Normally
distribut
ed
One-sample T
test / Z test
Paired Two
Sample T-
Test
T-Test ANOVA
Not
Normally
distribut
ed
Sign Test Wilcoxon
Mann-
Whitney
(Willcoxon)
Kruskall
Wallis
Binary
Z- test for
proportion
Nominal
Ordinal
Chi-square
for trend
Chi-square
for trend
One Group
Two Groups
Chi-Square
Categorica
l Data?
McNemar
test
> = 3
Groups
Numerical Data
Chi-Square
Chi-Square
Goodness of
Fit

How to check for normality?
•Several methods
•Look at the similarity between mean and median
•Graphical methods (Eyeballing)
•Statistical tests

Look at the mean and median
variable 1variable 2
mean 100 130
median 101 100
sd 20 25
Variable 1 normally distributed: mean = median
Variable 2 not normally distributed: mean not equal to
median

0
.02
.04
.06
0
.02
.04
.06
140 160 180 200
140 160 180 200
30-45 46-59
60+
Density
normal bp_before
Density
Before
Graphs by Age Group Graphical methods (eyeballing)
Histogram

Inaclinicaltrialparticipantswereaskedtoratetheirsymptomseverityfollowing6
weeksontheassignedtreatment.Symptomseveritywasmeasuredona5point
ordinalscalewithresponseoptions:Symptomsgotmuchworse,slightlyworse,no
change,slightlyimproved,ormuchimproved.Thereareatotalofn=20
participantsinthetrial,randomizedtoanexperimentaltreatmentorplacebo,and
theoutcomedataaredistributedasshowninthefigurebelow.

Table of Parametric & Nonparametric
Tests
Parametric Test
Nonparametric
Test Purpose of Test
Two-Sample
t-Test (either case)
Mann-Whitney/
Wilcoxon Rank
Sum Test
Compare two
independent samples
Paired t-Test
Sign Test or
Wilcoxon Signed-
Rank Test
Compare dependent
samples
Oneway ANOVA Kruskal-Wallis
Test
Compare ≥ three
k-independent
samples

1.Independent samples
Mann-Whitney/Wilcoxon Rank Sum Test
•Alternative to two-sample t-Test
•Use when…
-populations being sampled are not normally
distributed.
-sample sizes are small so assessing normality is
not possible (n
i<20).
-response is ordinal

Mann-Whitney/Wilcoxon Rank Sum Test
General Hypotheses
H
o: distribution of pop. A and pop. B are the
same, i.e. A = B
H
A: distribution of pop. A and pop. B are NOT
the same, i.e A = B
H
A: distribution of pop. A is shifted to the right
of pop. B, i.e. A > B.
H
A: distribution of pop. A is shifted to the left of
pop. B, i.e. A < B

Mann-Whitney/Wilcoxon Rank Sum Test
H
o: A = B vs. H
A: A > B
Q: Is there evidence that the values in
population A are generally larger than
those in population B?

Mann-Whitney/Wilcoxon Rank Sum Test
(Test Procedure)
1.Rank all N = n
A+ n
Bobservations in the combined
sample from both populations in ascending order.
2.Sum the ranks of the observations from populations A
and B separately and denote the sums w
Aand w
B.
Assign average rank to tied observations.
3.For H
A: A < B reject Ho if w
Ais “small” or w
Bis
“big”.
For H
A: A > B reject Ho if w
Ais “big” or w
Bis
“small”.
4.Use tables to determine how “big” or “small” the rank
sums must be in order to reject H
o or use software to
conduct the test.

Mann-Whitney/Wilcoxon Rank Sum Test
(Critical Value Table)
This table contains
the value the smaller
rank sum must be
less than in order to
reject the H
ofor a
one-tailed test
situation for two
significance levels
(a= .05 & .01)
Tables exist for the
two-tailed tests as
well.
nis the sample size of the group with the smaller rank sum.

Example: Huntington’s Disease and
Fasting Glucose Levels
Davidson et al. studied the responses to oral
glucose in patients with Huntington’s disease and
in a group of control subjects. The five-hour
responses are shown below. Is there evidence to
suggest the five-hour glucose (mg present) is
greater for patients with Huntington’s disease?
H
o: Control = Huntington’s i.e. C = H
H
A: Control < Huntington’s i.e. C < H

Example:Observations & Ranks
Control Group (n
A= 10) Huntington’s Disease (n
B= 11)
83 85
73 89
65 86
65 91
90 77
77 93
78 100
97 82
85 92
75 86
86
1.5
1.5
3
4
5.5
5.5
7
8
9
10.5
10.5
13
13
13
15
16
17
18
19
20
21
w
A= 78 w
B= 153

Example:Critical Value Table
Here,
n
C= 10 (control)
n
H= 11 (Huntington’s)
we will reject
H
o: C = H
in favor of
H
A: C < H
if the rank sum for the
control group is less than
86 at a= .05 level and
less than 77 at a = .01
level.

Example: Decision/Conclusion
Using the Wilcoxon Rank Sum Test we have
evidence to suggest that the five hour glucose
level for individuals with Huntington’s disease is
greater than that for healthy controls (p < .05).
Note:p < .05 because the observed rank sum for
the control group is less than 86 which is the
critical value for a= .05.

2.Dependent Samples
•Sign Test
•Wilcoxon Signed-
Rank Test

Sign Test
•The sign test can be used in place of the paired t-
test when we have evidence that the paired
differences are NOT normally distributed.
•It can be used when the response is ordinal.
•Best used when the response is difficult to
quantify and only improvement can be measured,
i.e. subject got better, got worse, or no change.
•Magnitude of the paired difference is lost when
using this test.

Sign Test
•The sign test looks at the number of (+) and (-)
differences amongst the nonzero paired
differences.
•A preponderance of +’s or –’s can indicate that
some type of change has occurred.
•If the null hypothesis of no change is true we
expect +’s and –’s to be equally likely to occur,
i.e. P(+) = P(-) = .50 and the number of each
observed follows a binomial distribution.

Example: Sign Test
Consider a clinical investigation to assess the
effectiveness of a new drug designed to reduce
repetitive behaviors in children affected with
autism. If the drug is effective, children will
exhibit fewer repetitive behaviors on treatment
as compared to when they are untreated. A total
of 8 children with autism enroll in the study.
Each child is observed by the study psychologist
for a period of 3 hours both before treatment
and then again after taking the new drug for 1
week.

•The time that each child is engaged in repetitive
behavior during each 3 hour observation period is
measured. Repetitive behavior is scored on a scale of 0
to 100 and scores represent the percent of the
observation time in which the child is engaged in
repetitive behavior. For example, a score of 0 indicates
that during the entire observation period the child did
not engage in repetitive behavior while a score of 100
indicates that the child was constantly engaged in
repetitive behavior. The data are shown below.

Example: Sign Test
Child
Before treatment After 1 week of
treatment
Difference
(Before –After)
Sign
1 85 75 10 +
2 70 50 20 +
3 40 50 -10 -
4 65 40 25 +
5 80 20 60 +
6 75 65 10 +
7 55 40 15 +
8 20 25 -5 -

•The critical values for the Sign (see Critical Values
for the Sign Testtable).
•To determine the appropriate critical value we need
the sample size, which is equal to the number of
matched pairs (n=8) and our one-sided level of
significance α=0.05. For this example, the critical
value is 1, and the decision rule is to reject H
0if the
smaller of the number of positive or negative
signs<1. We do not reject H
0because 2 > 1. We
do not have sufficient evidence at α=0.05 to show
that there is improvement in repetitive behavior
after taking the drug as compared to before.

Wilcoxon Signed-Rank Test
•The problem with the sign test is that the
magnitude or size of the paired differences is lost.
•The Wilcoxon Signed-Rank Test uses ranks of
the paired differences to retain some sense of
their size.
•Use when the distribution of the paired
differences are NOT normal or when sample size
is small.
•Can be used with an ordinal response.

Wilcoxon Signed Rank Test
(Test Procedure)
•Exclude any differences which are zero.
•Put the rest of differences in ascending
order ignoring their signs.
•Assign them ranks.
•If any differences are equal, average
their ranks.

Example: Wilcoxon Signed Rank Test
Resting Energy Expenditure (REE) for
Patient with Cystic Fibrosis
•A researcher believes that patients with cystic
fibrosis (CF) expend greater energy during
resting than those without CF. To obtain a fair
comparison she matches 13 patients with CF to
13 patients without CF on the basis of age, sex,
height, and weight.

Example: Wilcoxon Signed Rank Test
Pair
CF
(C)
Healthy
(H)
Difference
d = C -H
Sign of
Difference
Abs. Diff.
|d|
Rank
|d|
Signed
Rank
1 1153 996 157 + 157 6
2 1132 1080 52 + 52 3
3 1165 1182 -17 - 17 2
4 1460 1452 8 + 8 1
5 1634 1162 472 + 472 13
6 1493 1619 -126 - 126 5
7 1358 1140 218 + 218 9
8 1453 1123 330 + 330 11
9 1185 1113 72 + 72 4
10 1824 1463 361 + 361 12
11 1793 1632 161 + 161 7
12 1930 1614 316 + 216 8
13 2075 1836 239 + 239 10
1
-2
3
4
-5
6
7
8
9
10
11
12
13

Example: Wilcoxon Signed Rank Test
Pair
CF
(C)
Healthy
(H)
Difference
d = C -H
Signed
Rank
1 1153 996 157 6
2 1132 1080 52 3
3 1165 1182 -17 -2
4 1460 1452 8 1
5 1634 1162 472 13
6 1493 1619 -126 -5
7 1358 1140 218 9
8 1453 1123 330 11
9 1185 1113 72 4
10 1824 1463 361 12
11 1793 1632 161 7
12 1930 1614 316 8
13 2075 1836 239 10
We then calculate the sum
of the positive ranks ( T
+)
and the sum of the negative
ranks (T
-).
Here we have
T
+= 6 + 3 + 1 + 13 + 9 + 11 +
4 + 12 + 7 + 8 + 10= 84
and
T
-
= 2 + 5= 7

Wilcoxon Signed Rank Test
(Test Statistic)
•Intuitively we will reject the H
o,which
states that there is no difference between
the populations, if either one of these rank
sums is “large” and the other is “small”.
•The Wilcoxon Signed Rank Test uses the
smaller rank sum, T = min( T
+ ,T
-), as
the test statistic.

Example: Wilcoxon Signed Rank Test
For the cystic fibrosis example we have the
following hypotheses:
H
o:there is no difference in the resting energy
expenditure of individuals with CF and healthy
controls who are the same gender, age, height,
and weight.
H
A:the resting energy expenditure of individuals
with CF is greater than that of healthy individuals
who are the same gender, age, height, and weight.
MEDIAN PAIRED DIFFERENCE = 0
MEDIAN PAIRED DIFFERENCE > 0

Example: Wilcoxon Signed Rank Test
H
A:the resting energy expenditure of individuals
with CF is greater than that of healthy individuals
who are the same gender, age, height, and weight.
•The alternative is clearly supported if T+ is
“large” or T-is “small”.
•The test statistic T = min( T
+ , T
-) = 7
•Is T = 7 considered small, i.e. what is the
corresponding p-value?
•To answer this question we need a Wilcoxon
Signed Rank Test table or statistical software.

Example: Wilcoxon Signed Rank Test
This table gives the value of
T = min( T
+ , T
-)that our
observed value must be less
thanin order to reject Ho for
the both two-and one-tailed
tests.
Here we have n = 13 & T = 7.
We can see that our test
statistic is less than 21 (a= .05)
and 12 (a= .01)so we will
reject H
oand we also estimate
that our p-value < .01.

Example: Wilcoxon Signed Rank Test
•We conclude that individuals with cystic
fibrosis (CF) have a large resting energy
expenditure when compared to healthy
individuals who are the same gender,
age, height, and weight (p < .01).

Independent Samples
•If we have three or more
populations to compare we use…
Kruskal –Wallis Test

Kruskal-Wallis Test
•One-way ANOVA for a completely randomized
design is based on the assumption of normality and
equality of variance.
•The nonparametric alternative not relying on these
assumptions is called the Kruskal-Wallis Test.
•Like the Mann-Whitney/Wilcoxon Rank Sum Test
we use the sum of the ranks assigned to each group
when considering the combined sample as the basis
for our test statistic.

Kruskal-Wallis Test
Basic Idea:
1) Looking at all observations together,
rank them.
2) Let R
1, R
2, …,R
kbe the sum of the ranks
of each group
3) If some R
i’s are much larger than others,
it indicates the response values in
different groups come from different
populations.

Kruskal-Wallis Test
•The test statistic is
where,
N= total sample size = n
1+ n
2+ ... + n
k








 



k
i
k
i
i
i
N
n
R
n
NN
H
1
2
1~
2
1
)1(
12
 rank overall average
2
1
groupfor rank average



N
i
n
R
i
i

Kruskal-Wallis Test
•The test statistic is
•Under the null hypothesis, this has an
approximate chi-square distribution with
df = k -1, i.e. .
•The approximation is OK when each group
contains at least 5 observations.
•N= total sample size = n
1+ n
2+ ... + n
k








 



k
i
k
i
i
i
N
n
R
n
NN
H
1
2
1~
2
1
)1(
12
 2
1k

Chi-squared Distribution and p-value2
1k

Area = p-value2

The null and alternative hypotheses are stated
verbally.
For example
ho: The plans A, B and C are equally
effective.
h1: At least one of the following is true: A is
different from B, A is different from C or B is
different from C

Example: Kruskal-Wallis Test
A clinical trial evaluating the fever reducing effects
of aspirin, ibuprofen, and acetaminophen was
conducted. Study subjects were adults seen in
an ER with diagnoses of flu with body
temperatures between 100
o
F and 100.9
o
F.
Subjects were randomly assigned to treatment.
Changes in body temperature were recorded
2 hrs. after administration of treatments.

Example: Kruskal-Wallis Test
Resulting Data: Temperature Decrease (deg. F)
AspirinRankIbuprofenRank
Acetaminophen
Rank
.95 .39 .19
1.48 .44 1.02
1.33 1.31 .07
1.28 2.48 .01
1.39 .62
-.39
(i.e. temp increase)
1
2
3
45
6
7
8
9
10
1112
13
14
15
N = 15R
1= 44 R
2= 50 R
3= 26
n
1= 4 n
2= 5 n
3= 6

Example: Kruskal-Wallis Test2on with distributi square-chi i.e. ~ 833.6
2
115
6
26
6
2
115
5
50
5
2
115
4
44
4
)115(15
12
2
1
)1(
12
2
2
1














 





 





 










 


 

df
N
n
R
n
NN
H
k
i i
i
i

N = 15R
1= 44 R
2= 50 R
3= 26
n
1= 4 n
2= 5 n
3= 6

Chi-squared Distribution and p-value833.6
Area = .0332
2

Decision/Conclusion
•Using the Kruskal-Wallis test have evidence to
suggest that the temperature changes after taking
the different drugs are not the same (p = .033).
•Now we might like to know which drugs
significantly differ from one another.
Tags