Mann Whitney U Test

mhsgeography 58,127 views 8 slides Jan 10, 2010
Slide 1
Slide 1 of 8
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8

About This Presentation

Worked example for Mann Whitney U test for A2 Geography students


Slide Content

Comparing medians: the Man
Whitney U-test
The Mann Whitney U-test is a fairly complicated statistical test
to understand, though it is quite easy to apply to a set of data.
So, while the calculation is relatively easy, knowing when to
apply it, and what the calculation actually means, is a little
more difficult. It is also important not to be put off by the
formula.

What is the Mann Whitney U-test?
The Mann Whitney U-test is a nonparametric test
which is used to analyse the difference between the
medians of two data sets. By using the critical values
tables, it is possible to assess the degree to which
any observed difference is a result of chance or
fluke.
If the answer to each of the following questions is
‘yes’ then you may use the Mann Whitney U-test.
•Are you investigating the difference between two
samples of data?
•Is the data nonparametric?
•Is the data ordinal?
•Are there more than five pieces of data in each
sample?
•Are there 20 or fewer pieces of data in each sample
(recommended)?
KEY TERMS
Nonparametric
test: statistical
test that assumes
that the data is not
normally
distributed.
Ordinal data:
data that can be
ranked, i.e. put
into order from
highest to lowest

Though this is a nonparametric statistical test, both samples
should have a similar distribution. You can plot the data for
each set on a simple graph to check this.
Like many of the other statistical tests, you have to start with
a null hypothesis (Ho). However, unlike some of the other
tests, the null hypothesis (Ho) is always the same:
There is no significant difference between the two samples.

Applying the Mann Whitney U-test
Comparing two traffic flows in a town centre
A student was interested in finding out if a new
retail development had an impact upon traffic (and
therefore congestion) in the local area near to the
development. There were two parts to the primary
data collection. The first part was conducted before
the construction of the planned development
(sample x).
Methodology for primary data collection
She recorded the time of day and date.
She counted traffic (in both directions) on 10
streets around the development selected
randomly). She counted for 10 minutes. She used
a stopwatch for timing and a simple tally chart for
recording the data.
She completed the tally at different times of the
day.
TIP
Is it a one-tailed or a two-
tailed test?
This relates to the difference
between the data sets.
If you assume one specified
data set will be larger than
the other, you are
investigating a one-tailed
distribution. If you assume
differences can operate in
both directions, i.e. up or
down, you are investigating a
two-tailed distribution. This is
important when you interpret
you findings using the critical
values. In this example, the
student is assuming traffic
can go up or down in study 2
for all sites, despite the fact
that more customers are
likely to be attracted to the
development. In this case,
this makes it a two-tailed test

For the second study (sample y), she waited until 2 months after the
development had been completed. She went to another 10 sites (selected
randomly) and repeated the test.
She then devised the following null hypothesis (HS
):
‘There is no significant difference in traffic flows before and after the
development.’
Now let’s take a look at the formula:
UU
=
N .NN +
NN
(NN +
1)
2
- Σr

UU
is the Mann
Whitney
calculation for
sample x
n is the number in
the sample
Σr
h is the sum of
ranks for sample x
(‘sum of’ just
means added
together)

The best way to proceed is to incorporate the findings into a table that also
allows you to calculate the result. When you get two or more equal values,
use the mean rank. Here are the student’s findings:
Total traffic
flow in 10
minutes (m)
Rank
rr
Site Number Total traffic
flow in 10
minutes (m)
Rank
rr
126 11 1 194
148 7 2 128
85 15.5 3 69
61 19 4 135
179 4 5 171
93 12.5 6 149
45 20 7 89
189 3 8 248 1
85 15.5 9 79
93 12.5 10 137
Complete the table by ranking all the data from highest to lowest.

Σr
? = 120
ΣrΣ =
Ranking puts values in order from highest to lowest.
Next, she substituted the data into the formula:
UU
= 10x10 +
U = 100 +
U = 100 + 55 – 120
U = 35
Summary
Mann Whitney U-test can be used to
compare any two data sets that are
not normally distributed . As long as
the data is capable of being ranked,
then the test can be applied.
Other possible uses:
•Investigating differences in
questionnaire responses relating to a
new development.
•Investigating differences in species
diversity near to footpaths.
•Investigating differences in
vegetation cover between two
different slopes
UU
=
N .NN +
NN
(NN +
1)
2
-
Σr

10(10 + 1)
2
- 120
110
2
- 120
TIP
There is a useful way of
checking the accuracy of
your calculations.
UU
+ U
U should be the
same number as N
s.N
s
(which in this case is
10x10 = 100). If it is not,
you have made a mistake
somewhere

TASKS
•You now have the figure for UY
. Use the same method of calculation
to work out the figure for Ut. Here is the formula
•You now need to select the smaller of the two figures and use the
critical values table to decide on the statistical significance of your
result. For this test, if your result equal to or smaller than the critical
value as the 0.05 level of significance, then you can reject the null
hypothesis (H0 )
For 10 figures in each sample, the critical value at the 0.05 level of
significance is 23.
8.Do you accept or reject the null hypothesis (Hl)? Give reasons for
your answer.
10.What do these findings suggest about the difference in traffic flow
before and after the retail development in this study?
UU
=
N .NN +
NN
(N
N +
1)
2
- Σrr