Why Statistics.
•Statistics is used to take the analysis of
data one stage beyond what can be
achieved with maps and diagrams.
•You can gain a primitive insight into
patterns at a glance but mathematical
manipulation usually gives greater
precision.
•This allows us to discover things which
might otherwise go unnoticed.
The need for justification.
•Justifying mathematical manipulation is
vital.
•It is vital to be aware that statistics is an
aid to analysis and no more.
•Too often students make statistical
calculations in geographical projects
without adequate justification.
•Before statistics is used it is essential to
ask yourself two questions.
Question 1.
•Why am I using this technique?
•In the exam be absolutely clear what it is a
statistical test can prove and how a
statistical test can do this.
Question 2.
•Is the data appropriate to this particular
technique?
•Each technique requires data to be arranged in
a particular form.
•If they aren’t the technique cannot be used.
•If your data is not good in the first place the use
of a complex statistical technique will not help
you
“Rubbish in- Rubbish out”
Mean, Mode, Median.
•To be used when faced with a large
amount of data
•For example- average temperature of a
place every day for two years.
•It makes things far easier when we can
summarise it.
•This is relatively easy to do and there are
three common methods to achieve this.
1- Mean
•What most people call the average is the mean.
•You find it by adding all the numbers together
and then divide by the total number of data
values.
•The mean is shown by the symbol- x
•The mean is distorted if you have just one
extreme value which can be a problem.
•However, it is the most commonly used as it can
be used for further mathematical processing.
Find the mean of these data
values-
•3, 4, 4, 4, 6, 6, 9.
36 = 5.1
7
x = 5.1
2- The Mode.
•The mode is simply the most frequently
occurring event.
•If we are using simple numbers then the mode is
the most frequently occurring number.
•If we are looking at data on the nominal scale
(grouped into categories) the mode is the most
common category.
•The mode is very quick to calculate, but it cannot
be used for further mathematical processing.
•It is not effected by extreme values.
Find the mode of this data set.
•3, 4, 4, 4, 6, 9.
Mode (most frequently occurring number)=
4
Find the mode of this nominal data.
17Pasture
18Barley
29Wheat
3Fruit
15Vegetables
12Rye
10Clover
HectaresLand Use Mode (Most frequently
occurring category)= wheat.
3- The Median.
•The Median is the central value in a series
of ranked values.
•If there is an even number of values, the
median is the mid point between the two
centrally placed values.
•The median is not effected by extreme
values but it cannot be used for further
mathematical processing.
Find the median of this data set.
3, 4, 4, 4, 6, 9.
Median (central value)= 4.
Now find the median of this data
set.
3, 4, 4, 6, 6, 9.
Median (central value)= 5
Spread around the median and
mean.
•The median, mean and mode all give us a
summary value for a set of data.
•On their own, however, they give us no
idea of the spread of data around the
summary value, which can be misleading.
•For example…
•I collected the following rainfall data.
•The mean for this data is 20mm.
•But that gives an untrue picture of what really happened.
•There is a great “deviation about the mean”.
•Deviation can be measured statistically as follows.
01993
31992
01991
01990
Rainfall
(mm)
Year
971994
Spread around the median: the
interquartile range.
•The Interquartile range is a measure of the
spread of the values around their median.
•The greater the spread the higher the
interquartile range.
Method.
•Stage 1- Place the variables in rank order, smallest to
largest.
•Stage 2- Find the upper quartile. This is found by taking
the 25% highest values and finding the mid-point
between the lowest of these and the next lowest number.
•Stage 3- Find the lower quartile. This is obtained by
taking the 25% lowest values and finding the mid-point
between the highest of these and the next highest value.
•Stage 4- Find the difference between the upper and
lower quartiles. This is the interquartile range, a crude
index of the spread of the values around the median.
•The higher the range the greater the spread.
Over to you.
•Copy out the data on the next slide
•Then find the interquartile range, remembering
to follow all the four stages.
Spread about the mean: Standard
deviation.
•If we want to obtain some measure of the
spread of our data about its mean we
calculate its standard deviation.
•Two sets of figures can have the same
mean but very different standard
deviations.
•Stage 1- Tabulate the values (x) and their
squares (x ² ). Add these values (∑x and
∑x ² ).
•Find the mean of all the values of x (x )
and square it (x ² ).
•Stage 3- Calculate the formula
= ∑x² - x ²
n
Method.
= standard deviation.
= the square root of.
∑ = the sum of.
n = the number of values.
x = the mean of the values.
Over to you.
•Number of vehicles passing a traffic count
point.
•Calculate the standard deviation of the
following data.
8210
759
428
637
706
605
924
803
752
501
Number of vehicles.Day
Answer
•∑ X = 689
•∑ x² = 49 571.
•x = 689 divided by 10 = 68.9
•x ² = (68.9) ² = 4747.2
• = ∑x² - x ² = 49 571 – 4747.2
n 10
= 14.5
Phew!!!!!!
•The higher the standard deviation, the
greater the spread of data around the
mean.
•The standard deviation is the best of the
measures of spread as it takes into
account all of the values under
consideration.
Homework.
•Research the following tests of
significance to find out their meaning.
3.The Mann-Whitney U test.
4.The Chi- Squared (x²) test.