Statistical Methods

guest9fa52 36,955 views 30 slides Mar 04, 2009
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

No description available for this slideshow.


Slide Content

Statistical Methods.

Why Statistics.
•Statistics is used to take the analysis of
data one stage beyond what can be
achieved with maps and diagrams.
•You can gain a primitive insight into
patterns at a glance but mathematical
manipulation usually gives greater
precision.
•This allows us to discover things which
might otherwise go unnoticed.

The need for justification.
•Justifying mathematical manipulation is
vital.
•It is vital to be aware that statistics is an
aid to analysis and no more.
•Too often students make statistical
calculations in geographical projects
without adequate justification.
•Before statistics is used it is essential to
ask yourself two questions.

Question 1.
•Why am I using this technique?
•In the exam be absolutely clear what it is a
statistical test can prove and how a
statistical test can do this.

Question 2.
•Is the data appropriate to this particular
technique?
•Each technique requires data to be arranged in
a particular form.
•If they aren’t the technique cannot be used.
•If your data is not good in the first place the use
of a complex statistical technique will not help
you
“Rubbish in- Rubbish out”

Mean, Mode, Median.
•To be used when faced with a large
amount of data
•For example- average temperature of a
place every day for two years.
•It makes things far easier when we can
summarise it.
•This is relatively easy to do and there are
three common methods to achieve this.

1- Mean
•What most people call the average is the mean.
•You find it by adding all the numbers together
and then divide by the total number of data
values.
•The mean is shown by the symbol- x
•The mean is distorted if you have just one
extreme value which can be a problem.
•However, it is the most commonly used as it can
be used for further mathematical processing.

Find the mean of these data
values-
•3, 4, 4, 4, 6, 6, 9.
36 = 5.1
7
x = 5.1

2- The Mode.
•The mode is simply the most frequently
occurring event.
•If we are using simple numbers then the mode is
the most frequently occurring number.
•If we are looking at data on the nominal scale
(grouped into categories) the mode is the most
common category.
•The mode is very quick to calculate, but it cannot
be used for further mathematical processing.
•It is not effected by extreme values.

Find the mode of this data set.
•3, 4, 4, 4, 6, 9.
Mode (most frequently occurring number)=
4

Find the mode of this nominal data.
17Pasture
18Barley
29Wheat
3Fruit
15Vegetables
12Rye
10Clover
HectaresLand Use Mode (Most frequently
occurring category)= wheat.

3- The Median.
•The Median is the central value in a series
of ranked values.
•If there is an even number of values, the
median is the mid point between the two
centrally placed values.
•The median is not effected by extreme
values but it cannot be used for further
mathematical processing.

Find the median of this data set.
3, 4, 4, 4, 6, 9.
Median (central value)= 4.

Now find the median of this data
set.
3, 4, 4, 6, 6, 9.
Median (central value)= 5

Spread around the median and
mean.
•The median, mean and mode all give us a
summary value for a set of data.
•On their own, however, they give us no
idea of the spread of data around the
summary value, which can be misleading.
•For example…

•I collected the following rainfall data.
•The mean for this data is 20mm.
•But that gives an untrue picture of what really happened.
•There is a great “deviation about the mean”.
•Deviation can be measured statistically as follows.
01993
31992
01991
01990
Rainfall
(mm)
Year
971994

Spread around the median: the
interquartile range.
•The Interquartile range is a measure of the
spread of the values around their median.
•The greater the spread the higher the
interquartile range.

Method.
•Stage 1- Place the variables in rank order, smallest to
largest.
•Stage 2- Find the upper quartile. This is found by taking
the 25% highest values and finding the mid-point
between the lowest of these and the next lowest number.
•Stage 3- Find the lower quartile. This is obtained by
taking the 25% lowest values and finding the mid-point
between the highest of these and the next highest value.
•Stage 4- Find the difference between the upper and
lower quartiles. This is the interquartile range, a crude
index of the spread of the values around the median.
•The higher the range the greater the spread.

Over to you.
•Copy out the data on the next slide
•Then find the interquartile range, remembering
to follow all the four stages.

5December
7November
11October
15September
17August
17July
15June
12May
9April
7March
5February
4January
Average temperatureMonth

Answer
•Ranked the data looks like this.
25 5 7 7 9 11 12 15 15 17 17

Lower Quartile Median Upper Quartile
6 10 15
Interquartile range: (15-6) = 9.

Spread about the mean: Standard
deviation.
•If we want to obtain some measure of the
spread of our data about its mean we
calculate its standard deviation.
•Two sets of figures can have the same
mean but very different standard
deviations.

•Stage 1- Tabulate the values (x) and their
squares (x ² ). Add these values (∑x and
∑x ² ).
•Find the mean of all the values of x (x )
and square it (x ² ).
•Stage 3- Calculate the formula
= ∑x² - x ²
n
Method.

= standard deviation.
= the square root of.
∑ = the sum of.
n = the number of values.
x = the mean of the values.

Over to you.
•Number of vehicles passing a traffic count
point.
•Calculate the standard deviation of the
following data.

8210
759
428
637
706
605
924
803
752
501
Number of vehicles.Day

Answer.
6 72482
5 62575
1 76442
3 96963
4 90070
3 60060
8 46492
6 40080
5 62575
2 50050
x² x

Answer
•∑ X = 689
•∑ x² = 49 571.
•x = 689 divided by 10 = 68.9
•x ² = (68.9) ² = 4747.2
• = ∑x² - x ² = 49 571 – 4747.2
n 10
= 14.5

Phew!!!!!!
•The higher the standard deviation, the
greater the spread of data around the
mean.
•The standard deviation is the best of the
measures of spread as it takes into
account all of the values under
consideration.

Homework.
•Research the following tests of
significance to find out their meaning.
3.The Mann-Whitney U test.
4.The Chi- Squared (x²) test.
Tags