Intro.. Just as we use measures of location to identify the center of the data, measures of dispersion show the extent of the spread outwards The commonest measures of dispersion are the range , the standard deviation and the coefficient of variation (CV)
Range The range is the difference between the highest (Maximum) and lowest (Minimum) value in a set of observations. In bio-statistics the range is often reported by the minimum and maximum values such as “from (minimum) to (Maximum)” in the example of height in meters of women: 1.72, 1.6, 1.5, 1.4, 1.6, 1.5, 1.7, 1.55 , we demonstrate how to find the range;
Cont.. Step 1: Arrange data in ascending/descending order 1.4, 1.5, 1.5, 1.55, 1.6, 1.6, 1.7, 1.72 Step 2: Identify the minimum and maximum Minimum =1.4 Maximum=1.72 Step 3: Calculate the range; the difference between the minimum and maximum figure 1.72-1.4=0.32
Percentiles, quartiles and Inter quartile range A value is said to be an nth percentile in a given set of observations if n percent of the observations fall at or below it. In other words a percentile is any of the 99 values that divide the values in a set of data into 100 equal parts, so that each part represents 1/100th of the sample or population Common percentiles are the 25th, the 50th and the 75th. A 50th percentile is a median. The 25th and 75th percentiles are also known as lower and upper quartiles respectively
Cont.. The 50th percentile (median) is a measure of location/ central tendency while other percentiles are normally used to gauge the spread of the data One of the commonest use of percentiles is in computation of the inter quartile range
Example The distance in kilometers of 11 health units from the district headquarters is 6, 47, 49, 15, 43, 41, 7, 39, 43, 41, and 36. We can calculate the inter quartile range from lower and upper quartiles using the steps below Step 1: Arrange in ascending order 6, 7, 15, 36, 39, 41, 41, 43, 43, 47, 49 Step 2: Identify the position of the lower and upper quartile
Cont.. Position of lower quartile Q 1 = =3 Position of upper quartile Q 2 = =9 Step 3: Identify the values that occupy positions Q 1 and Q 2 Like in a median, if a value occupies the position Q 1 then the value is the lower quartile Similarly , if it occupies position Q 2 then it is the upper quartile The value for position Q 1 =15 and the value for position Q 2 =43
Cont.. Step 3: Identify the values that occupy positions Q 1 and Q 2 Like in a median, if a value occupies the position Q 1 then the value is the lower quartile. Similarly, if it occupies position Q 2 then it is the upper quartile The value for position Q 1 =15 and the value for position Q 2 =43 Step 4: Calculate the inter quartile range Subtract the value on position Q 1 from the value on position Q 2 : 43-15=28
Cont.. If the quartile lies between observations, the value of the quartile is the value of the lower observation plus the specified fraction of the difference between the observations . For example if the position of a quartile 20¼ it lies between 20th and 21st observations So the quartile is the value of the 20th observation plus ¼ the difference between the 20th and 21st observations Suppose we had another distance of a health unit as 51km, then Q 1 =3¼ and Q 2 =9.75
Cont.. Then the lower quartile would be in position between the 3rd (15) and 4th (36) values; this will be 15 + ¼ X (36-15) = 20.25 The upper quartile will be between the 9th (43) and 10th (47) values; this will be 43+ ¾ X (47-43) =46 The inter quartile range = 46-20.25 = 25.75
Variance and Standard Deviation Variance and standard deviation measure how far observations are from the expected value or the mean. The variance is obtained by summing up squared differences of the observations from the mean and dividing by n -1 where n is the number of observations The differences add to Zero and when they are squared they all become positive numbers
Cont.. The standard deviation is the square root of the variance. The standard deviation is the most commonly used measure of statistical dispersion It is non-negative and has the same units as the data. The standard deviation of the population is symbolized by s while for the sample it is designated as ‘ s
Cont.. Calculate the mean Compute the difference between each observation from the mean ( - ) Square the differences – when we square the differences, we eliminate the negative signs and therefore, our sum cannot be zero Get the sum of the squared differences å
Cont.. Since the data is a sample, divide the sum (from step 4) by the number of observations minus one, i.e. ( n -1) (where n is equal to the number of observations in the data set). [The term ( n -1) will later be called degrees of freedom ]. When we do so, we obtain the sample variance , usually given as On the other hand the population variance, σ2 is obtained by dividing the sum of the squared differences by the total number of observations, or population size, N
Cont.. Since we have all along been dealing with squared differences, we now obtain the square root of the variance The standard deviation is the square root of the variance
Presentation of descriptive data Suppose that we have 12 students in a bio-statistics course who have each achieved a score in a test We can present this information in a straight forward and simplistic way e.g. arranged in order from the smallest to the highest as: 61, 69, 72, 76, 78, 83, 85, 85, 86, 88, 93 and 97
Cont.. However, this poses problems as it is; Too detailed Too broad Difficult to interpret Plausible for small data sets There are available methods for summarizing this information in a way that the data set can be presented more meaningfully
Cont.. We can summarize nominal and ordinal data (categorical data) using numbers. We use; Frequencies Ratios, rates and proportions These measures can also be used for numerical data that has been grouped e.g. age categories A commonly used form of proportions is percentages
Tables and graphical presentations We can use tables and graphs to display data. This depends on whether it is numerical or categorical. We can employ; Frequency tables: We subdivide numerical data into classes e.g. age groups, and indicate the counts in each group Histograms: They mainly show area. The continuous variable of interest is on the x-axis, usually in grouped form based on ranges. The size of the ranges must be uniform. The frequency of occurrence is on the y-axis
Cont.. Frequency polygons: A derivative of histograms in which a line is drawn to indicate the frequencies. They are preceded by a frequency table. They are useful when comparing two distributions on the same graph Line graphs: They indicate the variation of one discrete continuous variable with another
Cont.. Scatter plots Stem and Leaf plots Box and whisker plots