Basics of Biostatistics
Measures of central tendency and dispersion
Size: 1.25 MB
Language: en
Added: Nov 01, 2017
Slides: 60 pages
Slide Content
Measure of central tendency measure of dispersion Dr. Dhaval Chaudhary ( M.V.Sc .~ Scholar ) ( Animal Genetics and Breeding)
Measure of central tendency Central tendency may be defined as value of the variate which is thoroughly representative of the series or the distribution as a whole. The measure of this character of an average depends upon how closely the other observation cling to it.
Characteristics of satisfactory average It should be rigidly defined and not left to be estimated by the investigator. Its computation should be based on all the observation. The general nature of an average should be easily comprehensible. It should not be of too abstract a mathematical characters.
3) An average should be capable of being computed easily and rapidly. 4) An average should be as little affected by the fluctuation of sampling as possible. 5) An average should be able to lend itself readily to algebraic treatment.
Types of average Arithmetic mean Median Mode Geometric mean Harmonic mean Weighted mean
Arithmetic mean This is the only form of average which is of practical importance. Its value depends upon all the observation. If X 1, X 2, X 3………… X N are the N observation, their will be given by following all the expression: = 1 (X 1 + X 2 + X 3 + ………. + X N ) =
Merits and demerits of A.M. It is rigidly defined. It is based on all the observation. It is also least affected by the fluctuations of sampling. _____________________________________________ It is very much affected by the values at extremes. Its value may not coincide with any of the given values. It can not be located on the frequency curve like median and mode nor it can be obtained by inspection.
Median When all the observation are arranged in ascending or descending order of magnitude, the middle is known as the median.
Merits and demerits of medium It can be readily calculated and rigidly defined. It can be easily and readily obtained even if the extreme values are not known. Median always remains the same whatsoever method of computation be applied. ________________________________________ It fails to remain satisfactory average when there is great variation among the item of population. It can not be precisely expressed when it falls between two values. It is more likely to be affected by fluctuation of sampling.
mode This is that value of the variable which occurs most frequently or whose frequency is maximum. Also, if several samples are drawn from a population, the important value which appear repeatedly in all the sample is called the mode. The mode is 6 (it occurs most often)
Merits of mode It can be obtained simply by inspection. Neither the extremes are needed in its computation nor it is affected by them. As it is the item of the maximum frequency , the same item is the mode in every sample of the population. This is the peculiarity which is present only in mode and not in any other average.
demerits of mode In many cases, there is no single and well defined mode. When there are more than one mode in the series it becomes difficult and takes much time to compute it. It computation is not based on all the observation. It, when multiplied by the number of observation, does not give the total of all the observation, as it is the case with airthmetic mean.
Geometric Mean Airthmetic mean, although gives equal weightage to all the items, has got a tendency towards the higher values. Sometimes we want an average having a tendency towards the lower values. In such case we take the help of geometric mean. Geometric mean of given series is always less than its airthmetic mean. It is defined by the following relations.
If a 1 , a 2 , a 3 , a n are N individual of a certain data and (G.M) their geometric mean, E.g.,
Its Computation is not possible without the help of logarithm. Therefore, using logarithm, we have log(G.M.)=(f l logX 1 + f 2 logX 2 +………..+ f k logX k ) =
Merits and demerits of geometric mean. This average is also rigidly defined and its computation is based on all the observation. ________________________________________ Its computation is not so easy as that of the airthmeic mean. As it is of too abstract a mathematical character, it is not widely used. It is difficult in its computation. If any item of the series is zero, G.M. becomes zero, and if there are certain items which are negative, it becomes meaningless even after computed.
Harmonic mean Harmonic mean of a number of quantities is the reciprocal of the airthmetic mean of their reciprocal.
MERITS AND DEMERITS OF H.M. It is rigidly defined and the calculation is based on all the observation. This is also not much affected by fluctuations of sampling. ________________________________________ Harmonic mean is neither easily calculated nor comprehensible. As it gives high weightage to smaller values it is not very useful in the analysis of the economical data. It is not a good representative of any set of observation unless the smaller values are to be given high weightage .
Weighted mean In the calculation of the arithmetic mean every item is given equal importance or is equally weighted. But sometimes it so happens that all the items are not equal importance. At that time they are given proper weights according to their relative importance, and then the average which is calculated on the basis of these weights is called the weighted average or weighted mean .
Formula Weighted Mean
It is especially useful in the following cases:- When the number of individuals in different classes of a group are widely varying. When the importance of all the items in a series is not the same. When the ratios, percentages or rates (e.g. quintals per hectare, rupees per kilogram, or rupees per meter etc.) are to be averaged. When the means of a series or group is to be obtained from the means of its component parts. Weighted mean is particularly used in calculating birth rates, death rates, index numbers, average yield, etc.
example The data on the length (mm) of 20 types of wools are given below. Find the Arithmetic Mean, Geometric Mean, Harmonic Mean, Median and Mode . 138, 138, 132, 149,164, 146, 147, 152,115, 168, 176, 154,132, 146, 147, 140,144, 161, 142, 145
A n s w e r Step 1: Formulate a table of following type Sr. No. Fibre Length Log X 1/X 1 132 2.12 0.0076 2 132 2.12 0.0076 3 138 2.14 0.0072 4 138 2.14 0.0072 5 140 2.15 0.0071 6 142 2.15 0.0070 7 144 2.15 0.0069 8 145 2.16 0.0069 9 146 2.16 0.0068 10 146 2.16 0.0068 11 147 2.17 0.0068 12 147 2.17 0.0068 13 149 2.17 0.0067 14 150 2.18 0.0067 15 152 2.18 0.0066 16 154 2.19 0.0065 17 161 2.21 0.0062 18 164 2.21 0.0061 19 168 2.23 0.0060 20 176 2.25 0.0057 Total 2971 43.41 0.1352
Airthmetic Mean(AM) :
Geometric Mean:
Harmonic Mean:
Median: The data in the ascending order of magnitude is 132,132,138,138,140,142,144,145,146, 146,147 , 147,149,150,152,154,161,164,168,176
Mode: The given set of data is polymodal type . 132,138,146 and 147 are Repeated twice. Hence there are four modes.: Mode: 132 138 146 147
Measure of Dispersion It is quite obvious that for studying a series, a study of the extent of scatter of the observation of dispersion is also essential along with the study of the central tendency in order throw more light on the nature of the series. Simply dispersion (also called variability , scatter , or spread ) is the extent to which a distribution is stretched or squeezed.
Different measures of dispersion Range Mean deviation Standard deviation Variance Quartile Deviation Coefficient of Variation Standard Error
Range Range is the simplest measure of dispersion. It is the difference the between highest and the lowest terms of a series of observations. Range = X H – X L where , X H = Highest variate value X L = Lowest variate value
Properties Its value usually increases with the increase in the size of the sample. It is usually unstable in repeated sampling experiments of the same size and large ones. It is very rough measure of dispersion and is entirely unsuitable for precise and accurate studies. The only merits possessed by ‘Range’ are that it is ( i ) simple, (ii) easy to understand (iii) quickly calculated.
Mean deviation The deviation without any plus or minus sign are known as absolute deviations . The mean of these absolute deviations is called the mean deviation . If the deviations are calculated from the mean, the measure of dispersion is called mean deviation about the mean .
Characteristics of mean Deviation A notable characteristic of mean deviation is that it is the least when calculated about the Median . Standard deviation is not less than the mean deviation in a discrete, i.e., it is either to or greater than the M.D. about Mean When a greater accuracy is required, standard deviation is used as a measure of dispersion. When an average other than the A.M. Is calculated as a measure of central tendency M. D. about that average is the only suitable measure of disperation .
Standard deviation Its calculation is also based on the deviations from the arithmetic mean. In case of mean deviation the difficulty, that the sum of the deviations from the arithmetic mean is always zero, is solved by taking these deviation irrespective of plus or minus signs. But here, that difficulty is solved by squaring them and taking the square root of their average.
It is thus defined by the following expression. Standard Deviation (S. D.) Where, X = An observation or variate value µ = Arithmetic mean of the population N = Number of given observations
Sample(S)=
Characteristics and uses of S.D. It is rigidly defined. Its computation is based on all the observation. If all the variate values are the same, S.D.=0 S.D. is least affected by fluctuations of sampling. ________________________________________ It is used in computing different statistical quantities like, regression coefficients, correlation coefficient, etc.
Variance Variance is the square of the standard deviation. Variance= (S. D.) 2 This term is now being used very extensively in the statistical analysis of the results from experiments. The variance of a population is generally represented by the symbol σ² and its unbiased estimate calculated from the sample, by the symbol s² .
Quartile Deviation or Semi-Inter-Quartile Range. This measure of dispersion is expressed in terms of quartiles and known quartile deviation or semi-inter-quartile range. where, Q 1 Lower Quartile Q 3 Lower Quartile
It is not a measure of the deviation from any particular average. For symmetrical and moderately skew distribution the quartile deviation is usually two-third of the standard deviation. Q.D. =
Coefficient of Variation This is also a relative measure of dispersion, and it is especially important on account of the the widely used measure of central tendency and dispersion i.e., Arithmetic Mean and Standard deviation. It is given by C. V. = It is expressed in percentage, and used to compare the variability in the two or more series.
Standard Error The term ‘ Standard error ’ of any estimate is used for a measure of the average magnitude of the difference between the sample estimate and the population parameter taken over all possible samples of the same size, from the population. This term is applied for the standard deviation of the sampling distribution of any estimate. If S be the standard deviation of the sample size N, the estimate of the standard error of mean is given by
Example 1 Find Range, Quartile Deviation, Mean Deviation about x and Standard Deviation and their relative measure for the following data: 1.3, 1.1, 1.0, 2.0, 1.7, 2.0, 1.9, 1.8, 1.6, 1.5 ANSWER: On arranging the above data in ascending order, we get 1.0, 1.1, 1.3, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.0 Range : H-L= 2.0-1.0 = 1.0
Quartile Deviation:
Mean Deviation about Mean
S. No. X i x i - (x i - ) 2 1 1.0 0.59 .3481 2 1.1 0.49 .2401 3 1.3 0.29 .0841 4 1.5 0.09 .0081 5 1.6 0.01 .0001 6 1.7 0.11 .0121 7 1.8 0.21 .0441 8 1.9 0.31 .0961 9 2.0 0.41 .1681 10 2.0 0.41 .1681 Total 15.9 2.92 1.169 Formulate table of this kind.
Mean Deviation: Standard Deviation:
Co-efficient variation:
Example 2 Calculate Range, Quartile Deviation, Mean Deviation about Mean, Median and Mode, Standard Deviation and their relative measures for the following data. Lactation Period No. of Cows 120-130 7 130-140 11 140-150 21 150-160 11 160-170 7 170-180 3
Answer: Formulate a table of this kind
Range: Quartile Deviation:
Mean Deviation:
Standard Deviation: Co-efficient of variation
Normal Distribution The equation of the normal curve is
Normal Distribution Curve
Properties of a Normal Distribution The distribution curve is symmetrical about thr mean µ and falls rapidly on both the side tailing off asymptotically to the X -axis in both direction(i.e., the X -axis is tangent to the curve at infinity). There are only two independent parameter, µ and σ . Here, Mean = Median = Mode= µ
The first and third moment about the mean are zero i.e., µ 1 = 0 and µ 3 = 0. The second moment about the mean is σ 2, the variance of the distribution, µ 2 = σ 2. The fourth moment about the mean is 3 σ 4 i.e., µ 4 = 3 σ 4 . In the normal distribution, β 1 = 0 and β 2 = 3.
( i ) The range µ ± σ includes about the 68% of the observation. (ii) The range µ ± 2σ includes about the 95% of the observation. (iii) The range µ ± 3σ includes about the 99% of the observation. A remarkable property of normal distribution is that sums and difference of normally distributed variables are also normally distributed.