Central Tendency for bio statistics and data analysis
snehasapra
45 views
56 slides
Sep 02, 2024
Slide 1 of 56
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
About This Presentation
Central Tendency for bio statistics and data analysis
Size: 416.86 KB
Language: en
Added: Sep 02, 2024
Slides: 56 pages
Slide Content
Analysis of dataAnalysis of data
MEASURES OF CENTRAL MEASURES OF CENTRAL
TENDENCYTENDENCY
•Condensation of data in to single value mostly it
is at centre & it carries important properties of
data.
•The values of variable tend to concentrate around
some central value of observations of an
investigation, which can be taken as a
representative for whole data. This tendency of
the distribution is known as central tendency.
•The measures devised to consider this tendency
are known as measures of central tendency.
•Also known as Measures of location.
CENTRAL TENDENCYCENTRAL TENDENCY
•Should be rigidly defined.
•Computation should be based on all
observations.
•Should lend itself for algebraic treatment.
•Should be least affected by extreme
observations .
Desirable properties of central Desirable properties of central
tendencytendency
•Averages calculated purely by mathematical
equations are known as mathematical average.
•Main three types:
1. Arithmetic mean (AM)
2. Geometric mean (GM)
3. Harmonic mean (HM)
Mathematical AverageMathematical Average
•It is most common average, used in our day today
life.
•It is the average which is obtained arithmetically.
•It is sum of all observations divided by number
of observations.
•It is denoted by A.M. or
1. Arithmetic mean1. Arithmetic mean
•Mean of ‘n’ observations x
1
, x
2
……. x
n
is
given by
• A.M = sum of all observations (ΣX)
total number of observations (n)
[A] Discrete frequency distribution
• Let the variable X take the values x
1, x
2…….
x
n
and let their frequencies be ƒ
1,
ƒ
2,
ƒ
3,…………..
ƒ
n .
• Then the arithmetic mean is computed by
given formula:
For grouped dataFor grouped data
or
•As we know, frequency distribution, the
frequency is not associated with any specified
single value but spread over entire class. It creates
difficulty for finding values x
1, x
2,…..x
n.
•To overcome this difficulty we make reasonable
assumption that the frequency is associated with
mid values of class or the frequency is distributed
uniformly over the class.
CONTINUOUS FREQUENCY CONTINUOUS FREQUENCY
DISTRIBUTION:DISTRIBUTION:
•we assume mid values as mwe assume mid values as m
11, m, m
22…..m…..m
nn. .
oror
•Write all class interval in 1
st
column and corresponding
frequency in 2
nd
column.
•Mid values = lower limit of C.I.+ Upper limit of C.I.
2
which is put in 3
rd
column.
•Multiply each “ƒ “ by corresponding “m” and write this
product in 4
th
column.
•Addition of this column gives “ƒm”.
METHOD:METHOD:
•it is easy to calculate and understand.
•it is based on all observations.
•it is familiar to common man and rigidly defined.
•it is capable of further mathematical treatment.
•it is least affected by sampling fluctuations hence
more stable.
MERITS OF A.M:MERITS OF A.M:
•Though it seem to be best measure of central
tendency it has certain limitations.
Used only for quantitative data not for qualitative
data like caste, religion, sex.
Unduly affected by extreme observation.
Can’t be used open ended frequency distribution.
Sometimes A.M. may not be represented by any
observation in data series.
Can’t be determined by inspection nor can be
represented graphically.
In extremely skewed distribution arithmetic mean
is not representative of distribution.
DEMERITS OF A.M: DEMERITS OF A.M:
•When data contains few extremely
large or small values in such case
arithmetic mean is unsuitable for data
•GM of positive ‘n’ observation is defined
as positive ‘n’ th root of their product.
2. GEOMETRIC MEAN (GM)2. GEOMETRIC MEAN (GM)
For ungrouped dataFor ungrouped data
n
i
i
xLog
n
AntiLogGM
1
1
For grouped DataFor grouped Data
)(
3321
321 n
ffff
xxxxGM
n
i
ii
xLogf
n
AntiLogGM
1
1
[A] Discrete frequency distribution[A] Discrete frequency distribution
n
f
n
ff
n
mmmGM )(
21
21
n
i
ii mLogf
n
AntiLogGM
1
1
For grouped DataFor grouped Data
[B] Continuous freq. distribution[B] Continuous freq. distribution
•it is determinate, provided all quantities are
positive.
•it is based on all observations.
•It is suitable for arithmetic & algebric manipulation.
•it is gives less weight to large items & more to
small ones than does AM. Thus it is not much
affected by sampling fluctuations.
•It is particularly useful in dealing with ratios, rates
& %.
MERITS OF G.M.MERITS OF G.M.
Can’t be used when any of the quantities are Zero
or negative.
Less easy to understand & calculate than the AM.
It may come out to be a value which is not
existing in the series.
DEMERITS OF G.M. DEMERITS OF G.M.
•It is reciprocal of arithmetic mean of
reciprocal observations.
HARMONIC MEAN:HARMONIC MEAN:
For ungrouped dataFor ungrouped data
•The harmonic mean HM of the positive
real numbers x
1,x
2, ..., x
n is defined to be
n
i iX
n
HM
1
1
n
i i
i
X
f
n
HM
1
For grouped DataFor grouped Data
[A] Discrete frequency distribution[A] Discrete frequency distribution
n
i i
i
m
f
n
HM
1
For grouped DataFor grouped Data
[B] Continuous freq. distribution[B] Continuous freq. distribution
•it is useful to study the rate of respiration, rate of
pulse, heart beat etc. in unit time.
•it is based on all observations.
•it is gives less weight to large items & more to
small ones than does AM.
•It is not much affected by sampling fluctuations.
MERITS OF H.M.MERITS OF H.M.
Can’t be used when any of the quantities are Zero
or negative.
Less easy to understand & calculate than the AM.
DEMERITS OF H.M. DEMERITS OF H.M.
AM > GM > HMAM > GM > HM
Relationship between AM, GM & Relationship between AM, GM &
HMHM
•We are considering that each item in data is of equal
importance. Sometimes , this is not true, some item is
more important than others. In such cases the usual
mean is not good representative of data. Therefore
we are obtaining weighted mean by assigning
weights to each item according to their importance.
•The Weighted mean of the positive real numbers
x
1,x
2, ..., x
n with their weight w
1,w
2, ..., w
n is defined to
be
WEIGHTED MEAN :WEIGHTED MEAN :
n
i
i
n
i
ii
w
xw
x
1
1
AVERAGES OF POSITIONAVERAGES OF POSITION
•Based on the position of an average in a
series of observation arranged in increasing
order of magnitude.
•Averages of position are of two types:
1. Median (M)
2. Mode (Z)
•Definition: when all the observation of a
variable are arranged in either ascending
or descending order of magnitude the
middle observation is called as median.
•It divides whole data into equal portion. In
other words 50% observations will be
smaller than the median and 50% will be
larger than it.
MEDIAN (M)MEDIAN (M)
•When ‘n’ is odd, When ‘n’ is odd,
•When ‘n’ is even, When ‘n’ is even,
For ungrouped dataFor ungrouped data
For grouped data:For grouped data:
[B] Discrete freq. distribution[B] Discrete freq. distribution
For grouped data:For grouped data:
[B] Continuous freq. distribution[B] Continuous freq. distribution
•Obtain class boundaries.
•Find less than cumulative frequencies of all the
classes in data.
•Find the median class.
L
1
= lower boundary of median class
n = total frequency or total no. of observation
C. ƒ. = cumulative frequency of the class previous
to the median class
ƒ
m = frequency of median class
h = class width
For grouped data:For grouped data:
[B] Continuous freq. distribution[B] Continuous freq. distribution
•The median can be obtained graphically from
the ogive curve. For this, plot “ less than” for
the given frequency distribution. Calculate the
value of N/2 and locate it on “Y” axis. Draw a
line from this point which is parallel to “X” axis
to meet the ogive curve. From the point of
intersection drop perpendicular on X axis
•Here median = value where perpendicular
cuts X axis.
Graphical Method:Graphical Method:
Less than and more than Ogive Less than and more than Ogive
The median
turns out to
be 443.94.
•Easy to understand & calculate.
•It can be computed for a distribution with open end classes.
•It is not affected by the extreme observation.
•Applicable for qualitative (color, intelligence, health etc.)
and quantitative data.
•Can be determined graphically.
•The values of only middle items are required to be known.
•It represents an actual item present in data series.
•It can be known without any calculation.
MERITS:MERITS:
1.It is not based on all observations, hence it is not proper
representative.
2.Not rigidly defined as A.M.
3.Not capable of further mathematical treatment.
4.Data must be arrayed. This involves considerable work if
no. of items is large.
5.It can’t be located with precision when items are grouped.
Only can be estimated & estimated value may not be
found in series.
6.Aggregate value of items cannot be obtained when the
Median & no. of items are known.
DEMERITS:DEMERITS:
•The observation which occurs most
frequently in a series is called as MODE
(z).
or
•The value of variable for which the
frequency is maximum.
MODE:MODE:
•Mode is obtained by inspection.
•Mode of the list (0, 1, 2, 3, 3, 3, 4) is 3.
The mode is not necessarily well
defined.
•The list (1, 2, 2, 2, 3, 3, 5) has the two
modes 2 (Major mode) and 3(minor
mode).
Ungrouped data:Ungrouped data:
1.Obtain class boundaries.
2.Locate the model class is the class which has
maximum frequency.
3.Find mode by using formula.
Where,
L
1
= lower boundary of modal class
ƒ
m
= frequency of modal class
ƒ
1
= frequency of pre modal class
ƒ
2
= frequency of post modal class
h= width of modal class
Grouped dataGrouped data
Grouped dataGrouped data
•L
1 = Lower boundary of modal class
•Δ
1 = difference of frequency between
modal class and class before it
•Δ
2 = difference of frequency between
modal class and class after
•H = class interval
hLZ
21
1
1
•L
1 = Lower boundary of modal class
•Δ
1 = difference of frequency between
modal class and class before it
•Δ
2 = difference of frequency between
modal class and class after
•H = class interval
hLZ
21
1
1
Empirical formula for mode (Z)Empirical formula for mode (Z)
•Procedure applicable to unimodal
distribution only.
•Mode can’t be determined if modal
class is at the extreme.
Example:Example:
•Graphical demonstration can be made by
plotting histogram.
Graphical Method:Graphical Method:
1.It is applicable for qualitative & quantitative
type of data.
2.It is not affected by extreme observations.
3.It can be determined even though distribution
has open end classes.
4.It can be obtain graphically.
5.The most likely variate.
6.Only values occurring with high frequencies are
required to be known.
Merits:Merits:
•As compared with mean & median mode has
very limited utility
1.It is not well defined.
2.Arithmetic explaination is not possible.
3.Sometimes it is indefinite.
4.It becomes difficult in multi-modal
distribution.
5.It is not based on all observations of a series.
Demerits:Demerits:
•The values which divide the given data in
to number of equal parts are called the
partition values.
•The most commonly used partition
values are QUARTILES, QUINTILES,
DECILES.
PARTITION VALUES:PARTITION VALUES:
•The values which divide the given data in to
four equal parts when observations are
arranged in order of magnitude are called
as quartiles.
•Obviously there will be three quartiles Q
1,Q
2
& Q
3
.
•Q
1 (1
st
quartile): 25% below & 75% above.
•Q
2
(2
nd
quartile): same as median 50% above
& below.
•Q
3
(3
rd
quartile): 75% below & 25% above.
QUARTILES:QUARTILES:
•Quintiles : It contains four points
so it will divide data in to five equal
parts.
•Deciles : it contain 9 points & it will
divide data in to ten equal parts.
QUINTILES & DECILES:QUINTILES & DECILES:
•It contains 99 points so it will
divide data in to 100 equal parts.
PERCENTILESPERCENTILES