Verian John Sudario , RMT Biostatistics and Epidemiology Professor
I. SUMMARIZING NUMERICAL DATA WITH NUMBERS
MEASURE OF CENTRAL TENDENCY
MEA S U R E S O F CENTRAL T END E N C Y US E D I N MEDI C I N E & E P IDEM I O L O G Y Mean 2 . Median 3.Mode
1 . MEAN The a v e rag e Su m of t he v a l u e s d i v i d e d b y t he nu m b e r of values w h e re x d e not e s t h e v a l u e s of t h e v ar i able E (si g ma)= ‘ t he su m of’ n i s t h e numb e r of obse r v a t i ons
2 . MEDIAN t h e v a l ue t h a t d iv i d e s t h e d i st r i b u t i o n i n h a l f . If the observations are arranged i n i nc r e a s i ng or d e r, t h e m e d i a n i s t he m i dd l e obs e r v a t i on 3 . MODE - V a l u e w hich m ost oc c u r of t e n
EXA M P L E P l asm a v o lum e s of e i ght h e a l t h y ad u l t ma l e s: 2 . 7 5 , 2 . 8 6 , 3 . 3 7 , 2 . 7 6 , 2 . 6 2 , 3 . 4 9 , 3 . 5 , 3 . 1 2 ( L i t ers) Find: Mean Median Mode A r r a n g e y o u r d a ta fir s t: 2 . 6 2 2 . 7 5 2 . 7 6 2 . 8 6 3 . 5 3 . 1 2 3 . 3 7 3 . 4 9 ANS W E R: 1 . M e a n : 3 .0 2 . M ed i a n : 8 + 1 / 2 = 4 . 5 th o b s e rv a t i o n =2.86+3.05/2 = 2 . 9 6 3 . MO D E : N o ne
REMEMBER: If the mean and the median are equal, the observation is symmetric If the mean is larger than the median, skewed to the right If the mean is smaller than the median, distribution is skewed to the left
REMEMBER: The mean, median and mode are, on average, equal when the distribution i s sy m m e tr i c a l a n d un i m o d a l . When the distribution is positively skewed, a geometric mean may be more appro pr i a t e t h a n the ar i t h m e t i c m e an
GUIDELINES WHICH CENTRAL TENDENCY IS BEST The mean is used for numerical data and for symmetric distributions The median is used for ordinal data or for numerical data if the distribution i s sk e w e d The mode is used primarily for bimodal distribution The geometric mean is generally used for observations measured on a logarithmic scale.
MEASURE OF SPREAD
Measures of Spread/Variation Range St a n d a r d D e v i a t i on Z-Scores St a n d a r d Err o r of t he M e an C onf i d e nce I n t e r v als
1 . RANGE difference between the largest and smallest values. Its disadvantage is that it is based on only two of the observations and gives no idea of how the other observations are arranged between these two.
INTERQUARTIL E RANGE range indicates the spread of the middle 50 % o f the d i s tr i b ut i o n , a n d t o g e th e r w i th the m e d i a n is a u s e ful a d j un c t to the range.
2 . STANDARD DEVIATION Describe how observations cluster around the mean and many statistical tests . Measure of the spread of data about their mean. Variance: measure of variation
COEFFICIENT OF VARIATION expresses the standard deviation as a percentage of the sample mean. a statistical measure of the relative dispersion of data points in a data series around the mean. REMEMBER:
The sample mean will not be exactly equal to the population mean. The theoretical distribution called the sampling distribution gives us the spr e a d of v a l u e s w e w ou l d g e t i f w e t ook a l a r ge nu m b e r of a dd i t i on a l samples; this spread depends on the amount of variation in the underlying population and on our sample size . Sampling Variation REMEMBER:
3 . STANDAR D ERROR measures how precisely the population mean is estimated by the sample mean. The s i z e o f the stand a rd e r r o r d e p e n d s bo t h on how much v ar i a t i on t h e re i s i n t h e popul a t i on and on t h e s i ze of t h e samp l e . The larger the sample size n, the smaller is the standard error.
SOLVE: The m e a n of t h e e i g h t p l asm a v o lum e s sho w n i n pr e v i ous t ab l e i s 3 . 00 l i t r e s a n d t he s t an d ar d d e v i a t i on i s . 31l i t r e s . C om put e f or t he sta n d a r d e rr o r.
STANDARD DEVIATION tells us how much variability can be expected among individuals STANDARD ERROR of the mean , however, is the standard deviation of the means in a sampling distribution, it tells us how much variability can be expected among means in the future samples. REMEMBER:
MEASURE OF SHAPE
MEASUR E O F SHAPE The N o r m a l Di s tr i b ut i o n Skewness Kurtosis
1 . THE N ORMAL DISTRIBUTION
WHY IS IT IMPORTANT? it can be shown that the sampling distribution of a mean is normal, even when the individual observations are not normally distributed, provided that t h e samp l e s i ze i s not t o o sma l l . It means that calculations based on the normal distribution are used to derive confidence intervals The normal distribution also underlies the calculation of P-values , which are u s e d to t e s t hypo t h e ses
A. EQUATIO N OF THE NORMAL CURVE Y= vertical height of the point of the normal distribution
EXAMPL E 1:
EXAMPL E 2: y is expressed as a proportion and the total area under the curve
is a normal distribution with a mean of zero and standard d e via t io n of 1. T he s t a n d a r d n o r m a l d i s tr i b uti o n i s c e nte r e d a t z e r o a nd the degree to which a given measurement deviates from the mean is given by the standard deviation . STANDARD NORMAL DISTRIBUTION
A R EA U N DE R T H E CU R V E OF S T A N DA R D N O R M AL DISTRIBUTION Since the area under the standard curve = 1, we can begin to define precisely the probabilities of specific observation. REMEMBER:
STANDARD NORMAL DISTRIBUTION A r e a of s t a nd a rd curve = 1 The SND expresses the value of a variable in terms o f the nu m b e r o f s t a n d a rd d e v i a t i o n s i t is a wa y fr o m the m e a n. For a n y g i v e n Z - s c o re w e ca n c o m p ut e th e a r e a under the curve to the greater of or less of a specified value or "Z value". B e c a u s e t he s t a n d a rd cu rv e i s 1, W e c a n u s e t he co m pu t e d ar e a cu rv e for c a l cu l a t in g t he p rob a b i l i t i e s!
A R E A IN UPP E R TAIL OF DISTRIBUTION The proportion of men who are taller than 180cm may be derived from the proportion of the area u nd e r t he no r m a l f r e qu e ncy d is t ri b u t io n curve t h a t i s a b ove 180cm Wh a t a r e a cu r v e i s a b o v e 1 80 c m ?? ? EXAMPLE
ANSWER: We conclude that a fraction . 9 5 1 , o r equiv a l e ntl y 9 . 5 1 % , of adult men are taller than 180cm
A R EA IN L O WER TAIL OF DISTR I B U TION The proport i o n o f m e n s ho r ter than 1 6 cm W h a t a r e a of t he cur v e i s b e l ow 1 6 0c m ? ? ?
AREA OF DISTRIBUTION BETWEEN TWO VALUES The proportion of men with a height between 165cm and 175cm Estimate it by finding the proportions of men shorter than 165cm and taller tha n 175 c m a n d s u b tract i n g the s e fr o m 1 1 . SN D co r r e spond i ng to 165cm P r op o r t i on b e l ow t h i s h e i ght i s : 2 . SN D co r r e spond i ng to 175cm P r op o r t i on ab o v e t h i s h e i ght i s : 3 . P r op o r t i on of m e n w i t h h e i gh t s b e t w ee n 165c m a n d 175c m :
Wh a t a r e a curv e is b etw e e n 16 5 a n d 175 c m ? ?
MEA S UR E OF S K EWN E SS
Describe the extent of peakness or flatness of the d i s t r i b u t i on of t h e data Measured by Coefficient of kurtosis (K) computed as: MEASURE OF KURTOSIS