Fundamentals of biostatistics

15,741 views 69 slides Aug 02, 2013
Slide 1
Slide 1 of 69
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69

About This Presentation

No description available for this slideshow.


Slide Content

Kingsuk Sarkar , MD Asst. Prof. Dept. of Community Medicine , DSMCH Fundamentals of biostatistics

Common statistical terms statistics: It refers to the subject of scientific activity dealing with the theories and methods of collection, compilation, analysis and interpretation of data. Bio-statistics: An art & science of collection, compilation, analysis and interpretation of data. Data (sing. Datum ): A set of observations, usually obtained by measurement or counting

Classification of data- Qualitative/Attribute Quantitative/Variable: Continuous & Discreet Qualitative Data: Can not be expressed in number Not measurable Can only be categorized under different categories & frequencies E.g., Religion is an attribute; can be categorized into Hindu, Muslim, Christian Human Blood Group: A,B,AB or O Sex: M/F

Quantitative Data/variable : In statistical language, any character, characteristic or quality that varies is called variable It has got magnitude Continuous variable: It is expressed in numbers & can be measured Can take up infinite no. of values in a certain range E.g., weight, height, blood sugar

Discreet variable: Countable only Takes only some isolated values E.g., numbers of a family members, no. of workers in a factory, no. of persons suffering from a particular disease According to source- Primary Data Secondary Data

Primary Data: Collected directly from the field of enquiry original in nature E.g., measurement of BP, weight, height, blood sugar Secondary Data: Collected previously by some other agency/organization Used afterwards by another E.g ., hospital records, census data

Nominal scales Ordinal Scales Interval Scales Ratio Nominal Scales: Used when data are classified by major categories or subgroups of population Religion can be assigned to following categories- Muslim, Hindu, Christian Outcome of treatment: cured or not cured; died or survived

Ordinal Scales: Assign rank order to categories placed in an order E.g., students rank in a class; Grades A,B,C,D; Literacy status : illiterate, just literate, primary, secondary, higher secondary, graduate, post graduate Disease condition: mild, moderate, severe Interval Scale: Distance between two measurement is defined, not their ratio E.g ., intelligence score in IQ tests, temperature in Centigrade

Ratio Scale: Both the distance & ratio between two measurements are defined E.g ., length, weight, incidence of disease, no. of children in a family Dichotomy/ Binary Scale: A scale with only two categories E.g ., disease→ present/absent; sex→male /female Population: - An aggregate of objects, animate or inanimate, under study A group of units defined according to aims & objective of the study Sample: a finite subset of or part of population Every member of population should have equal chance to be included in sample

Parameter : constant , describes the characteristics of population Statistic : Function of observation, which describes a sample Statistic Parameter Mean x (x bar) µ (Mu) Standard Deviation s s (sigma ) No. of Subject n N Proportion P P

Sources of data Main sources for collection of medical statistics are: Experiments: Performed in the laboratories of physiology, biochemistry, pharmacology,, clinical pathology Hospital words→ for investigations & fundamental research Used in preparation of thesis/dissertation, scientific paper for publication in scientific journals & books Surveys: Carried out for epidemiological studies in the field by trained teams to find out incidence or prevalence of health or disease situations in a community Used in OR→ assessment of existing condition, how to follow a program, to study merits of different methods adopted to control of a disease Provide trends in health status, morbidity, mortality, nutritional status, health practices, environmental hazards Provide feedback needed to modify policy Provide timely earning of public health hazards

Records: Maintained as a routine in registers or books over a long period of time Used for keeping vital statistics: births, deaths, marriage, hospitalization following illness, Used in demography & public health practices Collected data are qualitative

Presentation of data DATA INFORMATION Statistical data is presented usually in tabular forms through different types of tables and in pictorial forms; diagrams, charts Method of presentation: Tabulation Drawing Consolidation & summarization

Tabular presentation : A form of presenting data from a mass of statistical data at first frequency distribution table is prepared Table can be simple or complex Frequency distribution table or frequency table : All frequencies considered together form “frequency distribution” No of person in each group is called the frequency of that group Frequency distribution table of most biological variables develop normal, binomial or Poisson distribution.

For qualitative data- Here is no notion of magnitude or size of attribute

Presentation of quantitative data is more cumbersome as Characteristic has a measured magnitude as well as frequency Table x: presentation of quantitative data of height in markings Height of groups in Cm Markings Frequency of each group 160-162 //// //// 10 162-164 //// //// //// 15 164-166 //// //// //// // 17 166-168 //// //// //// //// 19 168-170 //// //// //// //// 20 170-172 //// //// //// //// //// / 26 172-174 //// //// //// //// //// //// 29 174-176 //// //// //// //// //// //// 30 176-178 //// //// //// //// // 22 178-180 //// //// // 12 Total 200

Data needs consolidation by way of tabulation to express some meaning Tabulation → a process of summarizing raw data & displaying it in a compact form for further analysis Orderly management of data in columns & rows

General Principle in designing Table: Table should be numbered Brief & self-explanatory title should be there mentioning time, place, person Headings of columns & rows should be clear & concise Data to be presented according to size of importance chronologically, alphabetically, geographically Data must be presented meaningfully Table should not be too large Foot notes given, if necessary Total no of observations ; the denominator should be written Information obtained should be summarized in the table

Frequency distribution drawings: After classwise or groupwise tabulation, the frequencies of a charecteristics can be presented by two kinds of drawings Graphs & Diagrams May be shown by either lines, dots, figures Presentation of quantitative data is through graphs Presentation of qualitative, discreet, counted data is through diagrams

Presentation of Quantitative data: Histogram Graphical presentation of frequency distribution Variable characters of different groups are indicated in the horizontal line (x-axis) is called abscissa No. of observations marked on the vertical line (y-axis) is called ordinate Frequency of each group forms a triangle

2. Frequency Polygon: An area diagram of frequency distribution developed over a histogram Mid points of the class intervals at the height of frequency are joined by straight lines It gives a polygon, figure with many angles

3 . Frequency Curve: If no. of observation are very large & group interval reduced Frequency polygon tends to loose its angulation Gives rise to a smooth curve → frequency curve

4. Line Chart or Graph: A frequency polygon presenting variation by lin Shows trend of event occurring over a period of time Shows rise, fall or periodic fluctuations vertical axis may not start from zero, but some point above frequency

5 . Cumulative Frequency Diagram or “ Ogive ” Graph of the cumulative frequency distribution An ordinary frequency distribution table→ relative frequency table Cumulative frequency : total no. of persons in each particular range from lowest value of the characteristic up to & including any higher group value

6. Scatter or Dot Diagram: Prepared after tabulation in which frequencies of at least two variables have been cross classified Shows nature of correlation between two variable character in same person(s)( e.g., height & weight) Also called correlation diagram

Presentation of illustration of qualitative data Bar Diagram: Graphically present frequencies of different categories of qualitative data Vertical/ horizontal May be descending/ascending order Widths should be equal Spacing between bars should also be equal Simple Bar Diagram: - Each bar represents frequency of a single category with a distinct gap from one another

ii. Multiple bar diagram:- Used to show comparison of two or more sets of related statistical data iii. Component/ proportional bar diagram: Used to compare sizes of different component parts among themselves Also shows relation between each part & the whole

2. Pie/ sector Diagram: A circle whose area is divided into different segments by different straight lines from cenre to circumference Each segment express proportional components of the attributes Angle ( ◦) of a sector is calculated by Class frequency X 3.6 or (Class frequency/total frequency)X 360

3. Pictogram/ Picture Diagram: - A popular method to denote the frequency of the occurrence of events to common man such as attacks, deaths, number operated, admitted, discharged, accidents, etc. in a population.

4. Map diagram/ spot Map: These diagrams are prepared to visualize the geographic distribution of frequency of characteristics One point denotes occurrence of one more events

Measures of central tendency When a series of observations have been tabulated in the form of frequency distribution →→ it is felt necessary to convert a series of observation in a single value , that describes the characteristics of that distribution,→ called Measure Of Central Tendency All data or values are clustered round it These values enable comparisons to be made between one series of observations and another Individual values may overlap, two distributions have different central tendency E.g., average incubation period of measles is 10 days and that of chicken pox is 15 days.

Types : Central tendency Measures of Central tendency Mean Mode Median Arithmetic Geometric Harmonic Mean(AM) Mean(GM) Mean(HM)

Arithmetic mean: Sum of all observations divided by number of observations Mean (x)= S x /n; x is a variable taking different observational values & n= no. of observations Exmp . ESR of 7 subjects are 8,7,9,10,7,7, & 6 mm for 1 st hr. Calculate mean ESR. - Mean (x)= (8+7+9+10+7+7+6)/7=54/7=7.7 mm

Median : when observations are arranged in ascending or descending order of magnitude, the middle most value is known as Median. Problem: From same example of ESR, observations are arranged first in ascending order: 6,7,7,7,8,9,10. Median= {7+1}/2=8/2=4 th observation I,e ., 7 When n is Odd no., Median={n+1 } 2 th observation When n is Even no., Median={n/2th + (n/2+1) th }/2 th observation Problem: suppose, there are 8 observations of ESR like 5,6,7,7,7,8,9,10 Median={8/2th +(8/2+1) th }/2={4 th +5 th obs }/2=(7+7)/2=7

Mode: The observation, which occurs most frquently in series Problem: ESR of 7 subjects are 8,7,9,10,7,7, & 6 mm for 1 st hr. Calculate the Mode. - Mode is 7.

Calculation of weighted arithmetic mean: Following methods are utilized in case of large no. of observations For Ungrouped Data: Suppose we have x ₁, x₂, x₃,…nth observations with corresponding frequencies f₁, f₂,f ₃,… f n Mean=  

For grouped Date: Data are arrange in groups & frequency distribution table are prepared Mean value of each group is multiplied by frequency Sum of product value is divided by total no of observations Mean such obtained is called “ weighted mean” Mean (x) =  

Geometric mean: Used when data contain a few extremely large or small values It’s the nth root product of n observastions GM=ⁿ√( x ₁.x₂.x ₃…. x n ) Harmonic Mean: Reciprocal of the arithmetic mean of reciprocals of observations arithmetic mean of reciprocals of observations= S (⅟x) HM=n/ S ⅟x got limited use A.M>GM>HM

Measures of dispersion Measures of central tendency do not provide information about spread or scatter values around them Measures of dispersion helps us to find how individual observations are dispersed or scattered around the mean of a large series of data Different measures of Dispersion are: Range Mean deviation Standard deviation Variance Coefficient of variation

Range: Difference between highest & lowest value Defines normal value of a biological characteristic Problem: Systolic blood pressure (mm of Hg) of 10 medical students as follows: 140/70, 120/88, 160/90, 140/80, 110/70, 90/60, 124/64, 100/62, 110/70 & 154/90 Range of Systolic BP of medical students = highest value- lowest value=160-90=70mm of Hg Range of Diastolic BP= 90-60=30 mm of Hg

Mean deviation: Average deviations of observations from mean value Mean Deviation( S) =( x-x)/n, where x=observation, x=Mean

Standard Deviation: Most frequently used measures of dispersion Square root of the arithmetic mean of the square of deviations taken from the arithmetic mean. In simple term “ Root-Mean-Square-Deviation” s) Where x= observation X=Mean n=no. of observations  

To estimate variability in population from values of a sample, degree of freedom is used in placed of no. of observations Standard deviation is calculated by following stages: Calculate the mean Calculate the difference between each observation & mean Square the difference Sum the squared values Divide the sum of squares by the no. of observations(n) to get mean square deviation or variances( s) Find the square root of variance to get “ Root-Mean-Square-Deviation ” Use: sample size calculation of any study - Summarizes deviation of a large series of observation around mean in a single value

Coefficient of Variation: Used to denote the comparability of variances of two or more different sets of observations Coefficient of Variation=( Sd /Mean)X100 Coefficient of Variation indicates relative variability

Normal distribution Most important useful distribution in theoretical statistics Quantitative data can be represented by a histogram & by joining midpoints of each rectangle in the histogram we can get a frequency polygon when no. of observations become very large & class intervals get very much reduced→ frequency polygon loses its angulation →gives rise to a smooth curve known as frequency curve, Most biological variables , e.g., height, weight, blood cholesterol etc , follows normal distribution can be graphically represented by “normal curve”

If a large no. of observations of any variables such as height, weight, blood pressure, pulse rate etc. are taken at random to make a representative sample of the w orld and if a frequency distribution table is made, it will show following characteristics: Exactly half the observations will lie above & half below the mean and all observations are symmetrically distributed on either side of mean Maximum no. of frequencies will be seen in the middle around the mean and fewer at extremities, decreasing smoothly on both sides

Mathematically can be expressed as following: - -Mean Mean Mean A distribution of this nature or shape is called Normal distribution or Gaussian Distribution This distribution pattern is usual for biological variables  

Normal Curve: Observations of a variable, which are normally distributed in a population, when plotted as a frequency curve will give rise to Normal Curve Characteristics of a Normal Curve: Smooth Bell shaped Bilaterally symmetrical Mean, Median, Mode coincide Distribution of observation under normal curve follows the same pattern of normal distribution as already mentioned

Standard Normal Curve : Each observation under a normal curve has a ‘Z’ value ‘Z’ or standard normal variate or relative deviate or critical ratio is the measure of distance of the observation from mean in terms of standard deviation If ‘Z’ score is -2→ observation is 2 S.D. away from mean on left hand side; if it is +2, I implies the observation is 2 S.D. away on right hand side. If all observations of normal curves are replaced by ‘Z’ score, virtually all curves become identical This standardized curve with ‘0’ mean and 1 variance is known as “standard Normal Curve ”  

It has got all properties of Normal Curvwe follows normal distribution with ‘0’ mean & 1 variance Area under the curve is 1 Mean, Median, & Mode coincide & they are 0 Standard deviation is 1  

Sampling technique Universe/population: Aggregate of units of observation about which certain information is required Population is a set of persons (or objects) having a common observable characteristics E.g., while recording pulse rate of boys in a school, all boys in the school constitute the population/universe Sample: A portion or part of total population selected in some manner Sapling Frame: A complete, non-overlapping list of all the sampling units (persons or objects) of the population from which the sample is to be drawn E.g., telephone directory acts as a frame for conducting opinion survey in a city

Statistic: A characteristic of a sample, whereas a parameter a character of a population Types of sampling: non-probability & probability/random sampling Non-probability sampling: Easier, less expensive o perform Sampling is done by choice & not by chance Information collected cannot be presumed to be representative of the whole universe E.g , Quota Sampling, convenience sampling, Purposive sampling, Snowball Sampling, Case Study

Probability/Random Sampling: Sample are selected from universe by proper sampling technique Each member of the universe has equal opportunity to get selected Composition of sample from universe occurs only by chance Types: Simple Random Sampling:

Stratified Random Sampling: Systemic Random Sampling: Cluster Sampling: Multistage sampling: Multiphase Sampling :

Thank You

Exercise no. 1 Following are the diastolic blood pressure values (in mmHg) of 10 male adults. 80, 60, 70, 80,65, 74, 66, 80, 70, 55 Solution: Mode= 80 Arranging in ascending order: 55,60,65,66,70,70,74,80,80,80 Median={10/2 th +(10/2+1) th }/2={5 th + 6 th }/2={70+70}/2=70 Mean=700/10=70

Exercise No. 5. The following table shows the number of children per family in a village Calculate the measure of central tendency: No of children per family No of families 30 1 40 2 70 3 30 4 20 5 10

Solution: Table 1.1 showing number of children in families Average (x)no. of children=400/200=2 No. of children in a family(x) No. of families(f) Total no. of children( fx ) 30 0x30=0 1 40 1x40=40 2 70 2x70=140 3 30 3x30=90 4 20 4x20=80 5 10 5x10=50 Total 200 400

Exercise no. 8 Marks obtained by 50 students in community medicine in final MBBS Part-I Exam as follows: Calculate central tendency. Marks No. of students 41-50 5 51-60 18 61-70 15 71-80 7 81-90 5

Solution: Average marks obtained by students=3165/50=63.3 Marks obtained No. of students(f) Mid value of marks group(x) of students Total marks obtained by each group( fx ) 41-50 5 45.5 227.5 51-60 18 55.5 999 61-70 15 65.5 982.5 71-80 7 75.5 528.5 81-90 5 85.5 427.5 Total 50 3165

Calculation of Median: N/2=3165/2=1582.5 Median class=60.5-70.5 Median=L+{(N/2 – cf ) xh }/f where: L = lower boundary of the median class h= class width N = total frequency cf = cumulative frequency of the class previous to the median class f = frequency in the median class Class boundary frequency Cumulative frequency 40.5-50.5 227.5 227.5 <N/2 50.5-60.5 999 Cf =1226.5 <N/2 60.5-70.5 f=982.5 2209 >N/2 70.5-80.5 528.5 2737.5 80.5-90.5 427.5 3165 Total 3165

Median= 60.5+ (1582.5 - 1226.5)x10/982.5 = 60.5 + 3560/982.5 = 60.5 + 3.62 = 64.12 *Modal class: the class having maximum frequency Class boundary frequency 40.5-50.5 f 1 =227.5 50.5-60.5 f m =999 Modal Class 60.5-70.5 f 2 =982.5 70.5-80.5 528.5 80.5-90.5 427.5 Total 3165

Mode=L + (f m –f 1 )/(2f m - f 1 – f 2 )x h Where, L= lower boundary of modal class f m =Frequency of modal class f 1 = frequency of pre-modal class f 2 = Frequency of post-modal class h= width of modal class Median= 60.5 +(999 –227.5 )/(2x 999- 227.5- 982.5 )x10 =60.5 -771.5/(1998-1210)x10 =60.5 – 771.5/788x10 =60.5 – 9.79 =50.71

Exercise no. 11 Calculate measures of dispersion from following data: 15,17,19,25,30,35,48 Solution: Range=48- 15= 33 Mean deviation= Σ (x- x)/n Observation(x) Mean(x) (x-x) 15 X= Σ x/n=189/7=27 -12 17 -10 19 -8 25 -2 30 3 35 8 48 11 Σ x=189 Σ (x-x)=54, ignoring- or + signs

x Standard deviation: SD=√(506/10)=√50.6= Observation(x) Mean(x) Deviation (x-x) (x-x) 2 15 X= Σ x/n=189/7=27 -12 144 17 -10 100 19 -8 64 25 -2 4 30 3 9 35 8 64 48 11 121 Σ x=189 Σ (x-x)=54, Σ (x-x)=506

Coefficient of variation=(SD/Mean)x 100 =√50.6/27 x 100 =

Exercise no. 20 In the following data A & B are given below: Calculate mean deviation & standard deviation. A-item B-frequency 10-20 4 20-30 8 30-40 8 40-50 16 50-60 12 60-70 6 70-80 4

Solution: a=assumed mean SD=√{(sumfd 1 ) 2 – (sum fd 1 )/N} 2 /√(N-1) x h x= sumfd 1 x h + a Data A - Class interval Data B- frequency (f) Mid value (x) d 1 =(x-a)/h fd 1 fd 1 2 10-20 4 15 (15-35)/10=-2 -8 64 20-30 8 25 -1 -8 64 30-40 8 a=35 40-50 16 45 1 16 256 50-60 12 55 2 24 576 60-70 6 65 3 18 324 total 54 Σ fd 1 =74 Σ fd 1 2 =1284

SD=√{1284- 74/54}/√(54-1) x 10 = √{1284- 1.37}/√53 x 10 = √( 1282.63/53) x 10 = √24.2 x 10
Tags