Biostatics

116,459 views 76 slides Apr 18, 2017
Slide 1
Slide 1 of 76
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76

About This Presentation

General statistics, emphasis of statistics with regards to healthcare, types of stats, methods of sampling, errors in sampling, different types of tests, measures of dispersion, correlation, types of correlation


Slide Content

BIOSTATICS PRESENTED BY : PRABLEEN ARORA MDS STUDENT

STATISTICS- is a science of compiling, classifying, and tabulating numerical data and expressing the results in a mathematical and graphical form. BIOSTATISTICS- is that branch of statistics concerned with the mathematical facts and data related to biological events.

Constant Quantities that do not vary e.g. in biostatistics, mean, standard deviation are considered constant for a population Variable Characteristics which takes different values for different person, place or thing such as height, weight, blood pressure

Parameter It is a constant that describes a population e.g. in a college there are 40% girls. This describes the population, hence it is a parameter. Statistic Statistic is a constant that describes the sample e.g. out of 200 students of the same college 45% girls. This 45% will be statistic as it describes the sample Attribute A characteristic based on which the population can be described into categories or class e.g. gender, caste, religion

HISTORY The science of statistics is said to have originated from two main sources: 1 . G overnment records 2. Mathematics It developed from registration of heads of families in ancient E gypt to the R oman census on military strength , birth and deaths etc and found its application gradually in the field of health and medicine.

John G raunt who is neither a physician nor a mathematician is the FATHER OF HEALTH STATISTICS.

WHAT IS STATISTICS ?? The following essential features of statistics are evident from various definitions of statistics: principles and methods for the collection of presentation, analysis and interpretation of numerical data of different kinds. 1. Observational data, qualitative data. 2. Data that has been obtained by a repetitive operation. 3. Data affected to a marked degree of a multiplicity of causes. b) The science and art of dealing with variation in such a way as to obtain reliable results.

c) Controlled objective methods whereby group trends are abstracted from observations on many separate individuals. d) The science of experimentation which may be regarded as mathematics applied to observational data.

WHY STATISTICS ?? Variabilty in measurement can be handled using statistics. Eg: investigator makes observations according to his judgement of the situation. (Depending upon his skills, knowledge, experience.) Epidemiology and Biostatistics are sister sciences or disciplines. Epidemiology collects facts relating to group of population in places, times and situation. Biostatistics converts all the facts into figures and at the end translates them into facts, interpreting the significance of their results.

Epidemiology and biostatistics both deal with the facts-figures-facts QUANITATIVE METHADOLOGY

USES OF BIOSTATISTICS To test whether the difference between two populations is real or by chance occurrence. To study the correlation between attributes in the same population. To evaluate the efficacy of vaccines. To measure mortality and morbidity. To evaluate the achievements of public health programs To fix priorities in public health programs To help promote health legislation and create administrative standards for oral health.

COLLECTION OF DATA The collective recording of observations either numerical or otherwise is called data. Demographic data comprises details of population size, disrtibution , geographic distribution , ethnic group , socio-economic factors and their trends over time. It is obtained from census and other public service reports.

Depending upon the nature of the variable, data is classified into: Qualitative data - attributes or qualities. a) discrete b) continuous 2. Quantitative data - through measurements using calipers.

Sources of statistical data Data can be collected EXPERIMENTS SURVEYS RECORDS Performed to collect data for investigations and research by one or more workers. Carried out for Epidemiological studies in the field by trained teams to find incidence or prevalence of health or disease in a community. Records are maintained as a routine in registers and books over a long period of time provide readymade data. PRIMARY SECONDARY Data obtained by the investigator himself. Data has already recorded. Eg: hospital records

Primary data can be obtained using any one of the following methods: Direct personal interviews Oral health examination Questionnaire method Face-to-face contact with the person. Subjective phenomena. Accurate and any ambiguity can be clarified. Cannot be used in extensive studies. When information is needed on health status. Cannot be used in extensive studies. Includes treatment List of Questions pertaining to the survey “questionnaire” is prepared. Various informants are requested to supply the information.

Sampling and sample design Population: - group of all individuals who are the focus of the investigation is known as population. Cencus enumeration: - if the information is obtained from each and every individual in the population. Sample means the group of individuals who actually available for investigation. Sampling units: the individual entities that form the focus of the study. Sampling frame/list: list of sampling units

Sample selection Purposive selection Representing the population as a whole. Great temptation to deliberately or purposively select the individual who seen to represent the population under study. Easy to carry out. Does not need the preperation of sampling frame. Random selection Sample of units is selected in such a way that all the characteristics of the population is reflected in the sample. Random indicates the chance of the population unit being selected in the sampe .

Sampling Design BASED UPON TYPE AND NATURE OF THE POPULATION AND THE OBJECTIVES OF THE INVESTIGATION . Sample random sampling Systematic random sampling Stratified random sampling Clusture sampling Multiphase sampling pathfinder survey

Sample random sampling Each and every unit in the population has an equal chance of being included in the sample. Selection of unit is by chance only. Two methods Lottery methods Population units are numbered on separate slip. Shuffled and blindfold selection. Table of random numbers Random arrangement of digits from 0-9 in rows and columns. Selection is done either in a horizontal or vertical direction

Systematic random sampling Select one unit at random and then selecting additional units at evenly spaced interval till the sample of required size has been drawn. Stratified random selection Population to be sampled is subdivided into groups (age/sex/genetic) known as Strata. ( i.e each group is homogenous in characteristics.) Then a simple randon selection is done from each stratum. More representative, provide greater accuracy and concentrate on wider geographical area.

C luster sampling The population forms natural groups or clusters such as village, wards blocks or children of a school. Sample of the clusters is selected and then all the units in each of the selected cluster is surveyed. Simpler, less time and cost. High standard of errors.

Multiphase sampling Part of information is collected from the whole sample and part from the sub sample. First phase : A ll the children in school are surveyed. Second phase: Only the ones with oral health problems. Third phase: section that needs treatment are selected. Sub-samples further becomes smaller and smaller. Adapted when the interest is in any specific disease.

Multistage sampling First stage is to select the groups or clusters. Then subsamples are taken in as many subsequent stages as necessary to obtain the desired sample.

Errors in sampling Sampling errors Faulty sample design Small sample sie Non-Sampling errors Coverage errors - due to non- response or non cooperation of the informant. Observational errors: interview bias, imperfect experimental technique. Processing errors: statistical analysis

Data presentation Two main types of data presentation are: Tabulation Graphic representation - charts and diagrams Tables are simple device used for the presentation of statistical data. PRINCIPLES: Tables should be as simple as possible.(2-3 small tables). Data should be presented according to size or importance, chronologically or alphabetically. Should be self explanatory. Each row and column should be labelled concisely and clearly. Tabulation

Specific unit of measure for the data should be given. Title should be clear, concise and to the point. Total should be shown. Every table should contain a title as to what is depiceted in the table. In small table, vertical lines seperating the column may not be necessary. If the data are not orignal , their source should be given in a footnote.

TYPES OF TABLES MASTER TABLE Contains all the data obtained from a survey SIMPLE TABLE One way tables which supply the answer to questions about one characteristic of data only. FREQUENCY DISTRIBUTION TABLE Two column frequent table. First column list the classes into which the data are grouped. Second column lists the frequency for each classification

Most convincing and appealing ways of depicting statistical results. Principles Every diagram must be given a title that is self explanatory. Simple and consistent with the data. The values of the variable are presented on the horizontal or X-axis and frequency on the vertical line Y-axis. Number of lines drawn in any graph should not be many. Scale of presentation for X-axis and Y- axis should be mentioned. The scale of division of both the axes should be proportional and the divisions should be marked along the details of the variable and frequencies presented on the axes. Charts and diagrams

Represents qualitative data. Bars can be either vertical or horizontal. Suitable scale is chosen Bars are usually equally spaced They are of three types: simple bar chart- represents only one variable. multiple bar chart- each category of a variable there are set of bars. component /proportional bar chart- individual bar is divided into 2 or more parts Bar chart

Pie chart Entire graph looks like a pie. It is divided into different sectors corresponding to the frequencies.

Line diagram Useful to study changes of values in the variable over time and is the simplest type of diagram. Time such as hours, days , weeks , months or years

Pictorial presentation of frequency distribution No space between the cells on a histogram. class interval given on vertical axis area of rectangle is proportional to the frequency Histogram

 Obtained by joining midpoints of histogram blocks at the height of frequency by straight lines usually forming a polygon. Frequency polygon

when number of observations is very large and class interval is reduced the frequency polygon losses its angulations becoming a smooth curve known as frequency curve Frequency curve

Pictogram Popular method of presenting data to the common man through small pictures or symbols. Spot map/shaded map/Cartogram These maps are prepared to show geographic distribution of frequencies of characteristics

Measures of statistical averages or central tendency central value around which all the other observations are distributed. Main objective is to condense the entire mass of dat and to facilitate the comparison. the most common measures of central tendency that are used in sental sciences: mean median mode

Refers to arithmetic mean It is obtained by adding the individual observations divided by the total number of observations. Advantages – it is easy to calculate. most useful of all the averages. Disadvantages – influenced by abnormal values. Mean

When all the observation are arranged either in ascending order or descending order, the middle observation is known as median. In case of even number the average of the two middle values is taken. Median is better indicator of central value as it is not affected by the extreme values. Median

Most frequently occurring observation in a data is called mode Not often used in medical statistics. EXAMPLE Number of decayed teeth in 10 children 2,2,4,1,3,0,10,2,3,8 Mean = 34 / 10 = 3.4   Median = (0,1,2,2, 2,3 ,3,4,8,10) = 2+3 /2 = 2.5 Mode = 2 ( 3 Times) Mode

There are three types of variability Biological variability Real variability Experimental variability Types of variability

Biological variability It is the natural difference which occurs in individuals due to age, gender and other attributes which are inherent This difference is small and occurs by chance and is within certain accepted biological limits e.g. vertical dimension may vary from patient to patient

Real Variability Such variability is more than the normal biological limits the cause of difference is not inherent or natural and is due to some external factors e.g. difference in incidence of cancer among smokers and non smokers may be due to excessive smoking and not due to chance only

Experimental Variability It occurs due to the experimental study they are of three types Observer error the investigator may alter some information or not record the measurement correctly Instrumental error this is due to defects in the measuring instrument both the observer and the instrument error are called non sampling error Sampling error or errors of bias this is the error which occurs when the samples are not chosen at random from population. Thus the sample does not truly represent the population.

MEASURES OF DISPERSION Dispersion is the degree of spread or variation of the variable about a central value. Helps to know how widely the observations are spread on either side of the average. Most common measures of dispersion are: RANGE MEAN DEVIATION STANDARD DEVIATION

RANGE MEAN DEVIATION STANDARD DEVIATION Defined as the difference between the value of the largest item and the smallest item. Gives no information about the values that lie between the extreme values. It is the average of the deviation from the arithematic mean. M.D= Ʃ(X-Xi) n Ʃ-sum of X- arithematic mean Xi- value of each observation in the data n- number of observation in the data Most important and widely used measure of studying dispersion. Greater the S.D , greater will be the magnitude of dispersion from the mean. Smaller S.D means a higher degree of uniformity of the observations. S.D= Ʃ(X-Xi)² n

Coefficient of variation It is used to compare attributes having two different units of measurement e.g. height and weight Denoted by CV CV = SD X 100 / Mean and is expressed as percentage

When the data is collected from a very large number of people and a frequency distribution is made with narrow class intervals, the resulting curve is smooth and symmetrical- NARROW CURVE. These limits on either side of measurement are called confidence limits . Normal distribution/normal curve/ Gaussian distribution

STANDARD NORMAL DEVIATION There may be many normal curves but only one standard normal curve. Characteristics Bell shaped Perfectly symmetrical Frequency increases from one side reaches its highest and decreases exactly the way it had increased . Total area of the curve is one, its mean is zero and standard deviation is one. The highest point denotes mean, median and mode which coincide.

Z-TEST Used to test the significance of difference in means for large samples. Criteria: Sample must be randomly selected. Data must be quantitative. The variable is assumed to follow a normal distribution in the population. Samples should be larger than 30.

When different samples are drawn from the same population, the estimates might differ - sampling variability. It deals with technique to know how far the difference between the estimates of different samples is due to sampling variation. Standard error of mean Standard error of proportion Standard error of difference between two means Standard error of difference between two proportion. Tests of significance

Standard error of mean: Gives the standard deviation of the means of several samples from the same population. Example : Let us suppose, we obtained a random sample of 25 males, age 20-24 years whose mean temperature was 98.14 deg. F with a standard deviation of 0.6. What can we say of the true mean of the universe from which the sample was drawn?

Standard Error of Proportion Standard error of proportion may be defined as a unit that measures variation which occurs by chance in the proportions of a character from sample to sample or from sample to population or vice versa in a qualitative data.

Standard Error of Difference Between two Means The standard error of difference between the two means is 7 .5. The actual difference between the two means is (370 - 318) 52, which is more than twice the standard error of difference between the two means, and therefore "significant".

Standard Error of Difference Between Proportions The standard error of difference is 6 whereas the observed difference (24.4 - 16.2) was 8.2. In other words the observed difference between the two groups is less than twice the S.E. of difference, i.e., 2 x 6. There was no strong evidence of any difference between the efficacy of the two vaccines. Therefore, the observed difference might be easily due to chance.

A null hypothesis or hypothesis of no difference (H0) asserts that there is no real difference in sample and the population in particular matter under consideration and the difference found is accidental and arised out of sampling variations. The alternative hypothesis of significant difference (H1) stated that there is a difference between the two groups compared.

A test of significance such as Z-test is performed to accept the null hypothesis H0 or to reject it and accept the alternative hypothesis H1. To make minimum error in rejection or acceptance of H0, we divide the sampling distribution or the area under the normalcurve into two regions or zone. i . A zone of acceptance ii. A zone of rejection.

The distance from the mean at which H0 is rejected is called the level of significance. It falls in the zone of rejection for H0, shaded areas under the curves and it is denoted by letter P which, indicates the probability or relative frequency of occurrence of the difference by chance. Greater the Z value, lesser will be the P.

i . Zone of acceptance: If the result of a sample falls in the plain area, i.e. within the mean ± 1.96 SE the null hypothesis is accepted, hence this area is called the zone of acceptance for null hypothesis. ii. Zone of rejection: If the result of a sample falls in the shaded area, i.e. beyond mean ± 1.96 SE it is significantly different from the universe value. Hence, the H0 of no difference is rejected and the alternate H1 is accepted. This shaded area, therefore, is called the zone of rejection for null hypothesis.

Degree of freedom: Defined as the number of independent members in the sample. EXAMPLE:- X+Y+Z/3=5 Out of 3 values, we can choose only 2 of them freely, but the choice of the third depends upon the fact that the total of the three values should be 15.

SIGNIFICANCE OF DIFFERENCE BETWEEN MEANS OF SMALL SAMPLES BY STUDENT’S t-TEST Small samples or their Z values do not follow normal distribution as the large ones do. So , the Z value based on normal distribution will not give the correct level of significance or probability of a small sample value occurring by chance. In case of small samples, t-test is applied instead of Z-test . It was designed by W.S.Gossett whose pen name was Student. Hence, this test is also called Student’s t-test .

There are two types of student t Test Unpaired t test Paired t test Criteria for applying t-test 1. Random samples 2. Quantitative data 3. Variable normally distributed 4. Sample size less than 30.

This test is applied to unpaired data of independent observations made on individuals of two different or separate groups or samples drawn from two populations, to test if the difference between the two means is real or it can be attributed to sampling variability . EXAMPLE: between means of the control and experimental groups. Unpaired t test

It is applied to paired data of dependent observation from one sample only when each individual given a pair of observations . The individual gives a pair of observation i.e. observation before and after taking a drug Paired t test

The CHI SQUARE TEST FOR QUALITATIVE DATA (X² TEST) Developed by Karl Pearson. Chi-square (x²) Test offers an alternate method of testing the significance of difference between two proportions. It has the advantage that it can also be used when more than two groups are to be compared . It is most commonly used when data are in frequencies such as in the number of responses in two or more categories.

Important applications in medical statistics as test of: 1. Proportion 2. Association 3. Goodness of fit . Test of Proportions As an alternate test to find the significance of difference in two or more than two proportions.

Test of Association The test of association between two events in binomial or multinomial samples is the most important application of the test in statistical methods . It measures the probability of association between two discrete attributes. Two events can often be studied for their association such as smoking and cancer, treatment and outcome of a disease, vaccination and immunity, nutrition and intelligence, etc.

Test of Goodness of Fit Chi-square (χ2) test is also applied as a test of “goodness of fit ”, to determine if actual numbers are similar to the expected or theoretical numbers—goodness of fit to a theory.

Analysis of Variance (ANOVA) Test N ot confined to comparing two sample means, but more than two samples drawn from corresponding normal populations . Eg. In experimental situations where several different treatments (various therapeutic approaches to a specific problem or various drug levels of a particular drug) are under comparison. It is the best way to test the equality of three or more means of more than two groups.

Requirements Data for each group are assumed to be independent and normally distributed Sampling should be at random One way ANOVA Where only one factor will effect the result between 2 groups Two way ANOVA Where we have 2 factors that affect the result or outcome Multi way ANOVA Three or more factors affect the result or outcomes between groups

CORRELATION AND REGRESSION Correlation: When dealing with measurement on 2 sets of variable in a same person, one variable may be related to the other in same way. ( i.e change in one variable may result in change in the value of other variable.) Correlation is the relationship between two sets of variable. Correlation coefficient is the magnitude or degree of relationship between 2 variables. (varies from -1 to +1).

Obtained by plotting scatter diagram ( i.e one variable on x-axis and other on y-axis). Perfect Positive Correlation In this, the two variables denoted by letter X and Y are directly proportional and fully correlated with each other. The correlation coefficent ( r) = + 1, i.e. both variables rise or fall in the same proportion. Perfect Negative Correlation V alues are inversely proportional to each other, i.e. when one rises, the other falls in the same proportion, i.e. the correlation coefficient ( r) = –1.

TYPES OF CORRELATION

Regression To know in an individual case the value of one variable , knowing the value of the other, we calculate what is known as the regression coefficient of one measurement to the other. It is customary to denote the independent variate by x and the dependent variate by y. The value of b is called the regression coefficient of y upon x. Similarly, we can obtain the regression of x upon y.

REFERENCES Essentials Of Preventive Community Dentistry – Dr.Soben Peter. Third Edition Essentials Of Preventive Community Dentistry – Dr.Soben Peter. F ourth Edition Mahajan's Methods in Biostatistics for Medical Students and Research Workers. 8 th edition. Parks textbook of preventive and social medicine. 18 th edition.

THANK YOU