BIOSTATISTI CS.pptx

GaneshPavanKumarKarr 0 views 88 slides Oct 15, 2025
Slide 1
Slide 1 of 88
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88

About This Presentation

dfvgbn


Slide Content

Good morning

BIOSTATISTICS PRESENTED BY R.PRIYA DARSHINI 1 ST YEAR M.D.S DEPARTMENT OF PROSTHODONTICS

CONTENTS Introduction Terminology Data Source of data Collection of data Sampling and sampling design Sample size Errors in sampling Presentation of data Measures of central tendency

CONTENTS Measures of dispersion Normal distribution curve Probability or p value Tests of significance Parametric Non – parametric test One tailed Two tailed Conclusion References

INTRODUCTION Branch of statistics applied to biological or medical sciences Statistics Italian word statista – statesman German word statistik – political state Science of statistics originated from Government records Mathematics Father of health statistics – JOHN GRAUNT(1620-1674)

Biostatistics Bio – part involves biology Statistics – involves accumulation, tracking, analysis, and application of data Biostatistics is the method of collection, organizing, analyzing, tabulating and interpretation of datas related to living organisms and human beings

TERMINOLOGY Constant Quantities that do not vary e.g. in biostatistics, mean, standard deviation are considered constant for a population Variable A name denoting a condition, occurrence, or effect that can assume different values is a variable.

TERMINOLOGY Population Population includes all persons, events and objects under study. it may be finite or infinite. Sample Defined as a part of a population generally selected so as to be representative of the population whose variables are under study Individual entities that form focus of the study are termed sampling units. List of sampling units is known as sampling list or sampling frame.

TERMINOLOGY Parameter It is a constant that describes a population e.g. in a college there are 80% girls or average age of dental patients in2010. This describes the population, hence it is a parameter. Statistic Statistic is a constant that describes the sample e.g. out of 100 students of the same college 80% girls. This 80% will be statistic as it describes the sample

TERMINOLOGY Attribute A characteristic based on which the population can be described into categories or class e.g. gender, caste, religion.

DATA A collective recording of observations either numerical or otherwise is called Data Observations can be collected by recording No of cases who have removable prosthesis Or by the age or gender of the patient In any of the cases a certain observation is made of a characteristic which varies from person to person – this is called a Variable

TYPES OF DATA   Data is of two types Qualitative data Quantitative data 

QUALITATIVE DATA Data collected on basis of attributes or quality like sex, occlusion, cavity etc. In such data there is no notion of magnitude or size of an attribute as the same cannot be measured. The number of person having the same attribute are variable and are measured e.g. like out of 100 people 75 have class I occlusion, 15 have class II occlusion and 10 have class III occlusion. Class I II III are attributes , which cannot be measured in figures, only no of people having it can be determined

QUANTITATIVE DATA Data is collected through measurement using calipers, like arch length, width, flouride concentration in water supply etc. In this the attribute has a magnitude, both the attribute and the number of persons having the attribute vary. It may be Continous - variable can take any value in a given range, decimal or fractional. Discrete – variable under observation takes fixed values like whole numbers. (DMF teeth) E.g Freeway space. It varies for every patient. It is a quantity with a different value for each individual and is measurable. It is continuous as it can take any value between 2 and 4 like it can be 2.10 or 2.55 or 3.07 etc.

Source of data The main sources for collection of data Experiments Surveys Records

SOURCE OF DATA Experiments Experiments are performed to collect data for investigations and research by one or more workers. Records Records are maintained as a routine in registers and books over a long period of time provides readymade data.

Surveys Carried out for Epidemiological studies in the field by trained teams to find incidence or prevalence of health or disease in a community. These epidemiological studies can be either Descriptive or Analytical

Descriptive epidemiological study Descriptive epidemiological studies uses Cross – sectional study Longitudinal study for measuring of a disease in terms of magnitude Incidence of a disease is obtained from longitudinal study Prevalence of a disease is obtained from cross sectional study

Cross – sectional study Measurement of exposure and effect are made at the same time. So, we get relationship between a disease and other variables of interest as they exist at one point of time E.g., In cross- sectional study of oral cancer we can also collect data during their survey on age, sex, occupation, habits and tobacco usage

Longitudinal study Study conducted over long period of time, it is longitudinal study These are done on samples drawn from population and observations made at periodic intervals Longitudinal studies are useful For studying the natural history of disease and its outcome For identifying risk factors associated with disease For calculating the incidence rate of disease.

Analytical epidemiological study Cause of disease, referred to as event, condition or characteristic or combination of these factors play an important role in production of disease. Before ensuring a factor as a cause, several observations have to be made on the so called ‘exposure’ These observations comprise the procedure of analytical epidemiology

Cohort study Approach of beginning with exposure and searching for effects in a prospective manner in time is referred to as Cohort study Is an observational study which attempts to study relationship between purported cause and subsequent risk of developing disease. Distinguishing features of cohort study Group of persons to be studied are defined in terms of characteristics manifest prior to appearance of disease under investigation Study groups are observed over a period of time to determine frequency of disease among them

Case – control study Beginning with disease and searching for causes in the past is referred to as case – control study. This is done along with another group of individuals who have not fallen prey of the condition, called the control Case – control studies are primarily used to assess risks and to study causes of diseases.

Data collected through Primary source Secondary source

SAMPLING Selection of sampling Purposive sample Random sample Advantages ( acc to R.A.Fisher ) Adaptability Speed Economy Enhanced scientific approach

SAMPLING DESIGN Simple random sampling Systemic random sampling Stratified random sampling Cluster sampling Multiphase sampling Pathfinder surveys

SIMPLE RANDOM SAMPLING Eash and every unit in population have equal chance of being included in sample. Selection of unit is determined by chance. To ensure randomness any of the methods can be chosen Lottery method Table of random numbers

SYSTEMIC RANDOM SAMPLING Selecting one unit at random and then selecting additional units at evenly spaced interval till sample of required size has been formed This method is used when complete list of population is available

STRATIFIED RANDOM SAMPLING Population to be sampled is subdivided into groups known as strata, such that each group is homogeneous in characteristic A simple random sample is chosen from each stratum. This type of sampling is heterogeneous with regard to the characteristic under study. E.g., to know the prevalence of DMF teeth in different age groups, then age groups form the strata and the random sample should be chosen from each stratum i.e., the age group

CLUSTER SAMPLING This method used when population forms natural groups or clusters, such as villages, wards, school children etc. First a sample of clusters is selected and then all units in each of the selected clusters are surveyed. This method simpler and involves less time and cost, but gives higher standard error.

MULTIPHASE SAMPLING Part of information is collected from whole sample and part from sub-sample. E.g., all children in school are surveyed and those with only oral health problem are selected in second phase. In 3 rd phase those who need the treatment are only selected. Thus by 3 rd and 4 th phases sub samples become smaller and smaller

PATHFINDER SURVEY Pathfinder surveys can be either pilot or national, depending on the number and type of sampling sites and the age groups included.  A national pathfinder survey  incorporates sufficient examination sites to cover all important subgroups of the population that may have differing disease levels or treatment needs, and at least three of the age groups or index ages This type of survey design is suitable for collection of data for planning and monitoring of services in all countries whatever the level of disease, availability of resources, or complexity of services.

In a large country with many geographic and population subdivisions and a complex service structure, a larger number of sampling sites is needed. The basic principle of using index ages and standard samples in each sites within a stratified approach, however, remains valid. 

SAMPLE SIZE ??? Bigger the sample, higher the precision of estimates of sample. E.g., if field survey is conducted to estimate the prevalence rate of a disease, the sample size is calculated by the formula n = z 2 α * p* (1-p) / L 2 n – sample size p – app. prevalence rate of disease L – permissible error in estimation of p z α – normal value for probability level.

SAMPLING ERRORS 2 types of errors Sampling error – due to sampling process and could arise b’cos of Faulty sampling design Small size of sample Non – sampling error Coverage error – due to non-response or non-cooperation of informant Observational error – due to interviews bias or imperfect experimental technique or interaction of both Processing error – due to errors in statistical analysis

Data presentation  Statistical data once collected should be systematically arranged and presented To arouse interest of readers For data reduction To bring out important points clearly and strikingly For easy grasp and meaningful conclusions To facilitate further analysis To facilitate communication

Two main types of data presentation are Tabulation Graphic representation with charts and diagrams

TABULATION It is the most common method Data presentation is in the form of columns and rows It can be of the following types Simple tables Frequency distribution tables

SIMPLE TABLE MONTH NO. OF PATIENTS AT VDC, BVRM JANUARY 2800 FEBRUARY 3000 MARCH 2500

Frequency distribution table   In a frequency distribution table, the data is first split into convenient groups ( class interval ) and the number of items ( frequency ) which occurs in each group is shown in adjacent column. NO. OF CAVITIES NO. OF PATIENTS 0-3 100 3-6 67 6-9 32 9 & ABOVE 20

GRAPHIC REPRESENTATION Charts and diagrams Useful method of presenting statistical data Powerful impact on imagination of the people

TYPES OF CHARTS AND DIAGRAMS Bar chart Histogram Frequency polygon Line diagram Pie diagram Spot map or map diagram or cartograms

BAR CHART Length of bars drawn vertical or horizontal is proportional to frequency of variable. Used to represent qualitative data Suitable scale is chosen Bars usually equally spaced They are of three types  simple bar Multiple bar Component bar

SIMPLE BAR CHART Represents only one variable NO OF CD PATIENTS

MULTIPLE BAR CHART two or more variables are grouped together

COMPONENT BAR CHART bars are divided into two parts each part representing certain item and proportional to magnitude of that item

HISTOGRAM Pictorial presentation of frequency distribution Used to depict quantitative data of continuous type Represents frequency distribution. consists of series of rectangles class interval given on vertical axis Area of rectangle is proportional to the frequency

FREQUENCY POLYGON Represents frequency distribution of quantitative data. obtained by joining midpoints of histogram blocks at the height of frequency by straight lines usually forming a polygon

LINE DIAGRAM line diagram are used to show the trends of events with the passage of time

PIE CHART In this frequencies of the group are shown as segment of circle Used to represent qualitative data Degree of angle denotes the frequency Angle is calculated by No. of observations in specific group X 360 / total observations in all groups

CARTOGRAMS Spot map or map diagram These maps are prepared to show geographic distribution of frequencies of characteristics

MEASURES OF STATISTICAL AVERAGES OR CENTRAL TENDENCY Average value in a distribution is the one central value around which all the other observations are concentrated Average value helps to find most characteristic value of a set of measurements to find which group is better off by comparing the average of one group with that of the other The most commonly used averages are Mean Median Mode

Objectives of central tendency To condense entire mass of data To facilitate comparison A good measure of central tendency Should be easy to understand and compute Should be based on each and every item in series Should not be affected by extreme variations Should be capable of further statistical computations Should have sampling stability

MEAN Refers to arithmetic mean It is the summation of all the observations divided by the total number of observations (n) Denoted by X for sample and µ for population X = x1 + X2 + X3 …. Xn / n Advantages – it is easy to calculate Disadvantages – influenced by extreme values

MEDIAN When all the observation are arranged either in ascending order or descending order, the middle observation is known as median In case of even number the average of the two middle values is taken Median is better indicator of central value as it is not affected by the extreme values

MODE Most frequently occurring observation in a data is called mode Not often used in medical statistics. Example Number of decayed teeth in 10 children 2,2,4,1,3,0,10,2,3, Mode = 2 ( 3 Times)

VARIABILITY Types of variability There are three types of variability Biological variability Real variability Experimental variability

BIOLOGICAL VARIABILITY It is the natural difference which occurs in individuals due to age, gender and other attributes which are inherent This difference is small and occurs by chance and is within certain accepted biological limits e.g. vertical dimension may vary from patient to patient

REAL VARIABILITY Such variability is more than the normal biological limits The cause of difference is not inherent or natural and is due to some external factors e.g. difference in incidence of cancer among smokers and non smokers may be due to excessive smoking and not due to chance only

EXPERIMENTAL VARIABILITY it occurs due to the experimental study they are of three types Observer error the investigator may alter some information or not record the measurement correctly Instrumental error this is due to defects in the measuring instrument both the observer and the instrument error are called non sampling error Sampling error or errors of bias this is the error which occurs when the samples are not chosen at random from population. Thus the sample does not truly represent the population

Measures of variation or dispersion Biological data collected by measurement shows variation Dispersion – degree of spread or variation of variable about central value. e.g. BP of an individual can show variation even if taken by standardized method and measured by the same person. Thus one should know what is the normal variation and how to measure it.

Mainly used To determine reliability of an average To serve as basis for control of variability To compare two or more series in relation to their variability Facilitate further statistical analysis

The various measures of variation or dispersion are Range Mean or average deviation Standard deviation Co-efficient of variation

RANGE It is the simplest Defined as the difference between the highest and the lowest figures in a sample Defines the normal limits of a biological characteristic e.g. freeway space ranges between 2-4 mm Not satisfactory as based on two extreme values only

MEAN DEVIATION It is the summation of difference or deviations from the mean in any distribution ignoring the + or – sign Denoted by MD MD = ∑ ( X – x ) / n X = observation x = mean n = no of observation

STANDARD DEVIATION Also called root mean square deviation It is an Improvement over mean deviation used most commonly in statistical analysis Denoted by SD or s for sample and σ for a population Denoted by the formula SD = √ ∑ ( x – x )2 / n or n-1 Greater the standard deviation, greater will be the magnitude of dispersion from mean Small standard deviation means a high degree of uniformity of the observations Usually measurement beyond the range of ± 2 SD are considered rare or unusual in any distribution

Uses of Standard Deviation It summarizes the deviation of a large distribution from it’s mean. It helps in finding the suitable size of sample e.g. greater deviation indicates the need for larger sample to draw meaningful conclusions It helps in calculation of standard error which helps us to determine whether the difference between two samples is by chance or real

COEFFICIENT OF VARIATION It is used to compare attributes having two different units of measurement e.g. height and weight Denoted by CV CV = SD X 100 / Mean and is expressed as percentage Higher CV, greater is the variation in series of data.

Normal distribution or normal curve So much of physiologic variation occurs in any observation Necessary to Define normal limits Determine the chances of an observation being normal To determine the proportion of observation that lie within a given range Normal distribution or normal curve used most commonly in statistics helps us to find these Large number of observations with a narrow class interval gives a frequency curve called the normal curve

It has the following characteristics Bell shaped Bilaterally symmetrical Frequency increases from one side reaches its highest and decreases exactly the way it had increased The highest point denotes mean, median and mode which coincide Maximum no observations is at value of variable corresponding to mean and the no of observations gradually decreases on either side with few observations at the extreme points

Area under curve between any 2 points which correspond to no of observations between any two values of variate can be found in terms of a relationship between the mean and the standard deviation as Mean +/ - 1 SD includes 68.27% of all observations . such observations are fairly common Mean +/ - 2 SD includes 95.45% of all observations i.e. by convention values beyond this range are uncommon or rare. There chances of being normal is 100 – 95.45% i.e. only 4.55.%. Mean +/ - 3 SD includes 99.73%. such values are very rare. There chance of being normal is 0.27% only This relationship is used for fixing confidence intervals

These limits on either side of measurement are called confidence limits the look of frequency distribution curve may vary depending on mean and SD . thus it becomes necessary to standardize it. Eg - One study has SD as 3 and other has SD as 2,thus it becomes difficult to compare them Thus normal curve is standardized by using the unit of standard deviation to place any measurement with reference to mean. The curve that emerges through this procedure is called standard normal curve

Relative or standard normal deviation When variable X follows a normal distribution with mean ¯x and standard deviation S, then relative or standard normal or deviate Z is given by Z = x – x¯ / S or Z = Observation – Mean / SD Values of Z for several values of X form normal distribution with mean 0 and SD 1

Probability or p value Probability is the chance of occurrence of any event or permutation combination. It is denoted by p for sample and P for population In various tests of significance we are often interested to know whether the observed difference between 2 samples is by chance or due to sampling variation. There probability or p value is used

Probability P ranges from 0 to 1 0 = there is no chance that the observed difference could not be due to sampling variation 1 = it is absolutely certain that observed difference between 2 samples is due to sampling variation However such extreme values are rare. P = 0.4 i.e. chances that the difference is due to sampling variation is 4 in 10 Chances that it is not due to sampling variation will be 6 in 10

Probability The essence of any test of significance is to find out p value and draw inference If p value is 0.05 or more it is customary to accept that difference is due to chance (sampling variation) . The observed difference is said to be statistically not significant. If p value is less than 0.05 observed difference is not due chance but due to role of some external factors. The observed difference here is said to be statistically significant.

From shape of normal curve We know that 95% observation lie within mean ± 2SD . Thus probability of value more or less than this range is 5% From probability tables p value is also determined by probability tables in case of student t test or chi square test By area under normal curve Here (z) standard normal deviate is calculated Corresponding to z values the area under the curve is determined (A) Probability is given by 2(0.5 - A)

References Soben Peter; essentials of preventive and community dentistry, second edition. G.N.prabhakara ; biostatistics T . Bhaskara rao ; methods of biostatistics

Tests of significance Classified as Parametric tests Non – parametric tests Can also be divided into One tailed Two tailes

Parametric tests Parametric tests are those tests in which certain assumptions are made about the population Population from which sample is drawn has normal distribution The variances of sample do not differ significantly The observations found are truly numerical thus arithmetic procedure such as addition, division, and multiplication can be used Since these test make assumptions about the population parameters hence they are called parameteric tests . These are usually used to test the difference They are: Student t test( paired or unpaired) ANOVA Test of significance between two means

Non – parametric tests In many biological investigation the research worker may not know the nature of distribution or other required values of the population. Also some biological measurements may not be true numerical values hence arithmetic procedures are not possible in such cases. In such cases distribution free or non parametric tests are used in which no assumption are made about the population parameters e.g. Mann Whitney test Chi square test Pi coefficient test Fischer’s Exact test Sign Test Freidmans Test

Two tailed test This test determines if there is a difference between the two groups without specifying whether difference is higher or lower It includes both ends or tails of the normal distribution Such test is called Two tailed test If the objective is to conclude that 2 samples are from same population or not, without considering the direction of difference between means, then two tailed test is used, Eg ., when one wants to know if mean IQ in malnourished children is different from well nourished children but does not specify if it is more or less

One tailed test In the test of significance when one wants to specifically know if the difference between the two groups is higher or lower . i.e., the direction plus or minus side is specified. Then one tail of the distribution is excluded If the objective is to conclude that the mean of one of the sample is larger than the other or not, one tailed test is used E.g., if one wants to know if mal nourished children have less mean IQ than well nourished then higher side of the distribution will be excluded Such test of significance is called one tailed test
Tags