GaneshPavanKumarKarr
0 views
52 slides
Oct 15, 2025
Slide 1 of 52
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
About This Presentation
cvbnm
Size: 924.65 KB
Language: en
Added: Oct 15, 2025
Slides: 52 pages
Slide Content
Good morning
BIOSTATISTICS PRESENTED BY R.PRIYA DARSHINI 1 ST YEAR M.D.S DEPARTMENT OF PROSTHODONTICS
CONTENTS Probability or p value Tests of significance Parametric Non – parametric test Conclusion References
Probability or p value Probability is the relative frequency or chance of occurrence of any event. It is denoted by p for sample and P for population In various tests of significance we are often interested to know whether the observed difference between 2 samples is by chance or due to sampling variation. There probability or p value is used
Probability P ranges from 0 to 1 0 = there is no chance that the observed difference could not be due to sampling variation 1 = it is absolutely certain that observed difference between 2 samples is due to sampling variation However such extreme values are rare. p = 0.4 i.e. chances that the difference is due to sampling variation is 4 in 10 Chances that it is not due to sampling variation will be 6 in 10 . This is denoted by q
Probability The essence of any test of significance is to find out p value and draw inference If p value is 0.05 or more it is customary to accept that difference is due to chance (sampling variation) . The observed difference is said to be statistically not significant. If p value is less than 0.05 observed difference is not due chance but due to role of some external factors. The observed difference here is said to be statistically significant.
Probability From shape of normal curve We know that 95% observation lie within mean ± 2SD . Thus probability of value more or less than this range is 5% From probability tables p value is also determined by probability tables in case of student t test or chi square test By area under normal curve Here (z) standard normal deviate is calculated Corresponding to z values the area under the curve is determined (A) Probability is given by 2(0.5 - A)
Stages in performing test of significance State the null hypothesis State the alternative hypothesis Accept or reject the null hypothesis Finally determine the p value
State the null hypothesis Null hypothesis It is a hypothesis of no difference between statistics of a sample and parameter of the population or between statistics of two samples It nullifies the claim that the experimental result is different from or better than the one observed already
One tailed test In the test of significance when one wants to specifically know if the difference between the two groups is higher or lower . i.e., the direction plus or minus side is specified. Then one tail of the distribution is excluded If the objective is to conclude that the mean of one of the sample is larger than the other or not, one tailed test is used E.g., if one wants to know if mal nourished children have less mean IQ than well nourished then higher side of the distribution will be excluded Such test of significance is called one tailed test
Two tailed test This test determines if there is a difference between the two groups without specifying whether difference is higher or lower It includes both ends or tails of the normal distribution Such test is called Two tailed test If the objective is to conclude that 2 samples are from same population or not, without considering the direction of difference between means, then two tailed test is used, Eg ., when one wants to know if mean IQ in malnourished children is different from well nourished children but does not specify if it is more or less
Sampling error Sampling errors are due to chance and concerned incorrect rejection or acceptance of null hypothesis. 2 types of sampling Type I error or alpha error Type II error or beta error
Type I error Also called an “error of first kind” Def: the rejection of null hypothesis that is actually true. Statistical testing is based predominantly on error. In our studies we use an alpha error of 0.05 (5%) as cut off for rejecting null hypothesis. Multiple comparisons and repeated testing for significance increases the likelihood of type I error.
Type II error Also called an “error of second kind” Def: it is the acceptance of a null hypothesis as true when it is actually false. Type II is used to determine the power of a study which is equal to 1 – beta. The power is the probability that the study would reject a null hypothesis as false when it is actually false. As difference between false null hypothesis and true alternative hypothesis increases, the probability of accepting null hypothesis decreases and power of study increases.
State the Alternative hypothesis If the difference between two samples is very high (i.e., there is significant difference) to disprove null hypothesis , we accept alternate hypothesis. It is hypothesis stating that the sample result is different i.e., larger or smaller than the value of population or statistics of one sample is different from the other
Accept or reject the null hypothesis Null Hypothesis is accepted or rejected depending on whether the result falls in zone of acceptance or zone of rejection If the result of a sample falls in the area of mean ± 2SE the null hypothesis is accepted. This area of normal curve is called zone of acceptance for null hypothesis If the result of sample falls beyond the area of mean ± 2 SE Null hypothesis of no difference is rejected and alternate hypothesis accepted This area of normal curve is called zone of rejection for null hypothesis
‘p’ value P value is determined using any of the previously mentioned methods If p> 0.05 the difference is due to chance and not statistically different but if p < 0.05 the difference is due to some external factor and statistically significant
Tests of significance Classified as Parametric tests Non – parametric tests
Parametric tests Parametric tests are those tests in which certain assumptions are made about the population Population from which sample is drawn has normal distribution The variances of sample do not differ significantly The observations found are truly numerical thus arithmetic procedure such as addition, division, and multiplication can be used Since these test make assumptions about the population parameters hence they are called parametric tests . These are usually used to test the difference
Parametric tests Calculation of standard error of mean SE x¯ Standard error of difference between 2 means. Large sample (values more than 30)-Z test . Small sample (values less than 30)-t test . Standard error of proportion Standard error of difference between two proportions For application of these tests, the sample must meet following criteria Samples should be selected randomly There should be homogeneity of variance in 2 samples.
Standard Error Of Mean Used for quantitative data If large sample of population is taken for study. Large no of samples, and each sample with specific individuals. Then calculate mean and SD of each sample Distribution of these means is sampling distribution The variability of this sampling distribution is measured by standard error of mean
SE of mean is SD of sample divided by square root of the number of observations in sample Application of SE To determine whether sample is drawn from a known population or not, when its mean µ is known. To work out the limits of desired confidence within which the population mean would lie.
SE Of Difference Between 2 Means Used for quantitative data It is the difference between means of two samples drawn from the same population It helps to know what is the significance of difference obtained by 2 research workers for the same investigation SE (X1 – X2) = √ SD1 2 / n1 + SD2 2 / n2
Standard Error Of Proportion A unit which measures variation in proportions of a character from sample to sample In case of qualitative data where character remains same but its frequency varies we express it in proportion instead of mean Proportion of individual having special character p q is number of individual not having the character p + q =1 or 100 if expressed in %age Standard error of proportion is the unit which measures variation in proportion of a character from sample to population
SE of proportion = √ p X q / n p = proportion of positive character q = proportion of negative character n = sample size Also proportion of population = proportion of sample ± 2 SEP Thus one can determine whether the proportion of sample is within limits of population proportion
Uses of SEP By calculating sample proportions we can find the confidence limits of population proportions To determine if sample is drawn from known population or not, when population proportion is known To find standard error of difference between 2 proportions To find sample size when we want to do survey
Standard error of difference between 2 proportions SE(P1 – P2) Criteria for use of this test N1, N2 are sufficiently large Samples are randomly selected Significance of difference can be found by Z-test Z = observed difference / SE of difference = P1 – P2 / SE (P1 – P2)
Calculation of SE (P1 – P2) = √p1q1/N1 + p2q2/N2 SE (P1 – P2) = √PQ 1/N1 + 1/N2 Then z is calculated by using formula Z = p1 – p2 / SE (p1 – p2) In case of calculating for small sample We use correction factor i.e., ½ ( 1/N1 + 1/N2) So, z = (p1-p2) – ½ (1/N1 + 1/N2) / √PQ ( 1/N1 + 1/N2)
Test of significance for small samples ‘t’ – Test Used for calculation of standard error of difference between means of small sample Criteria for applying ‘t’ test Random sample Quantitative data Variables are normally distributed Sample size less than 30
Applied to unpaired data of observation made on individuals of 2 separate groups to find the significance of difference between 2 means Sample size is less than 30 e.g. difference in accuracy in an impression using two different impression materials
Steps in unpaired t Test are Calculate the mean of two samples Calculate combined standard deviation SD² = ∑(x – x mean)² of group 1 + ∑ (x-x mean)² of group 2 / N1 +N2 – 2 Calculate the standard error of mean which is given by SED = SD √1/n1 + 1/n2 Calculate observed difference between means X1 – X2 Calculate t value = observed difference / Standard error of mean
Determine the degree of freedom which is one less than no of observation in a sample (n -1) Here combined degree of freedom will be = (n1 – 1) + (n2 – 1) 0r Df = N1 +N2 - 2 Refer to table and find the probability of the t value corresponding to degree of freedom P< 0.05 states difference is significant P> 0.05 states difference is not significant
Paired ‘t’ test It is the standard error of difference between 2 means of correlated data It is applied to paired data of observation from one sample only . Used in sample less than 30 The individual gives a pair of observation i.e. observation before and after taking a drug The steps involved are Calculate the difference in paired observation i.e. before and after = x1 – x2 = y Calculate the mean of this difference = y
Calculate SD Calculate SE = SD / √ n Determine t = observed difference / SED Determine the degree of freedom Since there is one sample Df = n-1 Refer to table and find the probability of the t value corresponding to degree of freedom P< 0.05 states difference is significant P> 0.05 states difference is not significant
ANOVA ANOVA is a powerful statistical technique in which the main emphasis is to consider the total variation occurring in a variable into different components and then examine the factors affecting the variable under consideration. It is the technique of partition of variance into parts so as to yield independent estimate of population variance Simultaneous comparison of means of several treatment groups is possible E.g., we might like to compare the difference in vertical dimension obtained using 3 or more methods like phonetics, swallowing, niswonger’s method
‘F’ distribution which describes the simultaneous distribution of two independent estimates of a population variance plays an important key role in ANOVA Assumptions underlying the use of ANOVA: Individuals in various groups should be selected in basis of random sampling Variables under study should follow normal distribution Variables of groups should be homogeneous. This should be tested by variance ratio test. Samples comprising the groups should be independent.
Types of ANOVA tests One way classification (one way ANOVA ) Where only one factor will effect the result between 2 groups Two way classification (two way ANOVA ) Where we have 2 factors that affect the result or outcome Three way classification ( multi way ANOVA ) Three or more factors affect the result or outcomes between groups
One way ANOVA Calculate sum of squares (SS T ) ( for all values) ∑x²= ∑ X² - ( ∑X²) N Calculate between sum of squares (SS B ) Calculate within sum of squares (SS W ) SS W = SS T – SS B SSW = group A ∑x² + group B∑x² + group C ∑x²
Calculate mean sum of squares SST SSB SSW Df Df Df Then F test F = mean sum of squares between groups mean sum of squares within groups Calculation of Df Df for total groups = no of cases in total – 1 Df for between groups= no of groups – 1 Df for within groups = no of cases in each group – no of groups
‘F’ test Homogeneity of variance is tested. F = SD1² / SD2² SD1 – it is larger of the 2 sample variances SD2 – it is smaller of the 2 sample variances By using ‘F’ test we evaluate the null hypothesis of no difference between 2 population variances. It is the basis of ANOVA, where the test itself involves a ratio of variance estimates It is useful to test the assumption which underlines many statistical tests that two population variance are equal
Chi Square Test Most commonly used When data expressed in frequencies or proportions or percentages chi square test is useful. Useful for discrete data. But any continous data can be reduced to categorical data and chi – Square test may be applied
Calculation: Make contingency (or 2 x 2) table. This is called four fold table. Note frequencies observed (O) in 4 cells. Calculate expected (E) value in each cell on assumption of null hypothesis. Find difference between observed frequency and expected frequency in each cell (O – E) Chi – Square - X² = (O – E) / E in all cells Then sum up chi – squares of all cells to chi – square value. Calculate Df = (column - 1) ( rows – 1 ) Direct formula X² = (ad – bc )² N / ( a+b ) ( c+d ) ( a+c ) ( b+d ) Where N = a + b + c + d
Characteristics of x² Test is based on frequencies or events as against ‘z’ test and ‘t’ test which are based on parameters like mean and SD. Although chi – square distribution is continous , it is used to discrete variable also. In testing the significance between a single expected and observed proportion or percentage, we use binomial test. In x² test we test the significance of difference between entire set of expected and observed frequencies.
For every degree of freedom there is one chi-square distribution Z and t distributions cannot be used when there are several characteristics involved in the distributions. But here we can use chi-square test. chi-square test is applied to draw inferences only (significant or not). But it is not useful to estimate as z and t tests are.
Non – parametric tests In many biological investigation the research worker may not know the nature of distribution or other required values of the population. Also some biological measurements may not be true numerical values hence arithmetic procedures are not possible in such cases. In such cases distribution free or non parametric tests are used in which no assumption are made about the population parameters .
Non – Parametric tests Non parametric or distribution free statistics Population studied need not fulfill the assumption of normal distribution. Advantages Simplicity of derivation Ease of application Speed of application Score of application Types of measurements required (distribution free statistics usually require ordinal data and sometimes nominal data, where as parametric statistics usually need interval data and ratio scales)
Influence of sample size Parametric statistics in small sample (10), even slight violation of assumption gives wrong results But, this is not so in non parametric, so these are advised for samples less than 10. In large samples, non parametric statistics are less efficient and laborious. Statistical efficiency When assumptions of non - parametric tests are met and parametric tests are not met, non parametric tests give superior results. Disadvantages Not useful in large samples Lower statistical efficiency
Distribution free statistics?? Rigid assumptions are not necessary in regard to the type of population distribution (in case of ‘z’ and ‘t’ tests normal distribution is needed). Calculation of mean and SD are not needed. It is based only on Df . It is simple to understand Can be used with small samples Can be used in simple ranking of values also. so, it is useful where data is not exact.
PEARSON’S CORRELATION One to one relation It is not necessary to assume variables to be independent. It is used to measure the degree of linear relationship between two variables. Represented by ‘r’ which is called ‘Pearson’s product moment correlation. In chi-square test we cannot get the degree of association and only get whether they are dependent or independent of each other
Correlation may be due to some direct relationship between two variables. This may be due to some inherent factors common to both variables Correlation is measured in terms of coefficient which takes into consideration the co-variation between 2 variables in relation to variables of 2 variation
r = ∑x y √( ∑x² ) x ( ∑y ²) r = sum of products of the deviations of x and y pairs from their respective means √sum of squares √sum of squares of deviations of of deviations of x from x mean y from y mean Variation of r can be 0 – no linear relationship 1 – perfect positive 2 – perfect negative.
References Soben Peter; essentials of preventive and community dentistry, second edition. G.N.prabhakara ; biostatistics T . Bhaskara rao ; methods of biostatistics