Inference involves making a generalization about a larger group of individuals on the basis of a subset or sample.
Size: 25.8 MB
Language: en
Added: Apr 11, 2021
Slides: 102 pages
Slide Content
INFERENTIAL STATISTICS Dr. Dalia El- Shafei Assist.Prof ., Community Medicine Department, Zagazig University http://www.slideshare.net/daliaelshafei
Definition of statistics : Branch of mathematics concerned with: Collection, Summarization, Presentation, Analysis, and Interpretation of data.
Types of statistics
inference Inference involves making a generalization about a larger group of individuals on the basis of a subset or sample.
Confidence level & interval “ interval estimate”
hypothesis testing To find out whether the observed variation among sampling is explained by sampling variations, chance or is really a difference between groups. The method of assessing the hypotheses testing is known as “ significance test” . Significance testing is a method for assessing whether a result is likely to be due to chance or due to a real effect.
Null & alternative hypotheses: In hypotheses testing, a specific hypothesis is formulated & data is collected to accept or to reject it. N ull hypotheses means: H : x 1 =x 2 this means that there is no difference between x 1 & x 2 . If we reject the null hypothesis, i.e there is a difference between the 2 readings, it is either H 1 : x 1 < x 2 or H 2 : x 1 > x 2 Null hypothesis is rejected because x 1 is different from x 2 .
Compared the smoking cessation rates for smokers randomly assigned to use a nicotine patch versus a placebo patch. Null hypothesis: Smoking cessation rate in nicotine patch group = smoking cessation rate in placebo patch group. Alternative hypothesis: Smoking cessation rate in nicotine patch group ≠ smoking cessation rate in placebo patch group (2 tailed) OR smoking cessation rate in nicotine patch group is higher than smoking cessation rate in placebo patch group (1 tailed).
decision errors Type I error “ α ” = False + ve = Rejection of true H Type II error “ β ” = False – ve = Accepting false H
In statistics, there are 2 ways to determine whether the evidence is likely or unlikely given the initial assumption: Critical value approach (favored in many of the older textbooks). P-value approach (what is used most often in research, journal articles, and statistical software).
If the data are not consistent with the null hypotheses, the difference is said to be “ statistically significant” . If the data are consistent with the null hypotheses it is said that we accept it i.e. statistically insignificant . In medicine, we usually consider that differences are significant if the probability is <0.05 . This means that if the null hypothesis is true, we shall make a wrong decision <5 in a 100 times.
Critical value
critical value
Critical Value is the z-score that separates sample statistics likely to occur from those unlikely to occur. The number 𝑍𝛼⁄2 is the z-score that separates a region of 𝛼⁄2 from the rest of the standard normal curve
Tests of significance
Analysis of quantitative variables
Z test or SND “standard normal deviate”
Z test or SND standard normal deviate Used for Comparing 2 means of large samples (>60) using the normal distribution.
Student’s t-test
Student's t-test Used for Comparing two means of small samples (<60) by the t distribution instead of the normal distribution.
unpaired t-test X 1 = mean of the 1 st sample X 2 =mean of the 2 nd sample n 1 = sample size of the 1 st sample n 2 = sample size of the 2 nd sample SD 1 = SD of the 1 st sample SD 2 = SD of the 2 nd sample. Degree of freedom (df) = (n 1 +n 2 )-2
Student's t-test The value of t will be compared to values in the specific table of "t distribution test" at the value of the degree of freedom. If the calculated value of t is less than that in the table, then the difference between samples is insignificant. If the calculated t value is larger than that in the table so the difference is significant i.e. the null hypothesis is rejected.
Student's t-test Calculated t (1.75) < Tabulated t (3.182), then the difference between samples is in significant. i.e. Null hypothesis is accepted. Suppose that you calculate t test= 1.75 Suppose that df = 3
Paired t-test Comparing repeated observation in the same individual or difference between paired data. The analysis is carried out using the mean & SD of the difference between each pair.
Analysis of variance (ANOVA)
Used for Comparing several means. To compare >2 means, this can be done by use of several t-tests that can consume more time & lead to spurious significant results. So, we must use analysis of variance or ANOVA.
Analysis of variance (ANOVA) There are two main types:
The main idea in the ANOVA is that we have to take into account the variability within the groups and between the groups and value of F is equal to the ratio between the means sum square of between the groups and within the groups. F = between-groups MS / within-groups MS.
Analysis of qualitative variables
Chi -square test
Chi -square test Test relationships between categorical independent variables. Qualitative data are arranged in table formed by rows & columns . Variables Obese Non-Obese Total Diabetic 62 63 125 Non-diabetic 51 44 105 Total 113 107 220
O = Observed value in the table E = Expected value Expected (E) = Row total Χ Column total Grand total Degree of freedom = (row - 1) (column - 1)
Example hypothetical study Two groups of patients are treated using different spinal manipulation techniques Gonstead vs. Diversified The presence or absence of pain after treatment is the outcome measure. Two categories Technique used Pain after treatment
Gonstead vs. Diversified example - Results Yes No Row Total Gonstead 9 21 30 Diversified 11 29 40 Column Total 20 50 70 Grand Total Technique Pain after treatment 9 out of 30 (30%) still had pain after Gonstead treatment and 11 out of 40 (27.5%) still had pain after Diversified, but is this difference statistically significant?
To find E for cell a (and similarly for the rest) Yes No Row Total Gonstead 9 21 30 Diversified 11 29 40 Column Total 20 50 70 Grand Total Technique Pain after treatment Multiply row total Times column total Divide by grand total First find the expected values for each cell Expected (E) = Row total Χ Column total Grand total
Find E for all cells Yes No Row Total Gonstead 9 E = 30*20/70=8.6 21 E = 30*50/70=21.4 30 Diversified 11 E=40*20/70=11.4 29 E=40*50/70=28.6 40 Column Total 20 50 70 Grand Total Technique Pain after treatment
Use the Χ 2 formula with each cell and then add them together Χ 2 = 0.0186 + 0.0168 + 0.0316 + 0.0056 = 0.0726 (9 - 8.6) 2 8.6 (21 - 21.4) 2 21.4 = 0.0186 0.0168 (11 - 11.4) 2 11.4 (29 - 28.6) 2 28.6 0.0316 0.0056
Evidence-based Chiropractic Therefore, Χ 2 is not statistically significant So, we will accept null hypothesis Calculated χ2 value (0.0726) < Tabulated value (7.815) at df = 1.
Z test for comparing 2 percentages “Proportion Z-Test”
Z test for comparing 2 percentages “Proportion Z-Test” p 1 =% in the 1 st group. p 2 = % in the 2 nd group q 1 =100-p1 q 2 =100-p2 n 1 = sample size of 1 st group n 2 =sample size of 2 nd group . Z test is significant (at 0.05 level) if the result >2. Z= p 1 – p 2 /√(p 1 q 1 /n 1 + p 2 q 2 /n 2 ).
Example If the number of anemic patients in group 1 which includes 50 patients is 5 and the number of anemic patients in group 2 which contains 60 patients is 20. if groups 1 & 2 are statistically different in prevalence of anemia we calculate z test. p 1 =5/50=10% p 2 =20/60=33% q 1 =100-10=90 q 2 =100-33=67 Z= 10 – 33/ √ (10x90/50 + 33x67/60) Z= 23 / √ (18 + 36.85) Z= 23/ 7.4 Z= 3.1 So, there is statistically significant difference between percentages of anemia in the studied groups (because Z>2).
Correlation & regression
Correlation & regression Correlation measures the closeness of the association between 2 continuous variables, while Linear regression gives the equation of the straight line that best describes & enables the prediction of one variable from the other.
Correlation is not causation!!!
Linear regression
Correlation Measured by the correlation coefficient , r . The values of r ranges between +1 and -1. “1” means perfect correlation while “0” means no correlation. If r value is near the zero, it means weak correlation while near the one it means strong correlation. The sign - and + denotes the direction of correlation
Regression
Linear regression Used to determine the relation & prediction of the change in a variable due to changes in another variable. For linear regression, the independent factor (x) must be specified from the dependent variable (y). Also allows the prediction of dependent variable for a particular independent variable
Scatterplots An X-Y graph with symbols that represent the values of 2 variables Regression line
Linear regression However, regression for prediction should not be used outside the range of original data. t-test is also used for the assessment of the level of significance. The dependent variable in linear regression must be a continuous one.
Multiple linear regression The dependency of a dependent variable on several independent variables, not just one. Test of significance used is the ANOVA. (F test).
example If neonatal birth weight depends on these factors: gestational age, length of baby and head circumference . Each factor correlates significantly with baby birth weight ( i.e has + ve linear correlation). We can do multiple regression analysis to obtain a mathematical equation by which we can predict the birth weight of any neonate if we know the values of these factors.