Inferential statistics

8,261 views 102 slides Apr 11, 2021
Slide 1
Slide 1 of 102
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102

About This Presentation

Inference involves making a generalization about a larger group of individuals on the basis of a subset or sample.


Slide Content

INFERENTIAL STATISTICS Dr. Dalia El- Shafei Assist.Prof ., Community Medicine Department, Zagazig University http://www.slideshare.net/daliaelshafei

Definition of statistics : Branch of mathematics concerned with: Collection, Summarization, Presentation, Analysis, and Interpretation of data.

Types of statistics

inference Inference involves making a generalization about a larger group of individuals on the basis of a subset or sample.

Confidence level & interval “ interval estimate”

hypothesis testing To find out whether the observed variation among sampling is explained by sampling variations, chance or is really a difference between groups. The method of assessing the hypotheses testing is known as “ significance test” . Significance testing is a method for assessing whether a result is likely to be due to chance or due to a real effect.

Null & alternative hypotheses: In hypotheses testing, a specific hypothesis is formulated & data is collected to accept or to reject it. N ull hypotheses means: H : x 1 =x 2 this means that there is no difference between x 1 & x 2 . If we reject the null hypothesis, i.e there is a difference between the 2 readings, it is either H 1 : x 1 < x 2 or H 2 : x 1 > x 2 Null hypothesis is rejected because x 1 is different from x 2 .

Compared the smoking cessation rates for smokers randomly assigned to use a nicotine patch versus a placebo patch. Null hypothesis: Smoking cessation rate in nicotine patch group = smoking cessation rate in placebo patch group. Alternative hypothesis: Smoking cessation rate in nicotine patch group ≠ smoking cessation rate in placebo patch group (2 tailed) OR smoking cessation rate in nicotine patch group is higher than smoking cessation rate in placebo patch group (1 tailed).

decision errors Type I error “ α ” = False + ve = Rejection of true H Type II error “ β ” = False – ve = Accepting false H

In statistics, there are 2 ways to determine whether the evidence is likely or unlikely given the initial assumption: Critical value approach (favored in many of the older textbooks). P-value approach (what is used most often in research, journal articles, and statistical software).

If the data are not consistent with the null hypotheses, the difference is said to be “ statistically significant” . If the data are consistent with the null hypotheses it is said that we accept it i.e. statistically insignificant . In medicine, we usually consider that differences are significant if the probability is <0.05 . This means that if the null hypothesis is true, we shall make a wrong decision <5 in a 100 times.

Critical value

critical value

Critical Value is the z-score that separates sample statistics likely to occur from those unlikely to occur. The number 𝑍𝛼⁄2 is the z-score that separates a region of 𝛼⁄2 from the rest of the standard normal curve

Tests of significance

Analysis of quantitative variables

Z test or SND “standard normal deviate”

Z test or SND standard normal deviate Used for Comparing 2 means of large samples (>60) using the normal distribution.

Student’s t-test

Student's t-test Used for Comparing two means of small samples (<60) by the t distribution instead of the normal distribution.

unpaired t-test X 1 = mean of the 1 st sample X 2 =mean of the 2 nd sample n 1 = sample size of the 1 st sample n 2 = sample size of the 2 nd sample SD 1 = SD of the 1 st sample SD 2 = SD of the 2 nd sample. Degree of freedom (df) = (n 1 +n 2 )-2

Student's t-test The value of t will be compared to values in the specific table of "t distribution test" at the value of the degree of freedom. If the calculated value of t is less than that in the table, then the difference between samples is insignificant. If the calculated t value is larger than that in the table so the difference is significant i.e. the null hypothesis is rejected.

Student's t-test Calculated t (1.75) < Tabulated t (3.182), then the difference between samples is in significant. i.e. Null hypothesis is accepted. Suppose that you calculate t test= 1.75 Suppose that df = 3

Paired t-test Comparing repeated observation in the same individual or difference between paired data. The analysis is carried out using the mean & SD of the difference between each pair.

Analysis of variance (ANOVA)

Used for Comparing several means. To compare >2 means, this can be done by use of several t-tests that can consume more time & lead to spurious significant results. So, we must use analysis of variance or ANOVA.

Analysis of variance (ANOVA) There are two main types:

The main idea in the ANOVA is that we have to take into account the variability within the groups and between the groups and value of F is equal to the ratio between the means sum square of between the groups and within the groups. F = between-groups MS / within-groups MS.

Analysis of qualitative variables

Chi -square test

Chi -square test Test relationships between categorical independent variables. Qualitative data are arranged in table formed by rows & columns . Variables Obese Non-Obese Total Diabetic 62 63 125 Non-diabetic 51 44 105 Total 113 107 220

O = Observed value in the table E = Expected value Expected (E) = Row total Χ Column total Grand total Degree of freedom = (row - 1) (column - 1)

Example hypothetical study Two groups of patients are treated using different spinal manipulation techniques Gonstead vs. Diversified The presence or absence of pain after treatment is the outcome measure. Two categories Technique used Pain after treatment

Gonstead vs. Diversified example - Results Yes No Row Total Gonstead 9 21 30 Diversified 11 29 40 Column Total 20 50 70 Grand Total Technique Pain after treatment 9 out of 30 (30%) still had pain after Gonstead treatment and 11 out of 40 (27.5%) still had pain after Diversified, but is this difference statistically significant?

To find E for cell a (and similarly for the rest) Yes No Row Total Gonstead 9 21 30 Diversified 11 29 40 Column Total 20 50 70 Grand Total Technique Pain after treatment Multiply row total Times column total Divide by grand total First find the expected values for each cell Expected (E) = Row total Χ Column total Grand total

Find E for all cells Yes No Row Total Gonstead 9 E = 30*20/70=8.6 21 E = 30*50/70=21.4 30 Diversified 11 E=40*20/70=11.4 29 E=40*50/70=28.6 40 Column Total 20 50 70 Grand Total Technique Pain after treatment

Use the Χ 2 formula with each cell and then add them together Χ 2 = 0.0186 + 0.0168 + 0.0316 + 0.0056 = 0.0726 (9 - 8.6) 2 8.6 (21 - 21.4) 2 21.4 = 0.0186 0.0168 (11 - 11.4) 2 11.4 (29 - 28.6) 2 28.6 0.0316 0.0056

Evidence-based Chiropractic Therefore, Χ 2 is not statistically significant So, we will accept null hypothesis Calculated χ2 value (0.0726) < Tabulated value (7.815) at df = 1.

Z test for comparing 2 percentages “Proportion Z-Test”

Z test for comparing 2 percentages “Proportion Z-Test” p 1 =% in the 1 st group. p 2 = % in the 2 nd group q 1 =100-p1 q 2 =100-p2 n 1 = sample size of 1 st group n 2 =sample size of 2 nd group . Z test is significant (at 0.05 level) if the result >2. Z= p 1 – p 2 /√(p 1 q 1 /n 1 + p 2 q 2 /n 2 ).

Example If the number of anemic patients in group 1 which includes 50 patients is 5 and the number of anemic patients in group 2 which contains 60 patients is 20. if groups 1 & 2 are statistically different in prevalence of anemia we calculate z test. p 1 =5/50=10% p 2 =20/60=33% q 1 =100-10=90 q 2 =100-33=67 Z= 10 – 33/ √ (10x90/50 + 33x67/60) Z= 23 / √ (18 + 36.85) Z= 23/ 7.4 Z= 3.1 So, there is statistically significant difference between percentages of anemia in the studied groups (because Z>2).

Correlation & regression

Correlation & regression Correlation measures the closeness of the association between 2 continuous variables, while Linear regression gives the equation of the straight line that best describes & enables the prediction of one variable from the other.

Correlation is not causation!!!

Linear regression

Correlation Measured by the correlation coefficient , r . The values of r ranges between +1 and -1. “1” means perfect correlation while “0” means no correlation. If r value is near the zero, it means weak correlation while near the one it means strong correlation. The sign - and + denotes the direction of correlation

Regression

Linear regression Used to determine the relation & prediction of the change in a variable due to changes in another variable. For linear regression, the independent factor (x) must be specified from the dependent variable (y). Also allows the prediction of dependent variable for a particular independent variable

Scatterplots An X-Y graph with symbols that represent the values of 2 variables Regression line

Linear regression However, regression for prediction should not be used outside the range of original data. t-test is also used for the assessment of the level of significance. The dependent variable in linear regression must be a continuous one.

Multiple linear regression The dependency of a dependent variable on several independent variables, not just one. Test of significance used is the ANOVA. (F test).

example If neonatal birth weight depends on these factors: gestational age, length of baby and head circumference . Each factor correlates significantly with baby birth weight ( i.e has + ve linear correlation). We can do multiple regression analysis to obtain a mathematical equation by which we can predict the birth weight of any neonate if we know the values of these factors.