6Tests of significance Parametric and Non Parametric tests.ppt
16 views
72 slides
Nov 15, 2024
Slide 1 of 72
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
About This Presentation
Parametric tests
Size: 1.08 MB
Language: en
Added: Nov 15, 2024
Slides: 72 pages
Slide Content
Tests of significance: Parametric
and Non Parametric tests
PRESENTER: Dr Bhanu M
GUIDE: Mr Shivraj
ACKNOWLEDGEMENT: Prof NS Murthy
1 of 72
CONTENTS
•Introduction & Terminologies
•Parametric
-Assumptions
•Non-Parametric tests
-Assumptions
•Tests for large samples
•Tests for small samples
•Various non-parametric tests
•References
2
Introduction
•Until 18
th
century statistics was used only to a
limited extent in the field of medicine.
•During 19
th
century Karl Pierson, known as
father of modern statistics, changed the
concept of descriptive statistics to inferential
statistics.
3
Introduction…
•He argued that mathematics can be applied to
biological problems and that analysis of
statistical data could answer many questions
about the life of plants, animals and humans.
•Statistics thus, was later started to be used to
obtain analytical answers to problematic
scientific data.
4
Tests of Significance
•Tests of significance refer to various
mathematical methods using which
probability(P) of an observed difference
occurring by chance is found out.
•Thus all tests of significance are aimed at
finding the value of P.
5
•P (level of significance) is the probability of,
the sampling variation or the chance factor
being responsible for the causation of the
difference in the sample estimates, when the
samples are from a common population.
[Applied Statistics in Health Sciences by NSN Rao & NS
Murthy].
6
PARAMETRIC TESTS- Assumptions
•The observations must be drawn from normally
distributed populations
•These populations must have the same variances
•The observations must be independent
•Population parameters involved is mean, standard
deviation
•Require interval scale or ratio scale (whole numbers
or fractions). Example: Height in inches: 72, 60.5,
54.7; temperature[30-34 degree Celcius]
8
PARAMETRIC TESTS…
•Critical ratio/ Z-test
•Paired t-test
•Unpaired t-test
•One way ANOVA
•Two way ANOVA
9
Tests for large samples
•SE of mean= s/ √n
•SE of proportion= √(pq/n)
•SE of difference between 2 means=
√[(s
1
2
/n
1 ) + (s
2
2
/n
2)]
•SE of difference between 2 proportions=
√[(p
1q
1/n
1) + (p
2q
2/n
2)]
10
•s= standard deviation of respective samples
•n= sample size of respective samples
•p= estimate of proportion of events in
respective samples
•q= (1-p)
11
1)Set up the null hypothesis that the 2 samples
are from the same population and that the
difference between the 2 sample estimates is due
to sampling variation.
12
•2)Calculate 2 means x
1 and x
2 or 2 proportions
p
1 and p
2 corresponding to the 2 samples with
sample size n
1 and n
2 respectively.
•3)Calculate the standard deviation of the 2
samples and their standard errors (SE
1) and
(SE
2) respectively.
13
•4)Calculate the standard error of the difference
between the 2 sample estimates as
•√[(SE
1)
2
+ (SE
2)
2
]
•5)Calculate the quantity critical ratio (CR) or
z value.
•z= difference between sample estimates/ SE
of difference
14
•6)Refer to the normal distribution table &
corresponding to this calculated value of z,
find the value of probability P.
•7)If P is ≤ 0.05, reject the null hypothesis and
conclude that the difference between the 2
sample estimates as significant.
15
Example
In an epidemic of gastroenteritis in an area the number of cases reported in 2
populations consuming water from different sources were as follows:
Source of
water
No. of people consuming
water from a particular source
No. of cases of
gastroenteritis
Tap water 800 35
Hand pump 2400 120
Total 3200 155
We have to find out whether the difference in the proportion of cases in the 2
groups is significantly different or not.
16
Tests for small samples
Assumptions for t- distribution:
•the means of the 2 samples are normally
distributed
•the means of the 2 samples are independently
distributed
•the variances of the 2 samples are equal.
17
t- test for paired observation
•The formula to calculate t statistics is,
•t= d¯/s
md with (n-1) degrees of freedom
•n= number of observations in the sample
•d¯= mean of differences in the values of the
variable of the sample observations before &
after treatment
18
•s
md= SE of mean difference
s
md= s
d/ √n
• s
d= standard deviation of the values of d (the
differences in variable before & after the
treatment
19
1)Set up null hypothesis that d¯=0
2)Calculate the difference d
1 for each pair of
observations before & after treatment and
compute their mean d¯
20
3)Calculate the SD of these differences
s
d= [√{sum total of difference between
individual observation & d¯}
2
]/(n-1)
n= number of pairs of observations
4)Calculate SE s
md from the formula
s
md
= s
d
/ √n
21
5)Calculate the value of t- statistics as t= d¯/s
md
6)Compute the degrees of freedom as (n-1)
7)From the t- distribution table find the
probability level corresponding to this value of t
& with degrees of freedom (n-1)
22
8)If P< 0.05 reject the null hypothesis and
conclude that the difference between before and
after treatment values are significant.
23
Unpaired t-test
t= {(x
1¯- x
2¯)/ s
md}
s
md is estimated standard error of the difference
between the 2 sample means
s
md= √[{(n
1+n
2)/n
1n
2}{(n
1-1)s
1
2
+ (n
2-1)s
2
2
}/ (n
1+n
2-2)]
s
1
2
and s
2
2
are the SD of 2 samples
n
1 and
n
2 are respective sample sizes
25
t= (x
1¯- x
2¯)/ √[{(n
1+n
2)/n
1n
2}{(n
1-1)s
1
2
+ (n
2-
1)s
2
2
}/ (n
1+n
2-2)]
with (n
1 + n
2 – 2) degrees of freedom
26
In an experiment to know whether there is any difference in the length of small
intestines between males and females, the observations recorded were as follows:
Males Females
No. of observations 17 15
Mean length of the small intestines 157 146
SD of the observations 34 31
27
ANOVA
Assumptions:
•the effects under different groups are additive
•standard error sum of squares are normally
distributed
•standard error sum of squares are independently
distributed
•the variance is same in various groups
28
One way ANOVA
1)Calculate sum of observations of each group
(village)= (T
i)
2)Calculate sum of all observations (Σx
ij),
where
x
ij represents each observation
3)Calculate square of sum of all observations=
(Σxij)
2
29
4)Calculate total sum of squares, ie = [Σ(x
ij
2
)-
{(Σx
ij)
2
/n}];
n= total no. of observations.
(Σx
ij)
2
/n is called correction factor(CF)
5)Calculate ‘sum of squares between groups’, ie
= {Σ(T
j
2
/k
i
)-CF}; T
j
= sum of observations in each
group; k
i= no. of observations in each group
30
6)Sum of squares within groups is obtained as
difference between the ‘total sum of squares’
and ‘sum of squares between villages’ ie
[(Σxij)2
ANALYSIS OF VARIANCE TABLE
Source of
sum of squares
Degree of
freedom
Sum of
squares
Mean sum
of squares
F
Between villages 3 349.525 116.5083 4.544
Within villages 36 922.950 25.6375
Total 39 1272.475
33
Two way ANOVA
•To study the differences within sub
classification of the groups also.
34
Anten
atal
perio
d
(trime
ster)
Villages Total
of
each
tri-
meste
r (R
i
)
1 2 3 4 5 6 7 8 9 10
I 11.519.518.512.518.518.526.518.516.024.5182.5
II 27.028.022.021.015.015.020.026.030.028.5237.0
III28.030.026.030.024.528.526.030.027.025.5275.5
Total
of
villag
e
66.577.566.563.558.064.572.574.573.078.5695.0
35
Analysis of variance table
Source of
SS
df Sum of squares MSS F
Between villages 9 134.17 14.908 0.855
Between
trimesters
2 436.72 218.36 12.526
Residual 18 313.78 17.432
Total 29 884.67
36
NON-PARAMETRIC TESTS-
Assumptions
•The observations do not follow normal distribution
•The populations do not have same variances
•The observations may or may not be independent
•Population parameters involved are median, mode
•Data may be nominally or ordinally scaled
Example: nominal(male/female); ordinal(good-better-
best)
38
NON-PARAMETRIC TESTS…
•Chi-square test
•Fisher’s exact test
•The Sign test
•Wilcoxon’s Signed Rank test
•Mann Whitney U test
•Kruskal- Wallis H test
•Friedman test
•Mc Nemar test
39
Chi-square test
•χ
2
= Σ {(observed frequency – expected
frequency)
2
/ expected frequency}
•This value is calculated for each cell in the
table & sum of above ratios is the total chi-
square value.
•df = (c-1)(r-1)
40
Blood group
Non
leprosy
Lepromatous
leprosy
Non lepromatous
leprosy
Total
A 30 49 52 131
B 60 49 36 145
O 47 59 48 154
AB 13 12 16 41
Total 150 169 152 471
41
Blood
group
Non
leprosy
Lepromatous
leprosy
Non
Lepromatous
Leprosy
A
(131/471)x150
= 41.7
(131/471)x169
= 47.0
(131/471)x152
= 42.3
B
(145/471)x150
= 46.2
(145/471)x169
= 52.0
(145/471)x152
= 46.8
O
(154/471)x150
= 49.0
(154/471)x169
= 55.3
(154/471)x152
= 49.7
AB
(41/471)x150
= 13.1
(41/471)x169
= 14.7
(41/471)x152
= 13.2
42
Blood group
Non
leprosy
Lepromatous
leprosy
Non lepromatous
leprosy
Total
A 3.28 0.09 2.22 5.59
B 4.12 0.17 2.49 6.78
O 0.08 0.25 0.06 0.39
AB 0.00 0.50 0.59 1.09
Total 7.48 1.01 5.36 13.85
43
Chi-square test for a 2x2 table
• χ
2
= [(ad-bc)
2
x G] / [(a+b)(c+d)(a+c)(b+d)]
•df = (2-1)(2-1) = 1
44
Filariasis
infestation
Male Female Total
Yes 28 20 48
No 237 222 459
Total 265 242 507
45
Fisher’s Exact probability test
P = [{(a+c)! (b+d)! (a+b)! (c+d)! }/ {n! a! b! c!
d!}]
•a, b, c & d = cell frequencies in 2x2 table
•n = total number of observations
46
Habit Boys Girls Total
Exercise
regularly
2 8 10
Do not exercise
regularly
10 4 14
Totals 12 12 24
47
Sign test
•The significance of difference can be tested
using usual χ
2
test, which is as follows:
•χ
2
= [(|a-b| - 1)
2
] / n with 1 df
•a & b = no. of (+) & (-) respectively
•n = (a+b)
48
•All ‘0’ differences, ie when the 2 tests are
identical in their results are omitted for
calculation so that ‘n’ is always equal to (a+b).
•P is obtained from χ
2
distribution table & the
conclusion about the significance is made as
usual
49
Wilcoxon’s Sign Rank test
•This is useful in testing the significance of
differences in paired observations, when the
data is quantitative in nature.
51
Kruskal Wallis H test
•This is the non-parametric equivalent to F test
used in one way ANOVA. This test is used to
test whether or not the groups of independent
samples have been drawn from the same
population. It is calculated using formula:
•H = [12/n(n+1)]{Σ(R
i
2
/n
i)} – 3(n+1)
56
Group 1 Group 2 Group 3
Score Rank Score Rank Score Rank
10 1 11 2 30 20
12 3 14 4 32 22
15 5 22 12 24 14
17 7 28 18 31 21
23 13 20 10 26 16
18 8 16 6 34 24
19 9 29 19 27 17
21 11 33 23
25 15
n
1
= 7 R
1
= 46 n
2
= 8 R
2
= 82 n
3
= 9 R
3
= 172
57
Friedman Test
•This is the non-parametric equivalent of two
way ANOVA, where each group may have
further sub-divisions. df = k-1
•The formula used is as follows:
•χ
2
= [{(12) x Σ(R
j
2
)} / {nk(k+1)}] – {3n(k+1)}
•ΣR
j
= sum of ranks in each column
•n = no. of rows
•k = no. of columns
58
Group
of
care
givers
Need 1 Need 2 Need 3 Need 4
ScoreRankScoreRankScoreRankScoreRank
Group
I
13 2 9 4 10 3 16 1
Group
II
10 2 4 3 3 4 12 1
Group
III
12 1 6 3 2 4 10 2
Rank
sums
5 10 11 4
59
Mc Nemar test
•It is a type of 2x2 chi-square test. It is for
comparisons of variables from matched pairs
& uses information only from discordant
pairs(variables are not independent).
•Reduces type 1 error
•χ
2
= [(|f-g| - 1)
2
] / (f+g)
•df = 1
60
Alcohol addiction
(cases)
Total
Yes No
Alcohol
addiction
(controls)
Yes 10 30 40
No 20 40 60
Total 30 70 100
61
Non-parametric tests: Advantages over
parametric tests
•Can be used to test the data measured on
nominal & ordinal scales
•Used when the observations are described in
terms of ranks or hierarchy
•Easier to compute, understand & explain
•Make fewer assumptions about distribution of
samples
•Can be used even for very small sample sizes
62
Advantages…
•Need not involve population parameters ie can
be used to analyze the data from populations
where mean & standard deviations are not
available or undeterminable
•Results may be as exact as parametric
procedures when used even on populations
following normal distribution
63
Advantages…
•Have applications in the field of sociology &
educational statistics where the socio-
economic data are not normally distributed
•The computational burden is so light that they
are known as “short cut” methods
64
Non-parametric tests: Disadvantages
•Can test the statistical hypothesis but cannot
estimate the parameter
•Low power of tests
•Difficult to compute by hand for large samples
•Tables are not widely available
65
Disadvantages…
•Waste of time & data if all assumptions of a
statistical model are satisfied & if there is a
suitable parametric test
66
NON-PARAMETRIC COMPLEMENTS
OF PARAMETRIC TESTS
PARAMETRIC TESTS NON-PARAMETRIC
TESTS
Paired t test Wilcoxon’s signed rank
test
Unpaired t test Mann Whitney U test
One way ANOVA Kruskal Wallis test
Two way ANOVA Friedman test
67
References
1] Rao NSN, Murthy NS. Applied statistics in
health sciences. 1
st
ed. New Delhi: Jaypee
Brothers Medical Publishers (P) Ltd; 2008.
2] Dixit JV. Principles and practice of
biostatistics. 3
rd
ed. Jabalpur: M/S Banarsidas
Bhanot Publishers; 2005.
68
3] Hennekens CH, Buring JE. Epidemiology in
medicine. 1
st
ed. Boston, Toronto: Little Brown &
Company.
4] Altman DG. Practical statistics for medical
research. UK: Chapman & Hall/CRC; 1990.
5] Last JM, editors. A dictionary of epidemiology.
Toronto: Oxford university press; 1983.
69
6] Maxwell FP. A-Z of medical statistics- a
companion for critical appraisal. 2
nd
ed. Hodder
Arnold; 2008.
7] Health research methodology- A guide for
training in research methods. 1
st
ed. Manila:
World Health Organization; 1992.
70
9] Driscoll P, Lecky F. An introduction to
hypothesis testing. Parametric comparison of
two groups-1. Emerg Med J 2001; 18:124-130.
10] Driscoll P, Lecky F. An introduction to
hypothesis testing. Non-parametric comparison
of two groups-1.Emerg Med J 2001;18:276-282.
71