Public health and Epidemiology sample size estimation
Sujit72
45 views
40 slides
Aug 14, 2024
Slide 1 of 40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
About This Presentation
Sample size for public health personal and will be useful for all the students
Size: 149.23 KB
Language: en
Added: Aug 14, 2024
Slides: 40 pages
Slide Content
Sample Size Estimation
Adapted from slides developed by the World Health Organization
Objectives
•Understand the relationship between sample size and
power
•Determine sample sizes necessary to achieve a given level
of power for estimating a simple proportion, and other
measures of effect
Steps in Estimating Sample Size
•Identify major study variable
•Determine type of estimate (%, mean, ratio,...)
•Indicate expected frequency of factor of interest
•Decide on desired precision of the estimate
•Decide on acceptable risk that estimate will fall outside its real
population value
•Adjust for population size
•Adjust for estimated design effect
•Adjust for expected response rate
Why are Sample Size & Power Important?
•Sample size and resultant statistical power essential for the
evaluation of the role of chance as an alternative
explanation of study findings
•If a study has a inadequate sample size, then a result
without a statistically significant association between
exposure & disease is uninformative
•A true lack of association will be difficult or impossible to
distinguish from a true association that cannot be detected
statistically because of inadequate power
Sample Size
•How reliable the final estimate should be
•Reliability based on accuracy and completeness
•Trade off between ideal sample size and survey cost
•Sample sufficient to accomplish the purpose but no more than
necessary
SAMPLE SIZE
Depending on:
1) variability in the target population. If unknown, assume
maximum variability
2) desired precision in the estimate
3) desired confidence in the estimate
4) feasibility
In most cases it is independent from the size of the original
population
a and Confidence Level
• The significance level of a test: the probability of
rejecting the null hypothesis when it is true (or the
probability of making a Type I error).
•Confidence level: The probability that an estimate of a
population parameter is within certain specified limits of
the true value; commonly denoted by “1-a”.
Power and b
•Power: The probability of correctly rejecting the null
hypothesis when it is false; commonly denoted by “1-b”.
• b: The probability of failing to reject the null hypothesis
when it is false (or the probability of making a Type II error).
Precision
A measure of how close an estimate is to the true value of a
population parameter. It may be expressed in absolute
terms or relative to the estimate.
SAMPLE SIZE
Sample Size Required for Estimating Population Mean
•The objective in interval estimation is to obtain narrow intervals
with high reliability
•The width of the interval is determined by the magnitude of the
quantity
(reliability coefficient or z) x (Standard error or
Sample Size Required for Estimating Population
Mean
•If we fix z the only way to reduce the width of the interval is
to reduce
•Since is equal to / n , and since is a constant,
•The only way to obtain a standard error is to take a large
sample
•That depends on the size of , the desired degree of
reliability, and the desired interval width
Sample Size Required for Estimating Population
Mean
•Suppose we want an interval that extends d units on either side of
the estimator
d = (reliability coefficient) x (Standard error)
•If sampling is with replacement, from a population sufficiently
large to warrant ignoring the finite population correction, the
equation is:
d = z
n
•When solved for n gives:
n = z
2
d
2
Example 1 (1/3)
What Sample Size Do I Need If . . . ?
•I want to estimate the true immunization coverage in a community
of school children
•Previous studies tell us that immunization coverage should be
somewhere around 80%
•Precision (absolute): we’d like the result to be within 4% of the
true value
•Confidence level: conventional = 95% = 1 - ; therefore, = 0.05
and z
(1-/2) = 1.96 = value of the standard normal distribution
corresponding to a significance level of a (1.96 for a 2-sided test
at the 0.05 level)
Example 1 (2/3)
•d = absolute precision = 0.04
•p = expected proportion in the population = 0.80
•z
(1-/2)
= 1.96 = value of the standard distribution
corresponding to a significance level of (1.96 for a 2-
sided test at the 0.05 level)
Sample Size Required for Estimating Population Mean
•The formulas for sample size require knowledge of
However, as a rule
the population variance is unknown and has to be estimated:
–A pilot or preliminary sample. Observations used in the pilot can be
counted as part of the final sample
–Estimates may be available from previous studies
–If thought that the population is approximately normally distributed,
we may use the fact that the range (R) is approximately equal to 6
standard deviations:
R/6
Example 1a (1/4)
What Sample Size Do I Need If . . . ?
•Previous studies tell us that there is 80% immunization
coverage in this village
•Relative Precision: relative difference between sample
coverage & true population coverage (we determine that we
want this to be +/-10% of the anticipated population
proportion or 80%)
•Confidence level: conventional a = 0.05; z
(1-a/2)
= 1.96 = value
of the standard normal distribution corresponding to a
significance level of a (1.96 for a 2-sided test at the 0.05
level)
Example 1a (2/4)
• e = relative precision = 10%
•p = expected proportion in the population = 0.80
•Design effect = 2
•z
(1-a/2) = 1.96 = value of the standard distribution
corresponding to a significance level of a (1.96 for a 2-sided
test at the 0.05 level)
Example 1a (3/4)
Sample Size
For a relative precision of 10%
z
2
.
p
.
(1-p)
n = ----------------------
(
.
p )
2
(1.96)
2
(.80)(.20)
= --------------------------------
(.10
.
.80)
2
= 96
Example 1a (3/4)
Sample Size
For a relative precision of 5%
z
2
.
p
.
(1-p)
n = ----------------------
(
.
p )
2
(1.96)
2
(.80)(.20)
= --------------------------------
(.05
.
.80)
2
= 384
Note: for p = 0.80, this is same as an absolute
precision of 0.04
Sample Size Required for Estimating Population Mean
•Example: A health department nutritionist , wishing to conduct a survey
among a population of teenage girls to determine the average daily protein
intake, is seeking advice relative to the size sample that should be taken
What information is needed to estimate the sample size?
•The nutritionist must provide three items of information: the desired width
of the confidence interval, the level of confidence desired, and the
magnitude of the population variance
Sample Size Required for Estimating Population Mean
•Solution: The nutritionist would like an interval about 10 units wide; that
is, the estimate should be within about 5 units of the true value in either
direction. A confidence coefficient of .95 is decided and on that, from past
experience, the nutritionist feels that the population standard deviation is
probably about 20 grams.
•Summarizing the information: z = 1.96, = 20, and d = 5
•Calculation:
n = (1.96)
2
(20)
2
= 61.47
(5)
2
Sample Size Required for Estimating Proportions
•The formula requires the knowledge of p, the proportion in the
population possessing the characteristic of interest. However,
this is what we are trying to estimate and is unknown
–A pilot or preliminary sample. Observations used in the pilot can be
counted as part of the final sample
–Estimates may be available from previous studies and the upper
bound of p can be used in the formula
–If impossible to come with a better estimate, set p = .05 in the formula
to yield the maximum value of n
Sample Size Required for Estimating Proportions
The method is essentially the same as for population mean. Assuming random sampling
and approximate normality in the distribution of p, brings us to the formula for n if
sampling is with replacement, from a population sufficiently large to warrant ignoring
the finite population correction :
Where q = 1 – p
If the finite population correction cannot be disregarded:
n = Nz
2
pq
d
2
(N-1)+z
2
pq
When n/N < .05 the finite population correction can be ignored
n
zpq
2
2
d
Finite Population Correction
•FPC = (N - n) / (N - 1)
–N = population size
–n = sample size
•Can be ignored when sample size is small in comparison
with the population size
•Use when n / N .05
Finite Population Correction
N n n/N (N-n)/(N-1) n
FPC
100000 384 0.00384 0.996 383
50000 384 0.00768 0.992 381
20000 384 0.0192 0.981 377
10000 384 0.0384 0.962 369
5000 384 0.0768 0.923 355
1000 384 0.384 0.617 237
Sample Size Required for Estimating Proportions
•Example: a survey is being planned to determined what proportion of
families in a certain area are medically indigent. It is believed that the
proportion cannot be greater than .35. A 95 percent confidence interval is
desired with d = .05. What sample size should be selected?
•Summary: z = 1.96, p = .35, and d = ,05
n = (1.96)
2
(.35)(1-.35) = 349.6
(.05)
2
Design Effect
•A bias in the variance introduced in the sampling design,
by selecting subjects whose results are not independent
from each other; relative change (increase) in the variance
due to the use of clusters.
•The design effect can be calculated after study completion,
but should be accounted for at the design stage.
–The design effect is 1 (i.e., no design effect) when taking
a simple random sample.
–The design effect varies using cluster sampling; it is
usually estimated that the design effect is 2 in
immunization cluster surveys.
Design Effect
Global variance
p(1-p)
Var srs = ----------
n
Cluster variance
p= global proportion
pi= proportion in each stratum
n= number of subjects
k= number of strata
Σ (pi-p)²
Var clus = -------------
k(k-1)
variance (cluster)
Design effect = -----------------------
variance (srs)
Sample Size Formula in
Descriptive Survey, with Design Effect
z: alpha risk express in z-score
p: expected prevalence
q: 1 - p
d: absolute precision
g: design effect
z² * p * q 1.96²*0.15*0.85
n = -------------- ---------------------- = 544
d² 0.03²
Cluster sampling
z² * p * q
2*1.96²*0.15*0.85
n = g* -------------- ------------------------ = 1088
d² 0.03²
Simple random / systematic sampling
What You Need to Calculate Sample Size for
Analytic Studies
•Desired values for the probabilities of and
•The proportion of the baseline (controls or non-exposed)
population
–EXPOSED (for case-control studies), or
–DISEASED (for cohort/intervention studies)
–Often based on previous studies or reports
•Magnitude of the expected effect (RR, OR)
–Often based on previous studies or reports
–Minimum effect that investigator considers worth detecting
•Formula: different formulae depending on study design,
research question, and type of data
Example 2 (1/3)
What Sample Size Do I Need If . . . ?
•Cohort study of oral contraceptive (OC) use in relation to risk of MI
among women of childbearing age
•Previous studies
–Proportion of non-OC users who are diseases = 0.15
–Proportion of OC-users who are diseases = 0.25
•Conventional = 0.05 (two-sided)
•Conventional = 0.20 (80% power to detect a difference if one truly
exists)
•Assume equal sample sizes (n
1
= n
2
)
Example 2 (2/3)
•p
0 = proportion of non-OC users who are diseases = 0.15
•p
1 = proportion of OC-users who are diseases = 0.25
•q
0
= (1-p
0
) = 1.0 - 0.15 = 0.85
•q
1
= (1-p
1
) = 1.0 - 0.25 = 0.75
•z
(1-/2) = 1.96 = value of the standard normal distribution
corresponding to a significance level of a (1.96 for a 2-sided
test at the 0.05 level)
•z
(1-) = 0.84 = value of the standard normal distribution
corresponding to the desired level of power (0.84 for a
power of 80%)
Example 2 (3/3)
(p
0q
0 + p
1q
1) (z
1-/2 + z
1-)
2
n (each group) = ------------------------------------
(p
1-p
0)
2
[(.15)(.85) + (.25)(.75)][1.96 + 0.84]
2
-------------------------------------------------
(0.25 + 0.15)
2
(0.315)(7.84)
------------------- = 246.96
0.01
Therefore: 247 OC users (and 247 non-OC users)
Example 3 (1/3)
What Size Sample Do I Need If . . . ?
•Case-control study of oral contraceptive (OC) use in relation to risk of
MI among women of childbearing age
•Previous studies: 10% of women use Ocs
•OR of MI associated with current OC use = 1.8
•Conventional = 0.05 (two-sided)
•Conventional = 0.20 (80% power to detect difference if one truly
exists)
•Assume equal sample sizes (n
1
=n
2
)
Example 3 (2/3)
•p
0 = proportion of controls who are current OC users = 0.10
•p
1 = proportion of cases who are current OC users =
•q
0 = (1-p
0) = 1.0- 0.10 = 0.90
•q
1
= (1-p
1
) = 1.0 - 0.18 = 0.82
•z
(1-a/2)
= 1.96 = value of the standard normal distribution corresponding to
a significance level of a (1.96 for a 2-sided test at the 0.05 level)
•z
(1-b)
= 0.84 = value of the standard normal distribution corresponding to
the desired level of power (80%)
Example 3 (3/3)
(p
0q
0 + p
1q
1) (z
1-/2 + z
1-)
2
n (each group) = ------------------------------------
(p
1-p
0)
2
[(.10)(.90) + (.18)(.82)][1.96 + 0.84]
2
-------------------------------------------------
(0.18 + 0.10)
2
(0.2376)(7.84)
------------------- = 291.06
0.0064
Therefore: 291 cases and 291 controls
Sample Sizes: Case-Control Study of OC Use and MI
OR Required sample sizes
1.2 3834
1.3 1769
1.5 682
1.8 291
2.0 196
2.5 97
3.0 59
The 10% Rule
•Note that sample-size estimates should be interpreted as
providing merely a MINIMUM estimate of the sample sizes
necessary for the study
•The formula takes into account only the overall crude
association between exposure & disease; i.e., no
confounders are considered
•10% rule: increase the sample size 10% for each
confounder/variable added
What About Unequal Sample Sizes?
•This is easy to do; formula changes slightly
•n1 is the sample size for the first group
•k * n1 is the sample size of the second group (where k is
pre-specified, e.g., 2x, 3x as many controls as cases)
•Very easy to do in Epi Info