Statistical estimation and sample size determination

MikaPop 64 views 69 slides Jun 04, 2024
Slide 1
Slide 1 of 69
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69

About This Presentation

Sample size
Mean and proportion


Slide Content

Statistical estimation and sample size
determination
By Derara Girma (BSC., MPH/Epi.)
Email address: [email protected]

Course outline
•Parameter,statistic,inference,andestimation
•Pointandintervalestimations
•Constructionandinterpretationofconfidenceintervalfora
singleproportion
•Samplesizedetermination
30/05/2024 2

Learning outcome
•At the end of this chapter students should be able to:
•Define parameter, statistic, inference, and estimation
•Identify point and interval estimations use to make inference
•Construct and interpret confidence interval for a single proportion
•Determine sample sizes
30/05/2024 3

Definition
•A statistic is a characteristic or measure obtained by using the
data values from a sample.
•A parameter is a characteristic or measure obtained by using
all the data values from a specific population.
30/05/2024 4

Definition…
•Sample statistic:
•Sample mean (x̄)
•Sample variance (S
2
)
•Sample Standard deviation (SD)
•Sample proportion (p̂)
•Population parameter:
•population mean (μ)
•Population variance (σ
2
)
•Population standard deviation (σ)
•Population proportion (P or π)
30/05/2024 5

Definition…
•Statistical inference is the procedure by which we reach a
conclusion about a population on the basis of the information
contained in a sample drawn from that population.
•Methods of inference usually fall into one of the two broad
categories:
•Estimation and Hypothesis testing.
30/05/2024 6

Definition…
•Example:
•An administrator of a large hospital is interested in the mean age of
patients admitted to the hospital during a given year.
•It will be too expensive to go through the records of all patients
admitted during that particular year.
•He consequently selects to examine a sample of the records from
which he can compute an estimate of the mean age of patients
admitted to the hospital that year.
30/05/2024 7

Definition…
•Estimation is concerned with estimate the values of specific
population parameters based on sample statistics.
•It is using sample data to make estimates about population parameters
•The true population parameter value is usually unknown
•The statistic itself is called an estimator and can be of two
types: point or interval.
•The value or values that the estimator assumes are called
estimates.
30/05/2024 8

Definition…
•The estimate is a single computed value, but the estimator is
the rule that tell us how to compute this value, or estimate.
•Estimators are usually presented as formulas.
•For E.g. Mean,
•is an estimator of the population mean, μ
30/05/2024 9n
x
x
n
i
i
=
=
1

Definition…
•Sample mean ҧ??????
?????? , calculated using data in a sample of size n, is a
point estimator of the population mean μ.
• If ҧ??????
??????= 10 the value 10 is called a point estimate of the population
mean μ
•Sample mean (ҧ??????) is an unbiased estimator of population mean μ.
E(ഥ??????) = µ
30/05/2024 10

Three Properties of a Good Estimator
1.The estimator should be an unbiased estimator.
•That is, the expected value or the mean of the estimates obtained from samples
of a given size is equal to the parameter being estimated.
2.The estimator should be consistent.
•For a consistent estimator, as sample size increases, the value of the estimator
approaches the value of the parameter estimated.
3.The estimator should be a relatively efficient estimator.
•That is, of all the statistics that can be used to estimate a parameter, the
relatively efficient estimator has the smallest variance.
30/05/2024 11

Methods of estimation
•There are two methods of estimation:
•Point estimation
•Interval estimation
•Point estimation involves the calculation of a single value to
estimate the population parameter.
•Interval estimation specifies a range of values assumed to
include population parameter.
30/05/2024 12

Point estimation
•A point estimate is a single numerical value used to estimate
the corresponding population parameter.
•A point estimate of some population parameter O is a single
value Ô of a sample statistic.
•To each sample statistic there corresponds a population
parameter.
30/05/2024 13

Point estimation…
•Point estimator: single best guess
•It has the form: [ Value ]
30/05/2024 14

Interval estimation
•A point estimate does not give any indication on how far away
the parameter lies.
•A more useful method of estimation is to compute an interval
which has a high probability of containing the parameter.
•An interval estimate is a statement that a population parameter
has a value lying between two specified limits.
30/05/2024 15

Interval estimation…
•Interval estimator: It has the form of a "range of plausible
values”
•It has the form: [lower limit, upper limit]
30/05/2024 16

Interval estimation…
•It specifies a range of reasonable values for the population
parameter based on a point estimate.
•An interval estimate provides more information about a
population characteristic than a point estimate.
•Such interval estimates are called Confidence Intervals (CI)
30/05/2024 17

Interval estimation…
•A Confidence Interval: Tells about variability
•Gives information about closeness to unknown population parameters
• Stated in terms of level of confidence
•Never 100% sure
30/05/2024 18

Confidence interval (CI)
•CI also give information about the precision of an estimate.
•How much uncertainty is associated with a point estimate of a
population parameter?
•When sampling variability is high, the CI will be wide to reflect
the uncertainty of the observation.
•Wider CI indicate less certainty.
30/05/2024 19

point estimate  (measure of how confident
we want to be)  (standard error)
The value of the statistic in sample (E.g.:
mean, proportion, etc.)
From a Z table or a T table, depending on
the sampling distribution of the statistic.
Standard error of the statistic
5/30/2024 20
The general formula for all CI

The general formula for all CI…
•Lower limit = Point Estimate - (Critical Value) x (Standard Error)
•Upper limit = Point Estimate + (Critical Value) x (Standard Error)
•A wide interval suggests imprecision of estimation.
•Narrow CI widths reflects large sample size or low variability or both.
•Note:
•Measure of how confident we want to be = critical value = confidence
coefficient
30/05/2024 21

Confidence Level
•It is confidence in which the interval will contain the unknown
population parameter.
•A percentage (less than 100%)
•Example: 90%, 95%, 99%
•Also written (1 - α) = 0.95, 100 (1-α) =95%
•α is to be chosen by the researcher, most common values of α are 0.05, 0.01,
0.001 and 0.1
•Definition: we are 100 (1-α) [e.g., 95%] confident that the single
computed interval contains the unknown population parameter.
30/05/2024 22

Interval Estimate components
•Estimator ± Margin of error
•Estimator ± (Reliability coefficient) x (Standard error)
•Precision of the estimate or Margin of error (d)= reliability coefficient
x standard error
•Where:
•Reliability Coefficient (RC) is the [p ] percentile of the given probability
distribution.
•Standard Error (SE) is the standard deviation of the sampling distribution of
the statistics (the point estimator)
30/05/2024 23

Confidence Level…
•The standardized z or t value corresponding to the given level
of confidence.
•Z = 1.64 if your confidence level is 90%.
•Z = 1.96 if your confidence level is 95%.
•Z = 2.58 if your confidence level is 99%.
30/05/2024 24

Interpreting Confidence Intervals
1) Probabilistic interpretation:
•In a repeated sampling, from a normally distributed population
with a known standard deviation, 100(1-α) percent of all
intervals of the form
•Estimator ± (reliability coefficient) x (standard error) will in the
long-run include the population parameter of interest.
30/05/2024 25

Interpreting Confidence Intervals…
2) Practical interpretation:
•When sampling is from a normally distributed population with a
known standard deviation, we are 100(1-α) percent confident
that the single computed interval,
•Estimator ± (reliability coefficient) x (standard error), contains
the population parameter of interest.
30/05/2024 26

Estimation for Single Population
30/05/2024 27

Central Limit Theorem
•As the sample size n increases without limit, the shape of the
distribution of the sample means taken with replacement from a
population with mean μ and standard deviation s will approach
a normal distribution.
•This standard deviation of the sampling distribution of means is
called standard error of the mean and is given by
•Large sample size, n>30
30/05/2024 28

1.CI for a Single Population Mean (normally distributed)
A. Known variance (large sample size)
•There are 3 elements to a CI:
1.Point estimate
2.SE of the point estimate
3.Confidence coefficient
•Consider the task of computing a CI estimate of μ for a
population distribution that is normal with σ known.
•Available data from a random sample of size = n.
30/05/2024 29

Assumptions
•Population standard deviation () is known
•Population is normally distributed
•If population is not normal, use large sample
•A 100(1-)% C.I. for  is:
• is to be chosen by the researcher, most common values of 
are 0.05, 0.01 and 0.1.
30/05/2024 30

Assumptions…
30/05/2024 31

Finding the Critical Value
30/05/2024 32

Margin of Error (Precision of the estimate)
30/05/2024 33

Factors Affecting Margin of Error
•The CI for mean or margin of error is determined by n, s, and α.
•As n increases, the CI decreases.
•As s increases, the length of CI increases.
•As the confidence level increases, α decreases, the length of CI
increases.
30/05/2024 34

Example:
•Waiting times (in hours) at a particular hospital are believed to
be approximately normally distributed with a variance of 2.25
hr.
A) A sample of 20 outpatients revealed a mean waiting time of 1.52
hours. Construct the 95% CI for the estimate of the population mean.
B) Suppose that the mean of 1.52 hours had resulted from a sample of
32 patients. Find the 95% CI.
C) What effect does larger sample size have on the CI?
30/05/2024 35

Solution a
•We are 95% confident that the true mean waiting time is between
0.87 and 2.17 hrs.
•Although the true mean may or may not be in this interval, 95% of
the intervals formed in this manner will contain the true mean.
•An incorrect interpretation is that there is 95% probability that this
interval contains the true population mean.
30/05/2024 36)17.2 ,87(.65.52.1
)33(.96.152.1
20
25.2
96.152.1
==
=

Solution b
•Solution c
•The larger the sample size makes the CI narrower (more precision).
30/05/2024 37).052 ,99(.53.52.1
)27(.96.152.1
32
25.2
96.152.1
==
=

Confidence Interval…
•When constructing CIs, it has been assumed that the standard
deviation of the underlying population,  , is known
•What if  is not known?
•In practice, if the population mean μ is unknown, then the
standard deviation, , is probably unknown as well.
30/05/2024 38

Confidence Interval…
•In this case, the SE of the population can be replaced by the SE of the sample if
the sample size is large enough (n>30). With large sample size, we assume a
normal distribution.
•Example: It was found that a sample of 35 patients were 17.2 minutes late for
appointments, on the average, with SD of 8 minutes. What is the 90% Cl for µ?
Ans: (15.0, 19.4).
•Since the sample size is fairly large (>30) and the population SD is unknown, we
assume the distribution of sample mean to be normally distributed based on the
CLT and the sample SD to replace population .
30/05/2024 39

B. Unknown variance (small sample size, n < 30)
•What if the  for the underlying population is unknown and the
sample size is small?
•As an alternative we use Student’s t distribution.
•Population standard deviation is Unknown
•Population is normally distributed
•If the population is not normal, use large sample
30/05/2024 40

B. Unknown variance (small sample size, n < 30)…
•Use Student’s t distribution
•Confidence Interval estimate
30/05/2024 41

Student’s t Distribution
•The t is a family of distributions
•Bell Shaped
•Symmetric about zero (the mean)
•Flatter than the Normal (0,1). This means
•The variability of a t is greater than that of a Z that is normal (0,1)
•Thus, there is more area under the tails and less at centre
•Because variability is greater, resulting confidence intervals will be
wider.
30/05/2024 42

Student’s t Distribution…
30/05/2024 43
Note: t dist’n approaches z dist’n as n increases

Student’s t Distribution…
30/05/2024 44-5 0 5
0.0
0.1
0.2
0.3
0.4
Value
density
T-distribution and Standard Normal Z distribution
Z distribution
T with 60 d.f.
•As the df gets larger, the student’s t-distribution looks more and more like
the SND with mean=0 and variance=1.

Student’s t Distribution…
30/05/2024 45
•What happens to CI as sample gets larger?
For large samples: Z and t values become
almost identical, so CIs are almost
identical.

Student’s t Distribution…
•t distribution values
•With comparison to the Z value
30/05/2024 46

Example
30/05/2024 47
•A random sample of size n =20 duration(minutes) of cardiac bypass
surgeries has a mean of duration of ഥ?????? = 267 minutes, and variance ??????
??????
=
36,700 minutes
2
Assuming the underlying distribution is normal with
unknown variance, construct a 90%Cl estimate of unknown true mean,
µ.
Standard error =
t-value at 90% CI at 19 df =1.729

Solution
•Putting this altogether:-
•Lower limit
= (point estimate) - con coeff(SE of point estimate )
= 267-(1.729)*(42.7)
= 193.17
•Upper limit
=(point estimate) + con coeff(SE of point estimate )
= 267 + (1.729)*(42.7
= 340.83
Thus, a 90%CI for the true mean duration of surgery is (193.17, 340.83) minutes
30/05/2024 48

Exercise
•Compute a 95% CI for the mean birth weight based on n = 10,
sample mean = 116.9 oz and s =21.70. From the t Table, t9, 0.975
= 2.262
Ans: (101.4, 132.4)
30/05/2024 49

2. CIs for single population proportion, p
•An interval estimation for the population proportion (P) can be
calculated by adding an allowance for an uncertainty to the
sample proportion
• It is based on three elements of CI.
•Point estimate
•SE of point estimate
•Confidence coefficient
30/05/2024 50

2. CIs for single population proportion, p…
•The distribution of the sample proportion is approximately
normal if the sample size is large, with standard deviation
•Possible to estimate with the sample data
30/05/2024 51

2. CIs for single population proportion, p…
•The CI for population proportion is calculated by:
•Where;
30/05/2024 52

Example 1
•A random sample of 100 people shows that 25 are left-handed.
Form a 95% CI for the true proportion of left-handers.
•Solution:
30/05/2024 53
Interpretation:

Example 2
•It was found that 28.1% of 153 cervical-cancer cases had never
had a Pap smear test prior to the time of case’s diagnosis.
Calculate a 95% CI for the percentage of cervical-cancer cases
who never had a Pap test.
•Solution:
30/05/2024 54

Example 3
•Suppose that among 10,000 female operating-room nurses, 60
women have developed breast cancer over five years. Find the
95% for p based on point estimate.
•Solution:
30/05/2024 55

Sample Size Determination
30/05/2024 56

Sample size
•Determining the sample size for a study is a crucial component
of study design.
•The goal is to include sufficient numbers of subjects so that
statistically significant results can be detected.
30/05/2024 57

Sample size…
•Among the questions that a researcher should ask when
planning a survey or study is that "How large a sample do I
need?“ in order to answer the study objectives.
•If the study is too small we may fail to detect important effects,
or may estimate effects too imprecisely.
•If the sample size is too large then, waste resources.
•The answer will depend on the aims, nature and scope of the
study and on the expected result.
30/05/2024 58

Sample size…
•Sample size depends on:
•The type of data analysis to be performed
•The desired precision of the estimates one wishes to achieve
•The kind and number of comparisons that will be made
•The number of variables that have to be examined simultaneously
•How heterogeneous the sampled population is, etc.
30/05/2024 59

1. Sample Size for Single Population Mean
•For continuous outcome variable
•Standard deviation of the population: It is rare that a
researcher knows the exact standard deviation of the
population.
•Typically, the standard deviation of the population is estimated:
•From the results of a previous survey, From a pilot study, From
secondary data, From judgment of the researcher.
30/05/2024 60

1. Sample Size for Single Population Mean
•Maximum acceptable difference(w): This is the maximum
amount of error that you are willing to accept.
•Desired confidence level: The confidence level is your level
of certainty that the sample mean does not differ from the true
population mean by more than the maximum acceptable
difference. Commonly we use a 95% confidence level.
30/05/2024 61

1. Sample Size for Single Population Mean
•The sample size determination formula for single population mean is
defined by:
•Where
• α= The level of significance which can be obtained as 1-confidence level.
• σ=Standard deviation of the population
•w= Maximum acceptable difference
•z α/2 = The value under standard normal table for the given value of confidence level
30/05/2024 62

Example
•A researcher wishes to estimate mean haemoglobin level in a defined
community. From preliminary contact he thinks this mean is about 150
mg/l with a standard deviation of 32 mg/l. If he is willing to tolerate a
sampling error of up to 5 mg/l in his estimate, how many subjects
should be included in his study? (α =5%, two sided)
Solution:
•If the population size is assumed to be very large, the required sample
size would be:
n = (1.96)
2
(32)
2
= 157.4 ≈ 158 persons
(5)
2
30/05/2024 63

2. Sample Size for Single Population Proportion
•When the variable of interest is categorical.
•The possible source of this proportion are:
•From the results of a previous study,
•Item from a pilot study,
•Taking 50%, if no previously studied
•Item judgment of the researcher.
30/05/2024 64

2. Sample Size for Single Population Proportion
•Then the formula for the sample size of single population
proportion is defined as:
30/05/2024 65

Example
•A public health student wants to conduct a research on the
prevalence of ANC utilization of mothers in fiche town. Given
that the prevalence from the previous study found to be 45.7% ,
what will be the sample size determined to address the
objective? (Margin of error = 5%), CL of 95 %)
Solution
•A confidence level of 95% will give the value of Zα/2=1.96.
•Then using the formula of:
n= (1.96)
2
0.45(1-0.45) = 382
(0.05)
2
30/05/2024 66

Considerations
30/05/2024 67

Reading assignment
•Sample size for Comparison of two population
•Proportion and mean
30/05/2024 68

Thank you!
30/05/2024 69
Tags