Learning outcome
•At the end of this chapter students should be able to:
•Define parameter, statistic, inference, and estimation
•Identify point and interval estimations use to make inference
•Construct and interpret confidence interval for a single proportion
•Determine sample sizes
30/05/2024 3
Definition
•A statistic is a characteristic or measure obtained by using the
data values from a sample.
•A parameter is a characteristic or measure obtained by using
all the data values from a specific population.
30/05/2024 4
Definition…
•Sample statistic:
•Sample mean (x̄)
•Sample variance (S
2
)
•Sample Standard deviation (SD)
•Sample proportion (p̂)
•Population parameter:
•population mean (μ)
•Population variance (σ
2
)
•Population standard deviation (σ)
•Population proportion (P or π)
30/05/2024 5
Definition…
•Statistical inference is the procedure by which we reach a
conclusion about a population on the basis of the information
contained in a sample drawn from that population.
•Methods of inference usually fall into one of the two broad
categories:
•Estimation and Hypothesis testing.
30/05/2024 6
Definition…
•Example:
•An administrator of a large hospital is interested in the mean age of
patients admitted to the hospital during a given year.
•It will be too expensive to go through the records of all patients
admitted during that particular year.
•He consequently selects to examine a sample of the records from
which he can compute an estimate of the mean age of patients
admitted to the hospital that year.
30/05/2024 7
Definition…
•Estimation is concerned with estimate the values of specific
population parameters based on sample statistics.
•It is using sample data to make estimates about population parameters
•The true population parameter value is usually unknown
•The statistic itself is called an estimator and can be of two
types: point or interval.
•The value or values that the estimator assumes are called
estimates.
30/05/2024 8
Definition…
•The estimate is a single computed value, but the estimator is
the rule that tell us how to compute this value, or estimate.
•Estimators are usually presented as formulas.
•For E.g. Mean,
•is an estimator of the population mean, μ
30/05/2024 9n
x
x
n
i
i
=
=
1
Definition…
•Sample mean ҧ??????
?????? , calculated using data in a sample of size n, is a
point estimator of the population mean μ.
• If ҧ??????
??????= 10 the value 10 is called a point estimate of the population
mean μ
•Sample mean (ҧ??????) is an unbiased estimator of population mean μ.
E(ഥ??????) = µ
30/05/2024 10
Three Properties of a Good Estimator
1.The estimator should be an unbiased estimator.
•That is, the expected value or the mean of the estimates obtained from samples
of a given size is equal to the parameter being estimated.
2.The estimator should be consistent.
•For a consistent estimator, as sample size increases, the value of the estimator
approaches the value of the parameter estimated.
3.The estimator should be a relatively efficient estimator.
•That is, of all the statistics that can be used to estimate a parameter, the
relatively efficient estimator has the smallest variance.
30/05/2024 11
Methods of estimation
•There are two methods of estimation:
•Point estimation
•Interval estimation
•Point estimation involves the calculation of a single value to
estimate the population parameter.
•Interval estimation specifies a range of values assumed to
include population parameter.
30/05/2024 12
Point estimation
•A point estimate is a single numerical value used to estimate
the corresponding population parameter.
•A point estimate of some population parameter O is a single
value Ô of a sample statistic.
•To each sample statistic there corresponds a population
parameter.
30/05/2024 13
Point estimation…
•Point estimator: single best guess
•It has the form: [ Value ]
30/05/2024 14
Interval estimation
•A point estimate does not give any indication on how far away
the parameter lies.
•A more useful method of estimation is to compute an interval
which has a high probability of containing the parameter.
•An interval estimate is a statement that a population parameter
has a value lying between two specified limits.
30/05/2024 15
Interval estimation…
•Interval estimator: It has the form of a "range of plausible
values”
•It has the form: [lower limit, upper limit]
30/05/2024 16
Interval estimation…
•It specifies a range of reasonable values for the population
parameter based on a point estimate.
•An interval estimate provides more information about a
population characteristic than a point estimate.
•Such interval estimates are called Confidence Intervals (CI)
30/05/2024 17
Interval estimation…
•A Confidence Interval: Tells about variability
•Gives information about closeness to unknown population parameters
• Stated in terms of level of confidence
•Never 100% sure
30/05/2024 18
Confidence interval (CI)
•CI also give information about the precision of an estimate.
•How much uncertainty is associated with a point estimate of a
population parameter?
•When sampling variability is high, the CI will be wide to reflect
the uncertainty of the observation.
•Wider CI indicate less certainty.
30/05/2024 19
point estimate (measure of how confident
we want to be) (standard error)
The value of the statistic in sample (E.g.:
mean, proportion, etc.)
From a Z table or a T table, depending on
the sampling distribution of the statistic.
Standard error of the statistic
5/30/2024 20
The general formula for all CI
The general formula for all CI…
•Lower limit = Point Estimate - (Critical Value) x (Standard Error)
•Upper limit = Point Estimate + (Critical Value) x (Standard Error)
•A wide interval suggests imprecision of estimation.
•Narrow CI widths reflects large sample size or low variability or both.
•Note:
•Measure of how confident we want to be = critical value = confidence
coefficient
30/05/2024 21
Confidence Level
•It is confidence in which the interval will contain the unknown
population parameter.
•A percentage (less than 100%)
•Example: 90%, 95%, 99%
•Also written (1 - α) = 0.95, 100 (1-α) =95%
•α is to be chosen by the researcher, most common values of α are 0.05, 0.01,
0.001 and 0.1
•Definition: we are 100 (1-α) [e.g., 95%] confident that the single
computed interval contains the unknown population parameter.
30/05/2024 22
Interval Estimate components
•Estimator ± Margin of error
•Estimator ± (Reliability coefficient) x (Standard error)
•Precision of the estimate or Margin of error (d)= reliability coefficient
x standard error
•Where:
•Reliability Coefficient (RC) is the [p ] percentile of the given probability
distribution.
•Standard Error (SE) is the standard deviation of the sampling distribution of
the statistics (the point estimator)
30/05/2024 23
Confidence Level…
•The standardized z or t value corresponding to the given level
of confidence.
•Z = 1.64 if your confidence level is 90%.
•Z = 1.96 if your confidence level is 95%.
•Z = 2.58 if your confidence level is 99%.
30/05/2024 24
Interpreting Confidence Intervals
1) Probabilistic interpretation:
•In a repeated sampling, from a normally distributed population
with a known standard deviation, 100(1-α) percent of all
intervals of the form
•Estimator ± (reliability coefficient) x (standard error) will in the
long-run include the population parameter of interest.
30/05/2024 25
Interpreting Confidence Intervals…
2) Practical interpretation:
•When sampling is from a normally distributed population with a
known standard deviation, we are 100(1-α) percent confident
that the single computed interval,
•Estimator ± (reliability coefficient) x (standard error), contains
the population parameter of interest.
30/05/2024 26
Estimation for Single Population
30/05/2024 27
Central Limit Theorem
•As the sample size n increases without limit, the shape of the
distribution of the sample means taken with replacement from a
population with mean μ and standard deviation s will approach
a normal distribution.
•This standard deviation of the sampling distribution of means is
called standard error of the mean and is given by
•Large sample size, n>30
30/05/2024 28
1.CI for a Single Population Mean (normally distributed)
A. Known variance (large sample size)
•There are 3 elements to a CI:
1.Point estimate
2.SE of the point estimate
3.Confidence coefficient
•Consider the task of computing a CI estimate of μ for a
population distribution that is normal with σ known.
•Available data from a random sample of size = n.
30/05/2024 29
Assumptions
•Population standard deviation () is known
•Population is normally distributed
•If population is not normal, use large sample
•A 100(1-)% C.I. for is:
• is to be chosen by the researcher, most common values of
are 0.05, 0.01 and 0.1.
30/05/2024 30
Assumptions…
30/05/2024 31
Finding the Critical Value
30/05/2024 32
Margin of Error (Precision of the estimate)
30/05/2024 33
Factors Affecting Margin of Error
•The CI for mean or margin of error is determined by n, s, and α.
•As n increases, the CI decreases.
•As s increases, the length of CI increases.
•As the confidence level increases, α decreases, the length of CI
increases.
30/05/2024 34
Example:
•Waiting times (in hours) at a particular hospital are believed to
be approximately normally distributed with a variance of 2.25
hr.
A) A sample of 20 outpatients revealed a mean waiting time of 1.52
hours. Construct the 95% CI for the estimate of the population mean.
B) Suppose that the mean of 1.52 hours had resulted from a sample of
32 patients. Find the 95% CI.
C) What effect does larger sample size have on the CI?
30/05/2024 35
Solution a
•We are 95% confident that the true mean waiting time is between
0.87 and 2.17 hrs.
•Although the true mean may or may not be in this interval, 95% of
the intervals formed in this manner will contain the true mean.
•An incorrect interpretation is that there is 95% probability that this
interval contains the true population mean.
30/05/2024 36)17.2 ,87(.65.52.1
)33(.96.152.1
20
25.2
96.152.1
==
=
Solution b
•Solution c
•The larger the sample size makes the CI narrower (more precision).
30/05/2024 37).052 ,99(.53.52.1
)27(.96.152.1
32
25.2
96.152.1
==
=
Confidence Interval…
•When constructing CIs, it has been assumed that the standard
deviation of the underlying population, , is known
•What if is not known?
•In practice, if the population mean μ is unknown, then the
standard deviation, , is probably unknown as well.
30/05/2024 38
Confidence Interval…
•In this case, the SE of the population can be replaced by the SE of the sample if
the sample size is large enough (n>30). With large sample size, we assume a
normal distribution.
•Example: It was found that a sample of 35 patients were 17.2 minutes late for
appointments, on the average, with SD of 8 minutes. What is the 90% Cl for µ?
Ans: (15.0, 19.4).
•Since the sample size is fairly large (>30) and the population SD is unknown, we
assume the distribution of sample mean to be normally distributed based on the
CLT and the sample SD to replace population .
30/05/2024 39
B. Unknown variance (small sample size, n < 30)
•What if the for the underlying population is unknown and the
sample size is small?
•As an alternative we use Student’s t distribution.
•Population standard deviation is Unknown
•Population is normally distributed
•If the population is not normal, use large sample
30/05/2024 40
B. Unknown variance (small sample size, n < 30)…
•Use Student’s t distribution
•Confidence Interval estimate
30/05/2024 41
Student’s t Distribution
•The t is a family of distributions
•Bell Shaped
•Symmetric about zero (the mean)
•Flatter than the Normal (0,1). This means
•The variability of a t is greater than that of a Z that is normal (0,1)
•Thus, there is more area under the tails and less at centre
•Because variability is greater, resulting confidence intervals will be
wider.
30/05/2024 42
Student’s t Distribution…
30/05/2024 43
Note: t dist’n approaches z dist’n as n increases
Student’s t Distribution…
30/05/2024 44-5 0 5
0.0
0.1
0.2
0.3
0.4
Value
density
T-distribution and Standard Normal Z distribution
Z distribution
T with 60 d.f.
•As the df gets larger, the student’s t-distribution looks more and more like
the SND with mean=0 and variance=1.
Student’s t Distribution…
30/05/2024 45
•What happens to CI as sample gets larger?
For large samples: Z and t values become
almost identical, so CIs are almost
identical.
Student’s t Distribution…
•t distribution values
•With comparison to the Z value
30/05/2024 46
Example
30/05/2024 47
•A random sample of size n =20 duration(minutes) of cardiac bypass
surgeries has a mean of duration of ഥ?????? = 267 minutes, and variance ??????
??????
=
36,700 minutes
2
Assuming the underlying distribution is normal with
unknown variance, construct a 90%Cl estimate of unknown true mean,
µ.
Standard error =
t-value at 90% CI at 19 df =1.729
Solution
•Putting this altogether:-
•Lower limit
= (point estimate) - con coeff(SE of point estimate )
= 267-(1.729)*(42.7)
= 193.17
•Upper limit
=(point estimate) + con coeff(SE of point estimate )
= 267 + (1.729)*(42.7
= 340.83
Thus, a 90%CI for the true mean duration of surgery is (193.17, 340.83) minutes
30/05/2024 48
Exercise
•Compute a 95% CI for the mean birth weight based on n = 10,
sample mean = 116.9 oz and s =21.70. From the t Table, t9, 0.975
= 2.262
Ans: (101.4, 132.4)
30/05/2024 49
2. CIs for single population proportion, p
•An interval estimation for the population proportion (P) can be
calculated by adding an allowance for an uncertainty to the
sample proportion
• It is based on three elements of CI.
•Point estimate
•SE of point estimate
•Confidence coefficient
30/05/2024 50
2. CIs for single population proportion, p…
•The distribution of the sample proportion is approximately
normal if the sample size is large, with standard deviation
•Possible to estimate with the sample data
30/05/2024 51
2. CIs for single population proportion, p…
•The CI for population proportion is calculated by:
•Where;
30/05/2024 52
Example 1
•A random sample of 100 people shows that 25 are left-handed.
Form a 95% CI for the true proportion of left-handers.
•Solution:
30/05/2024 53
Interpretation:
Example 2
•It was found that 28.1% of 153 cervical-cancer cases had never
had a Pap smear test prior to the time of case’s diagnosis.
Calculate a 95% CI for the percentage of cervical-cancer cases
who never had a Pap test.
•Solution:
30/05/2024 54
Example 3
•Suppose that among 10,000 female operating-room nurses, 60
women have developed breast cancer over five years. Find the
95% for p based on point estimate.
•Solution:
30/05/2024 55
Sample Size Determination
30/05/2024 56
Sample size
•Determining the sample size for a study is a crucial component
of study design.
•The goal is to include sufficient numbers of subjects so that
statistically significant results can be detected.
30/05/2024 57
Sample size…
•Among the questions that a researcher should ask when
planning a survey or study is that "How large a sample do I
need?“ in order to answer the study objectives.
•If the study is too small we may fail to detect important effects,
or may estimate effects too imprecisely.
•If the sample size is too large then, waste resources.
•The answer will depend on the aims, nature and scope of the
study and on the expected result.
30/05/2024 58
Sample size…
•Sample size depends on:
•The type of data analysis to be performed
•The desired precision of the estimates one wishes to achieve
•The kind and number of comparisons that will be made
•The number of variables that have to be examined simultaneously
•How heterogeneous the sampled population is, etc.
30/05/2024 59
1. Sample Size for Single Population Mean
•For continuous outcome variable
•Standard deviation of the population: It is rare that a
researcher knows the exact standard deviation of the
population.
•Typically, the standard deviation of the population is estimated:
•From the results of a previous survey, From a pilot study, From
secondary data, From judgment of the researcher.
30/05/2024 60
1. Sample Size for Single Population Mean
•Maximum acceptable difference(w): This is the maximum
amount of error that you are willing to accept.
•Desired confidence level: The confidence level is your level
of certainty that the sample mean does not differ from the true
population mean by more than the maximum acceptable
difference. Commonly we use a 95% confidence level.
30/05/2024 61
1. Sample Size for Single Population Mean
•The sample size determination formula for single population mean is
defined by:
•Where
• α= The level of significance which can be obtained as 1-confidence level.
• σ=Standard deviation of the population
•w= Maximum acceptable difference
•z α/2 = The value under standard normal table for the given value of confidence level
30/05/2024 62
Example
•A researcher wishes to estimate mean haemoglobin level in a defined
community. From preliminary contact he thinks this mean is about 150
mg/l with a standard deviation of 32 mg/l. If he is willing to tolerate a
sampling error of up to 5 mg/l in his estimate, how many subjects
should be included in his study? (α =5%, two sided)
Solution:
•If the population size is assumed to be very large, the required sample
size would be:
n = (1.96)
2
(32)
2
= 157.4 ≈ 158 persons
(5)
2
30/05/2024 63
2. Sample Size for Single Population Proportion
•When the variable of interest is categorical.
•The possible source of this proportion are:
•From the results of a previous study,
•Item from a pilot study,
•Taking 50%, if no previously studied
•Item judgment of the researcher.
30/05/2024 64
2. Sample Size for Single Population Proportion
•Then the formula for the sample size of single population
proportion is defined as:
30/05/2024 65
Example
•A public health student wants to conduct a research on the
prevalence of ANC utilization of mothers in fiche town. Given
that the prevalence from the previous study found to be 45.7% ,
what will be the sample size determined to address the
objective? (Margin of error = 5%), CL of 95 %)
Solution
•A confidence level of 95% will give the value of Zα/2=1.96.
•Then using the formula of:
n= (1.96)
2
0.45(1-0.45) = 382
(0.05)
2
30/05/2024 66
Considerations
30/05/2024 67
Reading assignment
•Sample size for Comparison of two population
•Proportion and mean
30/05/2024 68