Sample
Summary
(Statistic/Estimator
)
Sampling
Distribution
(Z, t, chi-square, F)
Population
Summary
(Parameter)
Population
Distribution
(Binomial,
Poisson,
Normal)
Inference
Inference
Example
•Mean amount spent is a random variable
•Proportion of youth customers ordering pizza is a random
variable
Concept of sampling distribution
Supposeyouselectallpossiblerandomsamplesof
customers,eachofthosesampleswillyieldavalue
oftheaverageamountspent(ҧ??????).Ifyouconstructa
histogramofthosevalues,whatyouwillgetis
preciselythesamplingdistributionofthemean
amountspent
Sampling Distributions
•t-distribution
•Chi-square distribution
•F distribution
•The t
dfDistribution with Various Degrees of
Freedom
LO 8.4
Chi-square
The distribution of chi-square depends on 1 parameter, its degrees of freedom (df
or v). As dfgets large, curve is less skewed, more normal.
F Distribution
Car Mileage Case
HybridandelectriccarsareavitalpartinreducingUS’sgasolineconsumption.
Mosteffectivewaytoconservegasolineistodesigngasolinepoweredcarsthatare
morefuelefficient.Virtuallyeverygasolinepoweredmidsizecarsequippedwith
automatictransmissionhasanEPAcombinedcityandhighwaymileageestimateof
26miles/gallonorless.Supposethatgovernmenthasdecidedtooffertaxcreditto
anyautomakersellingmidsizemodelwhichachievesanEPAofatleast31mpg.
Consideranautomakerhasrecentlyintroducedanewmidsizedmodelthatthis
qualifiesforthetaxcredit.Considerthepopulationofallcarsofthistypethatwill
orcouldbepotentiallybeproduced.Theautomakerwillchooseasampleof50of
thesecars.Themanufacturersproductionoperationruns8hour-shifts,with100
midsizedcarsproducedoneachshift.Whenallstartupproblemshavebeen
corrected,automakerselect1caratrandomfromeachof50shiftsandtheyare
subjectedtoEPAtest.
Sampling Distribution of the Sample Mean
Thesamplingdistributionofthesamplemeanisthe
probabilitydistributionofthepopulationofthesample
meansobtainablefromallpossiblesamplesofsizenfroma
population
Example: The Population of Sample Means
Example: A Graph of the Probability
Distribution
Standard Error
•Variation in the values of statistic from sample to
sample is called sampling fluctuation and is
measured by STANDARD ERROR
Sampling Distribution of Meanx
xE
=
=
ˆ
)( n
xes
=)(.
As sample size increases, standard error decreases
Result
??????�??????~????????????,??????
2
,ҧ??????~??????(??????,
??????
2
??????
)
Example
The foreman of a bottling plant has observed that the amount
of soda in each “32-ounce” bottle is actually a normally
distributed random variable, with a mean of 32.2 ounces and
a standard deviation of .3 ounce.
If a customer buys one bottle, what is the probability that the
bottle will contain more than 32 ounces?
Example
We want to find P(X > 32), where X is normally distributed and µ =
32.2 and σ=.3
“there is about a 75% chance that a single bottle of soda contains more than
32oz.”7486.2514.1)67.Z(P
3.
2.3232X
P)32X(P =−=−=
−
−
=
Example
The foreman of a bottling plant has observed that the amount
of soda in each “32-ounce” bottle is actually a normally
distributed random variable, with a mean of 32.2 ounces and
a standard deviation of .3 ounce.
If a customer buys a carton of fourbottles, what is the
probability that the mean amount of the four bottleswill be
greater than 32 ounces?
Example
We want to find P(X > 32), where X is normally distributed
With µ = 32.2 and σ=.3
Things we know:
X is normally distributed, therefore so will X.
= 32.2 oz.
Example
If a customer buys a carton of fourbottles, what is the probability that
the mean amount of the four bottleswill be greater than 32 ounces?
“There is about a 91% chance the mean of the four bottles will exceed
32oz.”
what is the probability that one bottle will
contain more than 32 ounces?
what is the probability that the mean of
four bottles will exceed 32 oz?
mean=32.2
Central Limit Theorem (CLT)
Ifarandomsampleofsizenisdrawnfroma
populationwithmeanµandstandarddeviationσ,
thedistributionofthesamplemean approaches
normaldistributionwithmeanµandstandard
deviationasthesamplesize(n)increases.
Ifthepopulationisnormal,thedistributionofthe
samplemeanisnormalregardlessofsamplesize.)(x n
n
Nxei
2
,~ ..
How Large?
•How large is “large enough?”
•If the sample size is at least 30, then for most
populations, the sampling distribution of sample
means is approximately normal
•For skewed distribution, it may be even 50 or more
•For heavy tailed it may be even more (100 or more)
•If the population is normal, then the sampling
distribution of sample mean is normal regardless of
the sample size
Data Analysis
Mean 31.56
Standard Error 0.112812
Median 31.55
Mode 31.4
Standard Deviation 0.797701
Sample Variance 0.636327
Kurtosis -0.51125
Skewness -0.03422
Range 3.5
Minimum 29.8
Maximum 33.3
Sum 1578
Count 50
How to estimate parameters?
Already seen that
Ƹ??????=ҧ??????����??????������??????������??????�??????�������??????�??????����??????�
For the car mileage case sample mean=31.56
How to estimate population Standard
deviation σ?
ො??????=�=�??????������=
1
�−1
??????−ҧ??????
2
Note: These estimates are point estimates, may not
be perfect.
Use “Interval Estimates”
Confidence Intervals
Interval Estimate =
Point Estimate ±Margin of Error
Margin of Error = sampling
distribution (point)*Standard error
Confidence Intervals for a Mean: σKnown
•Confidenceintervalforapopulationmeanisan
intervalconstructedaroundthesamplemeansowe
arereasonablesurethatitcontainsthepopulation
mean
•Anyconfidenceintervalisbasedonaconfidence
level
Confidence Interval
Statistic (eg.
Sample mean)
Confidence Limit (Lower) Confidence Limit (Upper)
Probability That the Population Parameter Falls
Somewhere Within the Interval.
Elements of Interval Estimation
The Car Mileage Case
•Automaker conducted mileage tests on n=50 cars
•Sample mean is 31.56
•This is a point estimate of the population mean
•Do not know how good this estimate is
•Will use a confidence interval
The Car Mileage Case
•There were many samples of 50 cars
•Each would give different means
•Consider the probability distribution of all the
sample means
•Called the sampling distributionn
xse
x
=
=
)(
The Car Mileage Case
1.Because the sampling distribution of the sample mean is
a normal distribution, we can use the normal distribution
to compute probabilities about the sample mean
2.The 95 percent confidence interval is
=
n
xx
x
96.196.1
0.4
0.3
0.2
0.1
0.0
x
f(
x
)
SamplingDistributionoftheMean
x
x
x
x
x
x
x
x
2.5%
95%
2.5%n
x
96.1− n
x
96.1+
x
2.5% fall above
the interval
2.5% fall below
the interval
95% fall within
the interval
What is happening?
2/6/2021 Statistical Inference
Generalizing
•The probability that the confidence interval will contain the
population mean μis denoted by 1 -α
•1 –αis referred to as the confidence coefficient
•(1 –α) 100% is called the confidence level
•Usual to use two decimal point probabilities for 1 –α
•Here, focus on 1 –α= 0.95 or 0.99
General Confidence Interval
•In general, the probability is 1 –αthat the population
mean μis contained in the interval
•The normal point z
α/2gives a right hand tail area under
the standard normal curve equal to α/2
•The normal point -z
α/2gives a left hand tail area under
the standard normal curve equal to a/2
•The area under the standard normal curve between z
α/2
and z
α/2is 1 –α
=
n
zxzx
x 22
General Confidence Interval
•If a population has standard deviation σ(known),
•and if the population is normal or if sample size is large (n
30), then …
•… a (1-)100% confidence interval for is
+
−=
n
zx,
n
zx
n
zx
222
99% Confidence Interval
•For 99% confidence, need the normal pointz
0.005
•(1 –0.99) / 2 = 0.005
•z
0.005= 2.575
•The 99% confidence interval is
+
−=
=
n
.x,
n
.x
n
.xzx
x.
57525752
5752
0250
The Effect of αon Confidence Interval Width
t-Based Confidence Intervals for a Mean:
σUnknown
•If σis unknown (which is usually the case), we can construct a
confidence interval for μbased on the sampling distribution of
•If the population is normal, then for any sample size n, this sampling
distribution is called the t distributionns
x
t
−
=
The t Distribution
•The curve of the t distribution is similar to that of the
standard normal curve
•Symmetrical and bell-shaped
•The t distribution is more spread out than the standard
normal distribution
•The spread of the t is given by the number of degrees of
freedom (sample size)
•Denoted by df
•For a sample of size n, there are one fewer degrees of
freedom, that is, df= n –1
Degrees of Freedom and the
t-Distribution
As the number of degrees of freedom increases, the spread
of the tdistribution decreases and the tcurve approaches
the standard normal curve
t and Right Hand Tail Areas
•Use a t point denoted by t
α
•t
αis the point on the horizontal axis under the t curve that
gives a right hand tail equal to α
•So the value of t
αin a particular situation depends on the
right hand tail area αand the number of degrees of freedom
•df= n –1
•1 –αis the specified confidence coefficient
tand Right Hand Tail Areas
t-Based Confidence Intervals for a Mean:
σUnknown
•If the sampled population is normally distributed with
mean , then a (1)100% confidence interval for is
•t
/2is the t point giving a right-hand tail area of /2
under the t curve having n-1 degrees of freedomn
s
tx
2
Car Mileage estimation:
•Recall from the previous example, ҧ??????= 31.56 mpg
for a sample of size n=50 and s= 0.8010.2
0.113
50
8.0
49,025.0
=
===
t
n
x
Car Mileage: 95% Confidence interval of
mean mileage]79.31,33.31[
of %95
22713.056.31
)113.0*010.2(56.31
1;2
CI
n
s
tx
n
=
=
−
Practice Problem 1:
•A manufacturer of light bulbs claims that its light bulbs have a mean life hours
with a standard deviation of 85 hours. A random sample of 40 such bulbs is
selected for testing. If the sample produces a mean value of 1505 hours, find out
95% Confidence Interval of .
Solution: Given, n=40 (large), =85 (known), 1-=0.95, =0.05,
95%CIofisgivenby 1505=x 96.1
025.02/
==zz
1531.34 , 66.1478
96.1
40
85
1505 , 96.1
40
85
1505
=
+−
Practice Problem 2:
•Waiting times (in hours) at a popular restaurant are found to have a mean
waiting time of 1.52 hours with sd2.25hrs. for a sample of 50 customers.
Construct the 99% confidence interval for the estimate of the population mean.
Solution: Given, n=50 (large), s=2.25 (estimated), 1-=0.99, =0.01,
Therefore,
99%CIofisgivenby
Usetbasedconfidenceintervalandobservethedifference(assumingnormal
population). 52.1=x 58.2
005.02/
==zz
2.34 , 20.1
58.2
50
25.2
52.1 , 58.2
50
25.2
52.1
=
+−