Sampling distribution and Estimation_Reading.pdf

shubhamc16 20 views 53 slides Aug 18, 2024
Slide 1
Slide 1 of 53
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53

About This Presentation

Dum dum dum


Slide Content

Sampling Distribution
and
Estimation

Sample
Summary
(Statistic/Estimator
)
Sampling
Distribution
(Z, t, chi-square, F)
Population
Summary
(Parameter)
Population
Distribution
(Binomial,
Poisson,
Normal)
Inference
Inference

Case
Arestaurantchainhasmadeimprovementstoitspizzasby
includinganewsoftandtastycrust,moreandbiggertoppings,
morecheeseandanewheavierimportedtomatosauce.The
chainmakesapromotionalcampaignbyofferingtheproductto
itscustomersathalfpriceexpectingalargefootfallofyouths
andfamiliesintheoutlets.Theofferislimitedtoonemonth.
Themanagerofoneoftheoutletsofthechainrestaurant
wantstomeasuretheeffectivenessofthecampaign.Shehas
beenrunningtherestaurantforlasttenyearsandhasclaimed
fromherexperiencethat75%ofitscustomersareyouthand
customersspendanaverageofINR350onpizza.

Case
Inanattempttomeasurethepersistingimpactofthemarketingcampaign
ontheamountthatthecustomersspendonpizza,andontheproportionof
youthsvisittherestaurant,themanagerconductsasurveyon40ofher
pizzacustomers.Thesurveyiscarriedouttwomonthsafterthepromotion
getsovertoeliminatebiasintheexperiment.Thesurveyrevealsthat32
(80%)ofthecustomerscompriseofyouth,andanaverageofINR375is
spentbyacustomeronpizza.Moreover,theestimatedstandarddeviationis
foundtobeINR50.
Themanagerwantstousetheabovedatato
a)obtain95%confidenceintervalforaverageamountspentonpizza.
b)obtain95%confidenceintervalforproportionofyouthcustomers
orderingpizza.
c)testwhethertheaverageamountspenthasincreasedduetothe
campaign.
d)testwhethertheproportionofyouthsbuyingpizzahasincreased.

Example
•Mean amount spent is a random variable
•Proportion of youth customers ordering pizza is a random
variable

Concept of sampling distribution
Supposeyouselectallpossiblerandomsamplesof
customers,eachofthosesampleswillyieldavalue
oftheaverageamountspent(ҧ??????).Ifyouconstructa
histogramofthosevalues,whatyouwillgetis
preciselythesamplingdistributionofthemean
amountspent

Sampling Distributions
•t-distribution
•Chi-square distribution
•F distribution

•The t
dfDistribution with Various Degrees of
Freedom
LO 8.4

Chi-square
The distribution of chi-square depends on 1 parameter, its degrees of freedom (df
or v). As dfgets large, curve is less skewed, more normal.

F Distribution

Car Mileage Case
HybridandelectriccarsareavitalpartinreducingUS’sgasolineconsumption.
Mosteffectivewaytoconservegasolineistodesigngasolinepoweredcarsthatare
morefuelefficient.Virtuallyeverygasolinepoweredmidsizecarsequippedwith
automatictransmissionhasanEPAcombinedcityandhighwaymileageestimateof
26miles/gallonorless.Supposethatgovernmenthasdecidedtooffertaxcreditto
anyautomakersellingmidsizemodelwhichachievesanEPAofatleast31mpg.
Consideranautomakerhasrecentlyintroducedanewmidsizedmodelthatthis
qualifiesforthetaxcredit.Considerthepopulationofallcarsofthistypethatwill
orcouldbepotentiallybeproduced.Theautomakerwillchooseasampleof50of
thesecars.Themanufacturersproductionoperationruns8hour-shifts,with100
midsizedcarsproducedoneachshift.Whenallstartupproblemshavebeen
corrected,automakerselect1caratrandomfromeachof50shiftsandtheyare
subjectedtoEPAtest.

Sampling Distribution of the Sample Mean
Thesamplingdistributionofthesamplemeanisthe
probabilitydistributionofthepopulationofthesample
meansobtainablefromallpossiblesamplesofsizenfroma
population

Example: The Population of Sample Means

Example: A Graph of the Probability
Distribution

Standard Error
•Variation in the values of statistic from sample to
sample is called sampling fluctuation and is
measured by STANDARD ERROR

Sampling Distribution of Meanx
xE
=
=


ˆ
)( n
xes

=)(.
As sample size increases, standard error decreases

Result
??????�??????~????????????,??????
2
,ҧ??????~??????(??????,
??????
2
??????
)

Example
The foreman of a bottling plant has observed that the amount
of soda in each “32-ounce” bottle is actually a normally
distributed random variable, with a mean of 32.2 ounces and
a standard deviation of .3 ounce.
If a customer buys one bottle, what is the probability that the
bottle will contain more than 32 ounces?

Example
We want to find P(X > 32), where X is normally distributed and µ =
32.2 and σ=.3
“there is about a 75% chance that a single bottle of soda contains more than
32oz.”7486.2514.1)67.Z(P
3.
2.3232X
P)32X(P =−=−=




 −


−
=

Example
The foreman of a bottling plant has observed that the amount
of soda in each “32-ounce” bottle is actually a normally
distributed random variable, with a mean of 32.2 ounces and
a standard deviation of .3 ounce.
If a customer buys a carton of fourbottles, what is the
probability that the mean amount of the four bottleswill be
greater than 32 ounces?

Example
We want to find P(X > 32), where X is normally distributed
With µ = 32.2 and σ=.3
Things we know:
X is normally distributed, therefore so will X.
= 32.2 oz.

Example
If a customer buys a carton of fourbottles, what is the probability that
the mean amount of the four bottleswill be greater than 32 ounces?
“There is about a 91% chance the mean of the four bottles will exceed
32oz.”

what is the probability that one bottle will
contain more than 32 ounces?
what is the probability that the mean of
four bottles will exceed 32 oz?
mean=32.2

Central Limit Theorem (CLT)
Ifarandomsampleofsizenisdrawnfroma
populationwithmeanµandstandarddeviationσ,
thedistributionofthesamplemean approaches
normaldistributionwithmeanµandstandard
deviationasthesamplesize(n)increases.
Ifthepopulationisnormal,thedistributionofthe
samplemeanisnormalregardlessofsamplesize.)(x n
 







n
Nxei
2
,~ ..

WHY CLT IS USEFUL
•Whenthesamplingdistributionofisapproximately
normal,wecanusetheEmpiricalruletopredicthow
closesamplemeanswillbetothetruepopulation
mean.
•SincetheCLTholdsforalargenumberofpopulation
distributions,ithelpsustomakeinferencesaboutthe
populationmeansregardlessoftheshapeofthe
populationdistribution.Thisisoftenhelpfulinpractice
sinceweusuallydonotknowthetrueshapeofthe
populationdistribution(andoftenitisskewed).x

How Large?
•How large is “large enough?”
•If the sample size is at least 30, then for most
populations, the sampling distribution of sample
means is approximately normal
•For skewed distribution, it may be even 50 or more
•For heavy tailed it may be even more (100 or more)
•If the population is normal, then the sampling
distribution of sample mean is normal regardless of
the sample size

Data Analysis
Mean 31.56
Standard Error 0.112812
Median 31.55
Mode 31.4
Standard Deviation 0.797701
Sample Variance 0.636327
Kurtosis -0.51125
Skewness -0.03422
Range 3.5
Minimum 29.8
Maximum 33.3
Sum 1578
Count 50

How to estimate parameters?
Already seen that
Ƹ??????=ҧ??????����??????������??????������??????�??????�������??????�??????����??????�
For the car mileage case sample mean=31.56

How to estimate population Standard
deviation σ?
ො??????=�=�??????������=
1
�−1
෍??????−ҧ??????
2
Note: These estimates are point estimates, may not
be perfect.
Use “Interval Estimates”

Confidence Intervals

Interval Estimate =
Point Estimate ±Margin of Error
Margin of Error = sampling
distribution (point)*Standard error

Confidence Intervals for a Mean: σKnown
•Confidenceintervalforapopulationmeanisan
intervalconstructedaroundthesamplemeansowe
arereasonablesurethatitcontainsthepopulation
mean
•Anyconfidenceintervalisbasedonaconfidence
level

Confidence Interval
Statistic (eg.
Sample mean)
Confidence Limit (Lower) Confidence Limit (Upper)
Probability That the Population Parameter Falls
Somewhere Within the Interval.
Elements of Interval Estimation

The Car Mileage Case
•Automaker conducted mileage tests on n=50 cars
•Sample mean is 31.56
•This is a point estimate of the population mean
•Do not know how good this estimate is
•Will use a confidence interval

The Car Mileage Case
•There were many samples of 50 cars
•Each would give different means
•Consider the probability distribution of all the
sample means
•Called the sampling distributionn
xse
x


=
=
)(

The Car Mileage Case
1.Because the sampling distribution of the sample mean is
a normal distribution, we can use the normal distribution
to compute probabilities about the sample mean
2.The 95 percent confidence interval is 






=
n
xx
x

 96.196.1

0.4
0.3
0.2
0.1
0.0
x
f(
x
)
SamplingDistributionoftheMean

x
x
x
x
x
x
x
x
2.5%
95%
2.5%n
x

96.1− n
x

96.1+
x
2.5% fall above
the interval
2.5% fall below
the interval
95% fall within
the interval
What is happening?
2/6/2021 Statistical Inference

Generalizing
•The probability that the confidence interval will contain the
population mean μis denoted by 1 -α
•1 –αis referred to as the confidence coefficient
•(1 –α) 100% is called the confidence level
•Usual to use two decimal point probabilities for 1 –α
•Here, focus on 1 –α= 0.95 or 0.99

General Confidence Interval
•In general, the probability is 1 –αthat the population
mean μis contained in the interval
•The normal point z
α/2gives a right hand tail area under
the standard normal curve equal to α/2
•The normal point -z
α/2gives a left hand tail area under
the standard normal curve equal to a/2
•The area under the standard normal curve between z
α/2
and z
α/2is 1 –α  




 
=

n
zxzx
x 22

General Confidence Interval
•If a population has standard deviation σ(known),
•and if the population is normal or if sample size is large (n
30), then …
•… a (1-)100% confidence interval for is 




 
+

−=



n
zx,
n
zx
n
zx
222

95% Confidence Interval 





 
+

−=





 
=
n
.x,
n
.x
n
.xzx
x.
961961
961
0250

99% Confidence Interval
•For 99% confidence, need the normal pointz
0.005
•(1 –0.99) / 2 = 0.005
•z
0.005= 2.575
•The 99% confidence interval is 





 
+

−=





 
=
n
.x,
n
.x
n
.xzx
x.
57525752
5752
0250

The Effect of αon Confidence Interval Width

t-Based Confidence Intervals for a Mean:
σUnknown
•If σis unknown (which is usually the case), we can construct a
confidence interval for μbased on the sampling distribution of
•If the population is normal, then for any sample size n, this sampling
distribution is called the t distributionns
x
t
−
=

The t Distribution
•The curve of the t distribution is similar to that of the
standard normal curve
•Symmetrical and bell-shaped
•The t distribution is more spread out than the standard
normal distribution
•The spread of the t is given by the number of degrees of
freedom (sample size)
•Denoted by df
•For a sample of size n, there are one fewer degrees of
freedom, that is, df= n –1

Degrees of Freedom and the
t-Distribution
As the number of degrees of freedom increases, the spread
of the tdistribution decreases and the tcurve approaches
the standard normal curve

t and Right Hand Tail Areas
•Use a t point denoted by t
α
•t
αis the point on the horizontal axis under the t curve that
gives a right hand tail equal to α
•So the value of t
αin a particular situation depends on the
right hand tail area αand the number of degrees of freedom
•df= n –1
•1 –αis the specified confidence coefficient

tand Right Hand Tail Areas

t-Based Confidence Intervals for a Mean:
σUnknown
•If the sampled population is normally distributed with
mean , then a (1)100% confidence interval for is
•t
/2is the t point giving a right-hand tail area of /2
under the t curve having n-1 degrees of freedomn
s
tx
2

Car Mileage estimation:
•Recall from the previous example, ҧ??????= 31.56 mpg
for a sample of size n=50 and s= 0.8010.2
0.113
50
8.0
49,025.0
=
===
t
n
x


Car Mileage: 95% Confidence interval of
mean mileage]79.31,33.31[
of %95
22713.056.31
)113.0*010.2(56.31
1;2


CI
n
s
tx
n
=
=

Practice Problem 1:
•A manufacturer of light bulbs claims that its light bulbs have a mean life hours
with a standard deviation of 85 hours. A random sample of 40 such bulbs is
selected for testing. If the sample produces a mean value of 1505 hours, find out
95% Confidence Interval of .
Solution: Given, n=40 (large), =85 (known), 1-=0.95, =0.05,
95%CIofisgivenby 1505=x 96.1
025.02/
==zz
  1531.34 , 66.1478
96.1
40
85
1505 , 96.1
40
85
1505
=






+−

Practice Problem 2:
•Waiting times (in hours) at a popular restaurant are found to have a mean
waiting time of 1.52 hours with sd2.25hrs. for a sample of 50 customers.
Construct the 99% confidence interval for the estimate of the population mean.
Solution: Given, n=50 (large), s=2.25 (estimated), 1-=0.99, =0.01,
Therefore,
99%CIofisgivenby
Usetbasedconfidenceintervalandobservethedifference(assumingnormal
population). 52.1=x 58.2
005.02/
==zz
   2.34 , 20.1
58.2
50
25.2
52.1 , 58.2
50
25.2
52.1
=






+−
Tags