Chapter four sampling and sampling distribution

aschalewshiferaw 86 views 36 slides Jun 29, 2024
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

This chapter deals with sampling and sampling distribution. It is very good material for teaching introductory statistics.


Slide Content

5-1
Chapter 4
Sampling and Sampling
Distributions

5-2
Using Statistics
Sample Statistics as Estimators of
Population Parameters
Sampling Distributions
Estimators and Their Properties
Degrees of Freedom
Sampling and Sampling Distributions4

5-3
•Statistical Inference:
Predict and forecast values of
population parameters...
Test hypotheses about values
of population parameters...
Make decisions...
On basis of sample statistics
derived from limited and
incomplete sample
information
Make generalizations
about the
characteristics of a
population...
On the basis of
observations of a
sample, a part of a
population
4-1 Using Statistics

5-4
Democrats Republicans
People who have
phones and/or cars
and/or are Digest
readers.
Biased
Sample
Population
Democrats
Republicans
Unbiased
Sample
Population
Unbiased,
representative sample
drawn at random from
the entire population.
Biased,
unrepresentative
sample drawn from
people who have cars
and/or telephones
and/or read the Digest.
The Literary Digest Poll (1936)

5-5
•An estimatorof a population parameter is a sample statistic
used to estimate or predict the population parameter.
•An estimateof a parameter is a particularnumerical value
of a sample statistic obtained through sampling.
•A point estimateis a single value used as an estimate of a
population parameter.
A population parameter
is a numerical measure of
a summary characteristic
of a population.
4-2 Sample Statistics as Estimators of
Population Parameters
A sample statisticis a
numerical measure of a
summary characteristic
of a sample.

5-6
•The sample mean, , is the most common
estimator of the population mean, 
•The sample variance, s
2
, is the most common
estimator of the population variance, 
2
.
•The sample standard deviation, s, is the most
common estimator of the population standard
deviation, .
•The sample proportion, , is the most common
estimator of the population proportion, p.
EstimatorsX pˆ

5-7
•The population proportionis equal to the number of
elements in the population belonging to the category of
interest, divided by the total number of elements in the
population:
•The sample proportionis the number of elements in the
sample belonging to the category of interest, divided by the
sample size:
Population and Sample Proportions
$p
x
n
=
p
X
N
=

5-8
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Population mean ()
Sample points
Frequency distribution
of the population
Sample mean ( )
A Population Distribution, a Sample from a
Population, and the Population and Sample Means
X

5-9
•Stratifiedsampling:in stratified sampling, the
population is partitioned into two or more
subpopulation called strata, and from each stratum
a desired sample size is selected at random.
•Cluster sampling:in cluster sampling, a random
sample of the strata is selected and then samples
from these selected strata are obtained.
•Systemic sampling:in systemic sampling, we start
at a random point in the sampling frame, and from
this point selected every k
th
, say, value in the frame
to formulate the sample.
Other Sampling Methods

5-10
•The sampling distributionof a statistic is the
probability distribution of all possible values the
statistic may assume, when computed from
random samples of the same size, drawn from a
specified population.
•The sampling distribution of Xis the
probability distribution of all possible values the
random variable may assume when a sample
of size nis taken from a specified population.X
4-3 Sampling Distributions

5-11
Uniform population of integers from 1 to 8:
X P(X) XP(X) (X-
x)(X-
x)
2
P(X)(X-
x)
2
1 0.1250.125 -3.5 12.25 1.53125
2 0.1250.250 -2.5 6.25 0.78125
3 0.1250.375 -1.5 2.25 0.28125
4 0.1250.500 -0.5 0.25 0.03125
5 0.1250.625 0.5 0.25 0.03125
6 0.1250.750 1.5 2.25 0.28125
7 0.1250.875 2.5 6.25 0.78125
8 0.1251.000 3.5 12.25 1.53125
1.0004.500 5.25000
87654321
0.2
0.1
0.0
X
P
(
X
)
UniformDistribution(1,8)
E(X) = = 4.5
V(X) = 
2
= 5.25
SD(X) = = 2.2913
Sampling Distributions (Continued)

5-12
There are 8*8 = 64 different but
equally-likely samples of size 2
that can be drawn (with
replacement) from a uniform
population of the integers from 1
to 8:Samples of Size 2 from Uniform (1,8)
12345678
11,11,21,31,41,51,61,71,8
22,12,22,32,42,52,62,72,8
33,13,23,33,43,53,63,73,8
44,14,24,34,44,54,64,74,8
55,15,25,35,45,55,65,75,8
66,16,26,36,46,56,66,76,8
77,17,27,37,47,57,67,77,8
88,18,28,38,48,58,68,78,8
Each of these samples has a sample
mean. For example, the mean of the
sample (1,4) is 2.5, and the mean of
the sample (8,4) is 6.Sample Means from Uniform (1,8), n = 2
12345678
11.01.52.02.53.03.54.04.5
21.52.02.53.03.54.04.55.0
32.02.53.03.54.04.55.05.5
42.53.03.54.04.55.05.56.0
53.03.54.04.55.05.56.06.5
63.54.04.55.05.56.06.57.0
74.04.55.05.56.06.57.07.5
84.55.05.56.06.57.07.58.0
Sampling Distributions (Continued)

5-13
Sampling Distribution of the Mean
The probability distribution of the sample mean is called the
sampling distribution of the the sample mean.
8.07.57.06.56.05.55.04.54.03.53.02.52.01.51.0
0.10
0.05
0.00
X
P
(
X
)
SamplingDistributionoftheMean
X P(X) XP(X) X-
X(X-
X)
2
P(X)(X-
X)
2
1.00.0156250.015625 -3.512.25 0.191406
1.50.0312500.046875 -3.0 9.00 0.281250
2.00.0468750.093750 -2.5 6.25 0.292969
2.50.0625000.156250 -2.0 4.00 0.250000
3.00.0781250.234375-1.5 2.25 0.175781
3.50.0937500.328125 -1.0 1.00 0.093750
4.00.1093750.437500 -0.5 0.25 0.027344
4.50.1250000.562500 0.0 0.00 0.000000
5.00.1093750.546875 0.5 0.25 0.027344
5.50.0937500.515625 1.0 1.00 0.093750
6.00.0781250.468750 1.5 2.25 0.175781
6.50.0625000.406250 2.0 4.00 0.250000
7.00.0468750.328125 2.5 6.25 0.292969
7.50.0312500.234375 3.0 9.00 0.281250
8.00.0156250.125000 3.5 12.25 0.191406
1.0000004.500000 2.625000
EX
VX
SDX
X
X
X
()
()
() .
==
==
==



4.5
2.625
16202
2
Sampling Distributions (Continued)

5-14
•Comparing the population
distribution and the sampling
distribution of the mean:
The sampling distribution is
more bell-shaped and
symmetric.
Both have the same center.
The sampling distribution of
the mean is more compact, with
a smaller variance.
87654321
0.2
0.1
0.0
X
P
(
X
)
UniformDistribution(1,8)
X
8.07.57.06.56.05.55.04.54.03.53.02.52.01.51.0
0.10
0.05
0.00
P
(X
)
SamplingDistributionoftheMean
Properties of the Sampling Distribution
of the Sample Mean

5-15
The expected value of the sample meanis equal to the population mean: EX
X X()==
The variance of the sample meanis equal to the population variance divided by
the sample size: VX
n
X
X
()==

2
2
The standard deviation of the sample mean, known as the standard error of
the mean, is equal to the population standard deviation divided by the square
root of the sample size: SDX
n
X
X
()==

Relationships between Population Parameters and
the Sampling Distribution of the Sample Mean

5-16
When sampling from a normal populationwith mean and standard
deviation , the sample mean, X, has anormal sampling distribution:XN
n
~(,)

2
This means that, as the
sample size increases, the
sampling distribution of the
sample mean remains
centered on the population
mean, but becomes more
compactly distributed around
that population mean
Normal population
0.4
0.3
0.2
0.1
0.0
f
(
X
)
SamplingDistributionoftheSampleMean

Sampling Distribution: n = 2
Sampling Distribution: n =16
Sampling Distribution: n = 4
Sampling from a Normal Population
Normal population

5-17
When sampling from a population
with mean and finite standard
deviation , the sampling
distribution of the sample mean will
tend to a normal distribution with
mean and standard deviationas
the sample size becomes large
(n>30).
For “large enough” n:
n )/,(~
2
nNX 
P
(X
)
X
0.25
0.20
0.15
0.10
0.05
0.00
n = 5
P
(X
)
0.2
0.1
0.0
X
n = 20
f(X
)
X
-
0.4
0.3
0.2
0.1
0.0

Large n
The Central Limit Theorem

5-18
Normal Uniform Skewed
Population
n = 2
n = 30
XXXX
General
The Central Limit Theorem Applies to
Sampling Distributions from AnyPopulation

5-19
Mercury makes a 2.4 liter V-6 engine, the Laser XRi, used in speedboats.
The company’s engineers believe the engine delivers an average power of
220 horsepower and that the standard deviation of power delivered is 15 HP.
A potential buyer intends to sample 100 engines (each engine is to be run a
single time). What is the probability that the sample mean will be less than
217HP?PX P
X
n n
PZ PZ
PZ
( )
( ).
 =













= 











= 











= =
217
217
217220
15
100
217220
15
10
200228




The Central Limit Theorem
(Example 4-1)

5-20
Example 4-2EPS Mean Distribution
0
5
10
15
20
25
Range
Frequency
2.00 - 2.49
2.50 - 2.99
3.00 - 3.49
3.50 - 3.99
4.00 - 4.49
4.50 - 4.99
5.00 - 5.49
5.50 - 5.99
6.00 - 6.49
6.50 - 6.99
7.00 - 7.49
7.50 - 7.99

5-21
If the population standard deviation, , isunknown, replace with
the sample standard deviation, s. If the population is normal, the
resulting statistic:
has a t distribution with (n -1) degrees of freedom.
•The tis a family of bell-shaped and symmetric
distributions, one for each number of degree of
freedom.
•The expected value of t is 0.
•The variance of t is greater than 1, but approaches
1 as the number of degrees of freedom increases.
The tis flatter and has fatter tails than does the
standard normal.
•The tdistribution approaches a standard normal
as the number of degrees of freedom increases.ns
X
t
/

=
Standard normal
t, df=20
t, df=10


Student’s tDistribution

5-22
The sample proportionis the percentage of
successes in nbinomial trials. It is the
number of successes,X, divided by the
number of trials, n.n
X
p ˆ=
As the sample size, n, increases, the sampling
distribution of approaches a normal
distributionwith mean pand standard
deviation$p pp
n
()1
Sample proportion:
1514131211109876543210
0.2
0.1
0.0
P
(X
)
n=15,p=0.3
X
14
15
13
15
12
15
11
15
10
15
9
15
8
15
7
15
6
15
5
15
4
15
3
15
2
15
1
15
0
15
15
15
^p
210
0.5
0.4
0.3
0.2
0.1
0.0
X
P
(X
)
n=2,p=0.3
109876543210
0.3
0.2
0.1
0.0
P
(X
)
n=10,p=0.3
X
The Sampling Distribution of the
Sample Proportion,$p

5-23
In recent years, convertible sports coupes have become very popular in Japan. Toyota is
currently shipping Celicas to Los Angeles, where a customizer does a roof lift and ships
them back to Japan. Suppose that 25% of all Japanese in a given income and lifestyle
category are interested in buying Celica convertibles. A random sample of 100
Japanese consumers in the category of interest is to be selected. What is the probability
that at least 20% of those in the sample will express an interest in a Celica convertible?
n
p
np Ep
p p
n
Vp
p p
n
SDp
=
=
= ==

= = =

= = =
100
025
10002525
1 2575
100
0001875
1
0001875004330127
.
()(.) ($)
( )(.)(.)
. ($)
( )
. . ($)
Pp P
pp
p p
n
p
p p
n
Pz Pz
Pz
($.)
$
( )
.
( )
..
(.)(.)
.
.
( .).
> =


>


= >

= >

= > =


























020
1
20
1
2025
2575
100
05
0433
11508749
Sample Proportion (Example 5-3)

5-24
An estimatorof a population parameter is a sample statistic used to
estimate the parameter. The most commonly-used estimator of the:
Population Parameter Sample Statistic
Mean () is the Mean (X)
Variance (
2
) is the Variance (s
2
)
Standard Deviation () is the Standard Deviation (s)
Proportion (p) is the Proportion ( )$p
•Desirable properties of estimators include:
Unbiasedness
Efficiency
Consistency
Sufficiency
4-4 Estimators and Their Properties

5-25
An estimator is said to be unbiasedif its expected value is equal to
the population parameter it estimates.
For example, E(X)=so the sample mean is an unbiased estimator of
the population mean. Unbiasedness is an average or long-run
property. The mean of any single sample will probably not equal the
population mean, but the average of the means of repeated
independent samples from a population will equal the population
mean.
Any systematic deviationof the estimator from the population
parameter of interest is called a bias.
Unbiasedness

5-26
An unbiasedestimator is on
target on average.
A biasedestimator is
off target on average.
{
Bias
Unbiased and Biased Estimators

5-27
An estimator is efficientif it has a relatively small variance (and
standard deviation).
An efficientestimator is,
on average, closer to the
parameter being estimated..
An inefficientestimator is, on
average, farther from the
parameter being estimated.
Efficiency

5-28
An estimator is said to be consistentif its probability of being close
to the parameter it estimates increases as the sample size increases.
An estimator is said to be sufficientif it contains all the information
in the data about the parameter it estimates.
n = 100
n = 10
Consistency
Consistency and Sufficiency

5-29
For a normal population, both the sample mean and
sample median are unbiased estimatorsof the
population mean, but the sample mean is both more
efficient(because it has a smaller variance), and
sufficient.Every observation in the sample is used in
the calculation of the sample mean, but only the middle
value is used to find the sample median.
In general, the sample mean is the bestestimator of the
population mean. The sample mean is the most
efficient unbiased estimator of the population mean. It
is also a consistent estimator.
Properties of the Sample Mean

5-30
The sample variance(the sum of the squared deviations from the
sample mean divided by (n-1)is an unbiased estimatorof the
population variance. In contrast, the average squared deviation
from the sample mean is a biased(though consistent) estimator of the
population variance.
EsE
xx
n
E
xx
n
()
()
()
()
2
2
2
2
2
1
=







=










Properties of the Sample Variance

5-31
Consider a sample of size n=4 containing the following data points:
x
1=10 x
2=12 x
3=16 x
4=?
and for which the sample mean is:
Given the values of three data points and the sample mean, the
value of the fourth data point can be determined:x=
x
n

=

=
=
121416
4
4
14
121416
4
56
x
x x
4
56121416=
=x
4
56 x
x
n
= =

14
4-5 Degrees of Freedom
x
4= 14

5-32
If only two data points and the sample mean are known:
x
1=10 x
2=12 x
3=? x
4=?
The values of the remaining two data points cannot be uniquely
determined:x=
x
n

=

=
=
1214
3
4
4
14
1214
3
4
56
xx
xx
Degrees of Freedom (Continued)x=14

5-33
The number of degrees of freedomis equal to the total number of
measurements (these are not always raw data points), less the total
number of restrictions on the measurements. A restriction is a
quantity computed from the measurements.
The sample mean is a restriction on the sample measurements, so
after calculating the sample mean there are only (n-1) degrees of
freedomremaining with which to calculate the sample variance.
The sample variance is based on only (n-1) free data points:
s
xx
n
2
2
1
=


()
()
Degrees of Freedom (Continued)

5-34
A sample of size 10 is given below. We are to choose three different numbers from
which the deviations are to be taken. The first number is to be used for the first five
sample points; the second number is to be used for the next three sample points; and
the third number is to be used for the last two sample points.
Example 4-4
Sample #1 2 3 4 5 6 7 8 9 10
Sample
Point
93 97 60 72 96 83 59 66 88 53
i.What three numbers should we choose in order to minimize the SSD
(sum of squared deviations from the mean).?
•Note: 
2
= xxSSD

5-35
Solution:Choose the means of the corresponding sample points. These are: 83.6,
69.33, and 70.5.
ii.Calculate the SSD with chosen numbers.
Solution:SSD = 2030.367. See table on next slide for calculations.
iii.What is the dffor the calculated SSD?
Solution:df = 10 –3 = 7.
iv.Calculate an unbiased estimate of the population variance.
Solution:An unbiased estimate of the population variance is SSD/df = 2030.367/7
= 290.05.
Example 4-4 (continued)

5-36
Example 4-4 (continued)
Sample #Sample Point Mean DeviationsDeviation
Squared
1 93 83.6 9.4 88.36
2 97 83.6 13.4 179.56
3 60 83.6 -23.6 556.96
4 72 83.6 -11.6 134.56
5 96 83.6 12.4 153.76
6 83 69.33 13.6667 186.7778
7 59 69.33 -10.3333 106.7778
8 66 69.33 -3.3333 11.1111
9 88 70.5 17.5 306.25
10 53 70.5 -17.5 306.25
SSD 2030.367
SSD/df 290.0524
Tags