Unit 4a- Sampling Distribution (Slides - up to slide 21).pdf

DevangshuMitra2 23 views 26 slides Sep 03, 2024

Slide 1 of 26

About This Presentation

Probability: Sampling Distribution

Size: 637.66 KB

Language: en

Added: Sep 03, 2024

Slides: 26 pages

Slide Content

Sampling Distributions
UNIT 4
OPRE 6359
1

Statistical Inference and Sampling
We are often interested in estimating a parameter while we study
a population.
As the population is a large group which is not easily accessible,
the process becomes very difficult and expensive if we want to do
an exhaustive job.
Thus, we use representative samples to study the characteristics
of the population, and then make inferences about the parameter
based on the value of the statistics. Sampling becomes critical to
do statistical inference.
2

Statistical inference –concept
Statistical inference allows to use samples to make conclusions
towards the populations.
A sample will provide an estimate of the population parameter.
Generally we are interested in the Mean, the Proportion and the
Variance/Standard Deviation of the population.
3

Sampling Distribution –an Example
Suppose we are sampling from a
population that has a mean ofμ=5and is rightly skewed.
# Population is an Exponential
distribution with Mean =5
PopDist<-data.frame(x =
seq(0,20,length=10000)) %>%
mutate(density=dexp(x,rate=.2))
ggplot(PopDist, aes(x=x,
y=density)) +
geom_area(fill='salmon') +
ggtitle('Population
Distribution')
4

Sampling Distribution –an Example
Consider now a sample of size 2 from this population. Denote the
outcomes of two independent values by X1and X2. Let the
average of the two outcomes be !"=!!"!"
#
5
Rules about Expectations and Variances

Sampling Distribution –an Example
Now if we want to estimate the population meanμby taking a random sample ofn=5.
Notice that the sample mean is never exactly 5, but it is close to that.
n <-5 # Our Sample Size!
mosaic::do(3) * {
Sample.Data<-data.frame( x = rexp(n,rate=.2))
Sample.Data%>% summarise( xbar= mean(x))}
xbar
1 4.515265
2 5.430437
3 3.416277
6

Sampling Distribution –A simulation
n <-5
SampDist<-mosaic::do(10000) * {
Sample.Data<-data.frame( x =
rexp(n,rate=0.2))
Sample.Data%>% summarise( xbar=
mean(x))
}
ggplot() + geom_area(data=PopDist,
aes(x=x, y=density),
fill='salmon') +
geom_histogram(data=SampDist,
aes(x=xbar, y=..density..),
binwidth=.1, alpha=.6)
7

Conclusions from the Graphs
1.The sampling distribution of!"is centered at the population
meanμ.
2.The sampling distribution of!"has less spread than the
population distribution.
3.The sampling distribution of !"is less skewed than the
population distribution.
Thus problems with skewness and departures from normality in
the data are reduced/removed when working with sampling
distributions.
8

Sampling Distribution of the Mean
What do you think will happen to the sampling distribution of the
mean as we increase the sample size, n?
Let us investigate…
9

Sampling Distribution and Sample Size
10
n=5 n=20
n=50n=100

Sampling Distribution and Sample Size
11
n=5 n=20
n=50n=100

Some important facts
1.The sample mean $!"does not depend on the sample size n.
2.The variance V(!") does depend on n, and it shrinks to zero as n
approaches ∞.
3.The calculations of the Mean and Variance does not depend on
the distribution of the population.
12

Sampling Distribution of the Mean
These relationships define the sampling distribution of !"
$!"=µ
13
Standard error
((!") = $"
%
*('!)=$"
%=*
+

Central Limit Theorem
LetX1,…Xnbe independent observations collected from a
distribution with expectationμand varianceσ2. Then the
distribution of!"converges to a normal distribution with expectationμand varianceσ2/n asn→∞.
In practice this means that ifnis large (usuallyn>30 is sufficient),
then
14

Central Limit Theorem
If the population from which successive samples are taken has a
normal distribution, then !"~-(.,$"
%).
If the population is not normally distributed, then:
For any infinite population with mean µ and variance σ2, the
sampling distribution of !"is well approximated by the normal
distribution with mean µ and variance $"
%, provided that n is
sufficiently large.
15
This depends of the extend of nonnormality of X (heavily
skewed, multimodal). In general, the larger the sample size,
the more closely the sampling distribution of !"will resemble
a normal distribution (n>30)

Finite Population Correction Factor
For finite population, the standard error of !"should be corrected
to:
16
The usual rule of thumb is to consider N large enough if it is
at least 20 times larger than n.

Sampling Distribution of the Mean -Example
A researcher is studying salaries of recently graduated MS
students, from a very large population. She sampled 3 alumni and assumes their answers are independent. The population is
-(.=70000,*=12000)
1.What is the probability that the first observation is greater
than $80,000?
1-pnorm(80000, mean=70000, sd=12000)
[1] 0.2023284
17

Sampling Distribution of the Mean -Example
2. What is the probability that the sample mean is greater than
$80,000?
std_error<-12000/sqrt(3)
std_error
[1] 6928.203
1-pnorm(80000, mean=70000, sd=std_error)
[1] 0.07445734
18

To Calculate in R -Example
#to create graph with Normal Distribution
distr<-data.frame(x=seq(22, 118, length=1000)) %>%
mutate( density = dnorm(x, mean=70, sd=12),
group = ifelse(x<=80, 'Lower','Higher') )
ggplot(distr, aes(x=x, y=density, fill=group)) +
geom_line() +
geom_area() +
theme_bw()
#to create graph with Sampling Distribution
distr<-data.frame(x_bar=seq(22, 118, length=1000)) %>%
mutate( density = dnorm(x_bar, mean=70, sd=6.93),
group = ifelse(x_bar<=80, 'Lower','Higher') )
ggplot(distr, aes(x=x_bar, y=density, fill=group)) +
geom_line() +
geom_area() +
theme_bw()
19

20
Compare X and !"

21
Compare X and !"

Sampling Distribution of the Proportion
The central limit theorem also applies to “sample proportions.”
Let X be a binomial random variable with parameters n and p. Since each trial results in either a “success” or a “failure,” we can
define for trial ia variable Xithat equals 1 if we have a success
and 0 otherwise.
Then, the proportion of trials that resulted in a success is given
by:
22

Sampling Distribution of the Proportion
is approximately normally distributed provided np and n(1-p)
are at least 10
23
Standard error of the
proportion
*()=

To Calculate in R -Example
A company hired 50 people from a pool of qualified candidates. If
the pool contains 30% females, and only 5 out of the 50 hired were females, can we conclude that there is gender
discrimination in hiring?
̂5is defined as the proportion of hires that are female.
sd_prop= sqrt((0.3*.70)/50)
pnorm(.10, mean=.30, sd=sd_prop)
[1] 0.001014116
This is the probability of hiring 10% females when the proportion
of females in the pool is 30%, making it a very rare occurrence
and providing evidence of gender discrimination.
24

To Calculate in R -Example
successes100 <-rbinom(10000, size = 50, prob = 0.30)
proportion100 <-successes100 / 50
hist(proportion100, breaks = 20, right = FALSE, xlim=
c(0,0.7), col="lightblue", xlab= "Sample proportion")
25

Alternative Solution with R -Example
prop.test(5, 50, p = 0.30, alternative = "less", correct =
FALSE)
1-sample proportions test without continuity correction
data: 5 out of 50, null probability 0.3
X-squared = 9.5238, df = 1, p-value = 0.001014
alternative hypothesis: true p is less than 0.3
95 percent confidence interval:
0.0000000 0.1915375
sample estimates:
p
0.1
26

Unit 4a- Sampling Distribution (Slides - up to slide 21).pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Unit 4a- Sampling Distribution (Slides - up to slide 21).pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......