Statistical Inference & Hypothesis Testing.pdf

ManashKumarMondal 171 views 29 slides Aug 18, 2024
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

Population

Hypothesis statistics


Slide Content

Statistical Inference
and
Hypothesis Testing

Why ?
•Samples drawn from an infinite population
•Features of population may vary from those of
samples
•Sample statistics may vary with samples
•Question is whether the sample properties
satisfactorily explain population properties
•Parameter -> population characteristics (mean,
variance etc.)
•Statistic -> sample characteristics

Statistical Inference
•Statistical inference The process of going to
unknown population parameters from estimated
sample statistics possible with random
sampling

•2 problems -> no idea about the feature of the
population – estimation, then testing of
hypothesis
•-> tentative idea about the feature of the
population – testing of hypothesis

Estimation
•Interval estimation -> a range of values within which
the unknown parameter has a chance to belong

•Point estimation -> estimate a particular value for
the unknown parameter – test whether the
estimation is satisfactory/ reliable – hypothesis
testing

•Testing reliability (whether statistically significant) of
estimation needs knowledge of probability
distribution function

Probability
•Probability of an event = (Number of cases
favourable to the event) / (Total number of cases)
•Example: Coin tossing (2 times)
•Events: HH, HT, TH, TT
•Probability (Different outcomes) = 2/ 4
•In axiomatic approach of probability relative
frequency of an event is considered as the event’s
probability of occurrence when the total number of
cases is very large (n -> ∞)

Probability distribution
•Y is a random variable => Y occurs with probability
•As probability is defined by relative frequency,
distribution of Y in the infinite population is
represented by its probabilities => Probabilities of
occurrence of different values of Y are presented
against those values of Y
•Y is discrete -> for any particular value of Y (let it be
c), f(c) = f(Y at c)= Pr(Y=c)
•f(Y) is the pmf (probability mass function) of Y if
• f(Y) ≥ 0, for any Y
•∑ f(Y) = 1, summation with all values that Y can
assume

Probability distribution
•Y is continuous -> for two values of Y, a (lower) and b
(upper),
b
• Pr(a≤Y≤b) = ∫ f(Y)dY
a
•f(Y) is the pdf (probability density function) of Y if
• f(Y) ≥ 0, for any Y

∫ f(Y)dY = 1, integration for all values of Y, (-∞, +∞)

Distribution function
•Statistical inference involves inferring the nature of a
population (central tendency - mean, variance,
skewness and kurtosis) from the nature of a sample
•-> we take help of distribution function
•For any value c of the continuous variable Y,
c
• F(c) = Pr( Y ≤ c)) = ∫ f(Y)
-∞
is called the cumulative distribution function or
distribution function of Y

Theoretical distribution
•-> f (a theoretical distribution) gives a fairly
close approximation to the actual distribution
of the population variable
•-> inference problem is related to having an
idea about the numerical values of (or
estimate) the parameters appearing in f
•-> We have an idea regarding the central
tendency of the variable concerned from its
theoretical distribution (mean, variance etc.)

Normal Distribution
•Most used form of distribution for a continuous
variable because
•– has very simple properties – comparatively easy to
deal with
•- many non-normal distributions become
asymptotically normal
•- transformation of variables often make them follow
normal distribution
•Central limit theorem and law of large numbers
•f(Y) = (1/(σ√2π) exp[-(Y-μ)
2
/2σ
2
], -∞ ≤Y ≤ +∞
•Mean(Y) = μ, Var(Y)= σ
2

Normal Distribution

Properties of normal distribution
f(Y) > 0 for all values of Y

∫ f(Y)dY = 1
-∞
The distribution is symmetrical
=> mean and median are the same
Any linear function of some normal variables is also
normally distributed.
=> If Y
1
, Y
2
, Y
3
are normally distributed, (Y
1
+ Y
2
+ Y
3
)/3
is also normally distributed

Normal Distribution Standard
Normal Distribution
•Population distribution Sampling distribution
•Population (Mean, Variance) Sample (Mean,
Variance)
•Distribution of the sample is also normal
•If Y follows normal distribution with mean = μ and
variance = σ
2
, then
τ = (Y- μ)/ σ

follows standard normal distribution
with Mean = 0 and variance = 1

Standard Normal Distribution

Statistical Inference
Let us denote by τ
α,

a value of τ, such that
Pr[τ > τ
α
] = α
and Pr[τ < τ
1-α
] = α
⇒ Pr[τ < τ
α
] = 1-α
For statistical inference value of α is normally
considered as 0.01 (99 per cent) or 0.05 (95 per cent)
and Pr[ -τ
α/2
< τ < τ
α/2
] = 1-α
The interval is defined as confidence limit and (1-α) is
confidence coefficient

Interval estimation
•Constructing a range of values (interval) within which
the unknown parameter has a chance to belong
•If y is sample mean (samples being y
1
, y
2
, ......., y
n
), it
can be shown that E(y) = μ, Var(y)= σ
2
/n (sampling
with replacement)
•If y
1
, y
2
, ......., y
n
follow normal distribution, y also
follows normal distribution
•Define τ = (y- μ)/ (σ/√n

), τ = [y- Mean(y)]/ √Var(y),
•τ follows standard normal distribution

Interval estimation
99 per cent confidence interval of μ is
Pr [-2.576 ≤ (y- μ)/ (σ/√n

) ≤2.576] = 0.99
Pr [-2.576 (σ/√n

) ≤ (y- μ) ≤2.576 (σ/√n

)] = 0.99
Pr [- y- 2.576 (σ/√n

) ≤ - μ ≤ -y +2.576 (σ/√n

)] = 0.99

=> Pr [y- 2.576 (σ/√n

) ≤ μ ≤ y +2.576 (σ/√n

)] = 0.99

=> In repeated sampling, in 99 per cent of the cases the
above interval will include μ (the probability that the
above interval will include μ is 0.99)

Hypothesis testing
•Assume a value μ
0
of the unknown μ and test
whether μ = μ
0
with the help of sample mean y

•Test the Null Hypothesis H
0
: μ = μ
0

•Alternative Hypothesis H
1
: μ ≠ μ
0


•The test statistic is
τ = (y- μ
0
)/ (σ/√n

), σ is known

Hypothesis testing
If the calculated τ (τ
c
) is outside the confidence interval,
we reject the null hypothesis.
When the confidence coefficient is 0.99, τ
α/2
= 2.576
Thus, if τ
c
> τ
α/2
(= 2.576), we reject H
0
: μ = μ
0
and
conclude that μ is not equal to μ
0

⇒Out of 100 samples drawn, in only one case μ will be
equal to μ
0
.

When calculated τ
c
is < 0, use τ
c
< -2.576 to reject H
0

If the null hypothesis is H
0
: μ = μ
0

against the alternative H
1
: μ > μ
0

or H
1
: μ < μ
0


we choose one-sided critical value, τ
α,
instead of τ
α/2

Here the critical (tabulated) value will be 2.326

t-distribution
•When σ is unknown, it is replaced by its sample
estimate.
•Then the ratio [(y-mean)/std. dev] follows
t-distribution
•The (1-α) percent confidence interval is
Pr [y- t
α/2,n-1
(s’/√n

) ≤ μ ≤ y + t
α/2,n-1
(s’/√n

)] = 1-α

Hypothesis testing in a bivariate
regression
The bivariate regression equation is
Y
i
= a + bX
i
+ U
i

We estimate this equation by Ordinary Least Square
(OLS) subject to fulfilment of some conditions
regarding U
i
and get a
est
and b
est

We have to test whether really X
i
affets Y
i
, i.e., whether
b
est
is significantly non-zero.
Calculate t
c
= (b
est
– 0)/√Var(b
est
) = b
est
/ s.d(b
est
)

Hypothesis testing in a bivariate
regression
So, now the null and alternative hypotheses are
H
0
: b
est
= 0
H
1
: b
est
≠ 0
The decision rule is
if t
c
> t
α/2,(n-1)
, reject H
0

=> X
i
significantly affects Y
i

An Example -Population

Two samples drawn from the
Population

Two SRFs estimated from Two
Samples

Another Example

Regressions

Test of Hypothesis
H
0
: β
2
= 0
H
1
: β
2
≠ 0
t
c
= β
2
/ s.e(β
2
) = 0.0020/0.00032 = 6.875
t
0.05/2,(34-1)
= t
0.025,33
= 2.021 < t
c

t
0.01/2,(34-1)
= t
0.005,33
= 2.704 < t
c

H
0
is rejected at both 5 per cent and 1 per cent level of
significance
=> Demand for Cellphone depends positively and
significantly on per capita income