•Small sample theory: The study of statistical inference with small
sample (i.e. n≤30). It includes t-distribution and F-distribution. They
are defined in terms of “number of degrees of freedom”.
•Degrees of freedom ν: Number of useful items of information
generated by a sample of given size with respect to the estimation
of a given population parameter.
OR
Total number of observations minus the number of independent
constraints imposed on the observations.
n - no. of observations
k - no. of independent constants
then n - k = no. of degrees of freedom
Example:- X = A + B + C , (10 = 2 + 3 + C , so C = 5)
n = 4 , k = 3
n – k = 1 , so 1 degree of freedom.
Introduction
t - Distribution
•William Sealy Gosset published t-distribution in 1908 in Biometrika
under pen name “Student”.
•When sample size is large than 30, then sampling distribution of
mean will follow Normal distribution.
•If sample size is less than 30, then sample statistic will follow t-
distribution.
•Probability density function of t-distribution:
Y
0
is a constant depending on n such that area under the curve is 1.
t-table gives the probability integral of t-distribution.
2
1
2
1
)(
t
Y
tf
o
Properties of t-Distribution
•Ranges from –∞ to ∞
•Bell-shaped and symmetrical around mean zero.
•Its shape changes as the no. of degrees of freedom
changes. Hence ν is a parameter of t-distribution.
•Variance is always greater than one and is defined
only when v ≥ 3, given as
•It is more platykurtic (less peaked at the centre and
higher in tails) than normal distribution.
•It has greater dispersion than normal distribution. As n
gets larger, t-distribution approaches normal form.
2
)(
tVar
Steps involved in testing of
hypothesis.
1.Establish a null hypothesis
2.Suggest an alternate hypothesis.
3.Calculate t value.
4.Find degrees of freedom.
5.Set up a suitable significance level.
6.From t-table find critical value of t using α (risk of type 1
error, significance level) and v- degrees of freedom.
7.If calculated t value is less than critical value obtained
from table, then null hypothesis is accepted. Otherwise
alternate hypothesis is accepted.
Applications of t - distribution
1.Test of Hypothesis about the population
mean.
2.Test of Hypothesis about the difference
between two mean.
3.Test of Hypothesis about the difference
between two mean with dependent
samples.
4.Test of Hypothesis about coefficient of
correlation.
1. Test of Hypothesis about the
population mean(σ unknown and small sample size)
• Null hypothesis:
• t value is given as:
• Standard deviation of sample is given as:
• Degrees of freedom = n – 1
• Calculate table value at specified significance level & d.f.
• If calculated value is more than table value then null hypothesis is
rejected.
• 100(1-α)% Confidence interval for population mean:
1
2
n
xx
s
n
s
x
t
x
n
s
tx
n
s
tx
,2,2
Test of hypothesis about the
difference between two means
When population variances are unknown,
t-test can be used in two types.
(c)When variances are equal.
(d)When variances are not equal.
(a) Case of equal variances
• Null hypothesis: μ
1
= µ
2
• t value is given as:
where,
and
• Degrees of freedom: n
1
+ n
2
– 2
• Calculate table value at specified significance level & d.f.
• If calculated value is more than table value then null hypothesis is
rejected.
21
21
11
nn
s
xx
t
2
11
21
2
22
2
11
nn
snsn
s
1
1
2
112
1
n
xx
s
1
2
2
222
2
n
xx
s
(b) Case of unequal variances
•When population variances are not equal, we use unbiased estimators s
1
2
and s
2
2
to replace σ
1
2
and σ
2
2
.
•Here, sampling distribution has large variability than population variability.
•t value:
•Degrees of freedom:
•Calculate table value at specified significance level & d.f.
•If calculated value is more than table value then null hypothesis is rejected.
2
2
2
1
2
1
2121
n
s
n
s
xx
t
11
..
2
2
2
2
2
1
2
1
2
1
2
2
2
21
2
1
n
ns
n
ns
nsns
fd
Confidence interval for the
difference between two means
21
,221
11
nn
stxx
Two samples of sizes n
1
and n
2
are randomly and independently drawn
from two normally distributed populations with unknowns but equal
variances. The 100(1-α)% confidence interval for µ
1 - µ
2 is given by:
(3) Test of hypothesis about the
difference between two means with
dependent samples (paired t-test)
•Samples are dependent, each observation in one sample is associated with
some particular observation in second sample.
•Observations in two samples should be collected in form called matched
pairs.
•Two samples should have same number of units.
•Instead of 2 samples we can get one random sample of pairs and two
measurements associated with a pair will be related to each other. Example:
in before and after type experiments or when observations are matched by
rise or some other criterion.
•Null hypothesis: μ
1
= µ
2
•t value is given as:
where, mean of differences,
standard deviation of differences,
•Degrees of freedom = n – 1
•Calculate table value at specified significance level & d.f.
•If calculated value is more than table value then null hypothesis is rejected.
•Confidence interval for the mean of the difference:
s
nd
t
n
d
d
11
2
2
nn
d
n
d
s
(4) Testing of hypothesis about
coefficient of correlation.
•Case 1: testing the hypothesis when the population
coefficient of correlation equals zero, i.e., H
o
: ρ=0
•Case 2: testing the hypothesis when the population
coefficient of correlation equals some other value than
zero, i.e., H
o
: ρ= ρ
o
•Case 3: testing the hypothesis for the difference between
two independent correlation coefficients.
Case 1: testing the hypothesis when the
population coefficient of correlation
equals zero, i.e., H
o
: ρ=0
•Null hypothesis: there is no correlation in population, i.e.,
H
o
: ρ=0
•t value is given as:
•Degrees of freedom: n-2
•Calculate table value at specified significance level & d.f.
•If calculated value is more than table value then null
hypothesis is rejected, then there is linear relationship
between the variables.
2
1
2
n
r
r
t
Case 2: testing the hypothesis when the
population coefficient of correlation equals
some other value than zero, i.e., H
o: ρ= ρ
o
•When ρ≠0, test based on t-distribution will not be
appropriate, but Fisher’s z-transformation will be
applicable.
z = 0.5 log
e
(1+r)/(1-r)
OR
z = 1.1513 log
10 (1+r)/(1-r)
•Z is normally distributed with mean
z
ρ
= 0.5 log
e
(1+ ρ)/(1- ρ)
•Standard deviation: σ
z
= 1/√(n-3)
•This test is more applicable if sample size is large ( atleast
10).
•Null hypothesis: H
o
: ρ= ρ
o
•Test statistic:
•Which follows approx. standard normal
distribution.
z
zz
z
Case 3: testing the hypothesis for the
difference between two independent
correlation coefficients
•To test the hypothesis of 2 correlation coefficients derived
from two separate samples, compare the difference of the 2
corresponding values of z with the standard error of that
difference.
•Formula used:
•If the absolute value of this statistic is greater than 1.96, the
difference will be significant at 5% significance level.
2
2
10
2
2
2
1
1
10
1
1
1
21
2121
1
1
log1513.1
1
1
log
2
1
1
1
log1513.1
1
1
log
2
1
3
1
3
1
21
r
r
r
r
z
r
r
r
r
z
where
nn
zzzz
z
e
e
zz
The F - Distribution
•Named in honour of R.A. Fisher who studied it in 1924.
•It is defined in terms of ratio of the variances of two normally
distributed populations. So, it sometimes also called variance ratio.
•F – distribution :
where,
s
1
2
, s
2
2
are unbiased estimator of σ
1
2
, σ
2
2
resp.
•Degrees of freedom: v1 = n
1
-1, v
2
- 1
•If σ
1
2
=σ
2
2
, then , F=s
1
2
/s
2
2
•It depends on v
1
and v
2
for numerator and denominator resp., so v
1
and v
2
are parameters of F distribution.
•For different values of v
1
and v
2
we will get different distributions.
2
2
2
2
2
1
2
1
s
s
1
1
2
2
222
2
1
2
112
1
n
xx
s
n
xx
s
Probability density function
•Probability density function of F-distribution:
21
21
2
1
12
1
F
YFf
o
Properties of F-distribution
•It is positively skewed and its skewness decreases with increase in
v
1 and v
2.
•Value of F must always be positive or zero, since variances are
squares. So its value lies between 0 and ∞.
•Mean and variance of F-distribution:
Mean = v
2
/(-v
2
-2), for v
2
> 2
Variance = 2v
2
2
(v1+v
2
-2) , for v
2
> 4
v
1
(v
2
-2)
2
(v
2
-4)
•Shape of F-distribution depends upon the number of degrees of
freedom.
•The areas in left hand side of the distribution can be found by taking
reciprocal of F values corresponding to the right hand side, when the
no. of degrees of freedom in nr. And in dr. are interchanged. It is
known as reciprocal property,
F
1-α,v
1,v
2=1/F
α,v
2,v
1
we can find lower tail f values from corresponding upper tail F
values, which are given in appendix.
Testing of hypothesis for equality of
two variances
It is based on the variances in two independently
selected random samples drawn from two
normal populations.
•Null hypothesis H
o
: σ
1
2
= σ
2
2
•F = s
1
2
/σ
1
2
, which reduces to F = s
1
2
s
2
2
/σ
2
2
s
2
2
place large sample variance in numerator.
•Degrees of freedom v
1
and v
2
.
•Find table value using v
1
and v
2
.
•If calculated F value exceeds table F value, null
hypothesis is rejected.
Confidence interval for the ratio of
two variances
•100(1-α)% confidence interval for the ratio
of the variances of two normally
distributed populations is given by:
s
1
2
/s
2
2
< σ
1
2
< s
1
2
/s
2
2
F
(1-α/2)
σ
2
2
F
α/2