Multicollinearity econometrics semester 4 Delhi University

killerharsh4100 68 views 15 slides Aug 05, 2024
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

Ecotrix


Slide Content

Multicollinearity
Rimpy Kaushal
PGDAV College
Rimpy Kaushal (PGDAV College) Multicollinearity 1 / 15

Multicollinearity
Multicollinearity: Introduction
•One of the assumptions of the classical linear regression model (CLRM) is that
there is no perfect multicollinearity i.e. no exact linear relationships among
explanatory variables, included in a multiple regression.
•In practice, we rarely encounter perfect multicollinearity, but cases of near or
very high multicollinearity, where explanatory variables are approximately
linearly related, frequently arise in many applications.
•It is important to know what problems these correlated variables pose for the
ordinary least squares (OLS) estimation of multiple regression models.
Rimpy Kaushal (PGDAV College) Multicollinearity 2 / 15

Multicollinearity
The Case of Perfect Collinearity
•We will consider first the case of exact collinearity where the explanatory
variables are perfectly correlated. Suppose that the true relationship is
Yi=β1+β2X2i+β3X3i+ui (1)
•And there is a linear relationship between X2and X3given as:
X3i=δ+µX2i (2)
•Note that(X3i−
¯
X3) = ([δ+µX2i]−[δ+µ
¯
X2]) =µ(X2i−
¯
X2). Hence,
X
(X3i−
¯
X3)
2

2
X
(X2i−
¯
X2)
2

2
X
(x2i)
2
(3)
X
(X3i−
¯
X3)(Yi−
¯
Y) =µ
X
(X2i−
¯
X2)(Yi−
¯
Y) =µ
X
(x2iyi) (4)
X
(X3i−
¯
X3)(X2i−
¯
X2) =µ
X
(X2i−
¯
X2)
2

X
(x2i)
2
(5)
Rimpy Kaushal (PGDAV College) Multicollinearity 3 / 15

Multicollinearity
The case of Perfect Collinearity
•Substituting for X3in expression for b2, we obtain;
b2=
P
(x2iyi)
P
(x3i)
2

P
(x3iyi)
P
(x3ix2i)
P
(x2i)
2
P
(x3i)
2
−(
P
x3ix2i)
2
=
P
(x2iyi)µ
2
P
(x2i)
2
−µ
P
(x2iyi)µ
P
(x2i)
2
P
(x2i)
2
µ
2
P
(x2i)
2
−(µ
P
(x2i)
2
)
2
=
0
0
•In cases of perfect linear relationship or perfect multicollinearity among
explanatory variables, we cannot obtain unique estimates of all parameters. And
since we cannot obtain their unique estimates, we cannot draw any statistical
inferences (i.e., hypothesis testing) about them from a given sample.
Rimpy Kaushal (PGDAV College) Multicollinearity 4 / 15

Multicollinearity
Perfect and Imperfect Multicollinearity: An Illustration
•Consider the data on demand for widgets (Y), price (X2) and two measures of
income namely (X3) and (X4), given in the table below.
•The regression models designed by two researchers are:
Yi=A1+A2X2i+A3X3i+ui (6)
Yi=B1+B2X2i+B3X4i+ui (7)
Rimpy Kaushal (PGDAV College) Multicollinearity 5 / 15

Multicollinearity
Perfect and Imperfect Multicollinearity: An Illustration
•When a researcher tried to estimate model (6), the system refused to estimate it.
Actually, there exist a relationship between X2and X3. While regressing X3on
X2, she obtained the following results:
X3i= 300−2X2i;R
2
= 1.00 (8)
•In other words, the income variable (X3) and the price variable (X2) are
perfectly linearly related; that is, there is perfect collinearity. Because of the
relationship in (8), one cannot estimate the regression (6). On substituting (8)
into (6), one can obtain;
Yi=A1+A2X2i+A3(300−2X2i) +ui
Yi= (A1+ 300A3) + (A2−2A3)X2i) +ui
Yi=C1+C2X2i+ui (9)
•WhereC1=A1+ 300A3andC2=A2−2A3. Regression model (6) can not be
estimated but one can estimate (9), which is a simple linear regression.
Rimpy Kaushal (PGDAV College) Multicollinearity 6 / 15

Multicollinearity
Perfect and Imperfect Multicollinearity: An Illustration
•Although C1and C2can be obtained, but one cannot obtain the original
parameters A1, A2, and A3.
•The results of regression (9) based on the data given in the table above, are;
bYi= 49.667−2.1576X2i (10)
se= (0.746)(0.1203)
t= (66.538)(−17.935)
R
2
= 0.9757
•Now, the regression results for (7) are;
bYi= 145.37−2.7975X2i−0.3191X4i (11)
se= (120.06)(0.8122)(0.4033)
t= (1.2107)(−3.4444)(−0.7971)
R
2
= 0.9778
Rimpy Kaushal (PGDAV College) Multicollinearity 7 / 15

Multicollinearity
Perfect and Imperfect Multicollinearity: An Illustartion
•On comparing the results in (10) and (11), we can observe that:
•Although the regression (6) cannot be estimated, one can estimate the regression
(7), even though the difference between theX3iandX4iis very small.
•Price coefficients (b2) are negative in both (10) and (11), and the numerical
difference between the two is not vast. It is statistically significantly different from
zero, but, the|t|value of the b2in (10) is much greater than the corresponding|t|
value in (11). And, the standard error of b2in (10) is much smaller than that in
(11).
•The R
2
value in (10) with one explanatory variable is 0.9757, whereas in (11) with
two explanatory variables it is 0.9778, an increase of only 0.0021.
•The coefficient of theX4iis statistically insignificant, but it has the wrong sign.
•Despite the insignificance of theX4i, if we conduct theFtest of joint significance
thatβ2=β3= 0, theH0could be rejected.
•This is the case of near perfect linear relationship, or near perfect
multicollinearity. There exist a high degree of multicollinearity between X2and
X4.
Rimpy Kaushal (PGDAV College) Multicollinearity 8 / 15

Multicollinearity
Theoretical Consequences of Multicollinearity
•Even in the presence of near collinearity, the OLS estimators are unbiased. But
remember that unbiasedness is a repeated sampling property. In reality, we can
not say anything about the properties of estimates given in any given sample,
since, we rarely have the luxury of replicating samples.
•Near collinearity does not destroy the minimum variance property of OLS
estimators. In the class of all linear unbiased estimators, OLS estimators have
minimum variance. But minimum variance does not mean the numerical value of
the variance will be small.
•Multicollinearity is essentially a sample phenomenon in the sense that even if the
X variables are not linearly related in the population, they can be so related in a
particular sample. This happens because most economic data are not obtained
in controlled laboratory experiments. Data on variables such as GDP, prices,
unemployment, profits, and dividends are usually observed as they occur and are
not obtained experimentally. If these data could be obtained experimentally to
begin with, we would not allow collinearity to exist.
Rimpy Kaushal (PGDAV College) Multicollinearity 9 / 15

Multicollinearity
Practical Consequences of Multicollinearity
•Large variances and standard errors of OLS estimators. As we know, if the
standard error of an estimator increases, it becomes more difficult to estimate
the true value of the estimator. That is, there is a fall in the precision of OLS
estimators.
•Wider confidence intervals. Because of large standard errors, confidence intervals
for relevant population parameters tend to be large.
•Insignificanttratios. In cases of high collinearity the estimated standard errors
increase dramatically, thereby making t values smaller. Therefore, in such cases
we will increasingly accept the null hypothesis that the relevant true population
coefficient is zero.
•A high R
2
value but few significanttratios.
•OLS estimators and their standard errors become very sensitive to small changes
in the data; that is, they tend to be unstable.
•Wrong signs for regression coefficients.
•Difficulty in assessing the individual contributions of explanatory variables to the
explained sum of squares (ESS) or R
2
.
Rimpy Kaushal (PGDAV College) Multicollinearity 10 / 15

Multicollinearity
Detection of Multicollinearity
•High R
2
but few significant t ratios.
•High pairwise correlations among explanatory variables. Suppose we have three
explanatory variables, X2, X3, and X4. Let r23, r24, and r34represent the
pairwise correlations between X2and X3, between X2and X4, and between X3
and X4, respectively. Supposer23= 0.90, indicating high collinearity between
X2and X3.
•Examination of partial correlations. Now consider the partial correlation
coefficient, r23,4, which is the coefficient of correlation between X2and X3,
holding the influence of the variable X4constant.
•Subsidiary, or auxiliary, regressions. One way of finding out which X variable is
highly collinear with other X variables in the model is to regress each X variable
on the remaining X variables and to compute the corresponding R
2
. Each of
these regressions is called a subsidiary or an auxiliary regression.
Rimpy Kaushal (PGDAV College) Multicollinearity 11 / 15

Multicollinearity
Detection of Multicollinearity
•The variance inflating factor (VIF). The variances of slope coefficients can also
be written as;
var(b2) =
σ
2
P
x
2
2i
(1−R
2
2
)
=
σ
2
P
x
2
2i
V IF (12)
var(b3) =
σ
2
P
x
2
3i
(1−R
2
2
)
=
σ
2
P
x
2
3i
V IF (13)
In these formulasR
2
2is the coefficient of determination in the auxiliary
regression of X2on X3, and,
1
(1−R
2
2
)
is called the variance inflation factor (VIF),
because, asR
2
2increases, the variance and standard errors of b2and b3increases.
But note that, the variance of b2and b3, not only depends upon the VIF, but
also upon the variance ofui, (σ
2
), as well as on the variation in X2and X3.
Rimpy Kaushal (PGDAV College) Multicollinearity 12 / 15

Multicollinearity
Is Multicollinearity Necessarily Bad?
•The answer to this question depends on the purpose of the study. If the goal of
the study is to use the model to predict or forecast the future mean value of the
dependent variable, collinearity may not be bad.
•If the objective of the study is not only prediction but also reliable estimation of
the individual parameters of the chosen model, then serious collinearity may be
bad, because we have seen that it leads to large standard errors of the estimators.
•If the objective of the study is to estimate a group of coefficients (e.g., the sum
or difference of two coefficients) fairly accurately, this can be done even in the
presence of multicollinearity. In this case multicollinearity may not be a problem.
Rimpy Kaushal (PGDAV College) Multicollinearity 13 / 15

Multicollinearity
Remedial Measures
•Dropping Variables From the Model:The simplest solution might seem to be to
drop one or more of the collinear variables. But, the best practical advice is not
to drop a variable from an economically viable model just because the
collinearity problem is serious.
•Acquiring Additional Data or a New Sample:Since multicollinearity is a sample
feature, it is possible that in another sample involving the same variables,
collinearity may not be as serious as in the first sample.
•Rethinking the Model:Sometimes a model chosen for empirical analysis is not
carefully thought out-maybe some important variables are omitted, or maybe the
functional form of the model is incorrectly chosen. So rethinking of the model
can be a remedial measure.
•Prior Information about Some Parameters:Sometimes a particular phenomenon,
such as a demand function, is investigated time and again. From prior studies it
is possible that we can have some knowledge of the values of one or more
parameters. This knowledge can be profitably used in the current sample.
•Transformation of Variables:Occasionally, transformation of variables included
in the model can minimize, if not solve, the problem of collinearity.
Rimpy Kaushal (PGDAV College) Multicollinearity 14 / 15

Multicollinearity
References
•Christopher Dougherty,Introduction to Econometrics, Fourth ed., Oxford
University Press.
•Damodar N. Gujarati, Dawn C. Porter,Essentials of Econometrics, Fourth ed.,
McGraw Hill International.
•Damodar N. Gujarati, Dawn C. Porter,Basic Econometrics, Fifth ed., McGraw
Hill International.
Rimpy Kaushal (PGDAV College) Multicollinearity 15 / 15
Tags