Perfect Multicollinearity Perfect multicollinearity is a violation of Classical Assumption VI. It is the case where the variation in one explanatory variable can be completely explained by movements in another explanatory variable. Such a case between two independent variables would be: Where the X’s are independent variables in:
Perfect Multicollinearity (continued) Other examples of perfect linear relationships: Real world examples? -Distance between two cities. -Percent of voters voting in favor and against a proposition.
Perfect Multicollinearity (continued)
Perfect Multicollinearity (continued) OLS is incapable of generating estimates of regression coefficients where perfect multicollinearity is present. You cannot “hold all the other independent variables in the equation constant.” A special case related to perfect multicollinearity is a dominant variable . A dominant variable is so highly correlated with the dependent variable that it masks the effects of other independent variables. Don’t confuse dominant variables with highly significant variables.
Imperfect Multicollinearity Imperfect multicollinearity : linear functional relationship between two or more independent variables so strong that it can significantly affect the estimations of coefficients. It occurs when two (or more) independent variables are imperfectly linearly related, as in: Note u i , a stochastic error term in Equation (8.7)
Imperfect Multicollinearity (continued)
The Consequences of Multicollinearity The major consequences of multicollinearity are: 1. Estimates will remain unbiased. 2. The variances and standard errors of the estimates will increase. 3. The computed t -scores will fall. 4. Estimates will become sensitive to changes in specification. 5. The overall fit of the equation and estimation of the coefficients of non multicollinear variables will be largely unaffected.
The Consequences of Multicollinearity (continued) 1 . Estimates will remain unbiased. Even if an equation has significant multicollinearity , the estimates of β will be unbiased if first six Classical Assumptions hold. 2 . The variances and standard errors of the estimates will increase. With multicollinearity , it becomes difficult to precisely identify the separate effects of multicollinear variables. OLS is still BLUE with multicollinearity . But the “minimum variances” can be fairly large.
The Consequences of Multicollinearity (continued)
The Consequences of Multicollinearity (continued) 3 . The computed t-scores will fall . Multicollinearity tends to decrease t -scores mainly because of the formula for the t -statistic. If standard error increases, t -score must fall. Confidence intervals also increase because standard errors increase.
The Consequences of Multicollinearity (continued) 4 . Estimates will become sensitive to changes in specification. Adding/dropping variables and/or observations will often cause major changes in β estimates when significant multicollinearity exists. This occurs because with severe multicollinearity OLS is forced to emphasize small differences between variables in order to distinguish the effect of one multicollinear variable.
The Consequences of Multicollinearity (continued) 5 . The overall fit of the equation and estimation of the coefficients of nonmulticollinear variables will be largely unaffected. will not fall much, if at all, with significant multicollinearity . Combination of high and no statistically significant variables is an indication of multicollinearity . It is possible for an F -test of overall significance to reject the null even though none of the individual t -tests do.
Two Examples of the Consequences of Multicollinearity Example : Student consumption function w here: CO i = annual consumption expenditures of the i th student on items other than tuition and room and board. Yd i = annual disposable income (including gifts) of that student LA i = liquid assets (savings, etc.) of the i th student ε i = stochastic error term
Two Examples of the Consequences of Multicollinearity (continued) Estimate Equation 8.9 with OLS: Including only disposable income:
Two Examples of the Consequences of Multicollinearity (continued) Example : Demand for gasoline by state w here: PCON i = petroleum consumption in the i th state (trillions of BTUs) UHM i = urban highway miles within the i th state TAX i = gasoline tax in the i th state (cents per gallon) REG i = motor vehicle registrations in the i th state (thousands)
Two Examples of the Consequences of Multicollinearity (continued) Estimate Equation 8.12 with OLS: If you drop UHM:
The Detection of Multicollinearity Multicollinearity exists in every equation. Important question is how much exists. The severity can change from sample to sample. There are no generally accepted, true statistical tests for multicollinearity . Researchers develop a general feeling for the severity of multicollinearity by examining a number of characteristics. Two common ones are: 1. Simple correlation coefficient 2. Variance inflation factors
High Simple Correlation Coefficients The simple correlation coefficient , r, is a measure of the strength and direction of the linear relationship of two variables. Range of r is +1 to -1. Sign of r indicates the direction of the correlation. If r, in absolute value, is high, then the two variables are quite correlated and multicollinearity is a potential problem.
High Simple Correlation Coefficients (continued) How high is high? Some researchers select arbitrary number, such as 0.80. Better answer might be r is high if it causes unacceptable large variances. The use of r to detect multicollinearity has a major limitation: groups of variables acting together can cause multicollinearity without any single simple correlation coefficient being high.
High Variance Inflation Factors (VIFs) Variance inflation factor (VIF) is a method of detecting the severity of multicollinearity by looking at the extent to which a given explanatory variable can be explained by all other explanatory variables in an equation. Suppose the following model with K independent variables: Need to calculate a VIF for each of the K independent variables.
High Variance Inflation Factors (VIFs) (continued) To calculate VIFs: 1. Run an OLS regression that has X i as a function of all the other explanatory variables in the equation. 2. Calculate the variance inflation factor for
High Variance Inflation Factors (VIFs) (continued) The higher the VIF, the more severe the effects of multicollinearity . But, there are no formal critical VIF values. A common rule of thumb: if VIF > 5, multicollinearity is severe. It’s possible to have large multicollinearity effects without having a large VIF.
Remedies for Multicollinearity Remedy 1: Do nothing Existence of multicollinearity might not mean anything (i.e. coefficients still significant and meet expectations). If you delete a multicollinear variable that belongs in model, you cause specification bias. Every time a regression is rerun, we risk encountering a specification that accidently works on the specific sample.
Remedies for Multicollinearity (continued) Remedy 2: Drop a redundant variable Two or more variables in an equation measuring essentially the same thing might be called redundant . Dropping redundant variable is nothing more than making up for a specification error. In case of severe multicollinearity , it makes no statistical difference which variable is dropped. The theoretical underpinnings of model should be the basis for dropping a redundant variable.
Remedies for Multicollinearity (continued) Example : Student consumption function:
Remedies for Multicollinearity (continued) Remedy 3: Increase the size of the sample Normally, a larger sample will reduce the variance of the estimated coefficients diminishing impact of multicollinearity . Unfortunately, while a useful alternative to be considered, it may be impossible.
An Example of Why Multicollinearity Often is Best Left Unadjusted Example : Impact of marketing on soft drink sales w here: S t = sales of the soft drink in year t P t = average relative price of the drink in year t A t = advertising expenditures for the company in year t B t = advertising expenditures for the company’s main competitor in year t.
An Example of Why Multicollinearity Often is Best Left Unadjusted (continued) If variable B is dropped: Note expected bias in estimated coefficient of A t :