This is chapter three of introductory econometrics.
Size: 1.76 MB
Language: en
Added: Jul 01, 2024
Slides: 52 pages
Slide Content
Chapter Three
Multiple Linear Regression
17/1/2024
Why more than one predictor variables?
–Morethanonevariableinfluencesa
dependentvariable.
–Predictorsmaythemselvesbecorrelated,but
theinterestiswhatistheindependent
contributionofeachvariableinexplainingthe
variationinthedependentvariable.
27/1/2024
Three fundamental aspects of linear
regression
Modelselection
Whatisthemostparsimonioussetof
predictorsthatexplainthemostvariationin
thedependentvariable
Evaluationofassumptions
Havewemettheassumptionsofthe
regressionmodel
Modelvalidation
Validatingthemodelresults
37/1/2024
Multiple Linear Regression Model
0
-Intercept
1
k
-Partial Regression slope coefficients
i
-Error term associated with the ith
observation
This model gives the expected value of Y conditional
on the fixed values of X
1
, X
2
, X
k
, plus erroriikkiii XXXY
22110
47/1/2024
Matrix Representation
•For a sample of size n the regression model is best
described as a system of equations:nnkknn
kk
kk
XXY
XXY
XXY
...
.
.
.
...
...
110
2221102
1111101
57/1/2024
•We can re-write these equations in a
matrix form as :
nknknn
k
k
n XXX
XXX
XXX
Y
Y
Y
2
1
1
0
21
22221
11211
2
1
1
1
1
Y = X +
(n 1) (n k) (k 1)(n 1)
67/1/2024
3.2. CLRM Assumptions
•Assumption 1: The expected value of the error
vector is 0
0
0
0
2
1
n
EE
77/1/2024
3.2. CLRM Assumptions
•Assumption 2: There is no correlation
between the ith and jth error terms
•This is called no autocorrelation0
jiE
87/1/2024
3.2. CLRM Assumptions
•Assumption 3: The errors exhibit constant
variance
•This is called homoscedasticity
•If errors don’t exhibit constant variance
then we call it hetroscedasticityIE
2
97/1/2024
3.2. CLRM Assumptions
Assumption 4: Covariance between the X’s
and error terms is 0
Usually satisfied if the predictor variables are
fixed and non-stochastic
X is called an exogeneousvariable
If the variable is not exogeneousthen it is
called an endogeneousvariable0,cov X
107/1/2024
3.2. CLRM Assumptions
Assumption 5: The rank of the data matrix, X is
k, the number of columns
For this to happen k < n, the number of
observations.
No exact linear relationships among X
variables.
Assumption of no multicollinearity
This is called an identification conditionkXr
117/1/2024
3.3.Least Squares Estimation
•Sample-based counter part to population regression
model:
•LS requires choosing values of b, such that residual
sum-of-squares (SSR) is as small as possible.eXbY
157/1/2024
167/1/2024
177/1/2024
The solution for the “b’s”
•Itshouldbeapparenthowtosolveforthe
unknownparameters
•Pre-multiplybytheinverseofXXYXXXbXXXX
11 YXXXb
1
•This is the fundamental outcome of OLS theory
187/1/2024
ANOVA Table
Source of
Variation
Sums-of-
Squares
dfMean
Square
F-ratio
Regression k -1 MSR/MSE
Residual n -k
Total n -12
YnYXb YXbYY YY 1
2
k
YnYXb kn
YXbYY
267/1/2024
Test of Multiple Restrictions
•Tests the null hypothesis:
H
0:
1
=
2
k
= 0
•Nullhypothesisisknownasajointor
simultaneoushypothesis,becauseit
comparesthevaluesofall
i
simultaneously
•Thistestsoverallsignificanceofregression
model
277/1/2024
•Hypothesis testing
–Significance of regression coefficients can be
tested using t-statistic
–Overall significance of the SRF can be tested as:
knRSS
kESS
dfRSS
dfESS
MSR
MSE
Fstatisticstest
oneleastatH
H
ko
/
1/
/
/
0:
0:
1
21
287/1/2024
The F-test statistic and R
2
vary directly
knYXbYY
kYnYXb
F
1
2
knESS
kRSS
F
1
knRSSTSS
kRSS
F
)(
1 )1(
)(
1
k
kn
TSSESS
TSSESS
F 11
2
2
k
kn
R
R
F
297/1/2024
Test statisticii
ii
cs
b
t
where c
ii
is the element of the ith row and ith column of
[]
-1
•Follows a tdistribution with n –kdf.iii
cskntb
;
2
•The 100(1-)% Confidence Interval is obtained from
307/1/2024
•Equality of regression coefficients
•Test of restrictions
43
ˆˆ
4343
431
43
ˆˆ
:
:
Se
tstatisticstest
H
H
o
32
22
ˆˆ
3232
321
32
33221
32
11
ˆˆ
1:
1:
lnlnln
Se
tstatisticstest
H
H
xxy
exxy
o
iii
iioi
i
317/1/2024
The simplest case of MLR-two explanatory variable regression model is
given as:
337/1/2024
Model Specification and Interpretation
•Let
•In this case, a one unit change in E will result in a
change in wage
•The semi-log model implies a non-constant increase in wagedE
w
dw
dE
wdw
dE
wd
Ew
iii
2
2
21
ln
ln
%100
2
457/1/2024
Model Specification and Interpretation
•Given
–Demand for xis a non-constant decreasing function
of price
–Whichimpliesthata1%increaseinpricewill
decreasequantitydemandedbyi
eapq
xx
xx
xx
x
x
ixx
pp
qq
p
q
paq
ln
ln
lnlnln
467/1/2024
Model Specification and Interpretation
•Dummy independent variables
1
0
1
0
ii
ii
Urban
Rurali
iii
UyE
UyE
U
Uy
477/1/2024
Model Specification and Interpretation
•Dummy independent variables
3
2
3322
31
03
21
02
3
2
1
3,2,1
ii
ii
ii
iiii
region
otherwisei
region
otherwisei
RyE
RyE
RyE
ddy
dd
Rregionlet
487/1/2024
Model Specification and Interpretation
•Models with interaction terms, say urban
and male
1
321
2
1
321
1
0
1
0
0,00,1
..
1,1
1,0
0,1
0,0
iiiiii
iii
iii
iii
iii
iiiiii
male
femalei
urban
rurali
muyEmuyE
femalesoneffecturbange
muyE
muyE
muyE
muyE
mumuy
mu
497/1/2024
Model Specification and Interpretation
•Models with interaction between a dummy
and a continuous variable
32
1
2
0
321
1
0
ii ui
i
ui
i
iiiiii
i
urban
rurali
E
y
E
y
EuEuy
educationofyearsEu
507/1/2024
Model Specification and Interpretation
•Models with interaction two continuous variables
•Insuchacase,graphingthefindingwillgivebetteranalysisi
i
i
i
i
i
iiiiii
ii
E
n
y
n
E
y
EnEny
educationofyearsEstatusnutritionn
3132
321
iiEvsn
32
517/1/2024