Chapter 3 Multiple linear regression.ppt

aschalewshiferaw 84 views 52 slides Jul 01, 2024
Slide 1
Slide 1 of 52
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52

About This Presentation

This is chapter three of introductory econometrics.


Slide Content

Chapter Three
Multiple Linear Regression
17/1/2024

Why more than one predictor variables?
–Morethanonevariableinfluencesa
dependentvariable.
–Predictorsmaythemselvesbecorrelated,but
theinterestiswhatistheindependent
contributionofeachvariableinexplainingthe
variationinthedependentvariable.
27/1/2024

Three fundamental aspects of linear
regression
Modelselection
Whatisthemostparsimonioussetof
predictorsthatexplainthemostvariationin
thedependentvariable
Evaluationofassumptions
Havewemettheassumptionsofthe
regressionmodel
Modelvalidation
Validatingthemodelresults
37/1/2024

Multiple Linear Regression Model

0
-Intercept

1

k
-Partial Regression slope coefficients

i
-Error term associated with the ith
observation
This model gives the expected value of Y conditional
on the fixed values of X
1
, X
2
, X
k
, plus erroriikkiii XXXY   
22110
47/1/2024

Matrix Representation
•For a sample of size n the regression model is best
described as a system of equations:nnkknn
kk
kk
XXY
XXY
XXY






...
.
.
.
...
...
110
2221102
1111101
57/1/2024

•We can re-write these equations in a
matrix form as :

























































nknknn
k
k
n XXX
XXX
XXX
Y
Y
Y












2
1
1
0
21
22221
11211
2
1
1
1
1
Y = X  + 
(n 1) (n k) (k 1)(n 1)
67/1/2024

3.2. CLRM Assumptions
•Assumption 1: The expected value of the error
vector is 0






























0
0
0
2
1

n
EE




77/1/2024

3.2. CLRM Assumptions
•Assumption 2: There is no correlation
between the ith and jth error terms
•This is called no autocorrelation0
jiE
87/1/2024

3.2. CLRM Assumptions
•Assumption 3: The errors exhibit constant
variance
•This is called homoscedasticity
•If errors don’t exhibit constant variance
then we call it hetroscedasticityIE
2

97/1/2024

3.2. CLRM Assumptions
Assumption 4: Covariance between the X’s
and error terms is 0
Usually satisfied if the predictor variables are
fixed and non-stochastic
X is called an exogeneousvariable
If the variable is not exogeneousthen it is
called an endogeneousvariable0,cov X
107/1/2024

3.2. CLRM Assumptions
Assumption 5: The rank of the data matrix, X is
k, the number of columns
For this to happen k < n, the number of
observations.
No exact linear relationships among X
variables.
Assumption of no multicollinearity
This is called an identification conditionkXr
117/1/2024

3.2. CLRM Assumptions
•Iftheseassumptionshold…
–ThentheOLSestimatorsareintheclass
ofunbiasedlinearestimators
–Alsominimumvarianceestimators
–InthiscasewesaythattheOLSestimators
areBLUE(BestLinearUnbiased
Estimators)
127/1/2024

3.2. CLRM Assumptions
•What does it mean to be BLUE?
–Allows us to compute a number of
statistics.
–OLS estimation
137/1/2024

3.2. CLRM Assumptions
Assumption6:Theerrortermsarenormally
distributed.
Notnecessarily,butwilleasestatisticalanalysis.
Assumption7:DGPforX
Xmaybefixedorrandom,butitisgeneratedbya
mechanismthatisnotrelatedtoε),0(~
2
N
i
147/1/2024

3.3.Least Squares Estimation
•Sample-based counter part to population regression
model:
•LS requires choosing values of b, such that residual
sum-of-squares (SSR) is as small as possible.eXbY 
157/1/2024

167/1/2024

177/1/2024

The solution for the “b’s”
•Itshouldbeapparenthowtosolveforthe
unknownparameters
•Pre-multiplybytheinverseofXXYXXXbXXXX 
 11 YXXXb 
1
•This is the fundamental outcome of OLS theory
187/1/2024

Assessment of “Goodness-of-Fit”
Usethecoefficientofdetermination(R
2
statistic)
givenby:
Itrepresentstheproportionofvariabilityin
responsevariablethatisaccountedforthe
explanatoryvariables
0R
2
1
GoodfitofmodelmeansthatR
2
willbecloseto
one.
PoorfitmeansthatR
2
willbenear0.SST
SSE
R
2
197/1/2024

R
2
–Coefficient of Determination
  YYYY
YYYY
TSSRSSR







ˆˆ
1/1
2  
kn
n
RRRadjusted
ynYY
ynYX
TSS
ESS
R
YXYYRSS
ynYXESS
ynYYTSS









1
1
'
''
ˆ
''
ˆ
'ˆ'ˆ
''
ˆ
'
222
2
2
2
2
2



207/1/2024

Critique of R
2
in Multiple
Regression
•R
2
isinflatedbyincreasingthenumberof
explanatoryvariablesinthemodel
•Oneshouldalsoanalyzetheresidualvalues
fromthemodel(MSR)
•AlternativelyusetheadjustedR
2
217/1/2024

Adjusted R
2
  
MSTMSR
nYYYY
knYYYY
R
/1
1/
/
ˆˆ
1
2







 22
;1 RRk 
227/1/2024

How does adjusted R-square work?
•TotalSum-of-Squaresisfixed,sinceitis
independentofthenumberofexplanatory
variables
•Thenumerator,SSR,decreasesasthenumberof
variablesincreases
•R
2
artificiallyinflatedbyaddingexplanatory
variablestothemodel
•UseAdjustedR
2
tocomparedifferentregression
•AdjustedR
2
takesintoaccountthenumberof
predictorsinthemodel
237/1/2024

3.4. Statistical Inference
•Inferencecanbemadeusing:
–1)hypothesistesting
–2)intervalestimation
•Tomakeinferencewewillneedtoimpose
distributionallimitsontheerrorterms
•Itturnsoutthattheprobabilitydistributionof
theOLSestimatorsdependsontheprobability
distributionoftheerrorterms,.
247/1/2024

ANOVA Approach
•Decompositionoftotalsums-of-squares
intocomponentsrelating
–explainedvariance(regression)
–unexplainedvariance(error)
257/1/2024

ANOVA Table
Source of
Variation
Sums-of-
Squares
dfMean
Square
F-ratio
Regression k -1 MSR/MSE
Residual n -k
Total n -12
YnYXb YXbYY  YY 1
2


k
YnYXb kn
YXbYY


267/1/2024

Test of Multiple Restrictions
•Tests the null hypothesis:
H
0: 
1
=
2

k
= 0
•Nullhypothesisisknownasajointor
simultaneoushypothesis,becauseit
comparesthevaluesofall
i
simultaneously
•Thistestsoverallsignificanceofregression
model
277/1/2024

•Hypothesis testing
–Significance of regression coefficients can be
tested using t-statistic
–Overall significance of the SRF can be tested as:
knRSS
kESS
dfRSS
dfESS
MSR
MSE
Fstatisticstest
oneleastatH
H
ko





/
1/
/
/
0:
0:
1
21

 
287/1/2024

The F-test statistic and R
2
vary directly 
 knYXbYY
kYnYXb
F



1
2 
knESS
kRSS
F



1 
knRSSTSS
kRSS
F



)(
1   )1(
)(
1 



k
kn
TSSESS
TSSESS
F 11
2
2




k
kn
R
R
F
297/1/2024

Test statisticii
ii
cs
b
t


where c
ii
is the element of the ith row and ith column of
[]
-1
•Follows a tdistribution with n –kdf.iii
cskntb 





;
2

•The 100(1-)% Confidence Interval is obtained from
307/1/2024

•Equality of regression coefficients
•Test of restrictions  
 
43
ˆˆ
4343
431
43
ˆˆ
:
:









Se
tstatisticstest
H
H
o     
 
32
22
ˆˆ
3232
321
32
33221
32
11
ˆˆ
1:
1:
lnlnln














Se
tstatisticstest
H
H
xxy
exxy
o
iii
iioi
i
317/1/2024

The simplest case of MLR-two explanatory variable regression model is
given as:
337/1/2024

Questions
1.Fittheregressionmodel(estimatetheparametersand
expresstheestimatedequation).
2.Findtheestimatorofthepopulationerrorvariance.
3.Computeandinterpretthecoefficientofdetermination
4.Testtheadequacyofthemodel.
5.Doesfoodpricesignificantlyaffectpercapitafood
consumption?Why?
6.Doespercapitaincomesignificantlyaffectfood
consumption?Why?
7.Interprettheresults.

367/1/2024

397/1/2024

407/1/2024

417/1/2024

427/1/2024

437/1/2024

Model Specification and Interpretation
•Linearity
–Supposemodelingreturnstoschooling
–WhereW
irepresentsearningsandE
irepresentyears
ofschooling
–AnyvalueofthecoefficientofE
iisinterpretedas
anincreaseinEbyoneunitwillresultinanincrease
inWby
–ThismeansthattheincreaseinW
iisconstant,say
forgoingfrom5
th
to6
th
gradeandfrom11
th
to12
th
grade,whichiscounterintuitive
–Itismoreintuitiveifreturnschangeinconstant
percentthanconstantabsolutetermsiii Ew  
21 2
447/1/2024

Model Specification and Interpretation
•Let
•In this case, a one unit change in E will result in a
change in wage
•The semi-log model implies a non-constant increase in wagedE
w
dw
dE
wdw
dE
wd
Ew
iii
2
2
21
ln
ln





 %100
2
457/1/2024

Model Specification and Interpretation
•Given
–Demand for xis a non-constant decreasing function
of price
–Whichimpliesthata1%increaseinpricewill
decreasequantitydemandedbyi
eapq
xx

 









xx
xx
x
x
ixx
pp
qq
p
q
paq
ln
ln
lnlnln 
467/1/2024

Model Specification and Interpretation
•Dummy independent variables
 
 






1
0
1
0
ii
ii
Urban
Rurali
iii
UyE
UyE
U
Uy
477/1/2024

Model Specification and Interpretation
•Dummy independent variables
 
 
 
 
3
2
3322
31
03
21
02
3
2
1
3,2,1











ii
ii
ii
iiii
region
otherwisei
region
otherwisei
RyE
RyE
RyE
ddy
dd
Rregionlet
487/1/2024

Model Specification and Interpretation
•Models with interaction terms, say urban
and male 
 
 
 
 
  
1
321
2
1
321
1
0
1
0
0,00,1
..
1,1
1,0
0,1
0,0













iiiiii
iii
iii
iii
iii
iiiiii
male
femalei
urban
rurali
muyEmuyE
femalesoneffecturbange
muyE
muyE
muyE
muyE
mumuy
mu
497/1/2024

Model Specification and Interpretation
•Models with interaction between a dummy
and a continuous variable
32
1
2
0
321
1
0











ii ui
i
ui
i
iiiiii
i
urban
rurali
E
y
E
y
EuEuy
educationofyearsEu
507/1/2024

Model Specification and Interpretation
•Models with interaction two continuous variables
•Insuchacase,graphingthefindingwillgivebetteranalysisi
i
i
i
i
i
iiiiii
ii
E
n
y
n
E
y
EnEny
educationofyearsEstatusnutritionn
3132
321









 iiEvsn
32
517/1/2024

End of chapter three
527/1/2024
Tags