ECONOMETRICS introductory and LECTURE NOTESa.pdf

almazwmbashira 171 views 187 slides May 28, 2024
Slide 1
Slide 1 of 231
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146
Slide 147
147
Slide 148
148
Slide 149
149
Slide 150
150
Slide 151
151
Slide 152
152
Slide 153
153
Slide 154
154
Slide 155
155
Slide 156
156
Slide 157
157
Slide 158
158
Slide 159
159
Slide 160
160
Slide 161
161
Slide 162
162
Slide 163
163
Slide 164
164
Slide 165
165
Slide 166
166
Slide 167
167
Slide 168
168
Slide 169
169
Slide 170
170
Slide 171
171
Slide 172
172
Slide 173
173
Slide 174
174
Slide 175
175
Slide 176
176
Slide 177
177
Slide 178
178
Slide 179
179
Slide 180
180
Slide 181
181
Slide 182
182
Slide 183
183
Slide 184
184
Slide 185
185
Slide 186
186
Slide 187
187
Slide 188
188
Slide 189
189
Slide 190
190
Slide 191
191
Slide 192
192
Slide 193
193
Slide 194
194
Slide 195
195
Slide 196
196
Slide 197
197
Slide 198
198
Slide 199
199
Slide 200
200
Slide 201
201
Slide 202
202
Slide 203
203
Slide 204
204
Slide 205
205
Slide 206
206
Slide 207
207
Slide 208
208
Slide 209
209
Slide 210
210
Slide 211
211
Slide 212
212
Slide 213
213
Slide 214
214
Slide 215
215
Slide 216
216
Slide 217
217
Slide 218
218
Slide 219
219
Slide 220
220
Slide 221
221
Slide 222
222
Slide 223
223
Slide 224
224
Slide 225
225
Slide 226
226
Slide 227
227
Slide 228
228
Slide 229
229
Slide 230
230
Slide 231
231

About This Presentation

Economics course reading material


Slide Content

HASSEN ABDA .
INTRODUCTION TO INTRODUCTION TO
ECONOMETRICS ECONOMETRICS
(ECON. 352) (ECON. 352)
HASSEN A. (M.Sc.) HASSEN A. (M.Sc.)JIMMA UNIVERSITY
2008/09 CHAPTER 1 - 1

HASSEN ABDA .
CHAPTER ONE
INTRODUCTION
1.1 The Econometric Approach
1.2 Models, Economic Models &
Econometric Models
1.3 Types of Data for Econometric
AnalysisJIMMA UNIVERSITY
2008/09 CHAPTER 1 - 2 HASSEN A.

HASSEN ABDA .
1.1 The Econometric Approach 1.1 The Econometric Approach WHAT IS ECONOMETRICS?
1
Econometrics means “economic
measurement”
1
In simple terms, econometrics deals
with the application of statistical
methods to economics.
1
The application of
mathematical &
statistical techniques
to data in order to
collect evidence on questions of interest
to economics. JIMMA UNIVERSITY
2008/09 CHAPTER 1 - 3 HASSEN A.

HASSEN ABDA .
1.1 The Econometric Approach 1.1 The Econometric Approach
1
Unlike economic statistics, which
mainly collects & summarizes statistical
data, econometrics combines
economic
theory
,
mathematical economics
,
economic statistics & mathematical
statistics:
.
economic theory
:
providing the
theory
, or, imposing a logical
structure on the form of the
question). e.g., when price goes
up, quantity demanded goes down.JIMMA UNIVERSITY
2008/09 CHAPTER 1 - 4 HASSEN A.

HASSEN ABDA .
.
mathematical economics
:
expressing economic theory
using math
(mathematical form).
.
economic statistics
: data
presentation & description.
.
mathematical statistics
:
estimation & testing techniques.
1.1 The Econometric Approach 1.1 The Econometric ApproachJIMMA UNIVERSITY
2008/09 CHAPTER 1 - 5 HASSEN A.

HASSEN ABDA .
1.1 The Econometric Approach 1.1 The Econometric Approach Goals/uses of econometrics
1
Estimation/measurement of
economic parameters or
relationships, which may be needed
for policy-or decision-making;
1
Testing (& possibly refining)
economic theory;
1
Forecasting/prediction of future
values of economic magnitudes; &
1
Evaluation of policies/programs.JIMMA UNIVERSITY
2008/09 CHAPTER 1 - 6 HASSEN A.

HASSEN ABDA .
1.2 Models, Economic Models & Econometric Models 1.2 Models, Economic Models & Econometric Models
1
Model: a simplified representation of
the real world phenomena.
Combines the economic model
with assumptions about the
random nature of the data
MODEL
ECONOMIC MODEL
ECONOMETRIC
MODELJIMMA UNIVERSITY
2008/09 CHAPTER 1 - 7 HASSEN A.

HASSEN ABDA .
1. Economic theory or model
2. Econometric model: a
statement of the economic theory
in an empirically testable form
6. Tests of any hypothesis
suggested by the economic model
7. Interpreting results & using the
model for prediction & policy
5. Estimation of the model
3. Data
4. Some
priori
information
1.2 Models, Economic Models & Econometric Models 1.2 Models, Economic Models & Econometric ModelsJIMMA UNIVERSITY
2008/09 CHAPTER 1 - 8 HASSEN A.

HASSEN ABDA .
1.2 Models, Economic Models & Econometric Models 1.2 Models, Economic Models & Econometric Models
1. Statement of theory or hypothesis:
e.g. Theory: people increase
consumption as income increases, but
not by as much as the increase in their
income.
2. Specification of mathematical model:
C = α+ βY; 0 < β< 1.
where:C = Consumption,
Y = Income,
β= slope = MPC = ∆C/∆Y,
α= interceptJIMMA UNIVERSITY
2008/09 CHAPTER 1 - 9 HASSEN A.

HASSEN ABDA .
1.2 Models, Economic Models & Econometric Models 1.2 Models, Economic Models & Econometric Models 3. Specification of econometric (statistical)
model:
C = α+ βY +
ɛ
; 0 < β< 1.
α= intercept = autonomous
consumption
ɛ
= error/stochastic/disturbance term. It
capturesseveral factors:
 
omitted variables,
 
measurement error in the dependent
variable and/or wrong functional form.
 
randomness of human behaviorJIMMA UNIVERSITY
2008/09 CHAPTER 1 - 10 HASSEN A.

HASSEN ABDA .
1.2 Models, Economic Models & Econometric Models 1.2 Models, Economic Models & Econometric Models
4. Obtain data
….
5. Estimate parameters of the model:
How?
3 methods!
Suppose
6. Hypothesis testing:
Is 0.8 statistically <1?
7. Interpret the results & use the model for
policy or forecasting: 1
A 1 Br. increase in income induces an 80
cent rise in consumption,
on average
.
1
If Y = 0, then
average
C = 184.08
i i
Y C8.0 08. 184
ˆ
+ =JIMMA UNIVERSITY
2008/09 CHAPTER 1 - 11 HASSEN A.

HASSEN ABDA .
1
Predict the level of C for a given Y,
1
Pick the value of the control variable
(Y) to get a desired value of the target
variable (C), …
1.2 Models, Economic Models & Econometric Models 1.2 Models, Economic Models & Econometric ModelsJIMMA UNIVERSITY
2008/09 CHAPTER 1 - 12 HASSEN A.

HASSEN ABDA .
1
Time series
data: a set of observations on
the values that a variable takes at different
times. e.g. money supply, unemployment
rate, …over years.
1
Cross-sectional
data: data on one or more
variables collected at the same point in
time.
1
Pooled
data: cross-sectional observations
collected over time, but the units don’t
have to be the same.
1
Longitudinal/panel
data: a special type of
pooled data in which the same cross-
sectional unit (say, a family or a firm) is
surveyed over time.
1.3 Types of Data for Econometric Analysis 1.3 Types of Data for Econometric AnalysisJIMMA UNIVERSITY
2008/09 CHAPTER 1 - 13 HASSEN A.

HASSEN ABDA CHAPTER TWO
SIMPLE LINEAR REGRESSION
2.1 The Concept of Regression Analysis
2.2 The Simple Linear Regression Model
2.3 The Method of Least Squares
2.4 Properties of Least-Squares Estimators and the
Gauss-Markov Theorem
2.5 Residuals and Goodness of Fit
2.6 Confidence Intervals and Hypothesis Testing in
Regression Analysis
2.7 Prediction with the Simple Linear RegressionJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 1 HASSEN A.

HASSEN ABDA 2.1 The Concept of Regression Analysis

Origin of the word regression!

Our objective in regression analysis is to find out
how the average value of the dependent variable (or
the regressand) varies with the given values of the
explanatory variable (or the regressor/s).

Compare regression & correlation! (dependence vs.
association).

The key concept underlying regression analysis is
the conditional expectation function (CEF), or
population regression function (PRF).
) ( ] |[
i i
Xf XYE
=JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 2 HASSEN A.

HASSEN ABDA 2.1 The Concept of Regression Analysis
ɛ
For empirical purposes, it is the stochastic PRF that
matters.
ɛ
The stochastic disturbance term ɛ
i
plays a critical
role in estimating the PRF.
ɛ
The PRF is an idealized concept, since in practice
one rarely has access to the entire population.
ɛ
Usually, one has just a sample of observations.
ɛ
Hence, we use the stochastic sample regression
function (SRF) to estimate the PRF, i.e., we
use: to estimate .
i i i
XYE Y
e
+
=
]|[
)
i
e,
i
Yf
i
Y
ˆ
( =
)
i
e,
i
XYEf
i
Y
]
|[(
=
)f(X Y
i i
=
ˆJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 3 HASSEN A.

HASSEN ABDA .
2.2 The Simple Linear Regression Model
⇒We assumelinear PRFs, i.e., regressions that are
linear in parameters ( αand β). They may or may not
be linear in variables (Y or X).
⇒Simple because we have only one regressor (X).
⇒Accordingly, we use:
. ,
ˆ

ly respective , and
of estimates re sample a from and
i
a
i
e
e ba
ba

.
i i
X XYE estimate to
i
X
i
Y
b a b a
+ = + =] |[
ˆ
ˆ ˆ
i i
X XYE
b
a
+
=
]|[
i i i
X Y
e
b
a
+
+
=
⇒JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 4 HASSEN A.

HASSEN ABDA 2.2 The Simple Linear Regression Model
ɛUsing the theoretical relationship between X and Y,
Y
i
can be decomposed into its non-stochastic
component
α+βX
i
and its random component
ɛ
i.
ɛThis is a theoretical decomposition because we do
not know the values of αand β, or the values of ɛ.
ɛAn operational decomposition of Y (used for
practical purposes) is with reference to the fitted
line. The actual value of Yis equal to the fitted
value plus the residual
e
i.
ɛThe residuals e
i
serve a similar purpose as the
stochastic term ɛ
i, but the two are not identical.
i i
X Y
b a
ˆ
ˆ ˆ
+ =JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 5 HASSEN A.

HASSEN ABDA 2.2 The Simple Linear Regression Model

From the PRF:

From the SRF:
i i i
X YE Y
i
e
+
=
] | [
i
e
i
Y
i
Y+ =
ˆ
] |[
i i i
X YE Y
i
-
=
e
i i i
X XYE but
b
a
+
=
]|[,
iiii iiii iiii
β
X
β
X
β
X
β
X
αααα
YYYY
εεεε
-
-
=
i i i
Y Y e
ˆ
- =
i
X Y but
i
ba
ˆ
ˆ ˆ
+=
iiii
XXXXββββααααYYYY eeee
iiii iiii
ˆˆ-- =JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 6 HASSEN A.

HASSEN ABDA 2.2 The Simple Linear Regression Model
O
1
P
4
α
X
P
3
P
2
O
4
O
3
O
2
P
1
E[Y|X
i] =
α+ βX
i
Y
ɛ
1
ɛ
2
ɛ
3
ɛ
4
X
1
X
2
X
3
X
4
E[Y|X
2
] =
α+ βX
2JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 7 HASSEN A.

HASSEN ABDA 2.2 The Simple Linear Regression Model
i i
X Y SRF
b a
ˆ
ˆ ˆ
:+ =
O
1
P
4
α
X
P
3
P
2
O
4
O
3
O
2
P
1
PRF: Y
i
=
α+ βX
i
Y
ɛ
1
ɛ
2
ɛ
3
ɛ
4
e
1
e
2
e
3
e
4
a
ˆ
R
1
R
2
R
3
R
4
Ɛ
i
&e
i
are
not identical
Ɛ
1
< e
1
Ɛ
2
= e
2
Ɛ
3
< e
3
Ɛ
4
> e
4
X
1
X
2
X
3
X
4JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 8 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
Remember that our sample is only one of the large
number of possibilities.
Implication
: the SRF line in the figure above is just
one of the many possible such lines. Each of the SRF lines has unique values.
Then, which of these lines should we choose?

Generally we will look for the SRF which is very cl ose
to the PRF.
But, how can we devise a rule that makes the SRF as close as possible to the PRF? Equivalently, how
can we choose the best technique to estimate the
parameters of interest ( αandβ)?
b a
ˆ
ˆ and JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 9 HASSEN A.

HASSEN ABDA Generally speaking, there are 3 methods of
estimation:

method of least squares,

method of moments, and

maximum likelihood estimation.
The most common method for fitting a regression
line is the method of least-squares. We will use
the LSE, specifically, the Ordinary Least Squares
(OLS) in Chapters 2 and 3. What does the OLS do?
2.3 The Method of Least SquaresJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 10 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
A line gives a good fit to a set of data if the poi nts
(actual observations) are close to it. That is, the
predicted values obtained by using the line should
be close to the values that were actually observed.
Meaning, the residuals should be small. Therefore,
when assessing the fit of a line, the vertical dist ances
of the points to the line are the only distances th at
matter because errors are measured as vertical
distances.
The OLS method calculates the best-fitting line for
the observed data by minimizing the sum of the
squares of the vertical deviations from each data
point to the line (the RSS). JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 11 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares

Minimize RSS =

We could think of minimizing RSS by successively
choosing pairs of values for until R SS is
made as small as possible

But, we will use differential calculus (which turns
out to be a lot easier).

Why the
squares
of the residuals? Why not just
minimize the sum of the residuals?

To prevent negative residuals from cancelling
positive ones.Because the deviations are first
squared, then summed, there are no cancellations
between positive and negative values.

=
n
i
i
e
1
2
b a
ˆ
ˆ and JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 12 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares

If we use , all the error terms e
i
would receive
equal importance no matter how close or how
widely scattered the individual observations are
from the SRF.

A consequence of this is that it is quite possible that
the algebraic sum of the e
i
is small (even zero)
although the e
isare widely scattered about the SRF.

Besides, the OLS estimates possess desirable
properties of estimators under some assumptions.

OLS Technique:

=
n
i
i
e
1
∑ ∑-- = - =
= = =

n
i
n
i
i i i i
n
i
i
)Xβα (Y )Y (Y
e
β ,α
minimize
1 1
2 2
1
2
ˆˆ ˆ
ˆ
ˆJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 13 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
F.O.C.
: (1)
0
ˆ
])
ˆ ˆ ( [
0
ˆ
) (
1
2
1
2
=

- - ¶

=


∑ ∑
= =
a
b a
a
n
i
i i
n
i
i
X Y e
0]1 ][)
ˆ ˆ ( .[2
1
=- - -
⇒∑
=
n
i
i i
X Y
b a
0)
ˆ ˆ (
1
= - - ⇒

=
n
i
i i
X Y
b a
0
ˆ ˆ
1 1 1
= - - ⇒
∑ ∑ ∑
= = =
n
i
i
n
i
n
i
i
X Y
b a
0
ˆ
ˆ= -- ⇒X Y
ba
XXXXββββ YYYYαααα
ˆ
ˆ-=

.0
ˆ ˆ
1 1
= - - ⇒
∑ ∑
= =
n
i
i
n
i
i
X n Y
b aJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 14 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
F.O.C.
: (2)
0
ˆ
])
ˆ ˆ ( [
0
ˆ
) (
1
2
1
2
=

- - ¶
⇒ =


∑ ∑
= =
b
b a
b
n
i
i i
n
i
i
X Y e
0] ][)
ˆ ˆ ( .[2
1
= - -- ⇒

=
i
n
i
i i
X X Y
b a
0 )] ()
ˆ ˆ [(
1
= - - ⇒

=
i
n
i
i i
X X Y
b a
0
ˆ ˆ
1
2
1 1
= - -

∑ ∑ ∑
= = =
n
i
i
n
i
i i
n
i
i
X X XY
b a
∑ ∑ ∑
= = =
+ = ⇒
n
i
i
n
i
i i
n
i
i
X X XY
1
2
1 1
ˆ ˆb aJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 15 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
Solve and
(called
normal equations
) simultaneously!
∑+ ∑= ∑
2222
iiii iiii iiii iiii
XXXXββββ XXXXαααα XXXXYYYY
ˆ
ˆ
∑ ∑ ∑
= = =
+ =
n
i
i
n
i
i i
n
i
i
X X XY
1
2
1 1
ˆ ˆb a
XXXXββββ YYYYαααα
ˆ
ˆ-=
∑+ ∑ - = ∑⇒
2
ˆ
ˆ
i i i i
Xβ)X)(X βY( XY

+

-

=
∑⇒
2
ˆ
ˆ
i i i i i
X
β
XX
β
XY XY
∑ - ∑= ∑- ∑⇒
i i i i i
X Xβ Xβ X Y XY
ˆ
ˆ
2
)XX X(
β
XY XY
i i i i i

-

=

-
∑⇒
2
ˆ
)Xn X(βYXn XY
2 2
i ii
- ∑= - ∑⇒
ˆ
.X n X
n
X
X b/c
i
i
= ∑Û

=JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 16 HASSEN A.

HASSEN ABDA 17
2.3 The Method of Least Squares
Thus,
T
o easily recall
the formula:
Alternative expressions for :
ββββ
ˆ
2 2
ˆ
.1
X n
i
X
YXn
i
X
i
Y
β

-

-

=
) (
2
) )( (
ˆ
.4
∑- ∑
∑ ∑- ∑
=
i
X
i
Xn
i
Y
i
X
i
X
i
Yn
β

-

- -
=
2
) (
) )( (
ˆ
.2
X Xi
Y
i
YX
i
X
β

)(
),(
ˆ
.3
X Var
YX Cov
β =
. :
ˆ
2
Y
i
Yy & X
i
Xx where
x
xy
β
- = - =


=JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 17 HASSEN A.

HASSEN ABDA 18
2.3 The Method of Least Squares
for just use:
Or, if you wish:
XXXXββββ YYYYαααα
ˆ
ˆ- =
]}
2
Xn
2
i
X
YXn
i
X
i
Y
.[X{ Yα
- ∑
- ∑
-=ˆ
2
Xn
2
i
X
i
X
i
YX
2
i
XY
α
- ∑
∑- ∑
= ⇒ˆ
2
Xn
2
i
X
]Y
2
Xn
i
X
i
YX Y]
2
Xn
2
i
X
α
- ∑
- ∑ - - ∑
= ⇒
[ [
ˆ
2
Xn
2
i
X
Y
2
Xn
i
X
i
YXY
2
Xn
2
i
XY
α
- ∑
+ ∑- - ∑
= ⇒ˆ
) (
) )( () )( (
ˆ
2
Xn
2
i
Xn
i
X
i
Y
i
X
2
i
X
i
Y
α
-

∑ ∑
-
∑ ∑
=

αααα
ˆJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 18 HASSEN A.

HASSEN ABDA 19
2.3 The Method of Least Squares
⇒Previously, we came across the following two
normal equations:
this is equivalent to:
equivalently,
⇒Note also the following property:
0)] ()
ˆˆ [(
1
= --

=
i
n
i
i i
X X Y 2.
ba
0
)
ˆˆ ( .1
1
= --

=
n
i
i i
X Y
ba
0
1
=

=
n
i
i
e
0
1
=

=
n
i
i i
Xe
i
e
i
Y
i
Y+=
ˆ

Y
Y
ˆ
=
n
i
e
n
i
Y
n
i
Y



+ = ⇒
ˆ



+ = ⇒
i
e
i
Y
i
Y
ˆ
.0 0
ˆ
= Û= = ⇒

e
i
e since Y YJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 19 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
The facts that and have the same average a nd
that this average value is achieved at the average
value of X (i.e., & ) tog ether imply
that the sample regression line passes through the
sample mean/average values of X and Y.

Y
Y
Y
ˆ
=
i i
X Y
baˆ
ˆ ˆ
+=
X
Y
Y
X
X Y
baˆ
ˆ+=JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 20 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
Assumptions Underlying the Method of Least Squares βTo obtain the estimates of αand β, assuming that
our model is correctly specified
and that the
systematic and the stochastic components in the
equation are independent
suffice.
βBut the objective in regression analysis is not onl y
to obtain but also to draw inferences about
the true
. For example, we’d like to know
how close are to or to .
βTo that end, we must not only specify the functiona l
form of the model, but also make certain assumps
aboutthe manner in which are generated.
i
Y
] |[
i
XYE
i
Y
ˆ
b
a
ˆ
ˆ and
b
a
ˆ
ˆ and
b
a
and
b
a
and JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 21 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
Assumptions Underlying the Method of Least Squares
The PRF shows that depends on
both .

Therefore, unless we are specific about how are created or generated, there is no way we can
make any statistical inference about the and a lso
about .
Thus, the assumptions made about the X variable
and the error term are extremely critical to the va lid
interpretation of the regression estimates.
i
Y
i
Y
b
a
and
i i i
X Y
e
b
a
+
+
=
i i
and X
e
i i
and X
eJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 22 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
THE ASSUMPTIONS:
1. Zero mean value of disturbance, ɛ
i:
E(ɛ
i|X
i) = 0
.
Or equivalently,
E[Y
i|X
i] = α+ βX
i.
2. Homoscedasticity or equal variance of ɛ
i. Given the
value of X, the variance of ɛ
i
is the same (finite
positive constant σ
2
) for all observations. That is,
var(ɛ
i|X
i)
= E[ɛ
i–E(ɛ
i|X
i)]
2
= E(ɛ
i)
2
= σ
2
.
By implication:
var(Y
i|X
i) = σ
2
.
var(Y
i|X
i)
= E{α+βX
i+ɛ
i
–(α+βX
i)}
2
= E(ɛ
i)
2
=
σ
2
for all i. JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 23 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
3. No autocorrelation
between the disturbance terms.
Each random error term ɛ
i
has zero covariance
with, or is uncorrelated with, each and every other
random error term ɛ
s
(
for s ≠i
).
cov(ɛ
i,ɛ
s
|X
i,X
s
)
= E{[ɛ
iE(ɛ
i)]|X
i}{[ɛ
s
E(ɛ
s
)]|X
s
} =
E(ɛ
i|X
i)(ɛ
s
|X
s
)
= 0
.
Equivalently,
cov(Y
i,Y
s
|X
i,X
s
)= 0
. (for all s ≠i).
4. The disturbance ɛand explanatory variable Xare
uncorrelated.
cov(ɛ
i,X
i) = 0.
cov(ɛ
i,X
i)
= E[ɛ
iE(ɛ
i)][X
iE(X
i
)]
= E[ɛ
i(X
iE(X
i ))]
= E(ɛ
iX
i)E(X
i)E(ɛ
i) = E(ɛ
iX
i)
= 0JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 24 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
5.The error terms are normally and independently
distributed, i.e., .
Assumptions 1 to 3 together imply that .
The normality assumption enables us to derive the
sampling distributions of the OLS estimators (
). This simplifies the task of establishing
confidence intervals and testing hypotheses.
6.X
is assumed to be
non-stochastic
, and
must take at
least two different values
.
7.The number of observations
nmust be greater than
the number of parametersto be estimated.
n > 2 in this case
.
) ,0( ~
2
s e
NID
i
b a
ˆ
ˆ and
) ,0( ~
2
s e
IID
iJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 25 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
Numerical Example
:
Explaining sales
= f(advertising) Sales are in
thousands
of Birr &
advertising
expenses are in
hundreds of Birr.
10
11
10
9
7
10
6
12
10
11
Sales (
Y
i)
10
10
9
9
7
8
6
7
8
6
8
5
5
4
10
3
7
2
10
1
Advertising Expense (
X
i) Firm (i)JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 26 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
.
10
10
10
80
9
7
6
8
8
5
10
7
10X
i
9611
10
9
7
10
6
12
10
11Y
i
Ʃ9
8
7
6
5
4
3
2
1
i
6.9
10
96
10
1
=
= =

=

n
Y
Y
i
i
8
10
80
10
1
=
= =

=

n
X
X
i
i
X X x
i i
- =
ii
yx
0.4
0
-1.4
-0.4
-0.6
-2.6
0.4
-3.6
2.4
0.4
1.4
Y Y y
i i
- =
02
1
-1
-2
0
0
-3
2
-1
2
0.8 211.4
-0.4
1.2
0
0
10.8
4.8
-0.4
2.8JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 27 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
.
28
4
1
1
4
0
0
9
4
1
4
2
0.16 0.4
10
30.41.96
0.16
0.36
6.76
1.96
12.96
5.76
0.16
1.96
01
-1
-2
0
0
-3
2
-1
2
0
-1.4
-0.4
-0.6
-2.6
0.4
-3.6
2.4
0.4
1.4
Ʃ9
8
7
6
5
4
3
2
1
i
75.0
28
21
ˆ
2
= = =


i
ii
x
yx
b
i
y
6.3 )8( 75.0 6.9
ˆ ˆ
= - =
- =

X Y
b a
i
x
2
i
y
2
i
xJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 28 HASSEN A.

HASSEN ABDA 2.3 The Method of Least Squares
.
11.10 10
96
10.35
8.85
8.10
9.60
9.60
7.35
11.10
8.85
11.1
Ʃ9
8
7
6
5
4
3
2
1
i
i i
X Y75.0 6.3
ˆ
+ =
2
i
e
1.21
-1.10
0
0.65
1.15
0.90
-2.60
0.40
-1.35
0.90
1.15
-0.10
14.650.4225
1.3225
0.81
6.76
0.16
1.8225
0.81
1.3225
0.01
i i i
YYe
ˆ
-=
65. 14
2
=

i
e
75. 15 ˆ
2
=

i
y
4. 30
2
=

i
y
0
ˆ
= =
= =




i
i i i
e
y x y
JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 29 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem
☞Given the assumptions of the classical linear
regression model, the least-squares estimators
possess some ideal or optimum properties.
These statistical properties are extremely importan t
because they provide criteria for choosing among
alternative estimators.
These properties are contained in the well-known
Gauss–Markov Theorem.JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 30 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem
Gauss-Markov Theorem: Under the above
assumptions of the linear regression model, the
estimators have the smallest variance of
all linear and unbiased estimators of . That
is, OLS estimatorsare the B
est L
inear U
nbiased
E
stimators (BLUE) of .
The Gauss-
Markov Theorem does not depend on the
assumption of normality (of the error terms).
Let us prove that is the BLUE of !
b a
ˆ
ˆ and
b
a
and
b
a
and
b
ˆ
bJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 31 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem

Linearity of : (in a stochastic variable, ).
)0 (sin

ˆ
2
2 2
= =
- =





∑∑
i
i
ii
i
i
i
ii
x ce
x
Yx
x
x Y
x
Yx
b
i
i
i
Y
x
x


= ⇒) (
ˆ
2
b


= = ⇒
2
ˆ
i
i
i ii
x
x
k where Y k
b
nn
Yk YkYk++ + = ⇒...
ˆ
22 11
b
∑∑
∑∑


∑∑
- =
-
= =
2 2
2 2
) (
ˆ
i
i
i
ii
i
i i
i
ii
x
Yx
x
Yx
x
Y Yx
x
yx

b
b
ˆ
i i
or Y
e
JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 32 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem
Note that:
(1) is a constant
(2)becausex
i
is non-stochastic, k
i
is also nonstochastic
(3).
(4).
(5).
(6).
0 ) (
2 2
= = =
∑∑



i
i
i
i
i
x
x
x
x
k
1 )() (
2
2
2
= = =
∑∑



i
i
i
i
i
ii
x
x
x
x
x
xk
.
1
) (
)] [(
2 22
2
2
2
2
∑ ∑




= = =
i i
i
i
i
i
x x
x
x
x
k
1
) () ( )() (
2 2 2
2 2
= + = + = =


∑∑





i
i
i i
i
i
i
i
i
i
i i
x
x X
x x
X x
x
x
X
x
x
Xk

2
i
xJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 33 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem

Unbiasedness:
)
ˆ
ˆ
i i i
ii
X (k
Yk
e b a b
b
+ + =
=


]1 X 0 [
ˆ
X
ˆ
i
i
= = + =
+ + =
∑ ∑ ∑



i i ii
ii i i
k and k because k
k k k
e b b
e b a b
)(). ()( )
ˆ
(
) ... ( )( )
ˆ
(
2 2 11
i i
n n
Ek E E
k k kE E E
e b b
e e e b b

+ =
+ + + + =
b b
b b
=
+=

)
ˆ
(
)0 ).( ( )
ˆ
(
E
k E
iJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 34 HASSEN A.

HASSEN ABDA .
2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem

Efficiency:
Suppose is another unbiased linear estimator of . Then, .

Proof
:
) ... var( )
ˆ
var(
) var( )
ˆ
var(
2 2 11n n
ii
Yk Yk Yk
Yk
+ + + =
=

b
b
0} s) i (for Y and Y between covariance the {since
Yk Yk Yk
s i
n n
= ¹"
+ + + =) var( ... ) var( ) var( )
ˆ
var(
2 2 11
b
)(...)()()
ˆ
var(
) var( ... ) var( ) var( )
ˆ
var(
2 2 2 2
2
2 2
1
2
2
2
2 1
2
1
s s s b
b
n
n n
k k k
Y k Y k Y k
++ + =
++ + =
b
~
b
)
~
var( )
ˆ
var(
b b
£JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 35 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem

=
2 2
)
ˆ
var(
i
k
s b

=ts. coefficien are s w where Yw : Suppose
i ii
b~
) X (
~
~
ii i
ii
w
Yw
e b a b
b
+ + =
=
∑∑
β .) Xw ( ).α w ()βE(
i i i


+ =
~
)ε).E( w (β) .E()Xw ( ).E(α w ()βE(
w w w
i i i i i
ii i i






+ + =
+ + =
)
~
X
~
i
e b a b

=
2
2
)
ˆ
var(
i
x
or
s
b
,
.1
~
= =


i
X and 0 , of estimator unbiased an be to for
i i
w w
b bJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 36 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem .
) ... var( )
~
var(
) var( )
~
var(
22 11nn
ii
Yw Yw Yw
Yw
+ + + =
=

b
b

=
2 2
)
~
var(
i
w
s b
0 s) i (for
s
Y and
i
Y between covariance the since = ¹"
+ + + =) var( ... ) var( ) var( )
~
var(
2 2 11n n
Yw Yw Yw
b
)( ... )( )( )
~
var(
) var( ... ) var( ) var( )
~
var(
2 2 2 2
2
2 2
1
2
2
2
2 1
2
1
s s s b
b
n
n n
w w w
Y w Y w Y w
++ + =
++ + =JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 37 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem .
.k w d
k w
i i i
i i
- =
¹ *
*
: by given be
them b/n r/p the and , Suppose
)!
β
var( and )
β
var( compare now us Let
~
ˆ





= - = ⇒
*
0
:
i i i
i i
k w d
zero equal k and w

both Because


∑ ∑ ∑
+ + = ⇒
+ + = ⇒ + = *
) )( ( 2
2
2
2 2 2
2 2 2 2 2
i
i
i i i i
ii i i i i i i
x
x
d d k w
dk d k w )d (k )(w





=-= - = ⇒
*
011
:
ii ii ii
ii ii
xk xw xd
equal xk and xwone both Because JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 38 HASSEN A.

HASSEN ABDA 39
2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem .
).
β( )β(ˆ
var
~
var> ⇒
∑ ∑ ∑

∑ ∑ ∑


∑ ∑ ∑
+ = ⇒
+ + = ⇒
+ + = ⇒
2 2 2
2
2 2 2
2
2 2 2
)0 )(
1
(2
) )(
1
(2
i i i
i
i i i
ii
i
i i i
d k w
x
d k w
xd
x
d k w


>

2 2 2 2
i i
k w
s s∑

> ⇒
2 2
i i
k w
). d nd thus, are zero a
s d , not all k (given w
i
i i i
0
2
>
¹

. d and thus, s are zero d
nly if all ) if and o β( )β
i i
0
ˆ
var
~
var(
2
=
=
∑JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 39 HASSEN A.

HASSEN ABDA 40
2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem
⇒Linearity of :
X Y
b a
ˆ
ˆ- =
} { ˆ
i
i
Yk X Y

- = ⇒
a
a
ˆ
}
... { ˆ
22 11nn
Yk Yk YkX Y++ + -= ⇒
a
n n
YkX
n
... YkX
n
YkX
n
α)
1
( )
1
( )
1

2 2 1 1
- ++ - + - = ⇒
}YkX ... YkX YkX{ Y Y Y
n
α
nn n
++ + - ++ + =

22 11 2 1
) ... (
1
ˆ
i i nn
kX
n
f where Yf... Yf Yfα-= ++ + = ⇒
1
ˆ
22 11
JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 40 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem

Unbiasedness:
}) {( (ˆ
ˆ ˆ

+ + - + = ⇒
-=
i i i
X )(k X)X
X Y
e ba ba a
b a
} { (ˆ
} { (ˆ
ii
ii i i i
k X)X
k Xk k X)X
e b b a a
e b a b a a



∑ + - + = ⇒
+ + - + = ⇒
) ( )( )ˆ(
) ( ˆ
i i
i i
k XE E E
k X X
e a a
e b b a a


- = ⇒
- - + = ⇒X
a a
e a a
= ⇒
- = ⇒

)ˆ(
)(). ( )( )ˆ(
E
Ek X E E
i iJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 41 HASSEN A.

HASSEN ABDA
2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem

Efficiency:
Suppose is another unbiased linear estimator of . Then, .
Proof
:
) ... var( )ˆ var(
) var( )ˆ var(
22 11nn
ii
Yf Yf Yf
Yf
++ + =
=

a
a
s} i for 0 ) Y, cov(Y {since
) var( ... ) var( ) var( )ˆ var(
s i
22 11
¹" =
+
+
+
=
nn
Yf Yf Yf
a
∑ =+++= ++ + =
2 2 2 2 2 2
2
2 2
1
2
2
2
2 1
2
1
)( ... )( )( )ˆ var(
) var( ... ) var( ) var( )ˆ var(
i n
n n
f f f f
Y f Y f Y f
s s s s a
a
a~
a
)
~
var( )ˆ var(
a
a
£JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 42 HASSEN A.

HASSEN ABDA 43
2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem
)}
2 1
({ )ˆ var(
))
1
(( )ˆ var(
22
2
2
2 2 2 2
i i
i i
kX
n
kX
n
kX
n
f

∑ ∑
- + =
- = =
s a
s s a∑

=
2
2
2
ˆ var
i
i
x n
X
σ)α( or,
}
1
{ }
1
{ )ˆ var(
}
2 1
{ )ˆ var(
2
2
2 2 2 2
2 2 2


∑ ∑
+ = + =
- + =
i
i
i i
x
X
n
k X
n
k X
n
k X
n
s s a
s a
1
1)
1
(= -= - =
∑ ∑ ∑
i i i
k X kX
n
f


+ =
2
2
2
1
i
i
x
X
n
f
: that note
)
1
( )ˆ var(
2
2
2

+ =
i
x
X
n
s aJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 43 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem
)ε).E( z (β) .E()Xz ( ).E(α z ()E(
z z z
i i i i i
ii i i






+ + =
+ + =
)
~
X
~
i
a
e b a a
) ... var( )
~
var(
) var( )
~
var(
22 11nn
ii
Yz Yz Yz
Yz
++ + =
=

a
a
β .) Xz ( ).αz ()E(
i i i


+ =
a~
) X (
~
~
ii i
ii
z
Yz
e b a a
a
+ + =
=
∑∑

=ts. coefficien are s z where
~
: Suppose
i ii
Yz
a
.0
~
= =


i
X & 1 , of estimator unbiased an be to for
i i
z z
a aJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 44 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem
.

=
2 2
)
~
var(
i
z
s a
s. i for 0 )
s
Y ,
i
(Y cov since ¹" =
+
+
+
=
) var( ... ) var( ) var( )
~
var(
22 11nn
Yz Yz Yz
a
)( ... )( )( )
~
var(
) var( ... ) var( ) var( )
~
var(
2 2 2 2
2
2 2
1
2
2
2
2 1
2
1
s s s a
a
n
n n
z z z
Y z Y z Y z
++ + =
++ + =
.f z d
f z
i i i
i i
- =
¹ *
*
: by given be b/n them
p relatioshi the and , Suppose
)!
~
var( and )ˆ var( compare now us Let
a
aJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 45 HASSEN A.

HASSEN ABDA 2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem .
X X
z X Xz
Xz Xz X Xz xz
z Xz
i i i
i i i i i ii
i i i
-= -=
- =
- = - = ⇒
= = *
∑ ∑
∑ ∑ ∑ ∑


)1( 0
) (
,1


and 0, Because
)} )( (
1
{2
)}
)( (
1
{2
2
2 2 2
2
2 2 2
X
x
X
n
f z d
xz
x
X
z
n
f z d
i
i i i
ii
i
i i i i
- - - + =

- - + =
⇒∑
∑ ∑ ∑


∑ ∑ ∑ ∑
} {2
2 2 2




- + =
ii i i i
fz f z d
) (
1
2

- =
i
i
i
x
x
X
n
f where
]})
1
([{2
2
2 2 2


∑ ∑ ∑
- - + =
i
i
i i i i
x
x
X
n
z f z dJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 46 HASSEN A.

HASSEN ABDA

2.4 Properties of OLS Estimators and the Gauss-Mark ov Theorem
.
).ˆ var()
~
var(
a a
> ⇒
∑ ∑ ∑∑


+ =

- =

2 2 2
2 2 2
i i i
i i i
f d z
f z d
∑ ∑


> ⇒
> ⇒
2 2 2 2
2 2
i i
i i
f z
f z
s s





∑ ∑ ∑
- + = ⇒
+ - + = ⇒
2 2 2 2
2
2
2 2 2
2
}
1
{2
i i i i
i
i i i
f f z d
x
X
n
f z d
are zero.ds and all d
nly if ) if and o α() α(
i ∑
=
2
ˆ var
~
varJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 47 HASSEN A.

HASSEN ABDA 2.5 Residuals and Goodness of Fit
Decomposing the variation in Y: JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 48 HASSEN A.

HASSEN ABDA 2.5 Residuals and Goodness of Fit
Decomposing the variation in Y:
⇒One measure of the variation in Yis the sum of its
squared deviations around its sample mean, often
described as the Total Sum of Squares, TSS.
⇒TSS, the total sum of squares of Ycan be
decomposed into ESS, the ‘explained’sum of
squares, and RSS, the residual (‘unexplained’) sum
of squares.
TSS = ESS + RSS



+ - = -
2 2 2
ˆ
i i i
e )Y Y( )Y (YJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 49 HASSEN A.

HASSEN ABDA 2.5 Residuals and Goodness of Fit
⇒The last term equals zero:
i i i
e Y Y Y Y+ - = -

ˆ
i i i
e Y Y+ =
ˆ
2 2
)
ˆ
( ) (
i i i
e Y Y Y Y+ - = -


+ - = -
2 2
)
ˆ
( ) (
i i i
e Y Y Y Y


+ =
2 2
) ˆ(
i i i
e y y




+ + =
ii i i i
ey e y yˆ 2 ˆ
2 2 2




- = - =
i ii i i ii
eY eY eY Y ey
ˆ
)
ˆ
( ˆ



- + = ⇒
i i i ii
e YeX ey)
ˆ
ˆ( ˆ
baJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 50 HASSEN A.

HASSEN ABDA 2.5 Residuals and Goodness of Fit
.
Hence: ⇒Coefficient of Determination
(
R
2
):
the proportion of the variation in the dependent
variable that is explained by the model.



+ = ⇒
2 2
2
ˆ
i i
i
e y y
RSS ESS TSS
+
=



+ = ⇒
ii i ii
eX e ey
b aˆ
ˆ ˆ
0 ˆ= ⇒

ii
ey
65. 14 75. 15 4. 30
+
=JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 51 HASSEN A.

HASSEN ABDA 52
2.5 Residuals and Goodness of Fit
⇒The OLS regression coefficients are chosen in such
a way as to minimize the sum of the squares of the
residuals. Thus it automatically follows that they
maximize R
2
.
∑∑
= =
2
2
2
ˆ
.1
y
y
TSS
ESS
R
TSSRSS
TSSESS
TSSTSS
RSS
ESS
TSS
+ = ⇒
+
=
∑∑
-= ⇒
2
2
2
1 .3
y
e
R
i



= =
2
2
2)
ˆ
(
y
x
TSS
ESS
R
b
∑∑
= =
2
2
2 2
ˆ
.2
y
x
TSS
ESS
Rb

TSS
RSS
TSS
ESS
TSS
RSS
TSS
ESS
- =

+ =

1
1JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 52 HASSEN A.

HASSEN ABDA 2.5 Residuals and Goodness of Fit
⇒Coefficient of Determination (R
2
):
∑∑
∑∑
=
2 2
2
y
xy
x
xy
R
∑ ∑

= ⇒
2 2
2
2
) (
.5
y x
xy
R
) )( (
ˆ
2
2
2
2


∑∑
= =
y
x
x
xy
TSS
ESS
R
b
∑∑
= =
2
2
ˆ
.4
y
xy
TSS
ESS
Rb

) var( ) var(
)], [cov(
.6
2
2 Y X
YX
R
´
= ⇒
5181 .0
4. 30
75. 15
ˆ
2
2
2
= =
= =



y
y
TSS
ESS
R
iJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 53 HASSEN A.

HASSEN ABDA 2.5 Residuals and Goodness of Fit
⇒A natural criterion of goodness of fit is the
correlation between the
actual
and
fitted values
of
Y. The least squares principle also maximizes this.
⇒In fact,
where and r
x,y
are the coefficients of correlation
between & Y, and X & Y, defined as:
, respectively.
Note:

- =
2 2
) 1(y R RSS
2
,
2

2
) ( ) (
yx yy
r r R= =


YX
r
Y X
yx
ss
), cov(
,
=

Y Y
yy
YY
r
ss
ˆ

),
ˆ
cov(
=
yy
r
,ˆY
ˆ
&JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 54 HASSEN A.

HASSEN ABDA 55
To sum up:
⇒Use
⇒OLS:
⇒Given the assumptions of the linear regression
model, the estimators have the smallest
variance of all linear and unbiased estimators
of
.
i i
X XYE estimate to
i
X
i
Y
b a b a
+ = + =] |[
ˆ
ˆ ˆ
∑ ∑-- = - =
= = =

n
i
n
i
i i i i
n
i
i
)Xβα (Y )Y (Y
e
1 1
2 2
1
2
ˆˆ ˆ

β
ˆ
,αˆ
min


=
2
ˆ
x
xy
β
Xβ Yα
ˆ
ˆ-=
b a
ˆ
ˆ and
b
a
and

=
2
2
)
ˆ
var(
i
x
s
b
)
1
( )ˆ var(
2
2
2

+ =
i
x
X
n
s a


=
2
2
2
i
i
x n
X
sJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 55 HASSEN A.

HASSEN ABDA 56
To sum up …
.
2
2
2
2
0357 .0
28
)
ˆ
var(
s
s s
b
»
= =

i
x



+ =
2 2
2
ˆ
i i
i
e y y
RSS ESS TSS
+
=
∑∑
= =
2
2
2
ˆ
y
y
TSS
ESS
R∑
-=
2 2
) 1(y R RSS
2
2
2
2
2
3857 .2
)
28
64
10
1
(
)
1
( )ˆ var(
s
s
s a
»
+ =
+ =



i
x
X
n
? ,
2
=
s
But


=xy y
b
ˆ
ˆ
2


=
2 2 2
ˆ
ˆx y
bJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 56 HASSEN A.

HASSEN ABDA 2 2
)2 () ( ) (s
- = =

n e E RSS E
i
.
2
ˆ
2
2
2
s s of estimator unbiased an is
-
= ⇒
∑ n
e
i
: then ,
2
ˆ define we if Thus,
2
2-
=
∑ n
e
i
s
An unbiased estimator for σ
2
2 2 2 2
)2 )(
2
1
() ()
2
1
()ˆ(
s s s
= -
-
=
-
=

n
n
e E
n
E
iJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 57 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
Why is the Error Normality Assumption Important?
The
normality assumption
permits us to derive the
functional form of the sampling distributions
of
.
Knowing the
form of the sampling distributions
enables us to derive
feasible test statistics
for the
OLS coefficient estimators.
These
feasible test statistics
enable us to conduct
statistical inference
,i.e.,
1)to construct confidence intervals for .
2)to test hypothesis about the values of .
2
ˆ&
ˆ

s ba
2
&,
s ba
2
&,
s baJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 58 HASSEN A.

HASSEN ABDA 59
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis .
),0( ~
2
s e
N
i
), (~
2
s ba
i i
X N Y+ ⇒
) ,( ~
ˆ
2
2

i
x
N
s
b b
) ,(~ˆ
2
2
2


i
i
x
X
N
sa a
2
ˆˆ
ˆ
-
-
n
~t
)
β
(es
β β

=
2
ˆ
)
ˆ

i
x
es
s
b
2
~
)ˆ(ˆ
ˆ
-
-
n
t
es
a
a
a


=
2
2
.ˆ ˆˆ
i
i
xn
X
σ)α(es
2
ˆ
2
-

=
n
e
σ
i
)1,0( ~ )
ˆ
(
2
N x
i

-s
bbJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 59 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
ɛConfidence Interval for:
ɛSimilarly,
a
a a
-= £ £ -
- -
-
1}
)(ˆ
{
2
2/
2
2/
ˆ
ˆ
n n
t
es
t P
αααααααα
αααα
)(e )s (t
n
α/
αααα
αααα
ˆ
ˆ
ˆ
2
2
-
±
::::αααα
CI for
ided
α
)% Two-S (
-
1 100
:
CI for
ided α)% Two-S (
β
-
1 100
)(ˆ) (
ˆ
ˆ
2
2/
b b
a
es t
n-
±
α and βJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 60 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
CI for :
2
2 2
2
~
ˆ
)2 (
-
-
n
n
c
ss
a c c c
a a
-= £ £
-
1} {
2
);2/ (
2 2
);2/ (1df df df
P
2
s
a c
s
s
c
a a
-= £
-
£ ⇒
- - -
1}
ˆ)2 (
{
2
)2 ();2/ ( 2
2
2
)2 ();2/ (1n n
n
P
a
c s
s
c
a a
-= ³
-
³ ⇒
- - -
1}
1
ˆ)2 (
1
{
2
)2 ();2/ (
2
2
2
)2 ();2/ (1n n
n
P
α }
χ σ) (n
σ
χ
P{
) );(n (α ) );(n (α
-= £
-
£ ⇒
- - -
1
1
ˆ2
1
2
2 2/ 1
2
2
2
2 2/JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 61 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
CI for ( continued):
OR
: r σ ided CI fo α)% Two-S (
2
1 100- ⇒
2
s
a
c
s
s
c
s
a a
-=
-
£ £
-

- - -
1}
ˆ)2 ( ˆ)2 (
{
2
2 );2/ (1
2
2
2
2 );2/ (
2
n n
n n
P
]
ˆ)2 (
,
ˆ)2 (
[
2
2 );2/ (1
2
2
2 );2/ (
2
- - -
- -
n n
n n
a a
c
s
c
s
] , [
2
2 );2/ (1
2
2 );2/ (- - -n n
RSS RSS
a a
c cJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 62 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis ⇒Let us continue with our earlier example.
We have:
is estimated by:
Thus,
2
s
, 3857 .2 )ˆ var(
2
s a
»
,6.3 ˆ
=
a
,75.0
ˆ
=
b
,10
=
n
,& 0357 .0 )
ˆ
var(
2
s b
»
, 5181 .0
2
=R
83125 .1
8
65. 14
2
ˆ
2
2
= =
-

=
n
e
σ
i
65. 14
2
= ∑
i
e3532 .1 83125 .1 ˆ» = ⇒σ
3688 .4 ) 83125 .1( 3857 .2 )ˆr(aˆv
»
»
a
09.2 3688 .4 )ˆ(ˆ» » ⇒
a
es
0654 .0 ) 83125 .1( 0357 .0 )
ˆ
r(aˆv» »
b
256.0 0654 .0 )
ˆ
(ˆ» »

b
esJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 63 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
ɛ95% CI for :
8195 .4
6
3
±
=
.
::::
αααα
for CI %95
::::
b for CI %95
α and β
05.0 95.0 1
=

=
-
a
a
)09.2( 306.2
)09.2(
63
63
8
025.0
) (
) (t
.
.
± =
±
:αfor CI 95%

) 256.0( 306.2
) 256.0(
75.0
75.0
8
025.0
) (
) (t
± =
±
5903 .0
75
.
0
±
=
: for CI 95%
b

025.0 2/
=

a 8.4195]
1.2195, [
-
1.3403] [0.1597,JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 64 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
⇒95% CI for :
::::
2
s
for CI %95 ⇒2
s83125 .1 ˆ
2
=
s
6.72] [0.84,
=
:
2
2 ;2 /-na
c
5. 17
2
8; 025.0
=
c
18.2
2
8; 975.0
=
c
:
2
2 );2 /(1- -n
a
c
]
ˆ)2 (
,
ˆ)2 (
[
2
2 );2/ (1
2
2
2 );2/ (
2
- - -
- -
n n
n n
a a
c
s
c
s

]
18.2
65. 14
,
5. 17
65. 14
[ =JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 65 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis ɛ
The confidence intervals we have constructed for
are
two-sided
intervals.
ɛSometimes we want either the upper or lower limit
only, in which case we construct
one-sided
intervals.
ɛFor instance, let us construct a one-sided (upper
limit) 95% confidence interval for .
ɛForm the t-table, .
ɛHence,
ɛThe confidence interval is
(-∞, 1.23].
2
&,
s ba
b
86.1
8
05.0
=t
23.1 48.0 75.0
)256.0( 86.1 75.0 )
ˆ
(ˆ.
ˆ
8
05.0
= + =
+ = +

b b
es tJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 66 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis αSimilarly, lower limit:
αHence, the 95% CI is:
[0.27, ∞).
Hypothesis Testing
:
ɛUse our example to test the following hypotheses.
ɛResult:
1.Test the claim that sales doesn’t depend on
advertising expense (at 5% level of significance).
)256.0( )09.2(
75.0 6.3
ˆ


i i
X Y+ =
27.0 48.0 75.0
) 256.0( 86.1 75.0 )
ˆ
(ˆ ˆ
8
05.0
= - =
- = -

b b
es tJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 67 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis ⇒H
0
: against H
a
: .
⇒Test statistic:
⇒Critical value: (t
t
= t-tabulated)
⇒Since we reject the null (the alterna tive is
supported). That is, the slope coefficient is
statistically significantly different from zero:
advertising has a significant influence on sales.
2.Test whether the intercept is greater than 3.5.
0
=
b
0
¹
b
025.0 2/ 05.0
=

=
a
a
)
ˆ

ˆ
b
b b
es
t
c
-
=
93.2
256.
0
0 75.0
=
-
= ⇒
c
t
306.2
8
025.0
2
2/
= = =
-
t t t
n
ta
,
t c
t t>JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 68 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis ⇒H
0
: against H
a
: .
⇒Test statistic:
⇒Critical value: (t
t
= t-tabulated)
At 5% level of significance
⇒Since we do not reject the null (the null is
supported). That is, the intercept (coefficient) is not
statistically significantly greater than 3.5.
5.3
=
a
5.3
>
a
),05.0 (
=
a
)ˆ(ˆ
ˆ
a
a
a
es
t
c
-
=
05.0
09.
2
1.0
09.
2
5.3 6.3
= =
-
=

c
t
86.1
8
05.0
2
= = =
-
t t t
n
t
a
,
t c
t t
<JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 69 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis 3.Can you reject the claim that a unit increase in
advertising expense raises sales by one unit? If so ,
at what level of significance?
⇒H
0
: against H
a
: .
⇒Test statistic:
⇒At and thus H
0
can’t be rejected.
⇒Similarly, at H
0
can’t be rejected.
⇒At and thus H
0
can’t be rejected.
⇒At H
0
is rejected.
1
=
b
1
¹
b
,05.0
=
a
)
ˆ

ˆ
b
b b
es
t
c
-
=
98.0
256.
0
25.0
256.
0
1 75.0
-=
-
=
-
= ⇒
c
t
306.2
8
025.0
= t
,10.0
=
a
86.1
8
05.0
= t
,20.0
=
a
397.1
8
10.0
= t
,50.0
=
a
706.0
8
05.0
= tJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 70 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis For what level of significance (probability) is the
value of the t-
tabulated for 8 df as extreme as
?
i.e., find P for which .
&
0.98 is between the two numbers (0.706 and 1.397).
So, is somewhere between 0.25 & 0.10.
1.397 –0.706 =
0.691
, and 0.98 is 0.98 –0.706 =
0.274
units above 0.706. Thus, the P-value for 0.98
( ) is units below0.25.
?}98.0 98.0 {
=
-
<
>
t or tP -
}98.0 {>tP
98.0 =
c
t
25.0 }706.0 {
=
>
tP10.0 } 397.1 {
=
>
tP
)10.0 25.0 )(
691.
0
274.0
(-
}98.0 {
>
tP
}98.0 {>tPJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 71 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis That is, the P-value for 0.98 is 0.06 units below
0.25. i.e., .
Hence, .
For our H
0
to be rejected, the minimum level of
significance (the probability of Type I error ) should
be as high as 38%.
To conclude, H
0
is retained!
The
p-value
associated with the calculated sample
value of the test statistic is defined as
the lowest
significance level at which H
0
can be rejected
.
Small p-values constitute strong evidence against H
0
.
38.0 }98.0 {2}98.0 {» > = >tP tP
19.0 06.0 25.0 }98.0 {
»
-
»
>
tPJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 72 HASSEN A.

HASSEN ABDA 2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis There is a correspondence between the confidence
intervals derived earlier and tests of hypotheses.

For instance, the 95% CI we derived earlier for is: (0.16 < < 1.34).
Any hypothesis that says , where c is in this
interval, will not be rejected at the 5% level for a
two-sided test.
For instance, the hypothesis was not rejected,
but the hypothesis was.
For one-sided tests we consider one-sided
confidence intervals.
b
b
c
=
b
1
=
b
0
=
bJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 73 HASSEN A.

HASSEN ABDA 2.7 Prediction with the Simple Linear Regression
⇒The estimated regression equation is used
for predicting the value (or the average value) of Y
for given values of X.
⇒Let X
0
be the given value of X . Then we predict the
corresponding value Y
P
of Y by:
⇒The true value
Y
P
is given by:
⇒Hence the prediction error is:
⇒is an unbiased predictor of Y. (
BLUP!
)
i i
X Y
b a
ˆ
ˆ ˆ
+ =
0
ˆ
ˆ ˆ
X Y
P
b a
+ =
P P
X Y
e
b
a
+
+
=
0
P P P
X Y Y
e bb aa
- - + - = -
0
)
ˆ
() ˆ(
ˆ
)( )
ˆ
( ) ˆ( )
ˆ
(
0P P P
E X E E Y YE
e bb aa
- - + - = -
0
ˆ
ˆ ˆ
X Y
P
b a
+=
0)
ˆ
(= -

P P
Y YEJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 74 HASSEN A.

HASSEN ABDA 2.7 Prediction with the Simple Linear Regression
⇒The variance of the prediction error is:
⇒Thus, the variance increases the farther away the
value of X
0
is from , the mean of the observations
on the basis of which have been computed .
) var( )
ˆ
, ˆ cov( 2
)
ˆ
var( ) ˆ var( )
ˆ
var(
0
2
0
P
P P
X
X Y Y
e b baa
b b aa
+ - - +
- + - = -

2
2
2
0 2
2
0 2
2
2
2
2 )
ˆ
var(
s s s s
+ - + = -
∑ ∑ ∑

i i i
i
P P
x
X
X
x
X
x n
X
Y Y
]
) (1
1[ )
ˆ
var(
2
2
0 2

-
++ = -
i
P P
x
X X
n
Y Y
s
X
b aˆ
&ˆJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 75 HASSEN A.

HASSEN ABDA 2.7 Prediction with the Simple Linear Regression
That is, prediction is more precise for values near er
to the mean (as compared to extreme values).
within-sample prediction (
interpolation
)
: if X
0
lies
within the range of the sample observations on X.
out-of-sample prediction (
extrapolation
)
: if X
0
lies
outside the range of the sample observations.
Not
recommended!
Sometimes, we would be interested in predicting the
mean of Y, given X
0
. We use: to predict
. (The same predictor as before! )
The prediction error is:
P P
X Y
b aˆ
ˆ ˆ
+ =
P P
X Y
b
a
+
=
P P P
X Y Y)
ˆ
() ˆ(
ˆ
bb aa
- + - = -JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 76 HASSEN A.

HASSEN ABDA 2.7 Prediction with the Simple Linear Regression
⇒The variance of the prediction error is:
⇒Again, the variance increases the farther away the
value of X
0
is from .
⇒The variance (the standard error) of the prediction
error is smaller in this case (of predicting the
average value of Y, given X) than that of predictin g
a value of Y, given X.
)
ˆ
, ˆ cov( 2)
ˆ
var( ) ˆ var( )
ˆ
var(
0
2
0
bbaa bb aa
- - + - + - = -X X Y Y
P P
]
) (1
[ )
ˆ
var(
2
2
0 2

-
+ = -

i
P P
x
X X
n
Y Y
s
XJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 77 HASSEN A.

HASSEN ABDA 78
2.7 Prediction with the Simple Linear Regression
⇒Predict (a) the value of sales, and (b) the average
value of sales, for a firm with an advertising expe nse
of six hundred Birr.
a. From , at X
i
= 6,
Point prediction
:
[Sales value | advertising of 600 Birr] = 8,100 Bir r.
Interval prediction
: 95% CI:
]
) ( 1
1[ ˆ )
ˆ

2
2
0 2 *

-
+ + =
i
P
x
X X
n
Yes
s
i i
X Y75.0 6.3
ˆ
+ =
1.8 )6( 75.0 6.3
ˆ
= + =
i
Y
28
)8 6(
10
1
1 35.1 )
ˆ

2
*
-
+ + =

P
Yes
508.1 ) 115.1( 35.1
=
=
306.2
8
025.0
= tJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 78 HASSEN A.

HASSEN ABDA 79
2.7 Prediction with the Simple Linear Regression
Hence,
b. From , at X
i
= 6,
Point prediction
:
[Average sales | advertising of 600 Birr] = 8,100 B irr.
Interval prediction
: 95% CI:
]
) ( 1
[ˆ )
ˆ

2
2
0 2 *

-
+ =
i
P
x
X X
n
Yes
s
1.8 )6( 75.0 6.3
ˆ
= + =
i
Y
28
)8 6(
10
1
35.1 )
ˆ
(
2
*
-
+ = ⇒
P
Yse
)508.1)( 306.2( 1.8 %95
±
: CI
]58. 11, 62.4[
i
X Y
i
75.0 6.3
ˆ
+ =
667.0 )493.0( 35.1 )
ˆ

*
= = ⇒
P
Yes
)667.0 )( 306.2( 1.8 %95
±
: CI
]64.9, 56.6[JIMMA UNIVERSITY
2008/09 CHAPTER 2 - 79 HASSEN A.

HASSEN ABDA Notes on interpreting the coefficient of X in simpl e linear regression
1.
2.
e
b
a
+
+
=
X Y
slope
dX
dY
dX dY= = ⇒ = ⇒
b b
.
e ba
+ +
=
X
e Y
e
b
a
+
+
=

X Yln
X. in change unit a
from resulting Y in change (AVERAGE) the is
β
dX dY
Y
dXYd..
1
.)(ln
b b
= ⇒ = ⇒
X. inchange unita fromting
-resul Y inchange percentage
(AVERAGE) the is 100)( β´
)100(. ´ = ⇒dXY in ∑ %age
b
X in Absolute
Y in ∑ Relative
D
== ⇒
dX
Y
dY
)(
b
dX
Y in ∑ %age
=
´
=´ ⇒
dX
Y
dY
100)(
)100(
bJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 80 HASSEN A.

HASSEN ABDA 81
Notes on interpreting the coefficient of X in simpl e linear regression
3.
4.
e b
e AX Y=
A X Yln ; ) (ln ln
=
+
+
=

a
e
b
a
Xin age
Yin age
X dX
Y dY
X d
Y d


D
D
= = =

%
%
/
/
) (ln
) (ln
b
Elasticity
=
X. in change percentage a
from resulting Y in change (AVERAGE) the is 0.01) (
β
´
X. in change percentage a from resulting
Y in change percentage (AVERAGE) the is β
E AX e
Y
b
=
; ln
e
b
a
+
+
=

X Y
X in Relative
Y in Absolute
D
D
= = = ⇒
dX
X
dY
X d
dY
)1( ) (ln
b
) ln( ) ln(E &A
=
=
e
a

) ).(% 01.0(Xin age dY
D
=

b
)100 .(
100
´ = ⇒
X
dX
dY
bJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 81 HASSEN A.

HASSEN ABDA STATA SESSIONJIMMA UNIVERSITY
2008/09 CHAPTER 2 - 82 HASSEN A.

. CHAPTER THREE
THE MULTIPLE LINEAR REGRESSION
CHAPTER THREE
THE MULTIPLE LINEAR REGRESSION
3.1 Introduction: The Multiple Linear Regression 3.2 Assumptions of the Multiple Linear Regression 3.3 Estimation: The Method of OLS 3.4 Properties of OLS Estimators 3.5 Partial Correlations and Coefficients of
Multiple Determination
3.6 Statistical Inferences in Multiple Linear
Regression
3.7 Prediction with Multiple Linear Regression JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 1 HASSEN A.

. 3.1 Introduction: The Multiple Linear Regression
iKiK i i i
X XX Y
ε
β
β
β
β
+
+



+
+
+=
22110
Relationship between a dependent & two or more
independent variables is a linearfunction
Population
Y-intercept
Population slopes
Dependent (Response)
variable (for sample)
Independent (Explanatory)
variables (for sample)
Random
Error
iKiK i i i
eX XX Y++•••+++=
β βββ
ˆ ˆˆˆ
22110
ResidualJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 2 HASSEN A.

. 3.1 Introduction: The Multiple Linear Regression 3.1 Introduction: The Multiple Linear Regression
)What changes as we move from simple to
multiple regression?
1. Potentially more explanatory power with more
variables;
2. The ability to control for other variables; (and
the interaction of the various explanatory variables: correlations and multicollinearity);
3. Harder to visualize drawing a line through
three or more (n)-dimensional space.
4. The R
2
is no longer simply the square of the
correlation coefficient between Y and X.JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 3 HASSEN A.

. 3.1 Introduction: The Multiple Linear Regression
)Slope ( ):
Ceteris paribus
, Y changes by for every 1 unit
change in , on average.
)Y-Intercept ( ):
The average value of Y when all s are zero.
(may not be meaningful all the time)
)A multiple linear regression model is defined to
be linear in the regression parameters rather
than in the explanatory variables.
)Thus, the definition of multiple linear re
g
ression
includes polynomial regression
.
e.g.
j
β
j
β
j
X
0
β
j
X
iii i i i i
XXXXXY
εβββββ
+++++=
214
2
1322110JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 4 HASSEN A.

. 3.2 Assumptions of the Multiple Linear Regression 3.2 Assumptions of the Multiple Linear Regression
)Assumptions 1 –7 in Chapter Two. 1. E(
ɛ
i|X
ji
) = 0 .(for all i = 1, 2, …, n; j = 1, …, K)
2. var(
ɛ
i|X
ji
) = σ
2
. (i ≠s) (Homoscedastic errors)
3. cov(
ɛ
i,ɛ
s
|X
ji
,X
js
) = 0. (i ≠s) (No autocorrelation)
4. cov(
ɛ
i,X
ji
) = 0 . Errors are orthogonal to the Xs.
5. X
j
is non-stochastic, and must assume different
values.
6. n > K+1. (Number of observations > number of
parameters to be estimated). Number of
parameters is K+1in this case ( β
0
, β
1
, …, β
K
)
7.
ɛ
i
~N(0, σ
2
). Normally distributederrors. JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 5 HASSEN A.

. 3.2 Assumptions of the Multiple Linear Regression 3.2 Assumptions of the Multiple Linear Regression
)Additional Assumption:
8. No perfect multicollinearity : That is, no exact
linear relation exists between any subset of
explanatory variables.
)In the presence of perfect (deterministic) linear
relationship between/among any set of the X
js,
the impact of a single variable ( ) cannot be
identified.
)More on multicollinearity in a later chapter!
j
βJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 6 HASSEN A.

. 3.3 Estimation: The Method of OLS 3.3 Estimation: The Method of OLS
The Case of Two Regressors (X The Case of Two Regressors (X
11
and X and X
22
))
)Minimize the RSS with respect to .
)
ˆ
(
ˆ ˆ
YYYYYYYYYYe
i ii iiii
−−−=−+−=−=
21
ˆ
&
ˆββ
i i i
XXY
22110
ˆˆˆ ˆβββ
++=
i i i
XXY
22110
ˆˆˆ ˆβββ
++=




=
=
2
2
2
1
1
2
)
ˆ
ˆ
(
i
i
i
i
x
x
y
e
RSS
β
β
0
2
=



ji
i
x
e
2,
1
;
0
)
(
)
ˆ
ˆ
(
2
ˆ
)
(
2
2
1
1
=
=



=



j
x
x
x
y
RSS
ji
i
i
i
j

β
β
β
i
i
i
x
x
y
2
2
1
1
ˆ
ˆ
ˆ
β
β
+
=
i
i
i
y
y
e
ˆ

=

0
)
(
)
ˆ
ˆ
(
.
1
1
2
2
1
1
=



i
i
i
i
x
x
x
y
β
β




+
=

i
i
i
i
i
x
x
x
x
y
2
1
2
21
1
1
ˆ
ˆ
β
β
0
)
(
)
ˆ
ˆ
(
.
2
2
2
2
1
1
=



i
i
i
i
x
x
x
y
β
β




+
=

22
2
2
1
1
2
ˆ
ˆ
i
i
i
i
i
x
x
x
x
y
β
βJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 7 HASSEN A.

. 3.3 Estimation: The Method of OLS 3.3 Estimation: The Method of OLS
Solve for the coefficients:
Determinant
:
To find , substitute the first column of A by
elements of F, then find |A
1
|, and finally find .
⎟⎟⎠⎞
⎜⎜⎝⎛
⎥⎥⎦⎤
⎢⎢⎣⎡
=
⎟⎟⎠⎞
⎜⎜⎝⎛





∑∑
21
22i
1i
2i
2i
1i
21i
2i
i
1i
i
ββ

x
x
x
x
x
x
x
y
x
y
ˆˆ
2
2
1
22
21
22
2
1
2
1
21
)
(∑







=
=
i
i
i
i
i
i
i
i
i
i
x
x
x
x
x
x
x
x
x
x
A
1
βˆ
=
F
AA
1
)
)(
(
)
)(
(
2
2
1
22
1
22
2
2
1
1
1









=
=
i
i
i
i
i
i
i
i
i
i
i
i
i
i
x
y
x
x
x
x
y
x
x
y
x
x
x
y
A
2
2
1
22
21
2
2
1
22
1
1
1
)
(
)
)(
(
)
)(
(
)
)(
(
ˆ









=
=
i
i
i
i
i
i
i
i
i
i
i
x
x
x
x
x
y
x
x
x
x
y
AA
β


A
βˆJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 8 HASSEN A.

. 3.3 Estimation: The Method of OLS 3.3 Estimation: The Method of OLS
Similarly, to find , substitute the second column
of A by elements of F, then find |A
2
|, and finally
find .
2
ˆ
β
A
A
2
))(())((
1 21
2
12
2 21
1
2
1
2
∑∑∑∑
∑∑
∑∑
− = =
iiii iii
iiii
ii i
xyxxxxy
xyxx
xyx
A
2
21
2
2
2
1
1 21
2
1 2 2
2
)())((
))(())((
ˆ
∑∑∑






==
ii i i
ii ii i ii
xxxx
xyxxxxy
A
A
β
2211 0
ˆˆ ˆ
XXY βββ
−−=JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 9 HASSEN A.

. The Case of The Case of KKExplanatory Variables Explanatory Variables
)The number of parameters to be estimated:
K+1 ( ).
nKnK n n n
KK
KK
KK
eX XX Y
eX XX Y
eX XX Y
eX XX Y
+++++=
+++++=
+++++=
+++++=
β βββ
β βββ
β βββ
β βββˆ ˆˆˆ

ˆ ˆˆˆ
ˆ ˆˆˆ
ˆ ˆˆˆ
22110
33 23213103
22 22212102
11 21211101

##%####



K
3.3 Estimation: The Method of OLS 3.3 Estimation: The Method of OLS
β
β
β
β
,,,
2,10
…JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 10 HASSEN A.

. 3.3 Estimation: The Method of OLS 3.3 Estimation: The Method of OLS
.
















+

































=
















n
3
2
1
K
2
1
0
Kn
K3
K2
K1
3n
33
32
31
2n
23
22
21
1n
13
12
11
n
3
2
1
e
e
e
e
β
β
β
β
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X

1
1
1
1
Y
Y
Y
Y
# # #

%



# ## # #
ˆ
ˆ
ˆ
ˆ
1×n
)1(
+
×
Kn
1
×
n
1)1(
×
+
K
eXY +=
β
ˆJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 11 HASSEN A.

. 3.3 Estimation: The Method of OLS 3.3 Estimation: The Method of OLS
.

















































=
















K
2
1
0
Kn
K3
K2
K1
3n
33
32
31
2n
23
22
21
1n
13
12
11
n
3
2
1
n
3
2
1
β
β
β
β
*
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X

1
1
1
1
Y
Y
Y
Y
e
e
e
e
ˆ
ˆ
ˆ
ˆ
# #

%



# ## # ##
β
ˆ
XYe −=JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 12 HASSEN A.

. 3.3 Estimation: The Method of OLS 3.3 Estimation: The Method of OLS
0
)
ˆ
(
)
(
:
.
.
.
=


β
RSS
C
O
F
()
⎟⎟⎟⎟⎟ ⎠⎞
⎜⎜⎜⎜⎜ ⎝⎛
=
+
+
+
=
=

n21
n
2
1
2n
22
21
2i
eee
.
e
e
e
e
...
e
e
e
RSS
#

)
β
X
(Y
)'
β
X
(Y
ˆ
ˆ


=
RSS
e
e'
=

RSS
Y
X'
'
β
)'
β
X
(Y'
β
X
Y'
costant,
a
is
β
X
Y'
Since
ˆ
ˆ
ˆ
ˆ
=
=
β
X
X'
'
β
Y
X'
'
β
Y
Y'
ˆ
)
(
ˆ
ˆ
2
+

=

RSS
β
X
X'
'
β
Y
X'
'
β
β
X
Y'
Y
Y'
ˆ
ˆ
ˆ
ˆ
+


=
0
β
X
X'
Y
X'
β
=
+

=



ˆ
2
2
)
ˆ
(
)
(
RSS
0
β
X
Y
X'
=



)
ˆ
(
2JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 13 HASSEN A.

. 3.3 Estimation: The Method of OLS 3.3 Estimation: The Method of OLS
.
)
,...,
2
,
1
(
.
0
.
2
K
j
X
e
ji
i
=
=

0
.
1
=

i
e

⎟⎟⎟⎟⎟ ⎠⎞
⎜⎜⎜⎜⎜ ⎝⎛
=
⎟⎟⎟⎟⎟ ⎠⎞
⎜⎜⎜⎜⎜ ⎝⎛
⎥⎥⎥⎥⎥ ⎦⎤
⎢⎢⎢⎢⎢ ⎣⎡

000
.
1
1
1

21
2
1
21
2212
2111
#
#

#
%
#
#
………
n
Kn
K
K
nn
eee
X
X
X
XX
XX
XX
Y
X'
X
X'
β
1

=
)
(
ˆ
Y
X'
β
X
X'
=

ˆ
0
e
X'
=

0
)
β
X
(Y
X'
e
X'
=

=
ˆJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 14 HASSEN A.

. 3.3 Estimation: The Method of OLS 3.3 Estimation: The Method of OLS
























=
n
K
K
n
Kn KK
n
X
X
X
X
X
X
XXX
X XX
2
2
1
1
12
11
21
1 12 11
1
1
1
.

111
#

%##



# %

##

XX
/














=⇒



∑∑
∑ ∑

2
1
1
2
1 1
1

K
K
K
K K
X
XX
X
XXX
X X
Xn
#

%

# #

XX
/
⎟⎟⎟⎟⎟ ⎠⎞
⎜⎜⎜⎜⎜ ⎝⎛
=
K
βββ ˆˆˆ
ˆ
10
#
β
⎥⎥⎥⎥ ⎦⎤
⎢⎢⎢⎢ ⎣⎡
⎥⎥⎥⎥ ⎦⎤
⎢⎢⎢⎢ ⎣⎡
=
n
Kn
K
K
n
YYY
X
X
X
X
X
X
#

#
%…
#
#

21
2
1
1
12
11



1
1
1
Y
X
/
⎥⎥⎥⎥ ⎦⎤
⎢⎢⎢⎢ ⎣⎡
=

∑∑

K
YXYX
Y
#

1
Y
X
/JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 15 HASSEN A.

. 3.3 Estimation: The Method of OLS 3.3 Estimation: The Method of OLS
.


































=






















∑ ∑∑∑
∑∑∑∑
∑ ∑∑∑
∑ ∑∑
K K K K K
K
K
K
K
YX
YX
YX
Y
X XXXXX
XX XXXX
XX XXXX
X X X n
β
β
β
β
#

#%# ##



#
2
1
2
2 1
2
2
2 12 2
1 21
2
1 1
2 1
2
1
0
ˆ
ˆ
ˆ
ˆ
1-
Y)(X'X)(X' β
1−
=
ˆ
11×+)(K
)1(1
+
×
+K)(K
1
1
×
+
)(KJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 16 HASSEN A.

. 3.4 Properties of OLS Estimators
)Given the assumptions of the classical linear
regression model (in Section 3.2), the OLS
estimators of the partial regression coefficients
are BLUE: linear, unbiased and have minimum
variance in the class of all linear unbiased
estimators –the Gauss-Markov Theorem.
)In cases where the small-sample desirable
properties (BLUE) may not be found, we look for
asymptotic (or large-sample) properties, like
consistency andasymptotic normality (CLT).
)The OLS estimators are consistent:
& 0)
ˆ
(lim=−
∞→
ββ
n
p
ˆ
limvar( ) 0
n→∞
=
βJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 17 HASSEN A.

. )In the multiple regression equation with 2
regressors (X
1
and X
2
),, we
can talk of:
¾the joint effectof X 1
and X
2
on Y, and
¾the partial effectof X
1
or X
2
on Y.
)The partial effect of X
1
is measured b
y
and the
partial effect of X
2
is measured by .
)Partial effect: holding the other variable constant
or after eliminating the effect of the other variable
.
)Thus, is interpreted as measuring the effect of
X
1
on Y after eliminating the effect of X
2
on X
1
.
1
ˆβ
2
ˆβ
1
ˆβ
ii i i
eXXY +++=
22110
ˆˆˆβββ
3.5 Partial Correlations and Coefficients of DeterminationJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 18 HASSEN A.

. )Similarly, measures the effect of X
2
on Y after
eliminating the effect of X
1
on X
2
.
)Thus, we can derive the estimator of in two
steps (by estimating two separate regressions):
)Step 1: Regress X
1
on X
2
(an auxiliary regression
to eliminate the effect of X
2
from X
1
). Let the
regression equation be:
Or, in deviation form:
Then,
)is part of X
1
which is free from the influence
of X
2
.
2
ˆβ
3.5 Partial Correlations and Coefficients of Determination
1
ˆβ
1
β
12212 1
eXbaX
+
+
=
12
e


=
2
2
21
12
x
xx
b
122121
exbx
+
=JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 19 HASSEN A.

. )Step 2: Regress Y on e
12
(residualized X
1
). Let the
regression equation be: in
deviation form.
Then,
)is the same as in the multiple re
g
ression,
)Proof: (You may skip the proof!)
veby
ye
3.5 Partial Correlations and Coefficients of Determination
+
=
12
ye
b
∑∑
=
2
12
12
e
ye
b
ye
.
ˆˆ
2211
exxy ++=
ββ
1
ˆβ
∑∑




==
2
2121
2121
2
12
12
)(
)(
xbx
xbxy
e
ye
b
ye
∑∑∑


−+

=⇒
2112
2
2
2
12
2
1
2 121
2xxbxbx
yxbyx
b
ye
∑∑
=
2
2
21
12
But,
x
xx
b
1
ˆβ
=
ye
b i.e.,JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 20 HASSEN A.

. .
3.5 Partial Correlations and Coefficients of Determination











− +

=⇒
21 2
2
21 2
2
2
2
2
21 2
1
2 2
2
21
1
)(2)(
)(
xx
x
xx
x
x
xx
x
yx
x
xx
yx
b
ye








∑∑
− +

=⇒
2
2
2
21
2
2
2
21 2
1
2
2
221 1
2
2
)(
2
)(
] [
x
xx
x
xx
x
x
yxxxyxx
b
ye

∑∑∑



∑∑


=
2
2
2
21
2
2
2
1
2
2
221 1
2
2
])( [
] [
x
xxxx
x
yxxxyxx
b
ye
2
21
2
2
2
1
2211
2
2
)(
∑∑∑






=
xxxx
yxxxyxx
b
ye
1
ˆβ
=⇒
ye
bJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 21 HASSEN A.

. )Alternativel
y
, we can derive the estimator of
as follows:
)Step 1: regress Y on X
2
, & save the residuals, e
y2
.
…... [e
y2
= residualized Y]
)Step 2: regress X
1
on X
2
, & save the residuals, e
12
.
……[e
12
= residualized X
1
]
)Step 3: regress e
y2
(that part of Y cleared of the
influence of X
2
) on e
12
(part of X
1
cleared of the
influence of X
2
).
122121
.2exbx
3.5 Partial Correlations and Coefficients of Determination
+
=
222
.1
yy
exby
+
=
ue
y
+
=
12122
e .3
α
!
ˆˆ

ˆ
)3( ,
2211
ex β xβ in y sion in regres Then++==
1 12
β α
1
ˆβ
1
βJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 22 HASSEN A.

. )Suppose we have a dependent variable, Y, and
two regressors, X
1
and X
2
.
)Suppose also: and are the squares of the
simple correlation coefficients
between Y & X
1
and Y & X
2
, respectively.
)Then,
the proportion of TSS thatX
1
alone explains.
the proportion of TSS that X
2
alone explains.
)On the other hand, is the proportion of the variation in Y that X
1
& X
2
jointlyexplain.
)We would also like to measure something else.
2
1y
r
2
12•y
R
2
2y
r
=
2
y1
r
=
2
y2
r
3.5 Partial Correlations and Coefficients of DeterminationJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 23 HASSEN A.

. For instance:
a) How much does X
2
explain after X
1
is already
included in the regression equation? Or,
b) How much does X
1
explain after X
2
is included?
)These are measured by the coefficients of partial
determination
: and , respectively.
)Partial correlation coefficients of the first order:
& .
)Order = number of X's already in the model.
2
21•y
r
2
12•y
r
21•y
r
12•y
r
3.5 Partial Correlations and Coefficients of Determination
)1)(1(
2
12
2
2
1221
21
rr
rrr
r
y
yy
y
−−

=

)1)(1(
2
12
2
1
1212
12
rr
rrr
r
y
yy
y
−−

=
•JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 24 HASSEN A.

. On Simple and Partial Correlation Coefficients
1. Even if r
y1
= 0, r
y1.2
will not be zero unless r
y2
or
r
12
or both are zero.
2. If r
y1
= 0; and r
y2
≠0, r
12
≠0and are of the same
sign, then r
y1.2
< 0, whereas if they are of
opposite signs, r
y1.2
> 0.
Example : Let Y = crop yield , X
1
= rainfall, X
2
=
temperature . Assume: r
y1
= 0(no association
between crop yield and rainfall); r
y2
> 0& r
12
<
0. Then, r
y1.2
> 0, i.e., holding temperature
constant, there is a positive association between yield and rainfall.
3.5 Partial Correlations and Coefficients of DeterminationJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 25 HASSEN A.

. 3. Since temperature affects both yield & rainfall,
in order to find out the net relationship between
crop yield and rainfall, we need to remove the
influence of temperature. Thus, the simple
coefficient of correlation (CC) is misleading.
4.r
y1.2
& r
y1
need not have the same sign.
5. Interrelationship among the 3 zero-order CCs:
6.r
y2
= r
12
= 0does not mean that r
y1
= 0.
Y & X
1
and X
1
& X
2
are uncorrelated does not
mean that Y and X
1
are uncorrelated.
12 0
1221
2
12
2
2
2
1
≤ −++≤rrrrrr
yy yy
3.5 Partial Correlations and Coefficients of DeterminationJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 26 HASSEN A.

. )The partial r
2
, , measures the (square of the)
mutual relationship between Y and X
2
after the
influence of X
1
is eliminated from both Y and X
2
.
)Partial correlations are important in deciding
whether or not to include more regressors.
e.g. Suppose we have: two regressors (X
1
& X
2
);
; and .
)To explain Y, X
2
alone can do a good job (high
simple correlation coefficient between Y & X
2
).
)But after X
1
is already included, X
2
does not add
much –X
1
has done the job of X
2
(very low
partial correlation coefficient between Y & X
2
).
95.0
2
2
=
y
r
01.0
2
12
=
•y
r2
12•y
r
3.5 Partial Correlations and Coefficients of DeterminationJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 27 HASSEN A.

. )If we regress Y on X
1
alone , then we would
have:
i.e., of the total variation in Y, an amount =
remainsunexplained (by X
1
alone).
)If we regressY on X
1
and X
2
, the variation in Y
(TSS) that would be left unexplained is:
)Adding X
2
to the model reduces
the RSS by:


−=
2 2
1
)1(yR RSS
y SIMP



2 2
1
)1(
iy
yR


−=
2 2
12
)1(yR RSS
y MULT
MULT SIMP
RSS RSS−
3.5 Partial Correlations and Coefficients of Determination

• •
−=
2 2
1
2
12
)(yRR
y y
( ( ( ∑

• •
−−−=
2 2
12
2 2
1
)1()1(yRyR
y yJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 28 HASSEN A.

. )If we now regress that part of Y freed from the
effect of X
1
(residualized Y ) onthe part of X
2
freed from the effect of X
1
(residualized X
2
), we
will be able to explain the following proportion
of the RSS
SIMP
:
)This is the Coefficient of Partial Determination
(square of the coefficient of partial correlation).
)We include X
2
if the reduction in RSS (or the
increase in ESS) is significant.
)But, when exactly? We will see later!
2
1
2
1
2
12
2 2
1
2 2
1
2
12 2
12
1 )1(
) (

• •

• •



=


=


y
y y
i y
i y y
y
R
RR
yR
yRR
r
3.5 Partial Correlations and Coefficients of DeterminationJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 29 HASSEN A.

. )The amount represents the
incremental contribution of X
2
in explaining the
TSS.
alone Xby explained
of proportion 2.
1
2

i
y
N
N
N
2
12
2
1
2
1
2
12
)1()(
•• ••
−=−
yy yy
rRRR


• •

2 2
1
2
12
)(
i y y
yRR
jointly X& X
by explained
of proportion 1.
21

2
i
y
dunexplaine leaves X
that of proportion 3.
1
2

i
y


2
i
2
y ofpart d unexplaine
the explaining in Xof
on contributi l incrementa
the of proportion the 4.
3.5 Partial Correlations and Coefficients of DeterminationJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 30 HASSEN A.

. )Coefficient of Determination (in Simple Linear
Regression):
)Coefficient of Multiple Determination:
)Coefficients of Partial Determination:
2
1
2
1
2
12 2
12
1

• •



=
y
y y
y
R
RR
r


=
2
2
ˆ
y
xy




+
=
•2
2211 2
12y
y
yx βyx β
R
ˆˆ
3.5 Partial Correlations and Coefficients of Determination
2
2
2
2
2
12 2
21
1

• •



=
y
y y
y
R
RR
r

∑∑
=
==

==
n
1i
2
i
K
1j
n
1i
ijij
2
12...Ky
2
y
}yx β{
RR
ˆ


=
2
22
2
ˆ
,
y
x
ROrβJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 31 HASSEN A.

. 3.5 Partial Correlations and Coefficients of Determination
)The coefficient of multiple determination ( R
2
)
measures the proportion of the variation in the
dependent variable explained by (the set of all the
regressors in) the model .
)However, the R
2
can be used to compare the
goodness-of-fit of alternative regression equations only if the regression models satisfy
two conditions.
1) The models must have the same dependent
variable .
Reason
: TSS, ESS, and RSS depend on the units
in which the regressand Y
i
is measured.
For instance, the TSS for Y is not the same as the
TSS for log(Y). JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 32 HASSEN A.

. 3.5 Partial Correlations and Coefficients of Determination
2) The models must have the same number of
regressors and parameters(the same value of K).
Reason
: Adding a variable to a model will never
raise the RSS (or, will never lower ESS or R
2
)
even if the new variable is not very relevant.
)The adjusted R-squared , , attaches a penalty to
adding more variables.
)It ismodified to account for changes/differences
in degrees of freedom (df): due to differences in
number of regressors ( K) and/or sample size (n).
)If adding a variable raises for a regression,
then this is a better indication that it has
improved the model than if it merely raises .
2
R
2
R
2
RJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 33 HASSEN A.

. 3.5 Partial Correlations and Coefficients of Determination
(Dividing TSS and RSS by their df).
)K+ 1 represents the number of parameters to be
estimated.
∑∑


−==
2
2
2
2
2
1
ˆ
y
e
y
y
R
] [
] [
1n
y
1)(Kn
e
1R
2
2
2

+−
−=


]
1
1
[
1
22
2





=
∑∑
K
n
n
ye
R
)
1
1
(
)
1(
1
2
2






=
K
n
n
R
R
)
1
1
(
)
1(
1
2
2





=

K
n
n
R
R
2
2
2
2
2
2
1
1
,
1
R
R
R
R
R
R
K

<


>



general,

In

as
long
A
s
.

),

to
(relative
larger

grows

n

As
2
2
R
R
K
→JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 34 HASSEN A.

. 3.5 Partial Correlations and Coefficients of Determination
1. While is always non-negative, can be
positive or negative.
2. . can be used to compare the goodness-of-fit of
two regression models only if the models have
the same regressand.
3. Including more regressors reduces both the RSS
and df; and raises only if the former effect
dominates.
4. . should never be the sole criterion for choosing
between/among models:
)Consider expected signs & values of coefficients,
)Look for results consistent with economic theor
y

or reasoning (possible explanations), ...
2
R
2
R
2
R
2
R
2
RJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 35 HASSEN A.

. Numerical Example: Numerical Example:
Y(Salary in
'000 Dollars)
X
1
(Years of post
high school
Education)
X
2
(Years of
Experience)
3030441010
20203388
3636661111
24244499
4040881212
ƩY = 150ƩX
1
= 25ƩX
2
= 50JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 36 HASSEN A.

. Numerical Example: Numerical Example:
X
1
YX
2
YX
1
2
X
2
2
X
1
X
2
1001004040
2424
6666
3636
9696
ƩX
1
X
2
= 262
6464
121121
8181
144144
ƩX
2
2
= 510
1616
99
3636
1616
6464
ƩX
1
2
= 141
Y
2
120120300300900900
400400
1296 1296
576576
1600 1600
ƩY
2
=
4772
6060160160
216216396396
9696216216
320320480480
n = 5
ƩX
1
= 25
ƩX
2
= 50
ƩY = 150
ƩYX
1
=812
ƩYX
2
=1567
ƩX
1
X
2
=262
ƩX
1
2
= 141
ƩX
2
2
= 510
ƩY
2
= 4772
ƩX
1
Y
= 812
ƩX
2
Y
= 1552JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 37 HASSEN A.

. YXXX ''
1
)(
ˆ

=
β





















=















∑∑∑
∑∑∑
∑∑

2
1
1
2
2 21 2
21
2
1 1
2 1
2
1
0
YX
YX
Y
XXXX
XXXX
XXn
β
β
β
ˆ
ˆ
ˆ





















=













1552
812
150
51026250
26214125
50255
β
β
β
1
2
1
0
ˆ
ˆ
ˆ





















=












1552
812
150
1 0.75- 6.25-
0.75- 0.625 4.375
6.25- 4.375 40.825
β
β
β
2
1
0
ˆ
ˆ
ˆ










=













5.5
0.25-
23.75-
β
β
β
2
1
0
ˆ
ˆ
ˆ
2 1
X5.5X0.2523.75Y
ˆ
+−−=JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 38 HASSEN A.

. )One more year of experience, after controlling
for years of education, results in $5500 rise in
salary, on average.
)Or, if we consider two persons with the same
level of education, the one with one more year of
experience is expectedto have a higher salary of
$5500.
)Similarly, for two people with the same level of
experience, the one with an education of one
more year isexpectedto have a lower annual
salary of $250.
)Experience looks far more important than
education (which has a negative sign). JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 39 HASSEN A.

. )The constant term -23.75 is the salary one
would get with no experience and no education.
)But, a negative salary is impossible. )Then, what is wrong? 1. The sample must have been drawn from a
subgroup . We have persons with experience ranging from 8 to 12years (and post high
school educationranging from 3 to 8years). So
we cannot extrapolatethe results too far out of
this sample range.
2.Model specification: is our model correctly
specified (variables, functional form ); does our
data set meet the underlying assumptions?JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 40 HASSEN A.

. ∑

+==
2
2211
2
ˆˆ
ˆ .2)x β xβ(yESS
2 2 2
.1YnYyTSS −==



5(5)(10)] [262 0.25)(5.5) 2(
]5(10) [510(5.5)]5(5)[1410.25)(ESS
2 2 2 2
− −+
− +− −=)XXnXX( ββ
)XnX( β)XnX( β ESS
212121
2
2
2
2
2
2
2
1
2
1
2
1
ˆˆ
2
ˆ ˆ
− +
−+−=




270.5 ESS
=

272
=
⇒TSS
2
)30(54772 −= TSS



++=
2121
2
2
2
2
2
1
2
1
ˆˆ
2
ˆˆ
xx ββ x β x β ESSJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 41 HASSEN A.

. 272
5.270
.4
2
==
TSS
ESS
R
ESSTSSRSS

= .3
)YXnYX( β)YXnYX β ESS
yx β yx β ESSOR
22 211 1
2211
ˆ
(
ˆ
ˆˆ
:
− +− =
+=
∑ ∑



5.270)525.5)62(25.0
=
+
−=⇒( ESS
5.1 =⇒ RSS
9945.0
2
=⇒ R
al.differenti the wage of 99.45% about explains
together) experience and (education model Our
5.270272

=
⇒RSS
4272
25.1
1
)1(
)1(
1 .5
2
−=



−=
nTSS
KnRSS
R
9890.0
2
=⇒ RJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 42 HASSEN A.

. 8833.0
272
62875.3
ˆ
.6
2
11 2
1
=
×
= ==




y
yx
TSS
ESS
R
y SIMP
y
β

875.3
16
62
ˆ
:Xon Y Regressing
2
1
2
1
11
2
1
1
1
1
==


==




XnX
YXnYX
x
yx
β
y
d. unexplaine 31.75)( 11.67% about leaves and wages,in
sdifference the of 88.33% about explanis alone )(education X
1
=
75.31)272(1167.0)272)(8833.01(
1112.08833.09945.0 .7
2
1
2
12
=−=−
• •y y
RR
=
=
−=
SIMP
RSS
25.30)272(1112.0) (
2 2
1
2
12
= = −

• •
yRR
y yJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 43 HASSEN A.

. wages.in variation totalthe of 30.25)( 11.12%
about explaining of on contributi (marginal) extra
an with equation the wage enters e) (experienc X
2
=
9528.0
8833.01
8833.09945.0
1
.8
2
1
2
1
2
12 2
12
=


=


=

• •

y
y y
y
R
RR
r
31.75).( d unexplaine left
has X that aldifferenti the wage of 30.25)(
95.28% about explains e) (experienc X Or,
1
2
=
=
.X of) influence the from(free to related not is
whichX of partthe of on contributi the is this thatNote
1
2JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 44 HASSEN A.

. )The case of two regressors (X
1
& X
2
):
));
ˆ
var(,(~
ˆ
0 0 0
βββ
N
));
ˆ
var(,(~
ˆ
1 1 1
βββ
N
));
ˆ
var(,(~
ˆ
2 2 2
βββ
N
),0(~
2
σε
N
i
3.6 Statistical Inferences in Multiple Linear Regression
)
ˆ
,
ˆ
cov(2)
ˆ
var()
ˆ
var( )
ˆ
var(
2121 2
2
11
2
2
2
0
ββ β β
σ
β
XX X X
n
+ + +=
)1(
)
ˆ
var(
2
12
2
1
2
1
rx
i

=

σ
β
)1(
)
ˆ
var(
2
12
2
2
2
2
rx
i

=

σ
βJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 45 HASSEN A.

. 3.6 Statistical Inferences in Multiple Linear Regression
)1(
)
ˆ
,
ˆ
cov(
2
1221
2
12
2
21
rxx
r
ii


=

σ
ββ
∑∑

=
2
2
2
1
2
21 2
12
)(
ii
ii
xx
xx
r
.Xon X regressing from RSS theis )1(
2 1
2
12
2
1
rx
i


.Xon X regressing from RSS theis )1(
1 2
2
12
2
2
rx
i


. ofestimator unbiasedan is
3
ˆ
2 2
σ
n
RSS

=
σJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 46 HASSEN A.

. 3.6 Statistical Inferences in Multiple Linear Regression 3.6 Statistical Inferences in Multiple Linear Regression
)Note that:
(a) (X'X) (X'X)
--11
is the same matrix we use to derive the
OLS estimates, and
(b) in the case of two regressors.
1
2
K
K1
K
1K K
2
1 1
1
2-1/2
X
XX
X
XXX

X

X
X n
σ X)(X σβ















= =−



∑∑
∑ ∑

#

%

# #

)
ˆ
cov(var
1
2
K
K1
K
1K K
2
1 1
1
X
XX
X
XXX

X

X
X n
)β(c οο var
















=−



∑∑
∑ ∑

#

%

# #

2
ˆ ˆσ
3
ˆ

=
n
RSS
2
σJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 47 HASSEN A.

. )In the general case of K explanatory variables,
is an unbiased estimator of .
Note:
)Ceteris paribus, the higher the correlation
coefficient between X
1
& X
2
( ), the less
precise will the estimates be, i.e., the CIs
for the parameters will be wider.
)Ceteris paribus, the higher the degree of
variation of the X
js(the more X
jsvary in our
sample), the more precise will the estimates be –
narrow CIsfor population parameters.
21
ˆ
&
ˆββ12
r
3.6 Statistical Inferences in Multiple Linear Regression
1
ˆ
2
−−
=
Kn
RSS
σ
2
σ
21
&
β
βJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 48 HASSEN A.

. )The above two points are contained in:
where RSS
j
is the RSS from an auxiliary regres-
sion of X
j
on all other (K–1) X's and a constant.
)We use t testto test about single parameters and
single linear functions of parameters.
)To test hypotheses about & construct intervals
for individual use:
.,...,1,0;~
)
ˆ

ˆ
1
*
Kjt
es
Kn
j
jj
=∀

−−
β
ββ
3.6 Statistical Inferences in Multiple Linear Regression
.,...,2,1 );,(~
ˆ
2
Kj
RSS
N
j
j j
=∀
σ
ββ
j
βJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 49 HASSEN A.

. 3.6 Statistical Inferences in Multiple Linear Regression
)Tests about and interval estimation of the error
variance are based on:
)Tests of several parameters and several linear
functions of parameters are F-tests.
Procedures for Conducting F-tests
:
1. Compute the RSS from regressing Y on all X
js
(URSS =Unrestricted Residual Sum of Squares).
2. Compute the RSS from the regression with the
hypothesized/specified values of parameters ( )
(RRSS= Restricted RSS).
2
1Kn 2
2
2
χ~
σ
σˆ1)K(n
σ
RSS
−−
−−
=
2
σ
s
βJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 50 HASSEN A.

. 3.6 Statistical Inferences in Multiple Linear Regression
3. Under H
0
(if the restriction is correct)
where J is the number of restrictions imposed.
If F-calculated is greater than the F-tabulated ,
then the RRSS (is significantly) greater than the
URSS, and thus wereject the null .
)A special F-test of common interest is to test the
null that none of the Xs influence Y (i.e., that
our regression is useless!):
Test H
0
: vs. H
1
: H
0
is not true.
0...
21
=
=
=
=
K
β
β
β
1,
~
)1/(
/) (
−−
−−

KnJ
F
KnURSS
JURSS RRSS
1KJ,n 2
U
2
R
2
U
F~
1)K)/(nR(1
)/JR(R
−−
−−−
−JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 51 HASSEN A.

. 3.6 Statistical Inferences in Multiple Linear Regression
)With reference to our example on wages, test
the following at the 5% level of significance.
a) ; b) ; c) ;
d) the overall significance of the model; and
e) .
0
0
=
β
.}
ˆ
{ )1(
11
2 22
∑∑ ∑∑
==
−=−=
K
j
n
i
ijij i i
yx yyR URSS
β
.
2

=
i
y RRSS
1, 2
2
~
)1/()1(
/
)1/(
/) (
−−
−−−
=
−−


KnK
F
KnR
KR
KnURSS
KURSS RRSS
0
1
=
β
0
2
=
β
21
β
β
=JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 52 HASSEN A.

. 1 2
)()
ˆ
cov(var

=−XX'
σβ
1
1
51026250
26214125
50255
X)(X'












=










=
1 0.75- 6.25-
0.75- 0.625 4.375
6.25- 4.375 40.825
1
ˆ :by estimated is
2 2
−−
=
Kn
RSS
σ σ
75.0
2
5.1
ˆ
2
==
σ










=


1 0.75 -6.25-
0.75 -0.625 4.375
6.25 - 4.375 40.825
0.75) β
ˆ
(
covvar










=
0.75 0.5625- 4.6875-
0.5625- 0.4687 3.28125
4.6875- 3.28125 30.61875
5JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 53 HASSEN A.

. ⎥









)β var(
) β,β cov( ) β var(
) β,β cov( ) β,β cov() β var(
2
21 1
20 10 0
ˆ
ˆˆ ˆ
ˆˆ ˆˆˆ
a)
b)
c)
29.4
61875.30
75.23
)
ˆ

0
ˆ
0
0
−≈

=

=
β
β
es
t
c
30.4
2
025.0
≈= tt
tab
!!null!the reject not dowe tt
tab cal
⇒≤,
37.0
46875.0
25.0
)
ˆ

0
ˆ
1
1
−≈

=

=
β
β
es
t
c
null.the reject not dowe ,tt
tab cal
⇒≤
35.6
75.0
5.5
)
ˆ

0
ˆ
2
2
≈=

=
β
β
es
t
c
null.the reject
tt
tab cal

>










=
0.75
0.5625- 0.46876
4.6875- 3.28125 30.61875JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 54 HASSEN A.

. d) e)
82.180
2/0055.0
2/9945.0
)1/()1(
/
2
2
≈ =
−−−
=
KnR
KR
F
c
19
05.0
2,2
≈= FF
t
null.the reject,FF
tab cal
⇒>
,
ˆˆˆ ˆ
From
22110i i i
XX Y
βββ
++=
). (
ˆˆ ˆ
ˆˆˆ ˆ
runNow
ii i
i i i
XX Y
XX Y
21 0
2 1 0
++=⇒
++=
ββ
βββ
08.12
=
⇒RRSS
5.1
=
URSS
51.18
05.0
2,1
≈= FF
t
11.14
2/5.1
1/)5.108.12(
)1/()(
/) (


=
−−

=
KnURSS
JURSS RRSS
F
c
null.the reject not dowe ,FF
tab cal
⇒≤JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 55 HASSEN A.

. 3.6 Statistical Inferences in Multiple Linear Regression
)Note that we can also use t-testto test the single
restrictionthat β
1
= β
2
(equivalently, β
1

2
= 0).
)The same result as the F-test, but the F-test is
easier to handle.
1
21 2 1
21
21
21
t~

ˆ

ˆ
v(oˆ2c) β
ˆ
r(aˆv) β
ˆ
r(aˆv
β
ˆ
β
ˆ

ˆ
β
ˆ
(eˆs
0 β
ˆ
β
ˆ
−+

=

−−
3.76
0.5625)2( 0.8660254 0.6846532
5.75
t
c
−≈
−− +

=
706.12
1
025.0
== tt
t
null.the reject not do ⇒
tab cal
tt<JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 56 HASSEN A.

. 3.6 Statistical Inferences in Multiple Linear Regression
To sum up To sum up
::
Assuming that our model is correctly specified
and all the assumptions are satisfied,
)Education (after controlling for experience)
doesn’t have a significant influence on wages.
)In contrast, experience (after controlling for
education) is a significant determinant of wages.
)The intercept parameter is also insignificant
(though at the margin).Less Important!
)Overall, the model explains a significant portion
of the observed wage pattern.
)We cannot reject the claim that the coefficients
of the two regressors are equal.JIMMA UNIVERSITY
2008/09 CHAPTER 3 - 57 HASSEN A.

. )In Chapter 2, we used the estimated simple
linear regression model for prediction: (i) mean
prediction(i.e., predicting the point on the
population regression function (PRF)), and (ii)
individual prediction(i.e., predicting an
individual value of Y), given the value of the
regressor X (say, X = X
0
).
)The formulas for prediction are also similar to
those in the case of simple regression except
that, to compute the standard error of the
predicted value, we need the variances and
covariances of all the regression coefficients.
3.7 Prediction with Multiple Linear RegressionJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 58 HASSEN A.

. Note: )Even if the R
2
for the SRF is very high, it does
not necessarily mean that our forecastsare
good.
)The accuracy of our prediction depends on the
stability of the coefficients between the period
used for estimation and the period used for
prediction.
)More care must be taken when the values of the
regressors (X's) themselves are forecasts.
3.7 Prediction with Multiple Linear RegressionJIMMA UNIVERSITY
2008/09 CHAPTER 3 - 59 HASSEN A.

. CHAPTER FOUR CHAPTER FOUR
VIOLATING THE ASSUMPTIONS OF VIOLATING THE ASSUMPTIONS OF
THE CLASSICAL LINEAR THE CLASSICAL LINEAR
REGRESSION MODEL (CLRM) REGRESSION MODEL (CLRM)JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 1 HASSEN A.

. )The estimates derived usin
g
OLS techniques
and the inferences based on those estimates are
valid only under certain conditions.
)In
g
eneral, these conditions amount to the
regression model being "well-specified".
)A regression model is statistically well-speci
f
ied
for an estimator (sa
y
, OLS) if all of the
assumptions required for the optimalit
y
of the
estimator are satisfied.
)The model will be statisticall
y
misspecified if
one/more of the assumptions are not satisfied.
4.1 Introduction 4.1 IntroductionJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 2 HASSEN A.

. )Before we proceed to testin
g
for violations of (or
relaxin
g
) the assumptions of the CLRM
sequentiall
y
, let us recall: (i) the basic steps in a
scientific enquiry & (ii) the assumptions made.
I. The Major Steps Followed in a Scientific Study I. The Major Steps Followed in a Scientific Study
:
1.Specifying a statistical modelconsistent with
theor
y
(or a model representin
g
the theoretical
relationship between a set of variables).
)This involves at least two choices to be made:
A.The choice of variablesto be included into
the model, and
4.1 Introduction 4.1 IntroductionJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 3 HASSEN A.

. B.The choice of the functional formof the link
(linear in variables, linear in lo
g
arithms of
the variables, polynomial in regressors, etc.)
2.Selecting an estimatorwith certain desirable
properties (provided that the re
g
ression model
in question satisfies a given set of conditions).
3.Estimating the model. When can one estimate a
model? (sample size? perfect multicollinearit
y
?)
4.Testingfor the validity of assumptions made. 5.a) If there is no evidence of misspecification, go
on to conducting statistical inferences.
4.1 Introduction 4.1 IntroductionJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 4 HASSEN A.

. 5.b) If the tests show evidence of misspecification
in one or more relevant forms, then there are two possible courses of action implied:
)If the precise form of model misspecification
can be established, then it ma
y
be possible to
find an alternative estimatorthat is optimal
under the particular sort of misspecification.
)Re
g
ard statistical misspecification as an
indication of a defective model. Then, search
an alternative, well-specified re
g
ression
model , and start over (return to Step 1).
4.1 Introduction 4.1 IntroductionJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 5 HASSEN A.

. 4.1 Introduction 4.1 Introduction
II. The Assumptions of the CLRM: II. The Assumptions of the CLRM:
A1A1
:n > K+1. Otherwise, estimation is not possible.
A2A2
:No perfect multicollinearity among the X's.
Implication
: any X must have some variation.
A3A3
:
ɛ
i|X
ji
~ IID(0, σ
2
) or
A3.1
:var(ɛ
i|X
j) = σ
2
(0 < σ
2
< ∞).
A3.2
: cov(ɛ
i,ɛ
s
|X
j) = 0, for all i ≠s; s = 1, …, n.
A4A4
: ɛ
i'sare normally distributed:
ɛ
i|X
j
~ N(0,σ
2
).
A5A5
:E(ɛ
i|X
j) = E(ɛ
i) = 0; i = 1, …, n & j = 1, …, K.
A5.1
: E(ɛ
i) = 0 and X’s are non-stochastic, or
A5.2
: E(ɛ
iX
ji
) = 0 or E( ɛ
i|X
j) = E(ɛ
i) with stochastic X’s.
Implication
: ɛis independent of X
j
& thus cov( ɛ,X
j) = 0.




=
=
ts for0
ts for σ
)X| ε E(ε
2
jtsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 6 HASSEN A.

. )Generall
y
speakin
g
, the several tests for the
violations of the assumptions of the CLRM are
tests of model misspecification.
)The values of the test statistics for testin
g

particular H
0
's tend to reject these H
0
's when
the model is misspecified in some way.
e.
g
., tests for heteroskedasticit
y
or autocorrelation
are sensitive to omission of relevant variables.
)A significant test statistic may indicate hetero-
skedastic(or autocorrelated) errors, but it ma
y

also reflect omission of relevant variables.
4.1 Introduction 4.1 IntroductionJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 7 HASSEN A.

. 1.Small Samples(A1?) 2.Multicollinearity(A2?) 3.Non-Normal Errors(A4?) 4.Non-IID Errors(A3?):
A. Heteroskedasticity(A3.1?) B. Autocorrelation(A3.2?)
5.Endogeneity(A5?):
A. Stochastic Regressors and Measurement Error B. Model Specification Errors:
a. Omission of Relevant Variables b. Wrong Functional Form c. Inclusion of Irrelevant Variables ( ?XXX ) d. Stability of Parameters
C. Simultaneit
y

(
or Reverse Causalit
y)
4.1 Introduction 4.1 Introduction
Outline:JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 8 HASSEN A.

. )Requirement for estimation: n > K+1. )If the number of data points ( n) is small, it ma
y

be difficult to detect violations of assumptions.
)With small n, it is hard to detect heteroskedast-
icityor nonnormality of ɛ
i'seven when present.
)Thou
g
h none of the assumptions is violated, a
linear regression with small nma
y
not have
sufficient power to reject β
j
= 0, even if β
j
≠0.
)If [(K+1)/n]> 0.4, it will often be difficult to fit
a reliable model.
)Rule of thumb
: aim to have n ≥6X& ideally n ≥10X.
4.2 Sample Size: Problems with Few Data Points 4.2 Sample Size: Problems with Few Data PointsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 9 HASSEN A.

. )Man
y
social research studies use a lar
g
e
number of predictors.
)Problems arise when the various predictors are
highly and linearly related (highly collinear).
)Recall that, in a multiple re
g
ression, onl
y
the
independent variation in a re
g
ressor (an X) is
used in estimating the coefficient of that X.
)If two X's (X
1
& X
2
) are hi
g
hl
y
correlated with
each other, then the coefficients of X
1
& X
2
will
be determined b
y
the minorit
y
of cases where
they don’t vary together (or overlap).
4.3 Multicollinearity 4.3 MulticollinearityJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 10 HASSEN A.

. )Perfect multicollinearity
: occurs when one (or
more) of the regressors in a model (e.g., X
K
) is a
linear function of other/s (X
i, i = 1, 2, …, K-1).
)For instance, if X
2
= 2X
1
, then there is a perfect
(an exact) multicollinearity between X
1
& X
2
.
)Suppose, PRF: Y=β
0

1
X
1

2
X
2
, & X
2
=2X
1
.
)The OLS technique yields 3 normal equations:
4.3 Multicollinearity 4.3 Multicollinearity
∑∑∑∑
∑∑∑∑



+ +=
++=
++=
2
i22i2i11i20i2i
i2i12
2
i11i10i1i
i22i110i
X β
ˆ
XX β
ˆ
X β
ˆ
XY
XX β
ˆ
X β
ˆ
X β
ˆ
XY
X β
ˆ
X β
ˆ
β
ˆ
nYJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 11 HASSEN A.

. 4.3 Multicollinearity 4.3 Multicollinearity
)But, substituting 2X
1
for X
2
in the 3
rd
equation
yields the 2
nd
equation.
)That is, one of the normal equations is in fact
redundant.
)Thus, we have onl
y
2 independent equations (1
& 2 or 1 & 3) but 3 unknowns (β's) to estimate.
)As a result, the normal equations will reduce to:
∑ ∑∑


++=
++=
2
121101
1210
2
2
i i ii
i i
X XXY
X nY
]
ˆˆ
[
ˆ
]
ˆˆ
[
ˆ
βββ
βββJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 12 HASSEN A.

. 4.3 Multicollinearity 4.3 Multicollinearity
)The number of β's to be estimated is
g
reater
than the number of independent equations.
)So, if two or more X's are perfectl
y
correlated, it
is not possible to find the estimates for all β's.
i.e., we cannot find separately, but .
&








+






=









∑∑



21
0
2
1 1
1
1
2
ββ
βˆˆ
ˆ
.
i i
i
ii
i
XX
Xn
XY
Y
21
β
ˆ

ˆ
21
β
ˆ
2 β
ˆ
+
2
1
2
1i
11ii
21
XnX
YXnXY
β
ˆ
2 β
ˆ
αˆ


=+=


121 0
X] β
ˆ
2 β
ˆ
[Y β
ˆ
+−=JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 13 HASSEN A.

. )High, but not perfect, multicollinearity
: two or
more re
g
ressors in a model are hi
g
hl
y
(but
imperfectly) correlated. e.g. X
1
= 3 –5X
K
+ u
i.
)This makes it difficult to isolate the effect of
each of the highly collinear X's on Y.
)If there is inexact but strong multicollinearity:
*The collinear re
g
ressors (X's) explain the
same variation in the regressand (Y).
*Estimated coefficients chan
g
e dramaticall
y
,
dependin
g
on the inclusion/exclusion of
other predictor/s into (or out of) the model.
4.3 Multicollinearity 4.3 MulticollinearityJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 14 HASSEN A.

. 4.3 Multicollinearity 4.3 Multicollinearity
*. tend to be ver
y
shak
y
from one sample to
another.
*Standard errors of will be inflated.
*As a result, t-tests will be insignificant & CIs
wide ( rejectingH
0
: β
j
= 0becomes very rare).
*We get low t-ratiosbut high R
2
(or F): there
is not enou
g
h individual variation in the X's,
but a lot of common variation.
)Yet, the OLS estimators are BLUE BLUE
.
)BLUE –a property of repeated-sampling –sa
y
s
nothing about estimates from a single sample.
s'β
ˆ
s'β
ˆJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 15 HASSEN A.

. 4.3 Multicollinearity 4.3 Multicollinearity
)But, multicollinearit
y
is not a problem if the
principal aim is prediction,
g
iven that the same
pattern of multicollinearit
y
persists into the
forecast period.
Sources of Multicollinearity :
)Improper use of dummy variables
. (Later!)
)Includin
g
the same (or almost the same)
variable twice (e.
g
. different operationaliaztions
of a single concept used together).
)Method of data collection used (e.
g
. samplin
g

over a limited range of X values).JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 16 HASSEN A.

. 4.3 Multicollinearity 4.3 Multicollinearity
)Includin
g
a variable computed from other
variables in the model (e.
g
. usin
g
famil
y
income,
mother’s income & father’s income together).
)Adding many polynomial terms
to a model,
especially if the range of the X variable is small.
)Or, it ma
y

j
ust happen that variables are hi
g
hl
y

correlated (without any fault of the researcher).
Detecting Multicollinearity :
)The classic case of multicollinearit
y
occurs
when R
2
is hi
g
h (& si
g
nificant), but none of X's
is si
g
nificant (some of the X's ma
y
even have
wrong sign).JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 17 HASSEN A.

. 4.3 Multicollinearity 4.3 Multicollinearity
)Detecting the presence of multicollinearity is
more difficult in the less clear-cut cases.
)Sometimes, simple or partial coefficients of
correlation among regressors are used.
)However, serious multicollinearity may exist
even if these correlation coefficients are low.
)A statistic commonly used for detecting multi-
collinearity is VIF (Variance Inflation Factor).
)From a simple linear regression of Y on X
j
we
have:

=
2ji
2
j
x
σ
)
βˆ
var(JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 18 HASSEN A.

. 4.3 Multicollinearity 4.3 Multicollinearity
)From multiple linear regression of Y on X's:
where is R
2
from regressing X
j
on all other X's.
)The difference between variance of β
j
in the
two cases arises from the correlation between
X
j
and the other X's, and is captured by:
)If X
j
is not correlated with the other X's,
and the two variances will be identical.
)R(1x
σ

ˆ
var(
2
j
2
ji
2
j

=

2
j
R
2
j
j
R1
1
VIF

=


=

=
2ji
2
j
2ji
2
2j
j
x
σ
.
VIF
x
σ
.
)
R
(1
1
)
βˆ
var(
1
VIF
j
=
,
R
2j
0
=JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 19 HASSEN A.

. 4.3 Multicollinearity 4.3 Multicollinearity
)As R
j
2
increases, VIF
j
rises.
)If X
j
is perfectl
y
correlated with the other X's,
VIF
j
= ∞. Implication for precision (or CIs)???
)Thus, a lar
g
e VIF is a si
g
n of serious/severe (or
“intolerable”) multicollinearity.
)There is no cutoff point on VIF (or an
y
other
measure) be
y
ond which multicollinearit
y
is
taken as intolerable.
)A rule of thumb: VIF > 10 VIF > 10is a si
g
n of severe
multicollinearity.
#In stata (after regression):
vifJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 20 HASSEN A.

. 4.3 Multicollinearity 4.3 Multicollinearity
Solutions to Multicollinearity :
)Solutions depend on the sources of the problem.
)The formula below is indicative of some
solutions:
)More precision is attained with lower variances
of coefficients. This may result from:
a)Smaller RSS (or variance of error term) –
less “noise”, ceteris paribus ( cp);
b)Lar
g
er sample size (n) relative to the
number of parameters (K+1), cp;
)R(1x1)K(n
e
2
j
2
ji
2
i
−−−
=


)R(1x
σˆ

ˆ
r(aˆv
2
j
2
ji
2
j

=
∑JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 21 HASSEN A.

. 4.3 Multicollinearity 4.3 Multicollinearity
c)Greater variation in values of each X
j, cp;
d)Less correlation between regressors, cp.
)Thus, serious multicollinearit
y
ma
y
be solved b
y

using one/more of the following:
1.“Increasing sample size”(if possible). ??? 2.Utilizin
g
a priori information on parameters
(from theory or prior research).
3.Transforming variables or functional form:
a)Using differences (ΔX) instead of levels (X)
in time series data where the cause ma
y
be
X's moving in the same direction over time. JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 22 HASSEN A.

. 4.3 Multicollinearity 4.3 Multicollinearity
b)In pol
y
nomial re
g
ressions, usin
g
deviations
of regressors from their means ( (X
j–X̅
j)
instead of X
j) tends to reduce collinearity.
c)Usually, logs are less collinear than levels.
4.Pooling cross-sectional and time-series data. 5.Dropping one of the collinear predictors. ???
However, this ma
y
lead to the omitted variable
bias (misspecification) if theor
y
tells us that the
dropped variable should be incorporated.
6.To be aware of its existence and emplo
y
in
g

cautious interpretation of results.JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 23 HASSEN A.

. 4.4 Non 4.4 Non
--
normality of the Error Term normality of the Error Term
)Normality is not required to get BLUE of β's. )The CLRM merely requires errors to be IID. )Normalit
y
of errors is required onl
y
for valid
hypothesis testing, i.e., validity of t-and F-tests.
)In small samples, if the errors are not normall
y

distributed, the estimated parameters will not
follow normal distribution, which complicates
inference.
)NB: there is no obli
g
ation on X's to be normall
y

distributed.
#In stata (after regression):
kdensity residual , normalJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 24 HASSEN A.

. 4.4 Non 4.4 Non
--
normality of the Error Term normality of the Error Term
)A formal test of normality is the Shapiro-Wilk
test [H
0
: errors are normally distributed].
)Large p-value shows that H
0
cannot be rejected.
#In stata:
swilk residual
)If H
0
is rejected, transforming the regressand or
re-specifying (the functional form of) the model
may help.
)With large samples, thanks to the central limit
theorem, hypothesis testing may proceed even if distribution of errors deviates from normality.
)Tests are generally asymptotically valid.JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 25 HASSEN A.

. )The assumption of IIDerrors is violated if a
(simple) random sampling cannot be assumed.
)More specifically, the assumption of IIDerrors
fails if the errors:
1) are not identically
distributed, i.e., if var( ε
i|X
ji
)
varies with observations –heteroskedasticity .
2) are not independently
distributed, i.e., if errors
are correlated to each other – serial correlation
.
3) are both heteroskedastic & autocorrelated.
This is common in panel & time series data.
4.5 Non 4.5 Non
--
IID Errors IID ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 26 HASSEN A.

. )One of the assumptions of the CLRM is homo-
skedasticity, i. e., var( ε
i|X) = var( ε
i) = σ
2
.
)This will be true if the observations of the error term are drawn from identical distributions. )Heteroskedasticity is present if var( ε
i)=σ
i
2
≠σ
2
:
different variances for different se
g
ments of the
population (segments by the values of the X's).
e.g.: Variability
of consumption rises with rise in
income, i.e., people with higher incomes
displa
y

greater variability
in consumption.
)Heteroskedasticity is more likely in cross- sectional than time-series data.
4.5.1 4.5.1
Heteroskedasticity HeteroskedasticityJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 27 HASSEN A.

. )With a correctly specified model (in any other
aspect), but heteroskedastic errors, the OLS
coefficient estimators are unbiased& consistent
but inefficient .
)Reason: OLS estimator for σ
2
(and thus for the
standard errors of the coefficients) are biased.
)Hence, confidence intervals based on biased
standard errors will be wrong, and the t & F tests will be misleading/invalid.
NB
: Heteroskedasticity could be a symptom of
other problems (e.g. omitted variables).
4.5.1 4.5.1
Heteroskedasticity HeteroskedasticityJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 28 HASSEN A.

. )If heteroskedasticit
y
is a result (or a reflection)
of specification error (sa
y
, omitted variables),
OLS estimators will be biased & inconsistent.
)In the presence of heteroskedasticit
y
, OLS is
not optimal as it
g
ives equal wei
g
ht to all
observations, when, in fact, observations with
larger error variances (σ
i
2
) contain less
information than those with smaller σ
i
2
.
)To correct,
g
ive less wei
g
ht to data points with
greater σ
i
2
and more wei
g
ht to those with
smaller
σ
i
2
. [i.e., use GLS ( WLSor FGLS)].
4.5.1 4.5.1
Heteroskedasticity HeteroskedasticityJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 29 HASSEN A.

. Detecting Heteroskedasticity: Detecting Heteroskedasticity:
A. Graphical Method
)Run OLS and plot squared residuals versus
fitted value of Y ( Ŷ) or against each X.
#In stata (after regression)
: rvfplot
)The
g
raph ma
y
show some relationship (linear,
quadratic, …), which provides clues as to the nature of the problem and a possible remedy.
e.g. let, the plot of ũ
2
(from Y = α+ βX + u) a
g
ainst
X signifies that var(u
i) increases proportional
to X
2
; (var(u
i)=σ
i
2
=cX
i
2
). What is the Solution?
4.5.1 4.5.1
Heteroskedasticity HeteroskedasticityJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 30 HASSEN A.

. 4.5.1 4.5.1
Heteroskedasticity Heteroskedasticity
)Now, transform the model by dividing Y, α, X
and u by X.
)Now, u* is homoskedastic: var(u
i*) = c; i.e.,
using WLS solves heteroskedasticity!
)WLS yields BLUE for the transformed model.
)If the pattern of heteroskedasticit
y
is unknown,
lo
g
transformation of both sides (compressin
g

the scale of measurement of variables) usuall
y

solves heteroskedasticity.
)This cannot be used with 0 or negative values.
*** uxy
+
+
=

β
α
X
u
X
X
XX
Y
++=
βα
1JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 31 HASSEN A.

. 4.5.1 4.5.1
Heteroskedasticity Heteroskedasticity
B. A Formal Test:
)The most-often used test for heteroskedasticit
y

is the Breusch-Pagan (BP)
test.
H
0
: homoskedasticity vs. H
a
: heteroskedasticity
)Regress ũ
2
on Ŷorũ
2
on the original X's, X
2
's
and, if enough data, cross-products of the X's.
)H
0
will be re
j
ected for hi
g
h values of the test
statistic [n*R
2

2
q
] or for low p-values.
)n & R
2
are obtained from the auxiliar
y

regression of ũ
2
on q(number of) predictors.
#In stata (after regression):
hettestorhettest, rhsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 32 HASSEN A.

. 4.5.1 4.5.1
Heteroskedasticity Heteroskedasticity
)The B-P test as specified above:
9uses the regression of ũ
2
on Ŷor on X's;
9and thus consumes less degrees of freedom; 9but tests for linear heteroskedasticity only; 9and has problems when the errors are not
normally distributed.
#Alternatively, use:hettest, iidor hettest, rhsiid
This doesn’t need the assumption of normality.
)If
y
ou want to include squares & cross products
of X's, generate these variables first and use:
#hettestvarlistor hettestvarlist, iid )The
hettestvarlist, iid
version of B-P test is the
same as White’s test for heteroskedasticity : JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 33 HASSEN A.

. 4.5.1 4.5.1
Heteroskedasticity Heteroskedasticity
#In stata (after regression):
imtest, white
Solutions to (or Estimatio n with) Heteroskedasticity )If heteroskedasticit
y
is detected, first check for
some other specification error in the model
(omitted variables, wrong functional form, …).
)If it persists even after correctin
g
for other
specification errors, use one of the following:
1.Use better method of estimation (WLS/FGLS); 2.Stick to OLS but use robust (heteroskedasticity
consistent) standard errors. #In stata:
regY X
1
…X
K
, robust
This is OK even with homoskedastic errors. JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 34 HASSEN A.

. 4.5.2 Autocorrelation 4.5.2 Autocorrelation
)Error terms are autocorrelated if error terms
from different (usuall
y
ad
j
acent) time periods
(cross-sectional units) are correlated, E( ε

j)≠0.
)Autocorrelation in cross-sectional data is called
spatial autocorrelation (in space, not over time).
)However, spatial autocorrelation is uncommon
since cross-sectional data do not usuall
y
have
some ordering logic, or economic interest.
)Serial correlationoccurs in time-series studies
when the errors associated with a
g
iven time
period carry over into future time periods.JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 35 HASSEN A.

. 4.5.2 Autocorrelation 4.5.2 Autocorrelation
)e
t
are correlated with lagged values: e
t-1
, e
t-2
, …
)Effects of autocorrelation are similar to those
of heteroskedasticity:
)OLS coefficients are unbiased and consistent, but inefficient; the estimate of σ
2
is biased, and
thus inferences are invalid.
Detecting Autocorrelation
)Whenever
y
ou do on time series data, set up
your data as a time-series (i.e., identif
y
the
variable that represents time or the sequential order of observations).
#In stata
: tssetvarnameJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 36 HASSEN A.

. 4.5.2 Autocorrelation 4.5.2 Autocorrelation
)Then, plottin
g
OLS residuals a
g
ainst the time
variable, or a formal test could be used to check
for autocorrelation.
#In stata (after regression and predicting residuals):
scatter residualtime
The Breusch-Godfrey Test
)Commonly-used general test of autocorrelation. )It tests for autocorrelation of first or hi
g
her
order, and works with stochastic regressors.
Steps Steps
:
1.Re
g
ress OLS residuals on X's and la
gg
ed
residuals: e
t
= f(X
1t
,...,X
Kt
, e
t-1
,…,e
t-j
)JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 37 HASSEN A.

. 4.5.2 Autocorrelation 4.5.2 Autocorrelation
2.Test the
j
oint h
y
pothesis that all the estimated
coefficients on la
gg
ed residuals are zero. Use the
test statistic: jF
cal
~
χ
2
j
;
3.Alternativel
y
, test the overall si
g
nificance of the
auxiliary regression using nR
2
~ χ
2
(k+j)
.
4.Reject H
0
: no serial correlationfor hi
g
h values
of the test statistic or for small p-values.
#In stata (after regression)
: bgodfrey, lags(#)
Eg.
bgodfrey, lags(2)
tests for 2
nd
order auto in error
terms (e
t'sup to 2 periods apart) like e
t, e
t-1
, e
t-2
;
while
bgodfrey, lags(1/4)
tests for 1
st
, 2
nd
, 3
rd
& 4
th
order autocorrelations.JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 38 HASSEN A.

. 4.5.2 Autocorrelation 4.5.2 Autocorrelation
Estimation in the Presence of Serial Correlation:
)Solutions to autocorrelation depend on the
sources of the problem.
)Autocorrelation may result from:
)Model misspecification (e.
g
. Omitted
variables, a wrong functional form, …)
)Misspecified d
y
namics (e.
g
. static model
estimated when dependence is dynamic), …
)If autocorrelation is si
g
nificant, check for model
specification errors, & consider re-specification.JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 39 HASSEN A.

. 4.5.2 Autocorrelation 4.5.2 Autocorrelation
)If the revised model passes other specification
tests, but still fails tests of autocorrelation, the
following are the key solutions:
1. FGLS: Prais-Winston regression, …. #In stata
:
praisY X
1
…X
K
2. OLS with robust standard errors: #In stata
:
neweyY X
1
…X
K
, lags(#)JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 40 HASSEN A.

. 4.6 Endogenous Regressors: 4.6 Endogenous Regressors:
E(E(
ɛɛ
ii|X|X
jj) ) ≠≠00
)A ke
y
assumption maintained in the previous
lessons is that the model, E(Y|X) = X βor
, was correctly specified.
)The model Y = X β+ εis correctly specified if:
1.εis ortho
g
onal to the X's, enters the model
with an additivel
y
(separable effect on Y),
and this effect equals zero on average; and,
2. E(Y|X) is linear in stable parameters ( β's).
)If the assumption E(ε
i|X
j) = 0is violated, the
OLS estimators will be biased & inconsistent.

=
+=
K
i
ii
Xββ E(Y|X)
1
0JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 41 HASSEN A.

. )Assumin
g
exo
g
enous re
g
ressors (ortho
g
onal
errors & X's) is unrealistic in many situations.
)The possible sources of endogeneity are:
1.stochastic regressors & measurement error; 2.specification errors: omission of relevant
variables or using a wrong functional form;
3.nonlinearity in & instability of parameters; and 4.bidirectional link between the X's and Y
(simultaneity or reverse causality);
)Recall two versions of exogeneity assumption: 1. E(ɛ
i) = 0 and X’s are fixed (non-stochastic),
2. E(ɛ
iX
j) = 0 or E(ɛ
i|X
j) = 0 with stochastic X’s.
4.6 Endogenous Regressors: 4.6 Endogenous Regressors:
E(E(
ɛɛ
ii|X|X
jj) ) ≠≠00JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 42 HASSEN A.

. )The assumption E(ε
i) = 0amounts to: “We do
not systematically over- or under-estimate the
PRF,”or the overall impact of all the excluded
variables is random/unpredictable.
)This assumption cannot be tested as residuals
will alwa
y
s have zero mean if the model has an
intercept.
)If there is no intercept, some information can
be obtained by plotting the residuals.
)If E(E(ɛɛ
i i) = ) =
μμ(a constant (a constant ≠≠0) & 0) & X's are fixed, the the
estimators of all estimators of all ββ's, except 's, except ββ
00
, will be OK! , will be OK!
))But, can we assume non But, can we assume non-
-
stochastic regressors? stochastic regressors?
4.6 Endogenous Regressors: 4.6 Endogenous Regressors:
E(E(
ɛɛ
ii|X|X
jj) ) ≠≠00JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 43 HASSEN A.

. A. Stochastic Regressors A. Stochastic Regressors
)Man
y
economic variables are stochastic, and it
is only for ease that we assumed fixed X's.
)For instance, the set of regressors may include:
*a lagged dependent variable (Y
t-1
), or
*an X characterized by a measurement error.
)In both of these cases, it is not reasonable to
assume fixed regressors.
)As lon
g
as no other assumption is violated, OLS
retains its desirable properties even if X's are
stochastic.
4.6.1 Stochastic Regressors and Measurement Error 4.6.1 Stochastic Regressors and Measurement ErrorJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 44 HASSEN A.

. )In
g
eneral, stochastic re
g
ressors ma
y
or ma
y

not be correlated with the model error term.
1. If X & ɛare independently distributed, E(ɛ|X)
= 0, OLS retains all its desirable properties.
2. If X & ɛare not independent but are either
contemporaneously uncorrelated, [E(ɛ
i|X
i±s
) ≠
0for s = 1, 2, …butE(ɛ
i|X
i) = 0],or ɛ& X are
as
y
mptoticall
y
uncorrelated, OLS retains its
large sample properties: estimators are biased,
but consistentand asymptotically efficient .
)The basis for valid statistical inference remains
but inferences must be based on large samples.
4.6.1 Stochastic Regressors and Measurement Error 4.6.1 Stochastic Regressors and Measurement ErrorJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 45 HASSEN A.

. 3.If X & ɛare not independent and are
correlated even as
y
mptoticall
y
, then OLS
estimators are biased and inconsistent.
)SOLUTION
: IV/2SLS
REGRESSION!
)Thus, it is not the stochastic (or fixed) nature of
re
g
ressors b
y
itself that matters, but the nature
of the correlation between X's & ɛ. B. Measurement Error
)Measurement error in the re
g
ressand (Y) onl
y

does not cause bias in OLS estimators as lon
g

as the measurement error is not s
y
stematicall
y

related to one or more of the regressors.
4.6.1 Stochastic Regressors and Measurement Error 4.6.1 Stochastic Regressors and Measurement ErrorJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 46 HASSEN A.

. )If the measurement error in Y is uncorrelated
with X's, OLS is perfectl
y
applicable (thou
g
h
with less precision or higher variances).
)If there is a measurement error in a re
g
ressor
and this error is correlated with the measured
variable, then OLS estimators will be biased
and inconsistent.
)SOLUTION
: IV/2SLS REGRESSION!
4.6.1 Stochastic Regressors and Measurement Error 4.6.1 Stochastic Regressors and Measurement ErrorJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 47 HASSEN A.

. )Model misspecification may result from:
)omission of relevant variable/s, )using a wrong functional form, or )inclusion of irrelevant variable/s.
1.Omission of relevant variables
: when one/more
relevant variables are omitted from a model.
)Omitted-variable bias
: bias in parameter
estimates when the assumed specification is
incorrect in that it omits a re
g
ressor that must
be in the model.
)e.g. estimating Y=β
0

1
X
1

2
X
2
+uwhen the
correct model is Y=β
0
+
β
1
X
1

2
X
2

3
Z+u.
4.6.2 Specification Errors 4.6.2 Specification ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 48 HASSEN A.

. 4.6.2 Specification Errors 4.6.2 Specification Errors
)Wrongly omitting a variable (Z) is equivalent
to imposing β
3
= 0when in fact β
3
≠0.
)If a relevant regressor (Z) is missing from a
model, OLS estimators of β's(β
0
, β
1
& β
2
) will
be biased, except if cov(Z,X
1
) = cov(Z,X
2
) = 0.
)Even if cov(Z,X
1
) = cov(Z,X
2
) = 0, the estimate
for β
0
is biased.
)The OLS estimators for σ
2
and for the
standard errors of the 's are also biased.
)Consequently, t-and F-tests will not be valid.
)In general, OLS estimators will be biased,
inconsistentand the inferenceswill be invalid .
β
ˆJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 49 HASSEN A.

. )These consequences of wron
g
l
y
excludin
g

variables are clearly very serious and thus,
attempt should be made to include all the
relevant regressors.
)The decision to include/exclude variables should be
g
uided b
y
economic theor
y
and
reasoning.
4.6.2 Specification Errors 4.6.2 Specification ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 50 HASSEN A.

. 2. Error in the algebraic form of the relationship
:
a model that includes all the appropriate
re
g
ressors ma
y
still be misspecified due to
error in the functional form relating Y to X's.
)e.
g
. usin
g
a linear functional form when the
true relationship is logarithmic (log-lo
g
) or
semi-logarithmic (lin-log or log-lin).
)The effects of functional form misspecification
are the same as those of omittin
g
of relevant
variables, plus misleading inferences.
)A
g
ain, rel
y
on economic theor
y
, and not
j
ust on
statistical tests.
4.6.2 Specification Errors 4.6.2 Specification ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 51 HASSEN A.

. Testing for Omitted Variables and Functional
Form Misspecification
1. Examination of Residuals
)Most often, we use the plot of residuals versus
fitted valuesto have a quick
g
lance at problems
like nonlinearity.
)Ideall
y
, we would like to see residuals rather
randomly scattered around zero.
#In stata (after regression)
:rvfplot, yline(0)
)If in fact there are such errors as omitted
variables or incorrect functional form, a plot of
the residuals will exhibit distinct
p
atterns.
4.6.2 Specification Errors 4.6.2 Specification ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 52 HASSEN A.

. 2. Ramsey’s Regression Equation Specification
Error Test (RESET)
)It tests for misspecification due to omitted
variables or a wrong functional form.
)Steps
:
1. Regress Y on X's , and get Ŷ& ũ. 2. Regress:
a)Y on X'sŶ
2
& Ŷ
3
, or
b)ũon X's, Ŷ
2
& Ŷ
3
, or
c)ũon X's, X
2
's, X
i*X
j's(i ≠j).
3. If the new regressors (
Ŷ
2
& Ŷ
3
or
X
2
's, X
i*X
j's
)
are significant (as judged by F test), then re j
ect
H
0
, and conclude that there is misspecification.
4.6.2 Specification Errors 4.6.2 Specification ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 53 HASSEN A.

. #In stata(after regression):
ovtestorovtest, rhs
)If the original model is misspecified, then try
another model: look for some variables which
are left out and/or try a different functional
form like log-linear ( butbased on some theory).
)The test (by rejecting the null) does not suggest an alternative specification. 3.Inclusion of irrelevant variables
: when one/more
irrelevant variables are wrongly included in the model. e.g. estimating Y=β
0

1
X
1

2
X
2

3
X
3
+u
when the correct model is Y=
β
0

1
X
1

2
X
2
+u
.
4.6.2 Specification Errors 4.6.2 Specification ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 54 HASSEN A.

. )The consequence is that the OLS estimators will
remain unbiasedand consistentbut inefficient
(compared to OLS applied to the right model).

2
is correctl
y
estimated, and the conventional
hypothesis-testing methods are still valid.
)The onl
y
penalt
y
we pa
y
for the inclusion of the
superfluous variable/s is that the estimated variances of the coefficients are larger.
)As a result, our probabilit
y
inferences about the
parameters are less precise, i.e.,precision is lost
if the correct restriction β
3
= 0 is not imposed.
4.6.2 Specification Errors 4.6.2 Specification ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 55 HASSEN A.

. )To test for the presence of irrelevant variables,
use F-tests(based on RRSS & URSS) if you
have some ‘correct’model in your mind.
)Do not eliminate variables from a model based
on insignificance implied by t-tests.
)In particular, do not drop a variable with |t| > 1. )Do not drop two or more variables at once (on
the basis of t-tests) even if each has |t| < 1.
)The t statistic corresponding to an X (X
j) may
radically change once another (X
i) is dropped.
)A useful tool in judging the extra contribution
of regressors is the added variable plot.
4.6.2 Specification Errors 4.6.2 Specification ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 56 HASSEN A.

. )The added variable plot shows the (mar
g
inal)
effect of addin
g
a variable to the model after all
other variables have been included.
)In a multiple re
g
ression, the added variable plot
for a predictor, say X
j, is the plot showin
g
the
residuals of Y on all predictors except X
j
against the residuals of X
j
on all other X's.
#In stata (after regression)
: avplotsor avplotvarnarnes
)In
g
eneral, model misspecification due to the
inclusion of irrelevant variables is less serious
than that due to omission of relevant variable/s.
4.6.2 Specification Errors 4.6.2 Specification ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 57 HASSEN A.

. )Takin
g
bias as a more undesirable outcome
than inefficienc
y
, if one is in doubt about which
variables to include in a re
g
ression model, it is
better to err by including irrelevant variables.
)This is one reason behind the advocac
y
of
Hendry’s “general-to-specific”methodology.
)This preference is reinforced b
y
the fact that
standard errors are incorrect if variables are
wron
g
l
y
excluded, but not if variables are
wrongly included.
4.6.2 Specification Errors 4.6.2 Specification ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 58 HASSEN A.

. )In
g
eneral, the specification problem is less
serious when the research task/aim is model
comparison (to see which has a better fit to the
data) as opposed to when the task is to
j
ustif
y

(and use) a sin
g
le model and assess the relative
importance of the independent variables.
4.6.2 Specification Errors 4.6.2 Specification ErrorsJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 59 HASSEN A.

. )So far we assumed that the intercept and all the
slope coefficients (β
j's) are the same/stable for
the whole set of observations. Y = X β+ e
)But, structural shifts and/or
g
roup differences
are common in the real world. May be:
)the intercept differs/changes, or )the (partial) slope differs/changes, or )both the intercept and slope differ/chan
g
e
across categories or time period.
)Two methods for testin
g
parameter stabilit
y
:
(i) Using Chow tests , or ( ii) Using DVR.
4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 60 HASSEN A.

. A. The Chow Tests
)Using an F-test to determine whether a sin
g
le
re
g
ression is more efficient than two (or more)
separate regressions on sub-samples.
)The stages in running the Chow test are: 1.Run two separate re
g
ressions on the data (sa
y
,
before and after war or policy reform, …) and
save the RSS's: RSS
1
& RSS
2
.
)RSS
1
has n
1
–(K+1)df& RSS
2
has n
2
–(K+1)df.
)The sum RSS
1
+ RSS
2
gives the URSSwith
n
1
+n
2
–2(K+1)df.
4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 61 HASSEN A.

. 4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)
2.Estimate the pooled/combined model (under
H
0
: no significant change/difference in β's).
)The RSS from this model is the RRSSwith
n–(K+1)df;where n = n
1
+n
2
.
3.Then, under H
0
, the test statistic will be:
4.Find the critical value: F
K+1,n-2(K+1)
from table.
5.Re
j
ect the null of stable parameters (and favor
H
a
: that there is structural break) if F
cal
> F
tab
.
1)]2(K[n
URSS
1)(K
URSS] [RRSS
F
+−
+

=
calJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 62 HASSEN A.

. 4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)
Example
: Suppose we have the followin
g
results
from the OLS Estimation of real consumption
on real disposable income:
i. For the period 1974-1991: cons
i
= α
1

1
*inc
i+u
i
Consumption = 153.95+ 0.75 *Income
p-value: (0.000) (0.000)
RSS = 4340.26114; R
2
= 0.9982
ii. For the period 1992-2005: cons
i
= α
2
+ β
2
*inc
i+u
i
Consumption = 1.95+ 0.806 *Income
p-value: (0.975) (0.000)
RSS = 10706.2127; R
2
= 0.9949JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 63 HASSEN A.

. 4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)
iii.
For the period 1974-2005: cons
i
= α+ β*inc
i+u
i
Consumption = 77.64+ 0.79 *Income
t-ratio: (4.96) (155.56)
RSS = 22064.6663; R
2
= 0.9987
1.URSS = RSS
1
+ RSS
2
= 15064.474
2.RRSS = 22064.6663
)K = 1 and K + 1 = 2; n
1
= 18, n
2
= 15, n = 33.
3.Thus,
4.p-value = Prob(F-tab > 6.7632981) = 0.003883
6.7632981
29
15064.474
2
15064.474] 3 [22064.666
F
cal
=

=JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 64 HASSEN A.

. 4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)
5.So, re
j
ect the null that there is no structural
break at 1% level of significance.
)The pooled consumption model is inadequate
specification and thus we should run separate
regressions for the two periods. )The above method of calculatin
g
the Chow test
breaks down if either n
1
< K+1 or n
2
< K+1.
)Solution
: use Chow’s second (predictive) test!
)If, for instance, n
2
< K+1, then the F-statistic
will be altered as follows.
)Replace URSS by RSS
1
and use the statistic:JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 65 HASSEN A.

. 4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)
*The Chow test tells if the parameters differ on
average, but not which parameters differ.
*The Chow test requires that all
g
roups have the
same error variance.
)This assumption is questionable: if parameters
can be different, then so can the variances be.
)One method of correctin
g
for unequal error
variances is to use the dumm
y
variable
approach with White's Robust Standard Errors.
1)(Kn
RSS
n
]RSS [RRSS
F
1
1
2
1
cal
+−

=JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 66 HASSEN A.

. B. The Dummy Variables Regression
I. Introduction I. Introduction
::
#Not all information can easily be quantified. )So, need to incorporate qualitative information. e.g.
1. Effect of belonging to a certain group:
1Gender, location, status, occupation 1Beneficiary of a program/policy
2. Ordinal variables:
1Answers to yes/no (or scaled) questions...
#Effect of some quantitative variable ma
y
differ
between groups/categories:
1Returns to education may differ between
sexes or between ethnic groups …
4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 67 HASSEN A.

. 4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)
#Interest in determinants of belonging to a group
1Determinants of being poor …
)Dummy dependent variable(logit
, probit
…)
)Dummy Variable
: a variable devised to use
qualitative information in regression analysis.
)A dummy variable takes 2 values: usually 0/1.
e.g. Y
i=β
0

1
*D+u; for iϵgroup 1, and
for i∉group 1.
¾If D = 0, E(Y) = E(Y|D = 0) = β
0
¾If D = 1, E(Y) = E(Y|D = 1) = β
0
+ β
1
)Thus, the difference between the two
g
roups (in
mean values of Y) is:E(Y|D=1) –E(Y|D=0) = β
1
.



=
0
1
DJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 68 HASSEN A.

. )So, the si
g
nificance of the difference between
the groups is tested by a t-test of β
1
= 0.
e.g.: Wage differential between male and female )Two possible ways: a male
or a female
dummy.
1. Define a male dummy
(male = 1 & female = 0).
#regwage male #Result: Y
i
= 9.45+ 172.84 *D + û
i
p-value: (0.000) (0.000)
)Interpretation
: the monthl
y
wa
g
e of a male
worker is, on avera
g
e, 172.84$ hi
g
her than that
of a female worker.
)This difference is significant at 1% level.
4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 69 HASSEN A.

. 2.Define a female dummy
(female = 1 & male = 0)
#regwage female #Result: Y
i
= 182.29 –172.84 *D + û
i
p-value: (0.000) (0.000)
)Interpretation
: the monthl
y
wa
g
e of a female
worker is, on avera
g
e, 172.84$ lower than that
of a male worker.
)This difference is significant at 1% level.
II. Using the DVR to Test for Structural Break: II. Using the DVR to Test for Structural Break:
)Recall the example of consumption function:
period 1
:cons
i
= α
1
+ β
1
*inc
i+u
i
vs.
period 2
:cons
i
= α
2
+ β
2
*inc
i+u
i
4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 70 HASSEN A.

. )Let’s define a dummy variable D
1
, where:
for the period 1974-1991, and
for the period 1992-2005
)Then, cons
i
= α
0

1
*D
1

0
*inc
i+β
1
(D
1
*inc
i)+u
i
For period 1:
cons
i
= (α
0

1
)+(β
0

1
)inc
i+u
i
Intercept = Intercept = αα
00
++αα
11
; Slope (= MPC) = ; Slope (= MPC) = ββ
00
++ββ
11
.
For period 2 (base category):
cons
i=α
0

0
*inc
i+u
i
Intercept = Intercept = αα
00
; Slope (= MPC) = ; Slope (= MPC) = ββ
00
..
)Regressing cons on inc, D
1
and ( D
1
*inc) gives:
cons
= 1.95 + 152D
1
+ 0.806*inc –0.056(D
1
*inc)
p-value: (0.968) (0.010) (0.000) (0.002)
4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)



=
0
1
D
1JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 71 HASSEN A.

. )Substituting D
1
=1 for i ϵperiod-1and D
1
=0 for
i ϵperiod-2:
period 1 (1974-1991): cons
= 153.95+ 0.75*inc
period 2 (1992-2005): cons
= 1.95+ 0.806*inc
)The Chow test is equivalent to testing α
1

1
=0
in: cons
=1.95+152D
1
+0.806*inc –0.056 (D
1
*inc)
#In stata (after regression):
test D
1
=D
1
*inc=0.
)This gives F(2, 29) = 6.76; p-value = 0.0039 . )Then, reject H
0
! There is a structural break!
)Comparin
g
the two methods, it is preferable to
use the method of dummy variables regression.
4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 72 HASSEN A.

. )This is because with the method of DVR:
1.we run only one regression.
2.we can test whether the chan
g
e is in the
intercept only, in the slope only, or in both.
In our example, the change is in both. Why???
)For a total of mcategories, use m–1dummies!
)Including mdummies (1 for each
g
roup) results
in perfect multicollinearit
y
(the dumm
y

variable trap). e.g.: 2 groups & 2 dummies:
)constant = D
1
+ D
2
!!!
4.6.3 Stability of Parameters and the Dummy 4.6.3 Stability of Parameters and the Dummy
Variables Regression Variables Regression
(DVR) (DVR)










=
10X1
01X1
01X1
X
13
12
11
]D D [constant X
21
=JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 73 HASSEN A.

. )Simultaneit
y
occurs when an equation is part of
a simultaneous equations s
y
stem, such that
causation runs from Y to X as well as X to Y.
)In such a case, cov(X,ε) ≠0 and OLS estimators
are biased and inconsistent.
)Such situations are pervasive in economic
models so simultaneity bias is a vital issue.
e.g.
The Simple Keynesian Consumption Function
)Structural formmodel: consists of the national
accounts identit
y
and a basic consumption
function, i.e., a pair of simultaneous equations.
4.6.4 Simultaneity Bias 4.6.4 Simultaneity BiasJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 74 HASSEN A.

. )Y
t
& C
t
are endo
g
enous (simultaneousl
y

determined) and I
t
is exogenous.
)Reduced form:expresses each endogenous
variable as a function of exogenousvariables,
(and/or predeterminedvariables – la
gg
ed
endo
g
enous variables, if present) and random
error term/s.
)The reduced form is:
4.6.4 Simultaneity Bias 4.6.4 Simultaneity Bias







++

=
++

=
]
t
U
t
βI)[ α
β1
1
(
t
C
]
t
U
t
I)[ α
β1
1
(
t
Y



++=
+=
tt t
ttt
U βY α C
ICYJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 75 HASSEN A.

. )The reduced form equation for Y
t
shows that:
)Y
t, in C
t
= α+ βY
t
+ U
t, is correlated with U
t.
)OLS estimators for β(MPC) & α(autonomous
consumption) are biased and inconsistent.
))Solution Solution
: IV/2SLS
4.6.4 Simultaneity Bias 4.6.4 Simultaneity Bias
]U),UI)( α
β 1
1
cov[()U,cov(Y
ttt tt
++

=
)]U,Ucov()U,Icov()U,)[cov( α
β1
1
(
tt tt t
+ +

=
0)
β1
())var(U
β1
1
(
t


=

=
2
U
σJIMMA UNIVERSITY
2008/09 CHAPTER 4 - 76 HASSEN A.

. …THE END …
GOOD LUCK! GOOD LUCK!JIMMA UNIVERSITY
2008/09 CHAPTER 4 - 77 HASSEN A.
Tags