Econ 423 Lecture Notes:
Additional Topics in Time Series
1
John C. Chao
April 25, 2017
1
These notes are based in large part on Chapter 16 of Stock and Watson (2011).
They are for instructional purposes only and are not to be distributed outside of the
classroom.
John C. Chao () April 25, 2017 1 / 34
Vector Autoregression (VAR)
Motivation:One may be interested in forecasting two or more
variables; such as rate of in‡ation, rate of unemployment, growth rate
of GDP, and interest rates. In this case, it is bene…cial to develop a
single model that allows you to forecast all these variables in a
systemic approach.
De…nition:AVAR(p), i.e., a vector autoregression of orderp, a set
ofmtime series regressions, in which the regressors are theplagged
values of themtime series variables.
Example:m=2 case
Y1t=b
10
+b
11
Y1t1+ +b
1p
Y1tp
+g
11
Y2t1+ +g
1p
Y2tp+u1t,
Y2t=b
20
+b
21
Y1t1+ +b
2p
Y1tp
+g
21
Y2t1+ +g
2p
Y2tp+u2t.
John C. Chao () April 25, 2017 2 / 34
Estimation and Inference
Algebraically, the VAR model is simply a system ofmlinear
regressions; or, to put it another way, it is a multivariate linear
regression model.
The coe¢ cients of the VAR can be estimated by estimating each
equation by OLS.
Under appropriate conditions, the OLS estimators are consistent and
have a joint normal distribution in large samples in the stationary case.
In consequence, in the stationary case, inference can proceed in the
usual way; for example, 95%con…dence interval on coe¢ cients can be
constructed based on the usual rule:
estimated coe¢ cients1.96standard errors.
John C. Chao () April 25, 2017 3 / 34
Estimation and Inference (con’t)
An Advantage of the VAR:By modeling the dynamics ofm
variables as a system, one can test joint hypotheses that involve
restrictionsacrossmultiple equations.
Example:In a two-varlableVAR(1), one might be interested in
testing the null hypothesis
H0:b
11
b
21
=0
on the unrestricted model
Y1t=b
10
+b
11
Y1t1+g
11
Y2t1+u1t,
Y2t=b
20
+b
21
Y1t1+g
21
Y2t1+u2t.
Since the estimated coe¢ cients have a jointly normal large sample
distribution, the restrictions on these coe¢ cients can be tested by
computing the t- or the F-statistic.
Importantly, many hypotheses of interest to economists can be
formulated as cross-equation restrictions.
John C. Chao () April 25, 2017 4 / 34
Modeling Issues
How many variables should be included in a VAR?
(i)
which, in turn, increases estimation error and can result in a
deterioration of forecast accuracy.
(ii)
coe¢ cients to estimate in each equation, leading to a total of 105
coe¢ cients that must be estimated.
(iii) mrelatively small and to make sure that
the variables included are plausibly related to each other, so that they
will be useful in forecasting one another.
(iv)
unemployment rate, and the short-term interest rate are related to one
another, suggesting that it would be useful to model these variables
together in a VAR system.
John C. Chao () April 25, 2017 5 / 34
Modeling Issues (con’t)
Determining the lag order in VAR’s
(i)
information criterion, but the latter is preferred as it trades o¤ between
goodness of …t and the dimension of the model, whereas the F-test
does not.
(ii)BIC in the vector case:
BIC(p)=ln
h
det
bSu
i
+m(mp+1)
lnT
T
,
wherebSuis an estimate ofSu, themmcovariance matrix of the
VAR errors. Letbu
itandbu
jtbe, respectively, the OLS residual for the
i
th
andj
th
equations, respectively, and note that the(i,j)
th
element of
bSuis given by
bSu(i,j)=
1
T
T
å
t=1
bu
itbu
jt,
i.e., an estimate ofCov
u
it,u
jt
. Moreover, det
bSu
denotes the
determinant of the matrixbSu.
John C. Chao () April 25, 2017 6 / 34
Modeling Issues (con’t)
Determining the lag order in VAR’s (con’t)
(iii)AIC in the vector case:Analogous to BIC,
AIC(p)=ln
h
det
bSu
i
+m(mp+1)
2
T
.
(iv)
m(mp+1),
whih is the total number of coe¢ cients in anm-variableVAR(p)
model, as there aremequations each having an intercept as well asp
lags of themvariables.
(v)
m
2
coe¢ cients to estimate.
(vi) P=f0,1,2, ....,pg. As in the univariate case, we can select the
lag order based on BIC or AIC using the following estimation rule
bp
BIC=arg min
p2P
BIC(p),
bp
AIC=arg min
p2P
AIC(p).
John C. Chao () April 25, 2017 7 / 34
Empirical Example: A VAR Model of the Rates of In‡ation
and Unemployment
Estimating aVAR(4)model forDInftandUnemptusing data from
1982:I to 2004:IV gives the following result:
[DInft=1.47
(0.55)
0.64
(0.12)
DInft10.64
(0.10)
DInft20.13
(0.11)
DInft3
0.13
(0.09)
DInft43.49
(0.58)
Unempt1+2.80
(0.94)
Unempt2
+2.44
(1.07)
Unempt32.03
(0.55)
Unempt4,
Empirical Example (con’t)
Granger Causality Tests (con’t)
(c)
for changes in in‡ation, given lags in in‡ation.
(d)
unemployment rate, i.e.,
H0:b
21
=b
22
=b
23
=b
24
=0.
Here,F=0.16 withp-value=0.96, so thatH0is not rejected in this
case.
John C. Chao () April 25, 2017 10 / 34
Cointegration
Intuitive Notion of Common Stochastic Trend:
It is possible that two or more time series with stochastic trends can
move together so closely over the long run that they appear to have
the same trend component. In this case, they are said to share a
common stochastic trend.
Orders of Integration, Di¤erencing, and Stationarity
1
IfYtis integrated of orderone(denotedYtI(1)); then, its …rst
di¤erenceDYtis stationary, i.e.,DYtI(0). In this case,Ythas a
unit autoregressive root.
2
IfYtis integrated of ordertwo(denotedYtI(2)); then, its second
di¤erenceD
2
Ytis stationary. In this case,DYtI(1).
3
IfYtis integrated of orderd(denotedYtI(d)); then,D
d
Ytis
stationary, i.e.,Ytmust be di¤erenceddtimes in order to produce a
series that is stationary.
John C. Chao () April 25, 2017 11 / 34
Cointegration (con’t)
De…nition of Cointegration:
Suppose thatXtandYtare integrated of order one. If, for some
coe¢ cientq,Zt=YtqXtis integrated of order zero; then,Xtand
Ytare said to becointegrated. The coe¢ cientqis called the
cointegrating coe¢ cient.
Remark:
IfXtandYtare cointegrated, then they have the same, or common,
stochastic trend. Computing the di¤erenceYtqXtthen eliminates
this common stochastic trend.
John C. Chao () April 25, 2017 12 / 34
Deciding If Variables Are Cointegrated
Three ways to decide whether two variables is cointegrated:
1
Use expert knowledge and economic theory.
2
Graph the series and see whether they appear to have a common
stochastic trend.
3
Perform statistical test for cointegration.
John C. Chao () April 25, 2017 13 / 34
Testing for Cointegration
Some Observations:LetYtandXtbe two time series such that
YtI(1)andXtI(1).
1
IfYtandXtare cointegrated with cointegrating coe¢ cientq, then
YtqXtI(0).
2
On the other hand, ifYtandXtare not cointegrated, then
YtqXtI(1).
3
1. and 2. suggest that we can test for the presence of cointegration by
testing
H0:YtqXtI(1)versusH1:YtqXtI(0)
Two Cases
1
qis known, i.e., a value forqis suggested by expert knowledge or by
economic theory. In this case, one can simply construct the time series
Zt=YtqXt
and test the null hypothesisH0:YtqXtI(1)using the
augmented Dickey-Fuller test.
John C. Chao () April 25, 2017 14 / 34
Testing for Cointegration (con’t)
2.qis unknown: In this case, perhaps the easiest approach is to adopt a
two-step procedure
1Step 1:Estimate the cointegrating coe¢ cientqby OLS estimation of
the regression
Yt=a+qXt+Zt
and obtain the residual seriesbZt=YtbabqXt.
2Step 2:Apply a unit root test, such as the augmented Dickey-Fuller
test, to test whether the residual seriesbZtis anI(1)process. (Engle
and Granger, 1987, and Phillips and Ouliaris, 1990).
John C. Chao () April 25, 2017 15 / 34
Testing for Cointegration (con’t)
3.Remark:A complication which arises whenqis unknown is that,
underH0,bZtI(1), so that the regression ofYtonXtis a spurious
regression, which implies, in particular, thatbqis not a consistent
estimator. As a result, we cannot use the same critical values which
apply in Case 1 discussed earlier.
4.
to cases with more than one regressor (e.g., the case withkregressors
X1t, ...,Xkt) by running the multiple regression
Yt=a+q1X1t+ +qkXkt+Zt
and testing the residual processbZt=Ytbabq1X1t bqkXkt
for the presence of a unit root. Critical values for the residual-based
cointegration test do depend on the number of regressors, however.
John C. Chao () April 25, 2017 16 / 34
Testing for Cointegration (con’t)
Table: Critical Values for Residual-
Based Tests for Cointegration
# of X’s in the regression 10% 5% 1%
1 3.123.413.96
2 3.523.804.36
3 3.844.164.73
4 4.204.495.07
John C. Chao () April 25, 2017 17 / 34
Vector Error Correction Model
Suppose thatXtI(1)andYtI(1), and suppose thatXtandYt
are cointegrated. Then, it turns out that a bivariate VAR model in
terms of the …rst di¤erencesDXtandDYtis misspeci…ed.
The correct model will include the termYt1qXt1in addition to
the lagged values ofDXtandDYt.
More speci…cally, the correct model is of the form
DYt=b
10
+b
11
DYt1+ +b
1p
DYtp
+g
11
DXt1+ +g
1p
DXtp
+a1(Yt1qXt1)+u1t,
DXt=b
20
+b
21
DYt1+ +b
2p
DYtp
+g
21
DXt1+ +g
2p
DXtp
+a2(Yt1qXt1)+u2t.
This model is known as thevector error correction model(VECM),
and the termYt1qXt1is called theerror correction term.
John C. Chao () April 25, 2017 18 / 34
Vector Error Correction Model (con’t)
Remarks:
1
In a VECM, past values of the error correction termYtqXthelp to
predict future values ofDYtand/orDXt.
2
Note also that a VAR model in …rst di¤erences is misspeci…ed in this
case precisely because it omits the error correction term.
In the case whereqis known; setZt1=Yt1qXt1, and we have
DYt=b
10
+b
11
DYt1+ +b
1p
DYtp
+g
11
DXt1+ +g
1p
DXtp
+a1Zt1+u1t,
DXt=b
20
+b
21
DYt1+ +b
2p
DYtp
+g
21
DXt1+ +g
2p
DXtp
+a2Zt1+u2t,
so that the parameters of the VECM can be estimated by linear least
squares in this case.
John C. Chao () April 25, 2017 19 / 34
Vector Error Correction Model (con’t)
In the case whereqis unknown; then, the VECM is nonlinear in
parameters, so that one cannot directly apply linear least squares.
In this case, there are a few di¤erent approaches to estimating the
parameters of a VECM.
1
Approach 1: Two-step procedure.
(i)Step 1:Estimateqby a preliminary OLS regression
Yt=a+qXt+Zt
and obtain the residualbZt1=Yt1bqXt1.
John C. Chao () April 25, 2017 20 / 34
Vector Error Correction Model (con’t)
(ii)Step 2:PlugbZt1into the VECM speci…cation to obtain
DYt=b
10
+b
11
DYt1+ +b
1p
DYtp
+g
11
DXt1+ +g
1p
DXtp
+a1
bZt1+bu1t,
DXt=b
20
+b
21
DYt1+ +b
2p
DYtp
+g
21
DXt1+ +g
2p
DXtp
+a2
bZt1+bu2t,
The remaining parameters of the VECM can then be estimated by
linear least squares. Note that this approach exploits the factbqis a
consistent estimator ofqif the assumption of cointegration is correct.
Moreover, rate of convergence for this estimator isTwhich is faster
than the usual
p
Tconvergence rate.
John C. Chao () April 25, 2017 21 / 34
Vector Error Correction Model (con’t)
2.Approach 2:A more e¢ cient approach is to estimate all the
parametersq,
b
10
, ...,b
1p
,b
20
, ..b
2p
,
g
11
, ...,g
1p
,g
21
, ...,g
2p
,
and(a1,a2)in the model
DYt=b
10
+b
11
DYt1+ +b
1p
DYtp
+g
11
DXt1+ +g
1p
DXtp
+a1(Yt1qXt1)+u1t,
DXt=b
20
+b
21
DYt1+ +b
2p
DYtp
+g
21
DXt1+ +g
2p
DXtp
+a2(Yt1qXt1)+u2t.
jointly by full system maximum likelihood. This is the approach that
has been developed by Soren Johansen (see Johansen 1988, 1991).
John C. Chao () April 25, 2017 22 / 34
Models of Conditional Heteroskedasticity - Motivation
Consider again theAR(1)model
Yt=bYt1+ut,
wherejbj<1 andfutgi.i.d.
0,s
2
.
Note that for this model
E[Yt+1]=0
but
E[Yt+1jYt,Yt1, ...]=E[Yt+1jYt]=bYt,
so that by using information about current and past values ofYt, this
model allows one to improve on ones forecast of the mean-level of
Yt+1over that which can be obtained when this information is not
used.
John C. Chao () April 25, 2017 23 / 34
Models of Conditional Heteroskedasticity - Motivation
Shortcoming of this model:The same improvement is not achieved
when forecasting the error variance with this model since
E
u
2
t+1jYt,Yt1, ...
=E
u
2
t+1
=s
2
Observation:This model is not rich enough to allow for better
prediction of the error variance based on past information. In
particular, the independence assumption on the errors precludes any
forecast improvement.
On the other hand, many …nancial and macroeconomic time series
exhibit "volatility clustering." Volatility clustering suggests the
possible presence of time dependent variance or time-varying
heteroskedasticty that may be forecastable. Interestingly, this can
occur even if the time series itself is close to being serially
uncorrelated so that the mean-level is di¢ cult to forecast.
John C. Chao () April 25, 2017 24 / 34
Models of Conditional Heteroskedasticity - Empirical
Motivation
John C. Chao () April 25, 2017 25 / 34
Why would there be interest in forecasting variance?
First, in …nance, the variance of the return to an asset is a measure of
the risk of owning that asset. Hence, investors, particularly those who
are risk averse, would naturally be interested in predicting return
variances.
Secondly, the value of some …nancial derivatives, such as options,
depends on the variance of the underlying assets. Thus, an options
trader would want to obtain good forecasts of future volatility to help
her or him decide on the price at which to buy or sell options.
Thirdly, being able to forecast variance could allow one to have more
accurate forecast intervals that adapt to changing economic
conditions.
John C. Chao () April 25, 2017 26 / 34
AutoRegressive Conditional Heteroskedasticity (ARCH)
Models
Here, we will discuss two frequently used models of time-varying
heteroskedasticity: theautoregressive conditional
heteroskedasticity (ARCH)model and its extension, the
generalized ARCH (or GARCH)model.
ARCH(1) process:Consider the ADL(1,1) regression
Yt=b
0
+b
1
Yt1+g
1
Xt1+ut.
Instead of modelingfutgas an independent sequence of random
variables, as we have before, the ARCH(1) process takes
ut=#t
a0+a1u
2
t1
1/2
,
wherea0>0, 0<a1<1, andf#tgi.i.d.N(0,1).
Remark:We have described here a ADL(1,1) model with ARCH
errors; but, in principle, an ARCH process can be applied to model
the error variance for any time series regression.
John C. Chao () April 25, 2017 27 / 34
ARCH Models
Some Moment Calculations
(i)Conditional Mean:
E[utjut1,ut2, ...]=
h
a0+a1u
2
t1
i
1/2
E[#tjut1,ut2, ...]
=
h
a0+a1u
2
t1
i
1/2
E[#t]
=0
(ii)Unconditional Mean:
E[ut]=E(E[utjut1,ut2, ...])
(by law of iterated expectations)
=E[0]
=0.
John C. Chao () April 25, 2017 28 / 34
ARCH Models
(iii)Conditional Variance:
E
h
u
2
tjut1,ut2, ...
i
=
h
a0+a1u
2
t1
i
E
h
#
2
tjut1,ut2, ...
i
=
h
a0+a1u
2
t1
i
E
h
#
2
t
i
=
h
a0+a1u
2
t1
i
(iv)Autocovariances:Letjbe any positive integer, and note that
E
utu
tj
=E
u
tjE[utjut1,ut2, ...]
(by law of iterated expectations)
=E
u
tj0
=0.
Remark:Interestingly, an ARCH process is serially uncorrelated but
not independent. These features are important for the modeling of
asset returns.
John C. Chao () April 25, 2017 29 / 34
ARCH Models
More Moments:It can also be shown that
Var(ut)=E
u
2
t
=
a0
1a1
,
E
u
4
t
=
"
3a
2
0
(1a1)
2
#
1a
2
1
13a
2
1
.
(Note:we assume thata0>0 and 0<a1<1).
John C. Chao () April 25, 2017 30 / 34
ARCH Models
Remark:Note that since
1a
2
1
13a
2
1
>1,
we have that
E
u
4
t
=
"
3a
2
0
(1a1)
2
#
1a
2
1
13a
2
1
>
3a
2
0
(1a1)
2
=3
E
u
2
t
2
.
On the other hand, ifuthad been normally distributed, say
futgi.i.d.N
0,s
2
; then, we would have
E
u
4
t
=3
E
u
2
t
2
=3s
4
. Hence, the ARCH error process has
“fatter-tails" than that implied by the normal distribution.
John C. Chao () April 25, 2017 31 / 34
ARCH Models
ARCH(p) process:A straightforward extension of the ARCH(1)
model is the p-th order ARCH process given by
ut=#t
a0+a1u
2
t1+ +apu
2
tp
1/2
,
wheref#tgi.i.d.N(0,1);ai>0 fori=0,1, ...,p; and
a1+ +ap<1.
John C. Chao () April 25, 2017 32 / 34
GARCH Models
GARCH(p,q) process:A useful generalization of the ARCH model
is the following GARCH model due to Bollerslev (1986).
ut=h
1/2
t#t,
where
ht=a0+a1u
2
t1+ +apu
2
tp+d1ht1+ +dqhtq.
Assumptions:
(i)f#tgi.i.d.N(0,1);
(ii)a0>0 anda
i0 fori=1, ...,p;
(iii)d
j0 forj=1, ...,q.
Remark:Note that even a GARCH(1,1) model will allowhtto
depend onu
2
tfrom the distant past. Thus, GARCH provides a clever
way of capturing slowly changing variances without having to specify
a model that has a lot of parameters to estimate.
Remark:Both ARCH and GARCH can be estimated using the
method of maximum likelihood.
John C. Chao () April 25, 2017 33 / 34
Empirical Illustration
A simple model of stock return with time-varying volatility is the
following
Rt=m+ut
wherefutgfollows a GARCH(1,1) process, i.e.,
ut=h
1/2
t#t,
ht=a0+a1u
2
t1+d1ht1.
The textbook provides empirical results of …tting this model to daily
percentage changes in the NYSE index using data on all trading days
from January 2, 1990 to November 11, 2005. The results are
bRt=bm=0.049
(0.012)
bht=0.0079
(0.0014)
+0.072
(0.005)
u
2
t1+0.919
(0.006)
ht1
John C. Chao () April 25, 2017 34 / 34