Panel data random effect fixed effect.ppt

Mustansarsaeed2 307 views 30 slides Jun 14, 2024
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

Panel data regression model


Slide Content

Introduction to Panel Data
Regression Models

Introduction
•Basic Concepts
–Types of Data
–Assumptions of Regression Model
•Why Panel Data?
•Fixed vs. Random effects
•Examine the Hausman test, which determines if
fixed or random effects should be used.
•Estimation of Panel Data Models using STATA

Basic Concepts
Data Types
•Time Series
•Cross-Section
•Panel Data
•Pooled Data

Basic Concepts
Data Types
•Time Series (i.e. Daily, weekly, monthly,
Quarterly, Annual)
Year Y X1 X2
1990 45 5 11
1991 46 8 12
1992 47 10 13
1993 49 13 18
1994 52 16 19

Basic Concepts
Data Types
•Cross-Sectional (i.e. firms, provinces, districts,
countries)
District Y X1 X2
Lahore 80 10 22
Faisalabad70 13 24
Sialkot 40 15 25
Multan 50 18 27
Gujrat 60 21 30

Basic Concepts
Panel Data (combining cross-sections and time series)
District Year Y X1 X2
Lahore 1990 80 10 22
Faisalabad 1990 70 13 24
Sialkot 1990 40 15 25
Multan 1990 50 18 27
Gujrat 1990 60 21 30
Lahore 1991 80 10 22
Faisalabad 1991 75 18 29
Sialkot 1991 45 20 30
Multan 1991 55 23 32
Gujrat 1991 65 26 35
Lahore 1992 85 15 27
Faisalabad 1992 75 18 29
Sialkot 1992 45 20 30
Multan 1992 55 23 32
Gujrat 1992 65 26 35
Lahore 1993 85 15 27
Faisalabad 1993 75 18 29
Sialkot 1993 45 20 30
Multan 1993 55 23 32
Gujrat 1993 65 26 35

Basic Concepts
Panel Data (combining cross-sections and time series)
District Year Y X1 X2
Narowal 1990 80 10 22
Gujranwala 1990 70 13 24
Shekhpura 1990 40 15 25
Multan 1990 50 18 27
Gujrat 1990 60 21 30
Lahore 1991 80 10 22
DGKhan 1991 75 18 29
BahawalNagar 1991 45 20 30
Rawalpindi 1991 55 23 32
Gujrat 1991 65 26 35
Rajanpur 1992 85 15 27
Faisalabad 1992 75 18 29
Sialkot 1992 45 20 30
Multan 1992 55 23 32
MBDin 1992 65 26 35
Lahore 1993 85 15 27
Faisalabad 1993 75 18 29
Bahawalpur 1993 45 20 30
Multan 1993 55 23 32
Gujrat 1993 65 26 35

Basic Concepts
Assumptions of CLRM

Assumptions of Classical Linear Regression Model
(CLRM)
Assumption 1
Linearity in the parameters:
–Linearity in the Variables
–Linearity in the Parameters iii XY  
21
9

Assumptions of Classical Linear Regression Model
(CLRM)
Assumption 2
X values are fixed in repeated sampling
Assumption 3
Zero mean value of the disturbance term u
i,
E(ui|X
i)=0
10

5. Assumptions of Classical Linear Regression Model
(CLRM)
Assumption 4
Homoscedasticity or equal variance of u
i
11

5. Assumptions of Classical Linear Regression Model
(CLRM)
Assumption 5
No autocorrelation
between the disturbance
Cov( u
i,u
j|X
i,X
j) = 0.
That is, given any two
values,
the errors are uncorrelated,
there should be no
systematic
pattern.
12

5. Assumptions of Classical Linear Regression Model
(CLRM)
Assumption 6
Zero covariance between u
iand X
ior E(u
iX
i) = 0.
Assumption 7
the number of observation nmust be greater than the
number of parameters to be estimated.
Assumption 8
Variability in X values: The X values in a given sample must
not all be the same. Technically, Var (X) must be a finite
positive number.
13

5. Assumptions of Classical Linear Regression Model
(CLRM)
Assumption 9
Regression model is correctly specified
(no omitted variables)
Assumption 10
No perfect Multicollinearity. That is, there are no
perfect linear relationship among the explanatory
variables.
14

Panel Data
•These are Models that Combine Cross-
section and Time-Series Data
•In panel data the same cross-sectional
unit (industry, firm, country) is surveyed
over time, so we have data which is
pooled over space as well as time.

Reasons for using Panel Data
1. Panel data can take explicit account
of individual-specific heterogeneity
2. By combining data in two
dimensions, panel data gives more
data variation, less collinearity and
more degrees of freedom.
3. Panel data is better suited than
cross-sectional data for studying the
dynamics of change.

4. Panel data is better at detecting and
measuring effects that cannot be observed
in either cross-section or time-series data.
5. Panel data enables the study of more
complex behavioural models –for example
the effects of technological change, or
economic cycles.
6. Panel data can minimise the effects of
aggregation bias, from aggregating firms
into broad groups.

If all the cross-sectional units have the same number of time
series observations the panel is balanced, if not it is
unbalanced.

















NTiTTT
Ntittt
Ni
Ni
yyyy
yyyy
yyyy
yyyy






21
21
222212
112111
Time
series
Cross section
-a matrix of balanced panel data observations on variable y,
Ncross-sectional observations, Ttime series observations.

Suppose yis investment and xis a measure of profit. We have
i= 1…n companies and t= 1…Ttime periods. Suppose we
specify a simple econometric model which says that
investment depends on profit:
u
itis a random error term: E (u
it) ~ N(0, σ
2
)
Estimation of (1) depends on the assumptions that we make
about the intercept (a
0), the slope coefficient (a
1) and the
error term (u
it).)1(
10 ititit
uxaay 

Several possible assumptions can be made in order to
estimate (1):
1. Assume that the intercept and slope coefficients are
constant across time and firms and that the error term
captures differences over time and over firms.
2. The slope coefficient is constant but the intercept varies
over firms.
3. The slope coefficient is constant but the intercept varies
over firms and over time.
4. All coefficients (intercept and slope) vary over firms.
5. The intercept as well as the slope vary over firms and time.

Pooled regression by OLS
This is estimation option 1 on the list. But pooled regression
may result in heterogeneity bias:
Pooled regression:
y
it=a
0+a
1x
it+u
it
True model: Firm 1
True model: Firm 2
True model: Firm 3
True model: Firm 4
y
x






••







Fixed Effects Estimation
The previous slide suggests that a better way to model the
data would be to allow each group (firm) to have its own
intercept:
This is know as the (One Way) Fixed Effects Model.
How do we estimate it?
The simplest way to allow each firm to have its own intercept
is to create a set of dummy (binary) variables, one for each
firm, and include them as regressors.
Consequently, this form of estimation is also known as Least
Squares Dummy Variables (LSDV). (Note that there is no
constant in this regression.))2(
10 ititiit
uxaay  )3(
1
1
0 itit
N
i
iitit uxaaDy 

However if there are a lot of groups (firms) then it becomes
very tedious to create all the dummy variables needed. Some
econometric software (e.g. Limdep) is able to automate this.
The method used is called the covariance estimatorand works
be “differencing” out the fixed effect by expressing variables as
deviations from their group means,:
So:
A further extension is to allow the intercept to vary across the
different time periods (Two Way Fixed Effects):)3()()(
100 iitiitiiiit
uuxxaaayy  iixy, )4()(
1 itiitiit
uxxayy  )5(
1
1
2
1
0 itit
T
t
iti
N
i
itiit uxaTaDay  


The time dummy coefficients can allow the regression function
to shift over time to capture changes in technology,
government regulation, tax policy, external influences (wars…)
etc.
Allowing intercept and slope coefficients to vary across groups
If we have a sufficient long time dimension to the panel, we
could of course just estimate a separate OLS regression for
each group (firm). If the number of firms (cross-sectional
dimension) is small, then we could estimate a single
regression with interactions between xand the group dummy
variables D.

Random Effects Estimation
The fixed effects model assumes that each group (firm) has a
non-stochasticgroup-specific component to y. Including
dummy variables is a way of controlling for unobservable
effects on y.
But these unobservable effects may be stochastic(i.e.
random). The Random Effects Modelattempts to deal with
this:
Here the unobservable component, v
i, is treated as a
component of the random error term. v
iis the element of the
error which varies between groups but not within groups. ε
itis
the element of the error which varies over group and time.)6(
10 itiitit
vxaay 

We assume that:
(We could also introduce an error component which varies
across time periods but not across groups –two way random
effects.)
Estimation of the random effects model cannot be performed
by OLS –instead a technique known as generalised least
squares (GLS)must be used.regressor) oft independen (both0)()(
n)correlatio group across (noif0)(
ation)autocorrel (noorif0)(
)components two of nce(independe,,0)(
tic)homoscedas components (both)(
)(
0)()(
22
22







itititi
ji
jsit
jit
it
vi
iti
xExvE
jivvE
jistE
jtivE
E
vE
EvE






 )1(
1 itiitit Vxay 

Choosing between Fixed Effects (FE) and Random Effects
(RE)
1. With large Tand small Nthere is likely to be little
difference, so FE is preferable as it is easier to compute
2. With large Nand small T, estimates can differ
significantly. If the cross-sectional groups are a random
sample of the population RE is preferable. If not the FE is
preferable.
3. If the error component, v
i, is correlated with xthen
RE is biased, but FE is not.
4. For large Nand small Tand if the assumptions behind
RE hold then RE is more efficient than FE.

Hausman test:
Tests for the statistical significance of the difference between
the coefficient estimates obtained by FE and by RE, under
then null hypothesis that the RE estimates are efficient and
consistent, and FE estimates are inefficient.
The test has a Wald test form, and is usually reported in Chi
2
form with k-1 degrees of freedom (k is the number of
regressors).
If W < critical value then random effects is the preferred
estimator.

Conclusion
•Panel data is a method for estimating data
which is both time series and cross
sectional
•It has both advantages but also
disadvantages over OLS estimation
•It applies to many different techniques,
such as tests for stationarity

Thank You
Tags