Title: Sao Polanec Faculty of Economics University of Ljubljana saso'polanecef'unilj'si
1Sao PolanecFaculty of EconomicsUniversity of
Ljubljanasaso.polanec_at_ef.uni-lj.si
- Econometrics II Panel Data Analysis
- MSc
- Summer 2007
2Econometrics II
- Sao Polanec
- Faculty of Economics,
- University of Ljubljana
- These lecture notes were created by William
Greene
3Econometrics II
4Motivation for Panel Data Sets
- Cross-sectional data sets suffer from an
important limitation - each unit is observed in only one particular time
period - hence inference about models parameters is based
only on cross-sectional heterogeneity of units - The estimates of parameters based on
cross-sectional data may be biased - when there are omitted variables from the model
and these omitted variables are correlated with
other explanatory variables - The estimates of parameters based on
cross-sectional data may be inefficient - when there is unobserved variance component
specific to specific units - Repeated cross-sections or panel data may allow
us to deal with both of these problems
5Types of Panel Data Sets
- Longitudinal data (number of different units N
- is large, time dimension T - is short) - Business survey (in Slovenia AJPES data for the
period 1989-2006 Amadeus database for EU
countries) - Household panel surveys (in Slovenia and other
countries) - Panel Study of Income Dynamics (PSID)
- Cross section time series (T and N both large)
- Grunfelds investment data
- Penn world tables
- Financial data by firm, year (T large and N
relatively small) - CAPM rit rft ?i(rmt - rft) eit, i
1,,many t1,many - Exchange rate data, essentially infinite T, large
N - Effects ?i ? vi
6Terms of Art
- Cross sectional vs. time series variation
- Is inference based on one or the other dimension?
Or both in panels? - Heterogeneity between units of observation
- What are different types of heterogeneity?
- Group effects vs. individual effects
- Fixed effects and/or random effects
- Are there substantive differences between these
effects? - Is it possible to tell them apart in observed
data?
7Panel Data
- Rotating panels Typical household surveys
- E.g. Spanish income study (http//www.cemfi.es/al
barran/0008r.pdf) - Efficiency analysis Efficiency measurement in
rotating panel data, Heshmati, A, Applied
Economics, 30, 1998, pp. 919-930 - Hierarchical (nested) data sets Student outcome,
by year, district, school, teacher
8Balanced and Unbalanced Panels
- Distinction between two types of panel data
- Balanced panels all units (e.g. firms,
individuals) have the same number of time
observations - Unbalanced panels units have different number
of time observations (more frequent e.g. firms
enter and exit people enter labor market and
earn wages vs. people exit labor market and
receive pensions) - A notation to help with mechanics
- zi,t, i 1,,N t 1,,Ti
- Mathematical and notational convenience
- Number of observations in Balanced Panel NT
- Unbalanced
-
9Benefits of Panel Data
- Time and individual variation in behavior
unobservable in cross sections or aggregate time
series - Observable and unobservable individual
heterogeneity - Rich hierarchical structures (two-way panels vs.
three way or nested panels) - Dynamics in economic behavior (inference on time
variation of parameters)
10Fixed and Random Effects
- Unobserved individual-specific effects in
regression Eyit xit, ci - Notation
-
- Linear specification
- Fixed Effects Eci Xi g(Xi) effects are
correlated with included variables. Common
Covxit,ci ?0 - Random Effects Eci Xi µ effects are
uncorrelated with included variables. If Xi
contains a constant term, µ0 (Without loss of
generality). - Common Covxit,ci 0, but Eci Xi µ is
needed for the full model
11Convenient Notation
- Fixed Effects
- Individual specific constant terms ??.
- Random Effects
- Compound (composed) disturbance
- error components ??.
12Assumptions for Asymptotics
- Convergence of moments involving cross section
Xi. - N increasing, T or Ti assumed fixed.
- Fixed T asymptotics (see text, p. 175)
- Time series characteristics are not relevant (may
be nonstationary) - If T is also growing, need to treat as
multivariate time series. - Ranks of matrices. X must have full column rank.
(Xi may not, if Ti lt K.) - Strict exogeneity and dynamics. If xit contains
yi,t-1 then xit cannot be strictly exogenous. Xit
will be correlated with the unobservables in
period t-1. (To be revisited later.) - Empirical characteristics of microeconomic data
13The Pooled Regression using OLS
- Presence of omitted effects (general form
unbalanced sample) - Potential bias/inconsistency of OLS depends on
fixed or random
14OLS with fixed individual effects illustration
of bias
15Application 1 Cornwell and Rupert Data
Cornwell and Rupert Returns to Schooling Data,
595 Individuals, 7 YearsVariables in the file
are EXP years of work experienceWKS number
of weeks worked in a given yearOCC
occupation, 1 if blue collar, 0 otherwise IND
1 if manufacturing industry, 0 otherwiseSOUTH
1 if resides in south, 0 otherwise SMSA 1 if
resides in a city (SMSA), 0 otherwise MS 1 if
married, 0 otherwiseFEM 1 if female, 0
otherwiseUNION 1 if wage set by union
contract, 0 otherwiseED years of education
(fixed over time in this data set) BLK 1 if
individual is black, 0 otherwiseLWAGE log of
wage dependent variable in regressions These
data were analyzed in Cornwell, C. and Rupert,
P., "Efficient Estimation with Panel Data An
Empirical Comparison of Instrumental Variable
Estimators," Journal of Applied Econometrics, 3,
1988, pp. 149-155. See Baltagi, page 122 for
further analysis. The data were downloaded from
the website for Baltagi's text.
16Application 1 Cornwell and Rupert Data (cont.)
17Application 1 Cornwell and Rupert Data (cont.)
- Stata needs to know that we are dealing with
panel data - Original data do not have panel structure no
variable for time period and no variable for
individual unit (person) - In order to create them, we use the following
lines (exploiting the balanced panel structure) - gen idint(_n/7-0.01)1
- bysort id gen year_n
- Next we tell Stata that these variables are
cross-sectional and time dimensions and use
command for description of our balanced panel - iis id
- tis year
- xtdes
- xt is starting name for all commands panel data
commands in Stata
18Application 1 Cornwell and Rupert Data (cont)
- Lets first estimate pooled wage regression using
standard OLS method (ignoring the panel
structure)
19Using First Differences
- Fixed and random effects share the general
specification - Eliminating the heterogeneity
20First Differences in Stata Wages
- First differences are easily created in Stata by
writting following lines (example for logarithm
of real wage - lwage) - bysort id (year) gen lwage_1lwage_n-1
- gen dlwagelwage-lwage_1
- Estimation of regression equation gives
21OLS with First Differences
- With strict exogeneity of (Xi,ci), OLS
regression of ?yit on ?xit is unbiased and
consistent but inefficient.
GLS is unpleasantly complicated. In order to
compute a first step estimator of se2 we would
use fixed effects. We should just stop there.
Or, use OLS in first differences and use
Newey-West with one lag.
22Two Periods
- With two periods and strict exogeneity,
- This is a classical regression model. If there
are no regressors,
23Estimation with Fixed Effects
- The fixed effects model
- ci is arbitrarily correlated with xit but
EeitXi,ci0 - Dummy variable representation
24Assumptions for the FE Model
- yi Xi? diai ei, for each individual
Eci Xi g(Xi) Effects are correlated with
included variables. Common Covxit,ci ?0
25Notation
26Estimating the Fixed Effects Model
- The FEM is a plain vanilla regression model but
with many independent variables - Least squares is unbiased, consistent, efficient,
but inconvenient if N is large.
27Estimating FE model in STATA with OLS
- Stata allows us to estimate the model with
dummies using a special command called
interaction expansion, which creates dummies just
for the estimation of the model - xi reg lwage ed exp occ smsa ms fem wks ind
union i.id - This estimation procedure is resource intensive
for two reasons - its demand for memory is substantial matsize
the number of variables in the model needs to be
increased substantially - time of calculation increases with number of
observations and may be prohibitive
28Application 1 Cornwell and Rupert Data (cont.)
- The results of our example of wages data after
controlling for individual differences estimated
coefficient for number of years of schooling
increases (ed)
29Useful Analysis of Variance Notation
Total variation Within groups variation
Between groups variation
30Application 1 Analysis of Variance for Wages
- Decomposition of total variation of log wage into
within and between variation can be estimated
using Stata by typing the command xtsum (see also
xttab) - xtsum lwage
- Stata reports standard deviations instead of sums
of squares. In order to calculate sums of
squares, we need to square the Std.Dev. and
multiply by N. (72.8 percent is between and 27.1
percent is within variation.)
31The Within Transformation Removes the Fixed
Effects
32Fixed Effects Estimator
33Fixed Effects Estimator (cont.)
34Fixed Effects Estimator (cont.)
35Least Squares Dummy Variable (LSDV) Estimator
- b is obtained by within groups least squares
(group mean deviations) - Normal equations for a are DXbDDaDy. Hence,
- a (DD)-1D(y Xb)
-
- Notes This is simple algebra the estimator is
just OLS. - Least squares is an estimator, not a
model. (Repeat twice.) - Note what ai is when Ti 1. Follow
this with yit-ai-xitb0 if - Ti1.
-
36Inference About OLS
- Assume strict exogeneity Coveit,(xjs,cj)0.
Every disturbance in every period for each person
is uncorrelated with variables and effects for
every person and across periods. - Now, its just least squares in a classical
linear regression model. - Asy.Varb
37Application Cornwell and RupertLSDV results
(cont.)
- In Stata we can estimate the parameters of the
model using command xtreg - xtreg lwage ed exp occ smsa ms fem wks ind union,
fe
38Comments of results
- Variables that do not change over time are
dropped (ed education fem female dummy) - R2 within is larger than between differences
over time for individuals are more important for
explaining variation of wages than differences
between individuals (fraction of variance due to
individuals ui is 97.9 percent) - Wald test for exclusion of fixed effects
F(594,3566)33.81 is large and exact probability
P0.0000. Its calculated according to the
standard formula - Interpretation of Coefficients
- E.g. one year of additional experience increases
wage by 9.6 percent (bexp0.096).
39The Random Effects Model
- The random effects model
- ci is uncorrelated with xit for all t
- Eci Xi 0
- EeitXi,ci0
- Note that this is different from fixed effects,
where
40Error Components Model
- Generalized Regression Model
41Notation
42Notation
43Convergence of Moments
44Random vs. Fixed Effects
- Random Effects
- Small number of parameters
- Efficient estimation
- Objectionable orthogonality assumption (ci ? Xi)
- Fixed Effects
- Robust generally consistent
- Large number of parameters
45Ordinary Least Squares
- Standard results for OLS in a RE model are
- Consistent (large sample property)
- and
- Unbiased (small sample property)
- but
- Inefficient (OLS variance is too large)
- True Variance
46Estimating the Variance for OLS
47Mechanics
48Cornwell and Rupert Data (cont.)
Cornwell and Rupert Returns to Schooling Data,
595 Individuals, 7 YearsVariables in the file
are EXP work experience, EXPSQ EXP2WKS
weeks workedOCC occupation, 1 if blue collar,
IND 1 if manufacturing industrySOUTH 1 if
resides in southSMSA 1 if resides in a city
(SMSA)MS 1 if marriedFEM 1 if
femaleUNION 1 if wage set by unioin
contractED years of educationBLK 1 if
individual is blackLWAGE log of wage
dependent variable in regressions These data were
analyzed in Cornwell, C. and Rupert, P.,
"Efficient Estimation with Panel Data An
Empirical Comparison of Instrumental Variable
Estimators," Journal of Applied Econometrics, 3,
1988, pp. 149-155. See Baltagi, page 122 for
further analysis. The data were downloaded from
the website for Baltagi's text.
49OLS Results Extended model with all rhs variables
Pooled OLS (random effects ignored)
50OLS Results (Robust SE) Extended model with all
rhs variables
Pooled OLS (using robust option which gives
Huber/White standard errors) These standard
errors are typically larger (not always compare
smsa) and t-statistics lower
51Generalized Least Squares
52GLS (cont.)
53Estimators for the Variances
54Feasible GLS
x does not contain a constant term in the
preceding.
55Practical Problems with FGLS
x does not contain a constant term in the
preceding.
56Computing Variance Estimators
57Estimation of Random Effects model
58Testing for Effects Lagrange Multiplier Test
59Application of LM test to RE estimation
Following xtreg command, we can perform Breusch
and Pagan (1980) LM test by command xttest0 Below
we see a strong rejection of this hypothesis
60Hausman Test for FE vs. RE
Hausman test helps us to decide between fixed and
random effects specification
61Hausman Test for Effects
ß does not contain the constant term in the
preceding.
62Computing the Hausman Statistic
ß does not contain the constant term in the
preceding.
63Hausman Test for Wages Application
THE PROCEDURE FOR OBTAINING HAUSMAN TEST
RESULTS xtreg lwage ed exp occ smsa ms fem wks
ind union, fe est store fixed xtreg lwage ed exp
occ smsa ms fem wks ind union, re hausman
fixed RESULTS REJECT THE NULL of no difference in
coefficients FE is adequate specification
64Appendix Random Effects Algebra (1)
65Appendix Random Effects Algebra (2)
66Appendix Random Effects Algebra (2, cont.)
67William H. GreeneStern Business SchoolNew York
University
- 8. Instrumental Variables Estimation in Panel Data
68Structure and Regression
69Exogeneity
70The Measurement Error Problem
How general is this result?
71Instrumental Variable
- One problem variable the last one
- yit ?1x1it ?2x2it ?KxKit eit
- EeitxKit ? 0. (0 for all others)
- There exists a variable zit such that
- ExKit x1it, x2it,, xK-1,it,zit g(x1it,
x2it,, xK-1,it,zit) - In the presence of the other variables, zit
explains xit - Eeit x1it, x2it,, xK-1,it,zit 0
- In the presence of the other variables, zit
and eit are uncorrelated. - A projection interpretation In the projection
- Xkt ?1x1it, ?2x2it ?k-1xK-1,it ?K
zit, - ?K ? 0.
72Least Squares
73The IV Estimator
74A Moment Based Estimator
75Consistency and Asymptotic Normality of the IV
Estimator
76Least Squares Revisited
77Comparing OLS and IV
78Application Cornwell and Rupert Data
Cornwell and Rupert Returns to Schooling Data,
595 Individuals, 7 YearsVariables in the file
are EXP work experience, EXPSQ EXP2WKS
weeks workedOCC occupation, 1 if blue collar,
IND 1 if manufacturing industrySOUTH 1 if
resides in southSMSA 1 if resides in a city
(SMSA)MS 1 if marriedFEM 1 if
femaleUNION 1 if wage set by unioin
contractED years of educationBLK 1 if
individual is blackLWAGE log of wage
dependent variable in regressions These data were
analyzed in Cornwell, C. and Rupert, P.,
"Efficient Estimation with Panel Data An
Empirical Comparison of Instrumental Variable
Estimators," Journal of Applied Econometrics, 3,
1988, pp. 149-155. See Baltagi, page 122 for
further analysis. The data were downloaded from
the website for Baltagi's text.
79Wage Equation with Endogenous Weeks
logWageß1 ß2 Exp ß3 ExpSq ß4OCC ß5 South
ß6 SMSA ß7 WKS e Weeks worked is believed
to be endogenous in this equation. We use the
Marital Status dummy variable MS as an exogenous
variable. Condition (5.3) CovMS, e is
assumed. Auxiliary regression In the regression
of WKS on 1,EXP,EXPSQ,OCC,South,SMSA,MS,
MS significantly explains WKS. A projection
interpretation In the projection XitK ?1 x1it
?2 x2it ?K-1 xK-1,it ?K zit , ?K ?
0. (One normally doesnt check the variables
in this fashion.
80Auxiliary Projection (5.5)
81Application IV for WKS in Rupert
82Application IV for wks in Rupert
83IV for Panel Data Fixed Effects Example
84Comments of results
- The first stage results suggest that marital
status is a good instrument for the - The first stage results of XTIVREG suggest that
marital status is not a statistically significant
variable for explaining the number of weeks
worked as it changes very infrequently. - In the presence of weak instruments, this is no
surprise.
85The Panel Data Case Hausman-Taylor model
86Hausman and Taylor
87Hausman and Taylor
88HTs FGLS Estimator
89HTs FGLS Estimator (cont.)
90HTs 4 STEP IV Estimator
91Stata code for Hausman-Taylor
The Stata command for estimation of the model
with structure of Hausman-Taylor model
is xthtaylor Difficulty with this method is to
find adequate instruments for both sets of
endogenous variables. In practice this turns out
to be a hard task.