Title: Panel Data Course Lecture 1
1Panel Data CourseLecture 1
- Trinity Term 2006
- Dr David Rueda
2Today Introduction to Panel Data
- Some practical issues about the class.
- Todays Lecture
- Introduction and terminology.
- Panel data variation.
- Single-Equation Linear Model.
- To pool or not to pool.
- Unit heterogeneity fixed effects.
- Advantages and disadvantages of fixed effects.
- Stata Session
- Introductory TSCS command.
- Introductory TSCS analysis.
3Some Practical Issues about the Class (1)
- Topics and Schedule for the Term
- Week 1
- Introduction and Terminology. Panel Data
Variation. Single-Equation Linear Model. Unit
heterogeneity fixed effects. - Week 2
- Random Effects. Fixed versus Random Effects.
- Week 3
- More on TSCS/Panel Models. The Parks Method.
Panel-Corrected Standard Errors. Temporal
Dynamics in TSCS/Panel Data. - Week 4
- More on Temporal Dynamics in TSCS/Panel Data.
First Difference Estimators. Arellano and Bond.
Random Coefficient Models.
4Some Practical Issues about the Class (2)
- Taking the Course for Credit?
- This course will be examined in the form of an
assignment to be set by the lecturer. Those
students taking the course for credit should talk
to the lecturer during the first session. The
assignment will involve data analysis and writing
a report covering all topics examined during the
course. The assignment will be due during the
last week of Trinity term (or by arrangement with
the instructor). Details to be agreed with the
instructor.
5Some Practical Issues about the Class (3)
- Class is divided into two halves.
- In the first half, we do the lecture. In the
second we do the related Stata session. - All material for this course are on my website
- The rubric is there (it will change, keep
checking). - Class and Stata session presentations will be
there. - Go to my website (http//users.ox.ac.uk/polf0050/
) and then teaching, Oxford, etc. - Go to the class website directly
http//users.ox.ac.uk/polf0050/page8.html - You can/should take the time series class with
Mark Pickup. - We emphasize intuitions in this class and do not
provide a lot of mathematical proofs.
References. - However, we will be using matrix notation.
- About the lectures
- If I say something that is not clear, stop me.
- If I am going too fast, stop me.
6Introduction and Terminology 1
- The data we are going to be looking at
- It varies both through time and across
cross-sections. - Examples of cross-sections?
- Countries, regions, individuals in panel survey,
households in a panel survey, executives,
parliamentary committees, etc, etc. - We will refer to time units as t 1, 2, 3, , T.
- We will refer to cross-sectional units as i 1,
2, 3, , N. - Total number of observations NT.
- More terminology
- Panel data usually refers to data that are mostly
cross-sectional, meaning NgtT (often much
greater). - Time-series cross-sectional (TSCS) data usually
refers to data that are mostly time series,
meaning TgtN, or data in which TN, or even data
in which NgtT but T is relatively high. - In this course, however, we are going to use the
two terms indistinctively.
7Introduction and Terminology 2
- Example of panel data and how we usually organize
it - Cross-sectional units countries.
- Time units years.
- Country Year Y X1 X2
- ASL 1955 1 2 3
- ASL 1956 2 3 4
- ASL 1957 3 4 5
- . . . . .
- AUS 1955 1 2 3
- AUS 1956 2 3 4
- AUS 1957 3 4 5
- . . . . .
- Within the context of panel data, we can have two
kinds of variation - Between cross-sectional units.
- Within cross-sectional units.
- (Or obviously both or none).
8Panel Data Variation
- Within the context of panel data, we can have two
kinds of variation - Between cross-sectional units.
- Within cross-sectional units.
- (Or obviously both or none).
- Consider the cross-sectional mean
- Within cross-sectional unit variation
9Single-Equation Linear Model (1)
- The single-equation linear model is the workhorse
of empirical political science -
or - What are we assuming in this model?
- The usual OLS assumptions (look at your notes
from previous terms). - That the constant term is constant across
different observations, different cross-sectional
units and through time. - That the effect of X over Y is constant across
different observations, different cross-sectional
units and through time. - The two last assumptions are likely to be
violated in the panel and tscs context (there are
often reasons why the constant term or the effect
of X on Y may be different across cross-sectional
units and through time).
10Single-Equation Linear Model (2)
- We also assume the error term is homoscedastic
and uncorrelated both across cross-sectional
units and through time (if not, the estimates
will not be biased but our inferences will be
inaccurate). - We also assume that the explanatory variables Xi
are exogenous. - In this context, we will define endogeneity the
following way an explanatory variable Xj is
endogenous if it is correlated with uit. - With the data that we use in social science, this
kind of endogeneity can arise because of three
reasons (see Wooldridge, chapter 4 for details). - Omitted variables we fail to include a relevant
variable into the model. The observed
explanatory variable may be correlated to the
unobserved explanatory variable. - Measurement error we observe an imperfect
measure of X, the difference between the observed
and unobserved X goes into the error. The
observed explanatory variable may be correlated
to the unobserved explanatory variable. - Simultaneity if X is determined partly as a
result of Y, then X an uit are generally
correlated.
11To Pool or not to Pool? (1)
- The advantages of pooling
- Pooling adds data! If the assumptions we are
making to pool work, this means more accurate
estimates. - Generalizability We want our conclusion to
apply to many cases, many time periods. - The theoretical claims require it Often
questions in political science involve both
comparative and time series issues. - Pooling provides the variation needed to answer
questions we couldnt answer with TS or CS data
alone. - Pooling can help us with the measurement error
and omitted variables problems we could have if
using TS or CS data alone.
12To Pool or not to Pool? (2)
- Complications?
- Pooled data analysis relies on the assumption
that the relationship between X and Y does not
vary cross-sectionally or through time. - This means the relationship between X and Y is
exactly the same for all i and for all t, and the
process affecting the uit is also the same for
all i and for all t. OK, but
13To Pool or not to Pool? (3)
- This assumption can be more complicated than it
seems. Example from Bartels (1996) - Imagine data that in reality contains two
different kinds of relationships between X and Y
for two groups of data (could be different
countries, or years, etc). - The pooled estimate of the coefficients will be a
weighted combination of the two separate ßs, the
weights being inversely proportional to the
variancecovariance matrix of that particular
parameter vector. (See the math in Bartels 1996) - This means that in the pooled estimate, the more
precise of the two coefficients will dominate - Ceteris paribus, the kind of data that will
dominate the pooled estimate will be the one
with the larger N, the larger values of the
coefficients, and/or the smaller standard errors.
14To Pool or not to Pool? (4)
- Deciding whether or not to pool should be the
first stet in our analysis!! - The question Are the units comparable and are
the processes the same?
15Unit Heterogeneity Fixed Effects (1)
- We mentioned above that a key assumption in panel
data analysis is that observations have the same
coefficients. But we can relax this. - There are two general ways fixed effects and
random effects. - Today fixed effects (within estimator), aka
least squares dummy variable estimation (see
Hsiao, chapter 3). - A simple way to think about this is in terms of a
unit-specific intercept (allowing some units to
have different average levels of Y). - We may be interested in change over time within a
panel data context. - We want to control for unit specific omitted
variables. - Differences across units can thus be captured in
differences in the constant. - We introduce dummies to model these unique
sources of variation, to model the heterogeneity. - Our equation is transformed
-
or
16Unit Heterogeneity Fixed Effects (2)
- Because the fixed effects are constant over time
or across units, their effects can be absorbed
into the intercept - The estimates will be unbiased and efficient.
- They will also be consistent as long as N or T or
both tend to infinity. - Why is least squares dummy variable estimation
the equivalent of having a constant per unit? - How about the general constant?
- Least squares dummy variable estimation is the
equivalent of ANOVA.
17Advantages and Disadvantages of Fixed Effects
- Reasons to Include Fixed Effects
- To avoid the dangers of specification bias (which
is a big problem)! - Easy to interpret Unit effects have an easy
explanation. - Popularity They are used very widely (in
economics and political science particularly).
Readers will understand. - Reasons NOT to Include Fixed Effects
- It is a problem when you are interested in
unit-specific variables. Why? - Fixed effects models cant include covariates
that are constant within units (because they are
perfectly collinear with the fixed effects). - For variables that vary little within the unit,
the effects will be hard to estimate precisely
(they will be highly collinear with the fixed
effects). - Inefficiency can be a problem. Fixed effects may
use a lot of degrees of freedom.
18References
- Bartels, Larry M. 1996. Pooling Disparate
Observations. American Journal of Political
Science 40 905-42. - Hsaio, Cheng. 2002. The Analysis of Panel Data,
2nd Ed. New York Cambridge University Press. - Wooldridge, Jeffrey. 2002. Econometric Analysis
of Cross Section and Panel Data. Cambridge, Mass
MIT Press.
19Stata Session Introducing the Dataset.
- Comparative Political Data taken from Klaus
Armingeon, Philipp Leimgruber, Michelle Beyeler,
Sarah Menegale. Comparative Political Data Set
1960-2002, Institute of Political Science,
University of Berne 2004. - cps.dta.
- Countries Australia Austria Belgium Canada
Denmark Finland France Germany Ireland
Italy Japan Luxembourg Netherlands New
Zealand Norway Sweden Switzerland United
Kingdom USA. - Years 1960-2002.
- Variables
- year, country, countryn (countrynumber).
- gov_right, gov_cent, gov_left right, center, and
left parties in percentage of total cabinet
posts, weighted by days. - effpar effective number of parties in parliament
according to Laakso/Taagepera. - fed federalism coded 0 no, 1 weak, 2
strong. - gdpgr growth of GDP, change from previous year.
20Stata Session Introductory TSCS Commands
- You can always sort your data using when
analyzing panel and/or TSCS data in Stata for
example, sort country year - The series of commands in Stata for analyzing
panel and tscs data all begin with the letters
xt - We need to tell Stata that our data are in panel
format to be able to use xt commands. - We do this by specifying the i and t variables
- . iis name
- . tis year
- Or we can do both
- . tset ccode year
- We can examine the between versus the within
variation in writing separately, using Statas
xtsum
21Stata Session Introductory TSCS Analysis
- See computer class notes.
- See Stata file in Student_Shared folder.
- It is annotated with explanations of commands and
procedures.