Panel Data Course Lecture 1 - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Panel Data Course Lecture 1

Description:

Some practical issues about the class. Today's Lecture. Introduction and terminology. ... Deciding whether or not to pool should be the first stet in our analysis! ... – PowerPoint PPT presentation

Number of Views:293
Avg rating:3.0/5.0
Slides: 22
Provided by: itde1
Category:
Tags: course | data | lecture | panel | stet

less

Transcript and Presenter's Notes

Title: Panel Data Course Lecture 1


1
Panel Data CourseLecture 1
  • Trinity Term 2006
  • Dr David Rueda

2
Today Introduction to Panel Data
  • Some practical issues about the class.
  • Todays Lecture
  • Introduction and terminology.
  • Panel data variation.
  • Single-Equation Linear Model.
  • To pool or not to pool.
  • Unit heterogeneity fixed effects.
  • Advantages and disadvantages of fixed effects.
  • Stata Session
  • Introductory TSCS command.
  • Introductory TSCS analysis.

3
Some Practical Issues about the Class (1)
  • Topics and Schedule for the Term
  • Week 1
  • Introduction and Terminology. Panel Data
    Variation. Single-Equation Linear Model. Unit
    heterogeneity fixed effects.
  • Week 2
  • Random Effects. Fixed versus Random Effects.
  • Week 3
  • More on TSCS/Panel Models. The Parks Method.
    Panel-Corrected Standard Errors. Temporal
    Dynamics in TSCS/Panel Data.
  • Week 4
  • More on Temporal Dynamics in TSCS/Panel Data.
    First Difference Estimators. Arellano and Bond.
    Random Coefficient Models.

4
Some Practical Issues about the Class (2)
  • Taking the Course for Credit?
  • This course will be examined in the form of an
    assignment to be set by the lecturer. Those
    students taking the course for credit should talk
    to the lecturer during the first session. The
    assignment will involve data analysis and writing
    a report covering all topics examined during the
    course. The assignment will be due during the
    last week of Trinity term (or by arrangement with
    the instructor). Details to be agreed with the
    instructor.

5
Some Practical Issues about the Class (3)
  • Class is divided into two halves.
  • In the first half, we do the lecture. In the
    second we do the related Stata session.
  • All material for this course are on my website
  • The rubric is there (it will change, keep
    checking).
  • Class and Stata session presentations will be
    there.
  • Go to my website (http//users.ox.ac.uk/polf0050/
    ) and then teaching, Oxford, etc.
  • Go to the class website directly
    http//users.ox.ac.uk/polf0050/page8.html
  • You can/should take the time series class with
    Mark Pickup.
  • We emphasize intuitions in this class and do not
    provide a lot of mathematical proofs.
    References.
  • However, we will be using matrix notation.
  • About the lectures
  • If I say something that is not clear, stop me.
  • If I am going too fast, stop me.

6
Introduction and Terminology 1
  • The data we are going to be looking at
  • It varies both through time and across
    cross-sections.
  • Examples of cross-sections?
  • Countries, regions, individuals in panel survey,
    households in a panel survey, executives,
    parliamentary committees, etc, etc.
  • We will refer to time units as t 1, 2, 3, , T.
  • We will refer to cross-sectional units as i 1,
    2, 3, , N.
  • Total number of observations NT.
  • More terminology
  • Panel data usually refers to data that are mostly
    cross-sectional, meaning NgtT (often much
    greater).
  • Time-series cross-sectional (TSCS) data usually
    refers to data that are mostly time series,
    meaning TgtN, or data in which TN, or even data
    in which NgtT but T is relatively high.
  • In this course, however, we are going to use the
    two terms indistinctively.

7
Introduction and Terminology 2
  • Example of panel data and how we usually organize
    it
  • Cross-sectional units countries.
  • Time units years.
  • Country Year Y X1 X2
  • ASL 1955 1 2 3
  • ASL 1956 2 3 4
  • ASL 1957 3 4 5
  • . . . . .
  • AUS 1955 1 2 3
  • AUS 1956 2 3 4
  • AUS 1957 3 4 5
  • . . . . .
  • Within the context of panel data, we can have two
    kinds of variation
  • Between cross-sectional units.
  • Within cross-sectional units.
  • (Or obviously both or none).

8
Panel Data Variation
  • Within the context of panel data, we can have two
    kinds of variation
  • Between cross-sectional units.
  • Within cross-sectional units.
  • (Or obviously both or none).
  • Consider the cross-sectional mean
  • Within cross-sectional unit variation

9
Single-Equation Linear Model (1)
  • The single-equation linear model is the workhorse
    of empirical political science

  • or
  • What are we assuming in this model?
  • The usual OLS assumptions (look at your notes
    from previous terms).
  • That the constant term is constant across
    different observations, different cross-sectional
    units and through time.
  • That the effect of X over Y is constant across
    different observations, different cross-sectional
    units and through time.
  • The two last assumptions are likely to be
    violated in the panel and tscs context (there are
    often reasons why the constant term or the effect
    of X on Y may be different across cross-sectional
    units and through time).

10
Single-Equation Linear Model (2)
  • We also assume the error term is homoscedastic
    and uncorrelated both across cross-sectional
    units and through time (if not, the estimates
    will not be biased but our inferences will be
    inaccurate).
  • We also assume that the explanatory variables Xi
    are exogenous.
  • In this context, we will define endogeneity the
    following way an explanatory variable Xj is
    endogenous if it is correlated with uit.
  • With the data that we use in social science, this
    kind of endogeneity can arise because of three
    reasons (see Wooldridge, chapter 4 for details).
  • Omitted variables we fail to include a relevant
    variable into the model. The observed
    explanatory variable may be correlated to the
    unobserved explanatory variable.
  • Measurement error we observe an imperfect
    measure of X, the difference between the observed
    and unobserved X goes into the error. The
    observed explanatory variable may be correlated
    to the unobserved explanatory variable.
  • Simultaneity if X is determined partly as a
    result of Y, then X an uit are generally
    correlated.

11
To Pool or not to Pool? (1)
  • The advantages of pooling
  • Pooling adds data! If the assumptions we are
    making to pool work, this means more accurate
    estimates.
  • Generalizability We want our conclusion to
    apply to many cases, many time periods.
  • The theoretical claims require it Often
    questions in political science involve both
    comparative and time series issues.
  • Pooling provides the variation needed to answer
    questions we couldnt answer with TS or CS data
    alone.
  • Pooling can help us with the measurement error
    and omitted variables problems we could have if
    using TS or CS data alone.

12
To Pool or not to Pool? (2)
  • Complications?
  • Pooled data analysis relies on the assumption
    that the relationship between X and Y does not
    vary cross-sectionally or through time.
  • This means the relationship between X and Y is
    exactly the same for all i and for all t, and the
    process affecting the uit is also the same for
    all i and for all t. OK, but

13
To Pool or not to Pool? (3)
  • This assumption can be more complicated than it
    seems. Example from Bartels (1996)
  • Imagine data that in reality contains two
    different kinds of relationships between X and Y
    for two groups of data (could be different
    countries, or years, etc).
  • The pooled estimate of the coefficients will be a
    weighted combination of the two separate ßs, the
    weights being inversely proportional to the
    variancecovariance matrix of that particular
    parameter vector. (See the math in Bartels 1996)
  • This means that in the pooled estimate, the more
    precise of the two coefficients will dominate
  • Ceteris paribus, the kind of data that will
    dominate the pooled estimate will be the one
    with the larger N, the larger values of the
    coefficients, and/or the smaller standard errors.

14
To Pool or not to Pool? (4)
  • Deciding whether or not to pool should be the
    first stet in our analysis!!
  • The question Are the units comparable and are
    the processes the same?

15
Unit Heterogeneity Fixed Effects (1)
  • We mentioned above that a key assumption in panel
    data analysis is that observations have the same
    coefficients. But we can relax this.
  • There are two general ways fixed effects and
    random effects.
  • Today fixed effects (within estimator), aka
    least squares dummy variable estimation (see
    Hsiao, chapter 3).
  • A simple way to think about this is in terms of a
    unit-specific intercept (allowing some units to
    have different average levels of Y).
  • We may be interested in change over time within a
    panel data context.
  • We want to control for unit specific omitted
    variables.
  • Differences across units can thus be captured in
    differences in the constant.
  • We introduce dummies to model these unique
    sources of variation, to model the heterogeneity.
  • Our equation is transformed

  • or

16
Unit Heterogeneity Fixed Effects (2)
  • Because the fixed effects are constant over time
    or across units, their effects can be absorbed
    into the intercept
  • The estimates will be unbiased and efficient.
  • They will also be consistent as long as N or T or
    both tend to infinity.
  • Why is least squares dummy variable estimation
    the equivalent of having a constant per unit?
  • How about the general constant?
  • Least squares dummy variable estimation is the
    equivalent of ANOVA.

17
Advantages and Disadvantages of Fixed Effects
  • Reasons to Include Fixed Effects
  • To avoid the dangers of specification bias (which
    is a big problem)!
  • Easy to interpret Unit effects have an easy
    explanation.
  • Popularity They are used very widely (in
    economics and political science particularly).
    Readers will understand.
  • Reasons NOT to Include Fixed Effects
  • It is a problem when you are interested in
    unit-specific variables. Why?
  • Fixed effects models cant include covariates
    that are constant within units (because they are
    perfectly collinear with the fixed effects).
  • For variables that vary little within the unit,
    the effects will be hard to estimate precisely
    (they will be highly collinear with the fixed
    effects).
  • Inefficiency can be a problem. Fixed effects may
    use a lot of degrees of freedom.

18
References
  • Bartels, Larry M. 1996. Pooling Disparate
    Observations. American Journal of Political
    Science 40 905-42.
  • Hsaio, Cheng. 2002. The Analysis of Panel Data,
    2nd Ed. New York Cambridge University Press.
  • Wooldridge, Jeffrey. 2002. Econometric Analysis
    of Cross Section and Panel Data. Cambridge, Mass
    MIT Press.

19
Stata Session Introducing the Dataset.
  • Comparative Political Data taken from Klaus
    Armingeon, Philipp Leimgruber, Michelle Beyeler,
    Sarah Menegale. Comparative Political Data Set
    1960-2002, Institute of Political Science,
    University of Berne 2004.
  • cps.dta.
  • Countries Australia Austria Belgium Canada
    Denmark Finland France Germany Ireland
    Italy Japan Luxembourg Netherlands New
    Zealand Norway Sweden Switzerland United
    Kingdom USA.
  • Years 1960-2002.
  • Variables
  • year, country, countryn (countrynumber).
  • gov_right, gov_cent, gov_left right, center, and
    left parties in percentage of total cabinet
    posts, weighted by days.
  • effpar effective number of parties in parliament
    according to Laakso/Taagepera.
  • fed federalism coded 0 no, 1 weak, 2
    strong.
  • gdpgr growth of GDP, change from previous year.

20
Stata Session Introductory TSCS Commands
  • You can always sort your data using when
    analyzing panel and/or TSCS data in Stata for
    example, sort country year
  • The series of commands in Stata for analyzing
    panel and tscs data all begin with the letters
    xt
  • We need to tell Stata that our data are in panel
    format to be able to use xt commands.
  • We do this by specifying the i and t variables
  • . iis name
  • . tis year
  • Or we can do both
  • . tset ccode year
  • We can examine the between versus the within
    variation in writing separately, using Statas
    xtsum

21
Stata Session Introductory TSCS Analysis
  • See computer class notes.
  • See Stata file in Student_Shared folder.
  • It is annotated with explanations of commands and
    procedures.
Write a Comment
User Comments (0)
About PowerShow.com