Estimating transitions from repeated cross sections - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Estimating transitions from repeated cross sections

Description:

The unobserved cells follow binomial distributions with parameters and ? ... unobserved heterogeneity (beta-binomial distribution) ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 40
Provided by: katholieke
Category:

less

Transcript and Presenter's Notes

Title: Estimating transitions from repeated cross sections


1
Estimating transitions fromrepeated cross
sections
  • Rob Eisinga
  • Ben Pelzer
  • University of Nijmegen NL
  • Paul Lazarsfeld symposium, Brussels June 4-5 2004

2
Problem
  • Can we infer individual-level transitions from
    repeated cross-sectional (RCS) data?

similar to well-known Ecological Inference (EI)
problem
Can we infer individual-level behaviour from
aggregate data?
3
Problem in most simple form
EI
RCS
4
Ecological inference problem across disciplines
Sociology Various ASR papers e.g. Robinson 1950,
Goodman 1953, Duncan Davis 1953
Political science King 1997 A solution to the
ecological inference problem
Econometrics Golan, Judge Miller 1996 Maximum
entropy econometrics
  • Epidemiology Richardson Montfort 2000 (and
    others in) Stat Med

Marketing Böckenholt Dillon 2000 JMR
Statistics Information margins provide about
cells Plackett 1977, Hamdan 1986, Haber 1989,
Kocherlakota 1992 Various JASA papers e.g.
Kalbfleish Lawless 1985 Special issue on
ecological analysis JRSSA 2001 McGullagh
Nelder 1992 Generalized linear models
others Fienberg 1997 Data disclosure limitation
methods
5
This meeting
6
Introduction into EI problem
  • 2 x 2 table
  • two situations (i) cells observed, (ii) cells
    unobserved

for both situations
  • two sampling models (i) product binomial, (ii)
    multinomial

7
Cells observed product binomial sampling
B (3, µ) B (7, ?)
8
Cells unobserved
g 0, 1, 2, 3
convolution of B (3, µ) and B
(7, ?)
9
Cells observed multinomial sampling
B (10,p) B (3, µ)
B (7, ?)
10
Cells unobserved
g 0, 1, 2, 3
B (10, p) convolution of B (3, µ)
and B (7, ?)
11
Maximum likelihood (ML) estimation
  • We can obtain ML estimates of µ and ? (and p) if
    we
  • (i) have multiple tables and
  • assume homogeneous probabilities across tables
  • (e.g., McGullagh Nelder, GLM 1992)
  • Other procedures include
  • expectation maximisation (EM) (David Firth,
    1982)
  • maximum entropy (ME) (Golan, Judge Miller,
    1996)

12
Example
13
Results (both margins random)
row prob.
total prob.
marginal prob.
14
Extension heterogenous probabilities
  • Options
  • 1. include covariates

15
Covariates
X0
X1
16
Extension heterogenous probabilities
  • Options
  • 2. assume distribution for µ and ?

17
Distribution for µ and ?
  • The unobserved cells follow binomial
    distributions with parameters µ and ?

The probabilities µ and ? themselves follow beta
distributions
Together they mix into a beta-binomial model for
the unobserved cells
18
Results
19
Repeated crosssectional data
t1
t2
t3
20
Repeated crosssectional data
t1
t0
t2
t1
t3
t2
21
Repeated crosssectional data
t1
  • Options
  • impute at t the aggregate values yt-1

t2
this grouping method is problematic
if sample sizes are small if X has many
categories (small n) if X is time-varying
t1
t3
therefore 2. individual-level model for
yt-1 pt-1
t2
22
Transition model for RCS data
  • for each unit i

identity pt µt (1- pt-1) ?t
pt-1
23
Marginal and transition probabilities
logit (µt) Xt ß logit (?t) Xt ß
logit (p1) X1 d ß, ß, d are parameters to
be estimated using Xt
24
Covariates Xt
  • time-constant X cohort, gender, race, education
  • time-varying X age, number of children, time
  • are used to generate backcasted values of Xt

and these backcasted Xt are (together with the
parameters) used to obtain p1, µt, ?t
for example
  • for cross section observed at t2, the model
    uses the
  • observed X value at t2 to obtain µ2 ,
    ?2
  • backcasted X value at t1 to obtain p1

25
Example
X0
X1
t1
t2
t3
t4
26
Results
27
Model extensions
  • model with non-backcastable covariates X
  • time-varying covariate effects (polynomials)
  • unobserved heterogeneity (beta-binomial model)
  • multi-state model

28
Application household ownership of PC in NL
1986-98
  • Data Socio-economic Panel of Statistics
    Netherlands n of cases 2.028 n of
    observations (13 years x 2.028 ) 26.364
  • Dependent var household ownership computer
    (0/1) Covariates age head household
    education head household no. household
    members household income time

Note panel data treated as independent cross
sections
29
Marginal proportions PC ownership in NL
30
ML estimates

  • parameter stand error
  • Entry (µ) age 35-54 .14 .05
  • age 55 -1.36 .07
  • education .37 .02
  • no. hh members .42 .02
  • income .44 .02
  • time .22 .01
  • constant -6.34 .12
  • 1 - Exit (?) constant 2.29 .13

31
Verification ML estimates
  • parametric bootstrap simulation
  • dynamic panel model
  • cross-sectional samples from panel

32
RCS vs bootstrap

  • parameter stand error
  • RCS
    bootstrap RCS bootstrap
  • Entry (µ) age 35-54 .14 .14 .05 .05
  • age 55 -1.36 -1.37 .07 .07
  • education .37 .37 .02 .02
  • no. hh memb .42 .42 .02 .02
  • income .44 .44 .02 .02
  • time .22 .22 .01 .01
  • constant -6.34 -6.34 .12 .12
  • 1 - Exit (?) constant 2.29 2.30 .13 .13

33
RCS vs dynamic panel

  • parameter stand error
  • RCS
    panel RCS panel
  • Entry (µ) age 35-54 .14 -.10 .05 .07
  • age 55 -1.36 -1.27 .07 .14
  • education .37 .25 .02 .03
  • no. hh memb .42 .38 .02 .09
  • income .44 .23 .02 .02
  • time .22 .17 .01 .01
  • constant -6.34 -5.12 .12 .14
  • 1 - Exit (?) constant 2.29 2.18 .13 .04

34
Independent samples from panel
Panel n 13 x 2.028
RCS sample n 13 x 156 2.028
35
RCS vs samples from panel

  • parameter stand error
  • RCS samples
    RCS samples
  • Entry (µ) age 35-54 .14 .15 .05 .05
  • age 55 -1.36 -1.42 .07 .06
  • education .37 .37 .02 .03
  • no. hh memb .42 .43 .02 .02
  • income .44 .45 .02 .02
  • time .22 .22 .01 .01
  • constant -6.34 -6.43 .12 .12
  • 1 - Exit (?) constant 2.29 2.39 .13 .26

36
Ecological Inference new methodological
strategiesGary King, Ori Rosen Martin Tanner
(Eds.)Cambridge University Press, July 2004
37
Crossmark by Ben Pelzer

38
Features Crossmark
  • data handling options (weighting, model
    restrictions)
  • ml estimation (fisher scoring, steepest ascent,
    BFGS)
  • parametric bootstrap simulation
  • mcmc simulation (metropolis sampling)
  • Additional modules for
  • unobserved heterogeneity (beta-binomial
    distribution)
  • 3-state transition models (including
    dirichlet-multinomial distribution)

39
Thanks !
Write a Comment
User Comments (0)
About PowerShow.com