Title: Estimating transitions from repeated cross sections
1Estimating transitions fromrepeated cross
sections
- Rob Eisinga
- Ben Pelzer
- University of Nijmegen NL
- Paul Lazarsfeld symposium, Brussels June 4-5 2004
2Problem
- Can we infer individual-level transitions from
repeated cross-sectional (RCS) data?
similar to well-known Ecological Inference (EI)
problem
Can we infer individual-level behaviour from
aggregate data?
3Problem in most simple form
EI
RCS
4Ecological inference problem across disciplines
Sociology Various ASR papers e.g. Robinson 1950,
Goodman 1953, Duncan Davis 1953
Political science King 1997 A solution to the
ecological inference problem
Econometrics Golan, Judge Miller 1996 Maximum
entropy econometrics
- Epidemiology Richardson Montfort 2000 (and
others in) Stat Med
Marketing Böckenholt Dillon 2000 JMR
Statistics Information margins provide about
cells Plackett 1977, Hamdan 1986, Haber 1989,
Kocherlakota 1992 Various JASA papers e.g.
Kalbfleish Lawless 1985 Special issue on
ecological analysis JRSSA 2001 McGullagh
Nelder 1992 Generalized linear models
others Fienberg 1997 Data disclosure limitation
methods
5This meeting
6Introduction into EI problem
- two situations (i) cells observed, (ii) cells
unobserved
for both situations
- two sampling models (i) product binomial, (ii)
multinomial
7Cells observed product binomial sampling
B (3, µ) B (7, ?)
8Cells unobserved
g 0, 1, 2, 3
convolution of B (3, µ) and B
(7, ?)
9Cells observed multinomial sampling
B (10,p) B (3, µ)
B (7, ?)
10Cells unobserved
g 0, 1, 2, 3
B (10, p) convolution of B (3, µ)
and B (7, ?)
11Maximum likelihood (ML) estimation
- We can obtain ML estimates of µ and ? (and p) if
we -
- (i) have multiple tables and
- assume homogeneous probabilities across tables
- (e.g., McGullagh Nelder, GLM 1992)
- Other procedures include
- expectation maximisation (EM) (David Firth,
1982) - maximum entropy (ME) (Golan, Judge Miller,
1996)
12Example
13Results (both margins random)
row prob.
total prob.
marginal prob.
14Extension heterogenous probabilities
- Options
- 1. include covariates
15Covariates
X0
X1
16Extension heterogenous probabilities
- Options
- 2. assume distribution for µ and ?
17Distribution for µ and ?
- The unobserved cells follow binomial
distributions with parameters µ and ?
The probabilities µ and ? themselves follow beta
distributions
Together they mix into a beta-binomial model for
the unobserved cells
18Results
19Repeated crosssectional data
t1
t2
t3
20Repeated crosssectional data
t1
t0
t2
t1
t3
t2
21Repeated crosssectional data
t1
- Options
- impute at t the aggregate values yt-1
t2
this grouping method is problematic
if sample sizes are small if X has many
categories (small n) if X is time-varying
t1
t3
therefore 2. individual-level model for
yt-1 pt-1
t2
22Transition model for RCS data
identity pt µt (1- pt-1) ?t
pt-1
23Marginal and transition probabilities
logit (µt) Xt ß logit (?t) Xt ß
logit (p1) X1 d ß, ß, d are parameters to
be estimated using Xt
24Covariates Xt
- time-constant X cohort, gender, race, education
- time-varying X age, number of children, time
- are used to generate backcasted values of Xt
and these backcasted Xt are (together with the
parameters) used to obtain p1, µt, ?t
for example
- for cross section observed at t2, the model
uses the - observed X value at t2 to obtain µ2 ,
?2 - backcasted X value at t1 to obtain p1
25Example
X0
X1
t1
t2
t3
t4
26Results
27Model extensions
- model with non-backcastable covariates X
- time-varying covariate effects (polynomials)
- unobserved heterogeneity (beta-binomial model)
28Application household ownership of PC in NL
1986-98
- Data Socio-economic Panel of Statistics
Netherlands n of cases 2.028 n of
observations (13 years x 2.028 ) 26.364 - Dependent var household ownership computer
(0/1) Covariates age head household
education head household no. household
members household income time
Note panel data treated as independent cross
sections
29Marginal proportions PC ownership in NL
30ML estimates
-
parameter stand error -
-
- Entry (µ) age 35-54 .14 .05
- age 55 -1.36 .07
- education .37 .02
- no. hh members .42 .02
- income .44 .02
- time .22 .01
- constant -6.34 .12
-
- 1 - Exit (?) constant 2.29 .13
31Verification ML estimates
- parametric bootstrap simulation
- cross-sectional samples from panel
32RCS vs bootstrap
-
parameter stand error - RCS
bootstrap RCS bootstrap -
- Entry (µ) age 35-54 .14 .14 .05 .05
- age 55 -1.36 -1.37 .07 .07
- education .37 .37 .02 .02
- no. hh memb .42 .42 .02 .02
- income .44 .44 .02 .02
- time .22 .22 .01 .01
- constant -6.34 -6.34 .12 .12
-
- 1 - Exit (?) constant 2.29 2.30 .13 .13
33RCS vs dynamic panel
-
parameter stand error - RCS
panel RCS panel -
- Entry (µ) age 35-54 .14 -.10 .05 .07
- age 55 -1.36 -1.27 .07 .14
- education .37 .25 .02 .03
- no. hh memb .42 .38 .02 .09
- income .44 .23 .02 .02
- time .22 .17 .01 .01
- constant -6.34 -5.12 .12 .14
-
- 1 - Exit (?) constant 2.29 2.18 .13 .04
34Independent samples from panel
Panel n 13 x 2.028
RCS sample n 13 x 156 2.028
35RCS vs samples from panel
-
parameter stand error - RCS samples
RCS samples -
- Entry (µ) age 35-54 .14 .15 .05 .05
- age 55 -1.36 -1.42 .07 .06
- education .37 .37 .02 .03
- no. hh memb .42 .43 .02 .02
- income .44 .45 .02 .02
- time .22 .22 .01 .01
- constant -6.34 -6.43 .12 .12
-
- 1 - Exit (?) constant 2.29 2.39 .13 .26
36Ecological Inference new methodological
strategiesGary King, Ori Rosen Martin Tanner
(Eds.)Cambridge University Press, July 2004
37Crossmark by Ben Pelzer
38Features Crossmark
- data handling options (weighting, model
restrictions) - ml estimation (fisher scoring, steepest ascent,
BFGS) - parametric bootstrap simulation
- mcmc simulation (metropolis sampling)
- Additional modules for
- unobserved heterogeneity (beta-binomial
distribution) - 3-state transition models (including
dirichlet-multinomial distribution)
39Thanks !