Title: Analysis of Cross-National Longitudinal Data Marc Callens (CBGS, Brussels
1Analysis of Cross-National Longitudinal
DataMarc Callens (CBGS, Brussels KULeuven,
Leuven)
EUROPEAN SCIENCE FOUNDATION Programme
Quantitative Methods in the Social Sciences
(QMSS) Seminar Theory and Practice in the
Analysis of Cross-National Cross-Sectional
Data 25 - 26 August 2005, University of Oxford,
United Kingdom
2Overview
- contextual regression
- multilevel logistic regression
- Models
- Estimation
- Retrospective data
- The Impact of Education on Third Births (FFS)
- Panel data
- Poverty Dynamics in Europe (ECHP)
3I. Contextual regression
- Callens, M. (2004), Regression Modelling of
Cross-National Data, Brussels PSPC.
4Contextual regression - 1
- a nested data structure
- - j 1,, N countries
- - i 1,, nj individuals
- yij responses depend on
- - xij individual level explanatory variable
- - zj country level explanatory variable
- yij are correlated within each country
- ordinary multiple regression is not adequate
here - - independence assumption for yij violated
- - how to include country-level expl. var. zj ?
5Contextual regression - 2
- Key Methodological issues
- Small N
- technical problems (few degrees of freedom,
) - only simple models possible
- Galton problem
- nations are not independent (cultural
diffusion, ) - Black box
- how to explain the impact of country-level
variables? - equivalence (See Harkness et al., 2003)
- how comparable are the data?
6 - Contextual regression - 3
- 1. non-hierarchical models (e.g., separate
regressions, pooled regression) - - ignore hierarchical data structure
-
- 2. classical contextual models (e.g., ancova)
- - acknowledge hierarchical data structure
- - fixed intercepts aj and slopes ßj
- 3. modern multi-level models (e.g., multilevel
analysis) - - acknowledge hierarchical data structure
- - random intercepts aj and slopes ßj
-
7Separate regressions - 1
- N models (one for each country) yiC aC ßC
xiC eiC -
- parameters to estimate
- for each country a specific intercept aC
- for each country a specific slope ßC
- lack of parsimony, what if N large?
- how to introduce country-level covariates zj ?
8Pooled regression - 1
- one (global) model
- parameters to estimate
- one (global) intercept a
- one (global) slope ß
- country membership is ignored
- how to introduce country-level covariates zj ?
9Analysis of Covariance - 1
- one (global) model yij aj ß xij eij
- countries enter the model as N-1 dummies
- parameters to estimate
- for each country a specific intercept aj
- one global slope ß
- all countries have the same slope unrealistic
- how to introduce country-level covariates zj ?
10Multilevel Regression - 1
- one model yij aj ßj xij eij
- aj and ßj are random variables, following a
normal distribution - random intercept aj N (a , s0²)
- random slope ßj N (ß , s1²)
- parameters to estimate
- one average intercept a
- one average slope ß
- intercept variance s0²
- slope variance s1²
- intercept-slope covariance c01
- inclusion of country level covariates zj
possible !!!! -
11Multilevel Regression - 2
- - Empty model yij aj (eij )
random intercept - - Random intercept model yij aj ß xij
(eij ) indiv. covar. - - Random slope model yij aj ßj xij
(eij ) - random slope
- - Extended random models yij aj ßj xij ?
zj, dxij zj (eij )
country covar.
12Contextual regression summary
Separate Pooled Ancova Multilevel
models countries 1 1 1
parms very large large large small
variances no no no yes
country vars? no 1 1 yes
complex? no no no yes
13II. Multilevel logistic regression
- Callens, M. and C. Croux (to appear), Performance
of Likelihood-based Estimation Methods for
Multilevel Binary Regression Models, Journal of
Statistical Computation and Simulation.
14Multilevel Logistic Regression
- binary responses yij depend on
- - xij individual-level explicative variable
- zj group-level explicative variable - a nested data structure two levels
- - level 2 N groups (j 1, ..., N)
- - level 1 in each group nj individuals (i
1, ..., nj) - - in each group responses yij are correlated
- standard logistic regression model not adequate
- - independence assumption is violated here
- - inclusion of multiple group-level covariates?
15 Models for
- standard logistic regression model
-
-
-
- random logistic regression models
-
- extended random logistic regression models
-
-
random intercepts
random slopes
cross-level interaction
group-level explicative variable
16multilevel discrete-time hazard models for event
data
- binary responses yijt are event data
- - yijt 1 event occurrence for individual i in
group j at time t - - yijt 0 event non-occurrence for individual
i in group j at time t - (a third birth in 1995?, in 1996?, .)
-
- event data analysis may be complicated by
censoring - - e.g., for some i, an event may occur after the
survey - - hazard models can take censoring into account
- recurrent events for i are possible
- (becoming poor)
17- maximum likelihood estimation
- - via equivalent model (Allison, 1982)
-
- - requires data in person-period format
-
-
pit Pr(yit) yit, indicator for event
occurrence in time-period t
18performance of estimation methods
- performance of
- generally available estimation methods?
- maximum likelihood estimation
- - step 1. compute likelihood L
- - step 2. maximise L with respect to model
parameters - difficulty likelihood intractable integral
random effects u Normal distribution
responses yij Bernouilli distribution
19- estimation ? three methods
- - penalised quasi-likelihood
- - non-adaptive gaussian quadrature
- - adaptive gaussian quadrature
- how good? ? four performance indicators
- - numerical convergence
- - bias
- - mean-squared error (mse)
- - computational efficiency
- simulation setting ? fractional factorial
experiment - - full experiment would take 6 months
- - fractional experiment only 1 month
- Key findings Penalised quasi-likelihood performs
(surprisingly) well
20III. Application for retrospective data
- Callens, M. and C. Croux (to appear), The Impact
of Education on Third Births, A Multilevel
Discrete-Time Hazard Analysis, Models, Journal of
Applied Statistics
21III. Application for Retrospective Data The
Impact of Education on Third Births. A
Multilevel Discrete-Time Hazard Analysis
Callens, M. and C. Croux, Journal of Applied
Statistics (to appear)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31-
- key findings for multilevel models
- - negative effect for education
- - positive effect for Nordic countries
-
32IV. Application for panel dataComparative
Poverty DynamicsA Multilevel Discrete-Time
Recurrent Hazard AnalysisCallens, M. and C.
Croux, (preprint)
33poverty theoretical perspectives
- general poverty theory?? (McKernan, 2002)
- individual structural perspectives (Iceland,
2003) - individual theories the poor create their
poverty - life cycle hypothesis (e.g., Rowntree, 1901)
- life cycle events (e.g., a divorce)
- human capital theory (Becker, 1975)
- education, age, gender,
34poverty theoretical perspectives
- structural theories econ., soc. policy systems
- skills mismatch hypothesis (Kasarda, 1990)
- deindustrialisation
- technological change
- - welfare state regimes (Esping-Andersen, 1999)
- - level and design of welfare benefits
- - relative role of state and market
- - four types social-democratic, conservative,
liberal, southern
35individual hypothesis for poverty entry
- h1 demographic and labour market events
- h2 for women effect of demographic events is
stronger
event Effect
marriage -
divorce
employment -
unemployment
event effect
marriage - -
divorce
36structural hypothesis for poverty entry
- ranking of welfare regimes dominant income
changes - h3 non-earned income changes dominate
- h4 earned income changes dominate
- 4 most dynamic
dominant income changes dominant income changes
regime non-earned earned
southern 4 1
liberal 3 2
conservative 2 3
social-democratic 1 4
37structural hypothesis for poverty entry
mismatch effect
deindustrialisation
technological change
38two longitudinal EU databases (linked at nuts1
level)
- individual level panel data
- European Community Household Panel
- yearly individual and household panel (94-98)
- income, employment, housing, healthcare,
- regional level time series
- regio database, New Cronos
- regional time series
- demography, unemployment, education,
39poverty status in a year
- compare income with relative poverty line
-
-
country-specific poverty thresholds
equivalised individual income
40discrete-time hazard models
- binary responses y1ijt are discrete-time event
data - - y1ijt 1 poverty entry for individual i in
region j in year t - - y1ijt 0 no poverty entry for individual i
in region j at year t - (poverty entry in 1995?, in 1996?, .)
-
- event data analysis may be complicated by
censoring - - e.g., for some i, an event may occur after the
survey - - hazard models can take censoring into account
-
41- discrete-time hazard
-
-
- proportional odds model
- estimation via equivalent logistic regression
model (Allison, 1982)
Tij is time of occurrence of event y1ijt for
individual i belonging to region j
conditional on being at risk
at set of t intercepts, one for each
discrete-time unit
42multilevel discrete-time recurrent hazard
analysis
- extension 1 poverty entry is a recurrent event
- here, an individual may experience two poverty
entries in a row - so, we simultaneously model k 2 discrete-time
hazards - extension 2 dependent individuals ? multilevel
model
Extension 2 dependency of individuals in a
region is modelled by random intercepts
Extension 1 we allow for event-specificity
for the baseline hazard k 1, 2
43explicative variables at individual level i
-
- xijt demographic events (changes in a year)
- - marriage from never married to married
- - divorce from married to
divorced/separated -
- xijt labour market events (changes in a year) -
employment from unemployed to employed - - unemployment from employed to unemployed
44 control variables at individual level i
- xij current status
- education ltsec, sec, third, at school
- xijt
- age 16-40, 41-50, 51-60, 60
- civil status married, sep./div, wid., unmarried
- cohabiting status no, yes
- activity status inactive, unemployed, working
(15) - health status bad, good
- household type single, singlechild,
couplechild, oth. - duration - 4, -3, -2, -1, 0, 1.
-
-
45explicative variables at regional level j
- zj
- - welfare regime
- southern es, it, gr, pt
- liberal uk, ie
- conservative be, fr, ge
- social-democratic dk (ref)
- - deindustrialisation
- employment service sector (relative to dk)
- zjt
- - technological change
- employment RD business sector (relative to
dk)
46control variables at regional level j
- zj
- employment rate working in total pop (rel. to
dk) - zjt
- unemployment rate unemp. In active pop (rel.
to dk) - relative gdp of EU 15 average (rel. to dk)
- gdp growth log differences
47poverty entry results at individual level
-
- odds ratio
- p lt 0.05 p lt 0.01 p lt 0.001
-
women women women men men men
event effect OR p effect OR p
marriage (-) 0.75 () 1.30
divorce 5.28 () 1.27
employment 1.57 () 1.20
unemployment 1.34 1.77
48 key results for individual level
- individual level strong impact
- for women
- demographic events gt labour market events
- for men
- labour market events gt demographic events
49poverty entry results at regional level
- Welfare regimes
- odds ratio
- p lt 0.05 p lt 0.01 p lt 0.001
women women women men men men
regime effect OR p effect OR p
southern - 0.61 - 0.55
liberal () 1.12 (-) 0.85
conservative -- 0.59 -- 0.53
social-democratic (ref) 1 (ref) 1
50poverty entry results at regional level
women women women men men men
skills mismatch effect OR p effect OR p
deindustrialisation () 1.00 () 1.01
technological change () 1.02 () 1.11
odds ratio p lt 0.05 p lt 0.01 p lt
0.001
51key results for regional level
- regional level variation small, but relevant
- welfare regimes have an impact
- but, theoretically ambiguous
52conclusion
- individual gt regional
- women demographic events
- men labour market events
- welfare regime important, but how??
-
53(No Transcript)