Analysis of Cross-National Longitudinal Data Marc Callens (CBGS, Brussels - PowerPoint PPT Presentation

1 / 53

About This Presentation

Title:

Analysis of Cross-National Longitudinal Data Marc Callens (CBGS, Brussels

Description:

Programme Quantitative Methods in the Social Sciences' (QMSS) Seminar: ... lack of parsimony, what if N large? how to introduce country-level covariates zj ? ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 54

Provided by: Call72

Category:

more less

Transcript and Presenter's Notes

Title: Analysis of Cross-National Longitudinal Data Marc Callens (CBGS, Brussels

1
Analysis of Cross-National Longitudinal
DataMarc Callens (CBGS, Brussels KULeuven,
Leuven)
EUROPEAN SCIENCE FOUNDATION Programme
Quantitative Methods in the Social Sciences
(QMSS) Seminar Theory and Practice in the
Analysis of Cross-National Cross-Sectional
Data 25 - 26 August 2005, University of Oxford,
United Kingdom
2
Overview

contextual regression
multilevel logistic regression
Models
Estimation
Retrospective data
The Impact of Education on Third Births (FFS)
Panel data
Poverty Dynamics in Europe (ECHP)

3
I. Contextual regression

Callens, M. (2004), Regression Modelling of
Cross-National Data, Brussels PSPC.

4
Contextual regression - 1

a nested data structure
- j 1,, N countries
- i 1,, nj individuals
yij responses depend on
- xij individual level explanatory variable
- zj country level explanatory variable
yij are correlated within each country
ordinary multiple regression is not adequate
here
- independence assumption for yij violated
- how to include country-level expl. var. zj ?

5
Contextual regression - 2

Key Methodological issues
Small N
technical problems (few degrees of freedom,
)
only simple models possible
Galton problem
nations are not independent (cultural
diffusion, )
Black box
how to explain the impact of country-level
variables?
equivalence (See Harkness et al., 2003)
how comparable are the data?

Contextual regression - 3
1. non-hierarchical models (e.g., separate
regressions, pooled regression)
- ignore hierarchical data structure
2. classical contextual models (e.g., ancova)
- acknowledge hierarchical data structure
- fixed intercepts aj and slopes ßj
3. modern multi-level models (e.g., multilevel
analysis)
- acknowledge hierarchical data structure
- random intercepts aj and slopes ßj

7
Separate regressions - 1

N models (one for each country) yiC aC ßC
xiC eiC
parameters to estimate
for each country a specific intercept aC
for each country a specific slope ßC
lack of parsimony, what if N large?
how to introduce country-level covariates zj ?

8
Pooled regression - 1

one (global) model
parameters to estimate
one (global) intercept a
one (global) slope ß
country membership is ignored
how to introduce country-level covariates zj ?

9
Analysis of Covariance - 1

one (global) model yij aj ß xij eij
countries enter the model as N-1 dummies
parameters to estimate
for each country a specific intercept aj
one global slope ß
all countries have the same slope unrealistic
how to introduce country-level covariates zj ?

10
Multilevel Regression - 1

one model yij aj ßj xij eij
aj and ßj are random variables, following a
normal distribution
random intercept aj N (a , s0²)
random slope ßj N (ß , s1²)
parameters to estimate
one average intercept a
one average slope ß
intercept variance s0²
slope variance s1²
intercept-slope covariance c01
inclusion of country level covariates zj
possible !!!!

11
Multilevel Regression - 2

- Empty model yij aj (eij )
random intercept
- Random intercept model yij aj ß xij
(eij ) indiv. covar.
- Random slope model yij aj ßj xij
(eij )
random slope
- Extended random models yij aj ßj xij ?
zj, dxij zj (eij )
country covar.

12
Contextual regression summary
Separate Pooled Ancova Multilevel
models countries 1 1 1
parms very large large large small
variances no no no yes
country vars? no 1 1 yes
complex? no no no yes
13
II. Multilevel logistic regression

Callens, M. and C. Croux (to appear), Performance
of Likelihood-based Estimation Methods for
Multilevel Binary Regression Models, Journal of
Statistical Computation and Simulation.

14
Multilevel Logistic Regression

binary responses yij depend on
- xij individual-level explicative variable
- zj group-level explicative variable
a nested data structure two levels
- level 2 N groups (j 1, ..., N)
- level 1 in each group nj individuals (i
1, ..., nj)
- in each group responses yij are correlated
standard logistic regression model not adequate
- independence assumption is violated here
- inclusion of multiple group-level covariates?

15
Models for

standard logistic regression model
random logistic regression models
extended random logistic regression models

random intercepts
random slopes
cross-level interaction
group-level explicative variable
16
multilevel discrete-time hazard models for event
data

binary responses yijt are event data
- yijt 1 event occurrence for individual i in
group j at time t
- yijt 0 event non-occurrence for individual
i in group j at time t
(a third birth in 1995?, in 1996?, .)
event data analysis may be complicated by
censoring
- e.g., for some i, an event may occur after the
survey
- hazard models can take censoring into account
recurrent events for i are possible
(becoming poor)

maximum likelihood estimation
- via equivalent model (Allison, 1982)
- requires data in person-period format

pit Pr(yit) yit, indicator for event
occurrence in time-period t
18
performance of estimation methods

performance of
generally available estimation methods?
maximum likelihood estimation
- step 1. compute likelihood L
- step 2. maximise L with respect to model
parameters
difficulty likelihood intractable integral

random effects u Normal distribution
responses yij Bernouilli distribution
19

estimation ? three methods
- penalised quasi-likelihood
- non-adaptive gaussian quadrature
- adaptive gaussian quadrature
how good? ? four performance indicators
- numerical convergence
- bias
- mean-squared error (mse)
- computational efficiency
simulation setting ? fractional factorial
experiment
- full experiment would take 6 months
- fractional experiment only 1 month
Key findings Penalised quasi-likelihood performs
(surprisingly) well

20
III. Application for retrospective data

Callens, M. and C. Croux (to appear), The Impact
of Education on Third Births, A Multilevel
Discrete-Time Hazard Analysis, Models, Journal of
Applied Statistics

21
III. Application for Retrospective Data The
Impact of Education on Third Births. A
Multilevel Discrete-Time Hazard Analysis
Callens, M. and C. Croux, Journal of Applied
Statistics (to appear)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31

key findings for multilevel models
- negative effect for education
- positive effect for Nordic countries

32
IV. Application for panel dataComparative
Poverty DynamicsA Multilevel Discrete-Time
Recurrent Hazard AnalysisCallens, M. and C.
Croux, (preprint)
33
poverty theoretical perspectives

general poverty theory?? (McKernan, 2002)
individual structural perspectives (Iceland,
2003)
individual theories the poor create their
poverty
life cycle hypothesis (e.g., Rowntree, 1901)
life cycle events (e.g., a divorce)
human capital theory (Becker, 1975)
education, age, gender,

34
poverty theoretical perspectives

structural theories econ., soc. policy systems
skills mismatch hypothesis (Kasarda, 1990)
deindustrialisation
technological change
- welfare state regimes (Esping-Andersen, 1999)
- level and design of welfare benefits
- relative role of state and market
- four types social-democratic, conservative,
liberal, southern

35
individual hypothesis for poverty entry

h1 demographic and labour market events
h2 for women effect of demographic events is
stronger

event Effect
marriage -
divorce
employment -
unemployment
event effect
marriage - -
divorce
36
structural hypothesis for poverty entry

ranking of welfare regimes dominant income
changes
h3 non-earned income changes dominate
h4 earned income changes dominate
4 most dynamic

dominant income changes dominant income changes
regime non-earned earned
southern 4 1
liberal 3 2
conservative 2 3
social-democratic 1 4
37
structural hypothesis for poverty entry

h5 skills mismatch

mismatch effect
deindustrialisation
technological change
38
two longitudinal EU databases (linked at nuts1
level)

individual level panel data
European Community Household Panel
yearly individual and household panel (94-98)
income, employment, housing, healthcare,
regional level time series
regio database, New Cronos
regional time series
demography, unemployment, education,

39
poverty status in a year

compare income with relative poverty line

country-specific poverty thresholds
equivalised individual income
40
discrete-time hazard models

binary responses y1ijt are discrete-time event
data
- y1ijt 1 poverty entry for individual i in
region j in year t
- y1ijt 0 no poverty entry for individual i
in region j at year t
(poverty entry in 1995?, in 1996?, .)
event data analysis may be complicated by
censoring
- e.g., for some i, an event may occur after the
survey
- hazard models can take censoring into account

discrete-time hazard
proportional odds model
estimation via equivalent logistic regression
model (Allison, 1982)

Tij is time of occurrence of event y1ijt for
individual i belonging to region j
conditional on being at risk
at set of t intercepts, one for each
discrete-time unit
42
multilevel discrete-time recurrent hazard
analysis

extension 1 poverty entry is a recurrent event
here, an individual may experience two poverty
entries in a row
so, we simultaneously model k 2 discrete-time
hazards
extension 2 dependent individuals ? multilevel
model

Extension 2 dependency of individuals in a
region is modelled by random intercepts
Extension 1 we allow for event-specificity
for the baseline hazard k 1, 2
43
explicative variables at individual level i

xijt demographic events (changes in a year)
- marriage from never married to married
- divorce from married to
divorced/separated
xijt labour market events (changes in a year) -
employment from unemployed to employed
- unemployment from employed to unemployed

44
control variables at individual level i

xij current status
education ltsec, sec, third, at school
xijt
age 16-40, 41-50, 51-60, 60
civil status married, sep./div, wid., unmarried
cohabiting status no, yes
activity status inactive, unemployed, working
(15)
health status bad, good
household type single, singlechild,
couplechild, oth.
duration - 4, -3, -2, -1, 0, 1.

45
explicative variables at regional level j

zj
- welfare regime
southern es, it, gr, pt
liberal uk, ie
conservative be, fr, ge
social-democratic dk (ref)
- deindustrialisation
employment service sector (relative to dk)
zjt
- technological change
employment RD business sector (relative to
dk)

46
control variables at regional level j

zj
employment rate working in total pop (rel. to
dk)
zjt
unemployment rate unemp. In active pop (rel.
to dk)
relative gdp of EU 15 average (rel. to dk)
gdp growth log differences

47
poverty entry results at individual level

odds ratio
p lt 0.05 p lt 0.01 p lt 0.001

women women women men men men
event effect OR p effect OR p
marriage (-) 0.75 () 1.30
divorce 5.28 () 1.27
employment 1.57 () 1.20
unemployment 1.34 1.77
48

key results for individual level

individual level strong impact
for women
demographic events gt labour market events
for men
labour market events gt demographic events

49
poverty entry results at regional level

Welfare regimes
odds ratio
p lt 0.05 p lt 0.01 p lt 0.001

women women women men men men
regime effect OR p effect OR p
southern - 0.61 - 0.55
liberal () 1.12 (-) 0.85
conservative -- 0.59 -- 0.53
social-democratic (ref) 1 (ref) 1
50
poverty entry results at regional level

skills mismatch

women women women men men men
skills mismatch effect OR p effect OR p
deindustrialisation () 1.00 () 1.01
technological change () 1.02 () 1.11
odds ratio p lt 0.05 p lt 0.01 p lt
0.001
51
key results for regional level