Title: Modelling Longitudinal Data
1Modelling Longitudinal Data
- Survival Analysis.
- Event History.
- Recurrent Events.
- A Final Point and link to Multilevel Models
(perhaps).
2Vector of explanatory variables and estimates
Yi 1 bXi1 ei1
Outcome 1 for individual i
Independent identifiably distributed error
3THE SAME AGAIN AT TIME 2
Vector of explanatory variables and estimates
Yi 2 bXi2 ei2
Outcome 2 for individual i
Independent identifiably distributed error
4Considered together conventional regression
analysis in NOT appropriate
Yi 1 bXi1 ei1
Yi 2 bXi2 ei2
5Change in Score
Yi 2 - Yi 1 b(Xi2-Xi1) (ei2 - ei1)
Here the b is simply a regression on the
difference or change in scores.
6As social scientists we are often substantively
interested in whether a specific event has
occurred.
7Survival Data Time to an event
- In the medical area
- Duration from treatment to death.
- Time to return of pain after taking a pain
killer.
8Survival Data Time to an event
- Social Sciences
- Duration of unemployment.
- Duration of time on a training scheme.
- Duration of housing tenure.
- Duration of marriage.
- Time to conception.
9Consider a binary outcome or two-state event
- 0 Event has not occurred
- 1 Event has occurred
100
1
1
0
1
0
t1 t2 t3
End of Study
Start of Study
11These durations are a continuous Y so why cant
we use standard regression techniques?
120
1
0
0
1
CENSORED OBSERVATIONS
1
0
1
0
Start of Study
End of Study
13A
1
B
CENSORED OBSERVATIONS
Start of Study
End of Study
14These durations are a continuous Y so why cant
we use standard regression techniques?
- What should be the value of Y for person A and
person B at the end of our study (when we fit the
model)?
15Cox Regression
- is a method for modelling time-to-event data in
the presence of censored cases. - Explanatory variables in your model (continuous
and categorical). - Estimated coefficients for each of the
covariates. - Handles the censored cases correctly.
16UNEMPLOYMENT AND RETURNING TO WORK STUDY
0 Unemployed 1 Returned to work
0
1
0
0
1
CENSORED OBSERVATIONS
1
0
1
0
Start of Study
End of Study
17A Statistical Model
Y variable duration with censored observations
X1
X2
X3
18A Statistical Model
Previous Occupation
Y variable duration with censored observations
Educational Qualifications
Length of Work experience
A continuous covariate
19More complex event history analysis
20UNEMPLOYMENT AND RETURNING TO WORK STUDY
0 Unemployed 1 Returned to work
1
1
0
0
t1 t2 t3
End of Study
Start of Study
21UNEMPLOYMENT AND RETURNING TO WORK STUDY
0 Unemployed 1 Returned to work
Spell or Episode
0
t1
End of Study
Start of Study
22UNEMPLOYMENT AND RETURNING TO WORK STUDY
0 Unemployed 1 Returned to work
Transition movement from one state to another
1
0
t1
End of Study
Start of Study
23Recurrent Events Analysis
24The structure of many large-scale studies results
in survey data being collected at a number of
discrete occasions. In this situation, rather
than being continuous, time lends itself to be
conceptualized as a sequence of discrete events.
Furthermore, social scientists are often
substantively interested in whether a specific
event has occurred. Taken together, these two
issues appeal to the adoption of a discrete-time
or event history approach.
25Recurrent events are merely outcomes that can
take place on a number of occasions. A simple
example is unemployment measured month by month.
In any given month an individual can either be
employed or unemployed. If we had data for a
calendar year we would have twelve discrete
outcome measures (i.e. one for each month).
26Social scientists now routinely employ
statistical models for the analysis of discrete
data, most notably logistic and log-linear
models, in a wide variety of substantive areas. I
believe that the adoption of a recurrent events
approach is appealing because it is a logical
extension of these models.
27Willet and Singer (1995) conclude that
discrete-time methods are generally considered to
be simpler and more comprehensible, however,
mastery of discrete-time methods facilitates a
transition to continuous-time approaches should
that be required.Willet, J. and Singer, J.
(1995) Investigating Onset, Cessation, Relapse,
and Recovery Using Discrete-Time Survival
Analysis to Examine the Occurrence and Timing of
Critical Events. In J. Gottman (ed) The Analysis
of Change (Hove Lawrence Erlbaum Associates).
28STATISTICAL ANALYSIS FOR BINARY RECURRENT EVENTS
(SABRE)
- Fits appropriate models for recurrent events.
- It is like GLIM.
- It can be downloaded free.
29www.cas.lancs.ac.uk/software
30Consider a binary outcome or two-state event
- 0 Event has not occurred
- 1 Event has occurred
- In the cross-sectional situation we are used to
modelling this with logistic regression.
31UNEMPLOYMENT AND RETURNING TO WORK STUDY A
study for six months
0 Unemployed 1 Returned to work
32Months 1 2 3 4 5 6obs 0 0 0 0 0 0Constantly
unemployed
33Months 1 2 3 4 5 6obs 1 1 1 1 1 1Constantly
employed
34Months 1 2 3 4 5 6obs 1 0 0 0 0 0Employed
in month 1 then unemployed
35Months 1 2 3 4 5 6obs 0 0 0 0 0 1Unemployed
but gets a job in month six
36Months 1 2 3 4 5 6obs 0 1 0 1 1 0obs 0 0 1 0
1 1obs 0 1 1 0 0 1obs 1 0 0 0 1 0Mixed
employment patterns
37Here we have a binary outcome so could we
simply use logistic regression to model
it?Months 1 2 3 4 5 6obs 0 0 0 0 0 0
38Yes and No!
39SABRE fits two models that are appropriate to
this analysis.
- Model 1 Pooled Cross-Sectional Logit Model
40POOLED CROSS-SECTIONAL LOGIT MODEL
x it is a vector of explanatory variables and b
is a vector of parameter estimates .
41POOLED CROSS-SECTIONAL LOGIT MODEL
In conventional logistic regression models, where
each observation is assumed to be independent, a
logistic link function is used, the contribution
to the likelihood by the ith case and the t th
event is given by the equation above.
42This approach can be regarded as a naïve solution
to our data analysis problem.
43We need to consider a number of issues.
44Months Y1 Y2 obs 0 0
Pickles tip - In repeated measured analysis we
would require something like a paired t test
rather than an independent t test because we
can assume that Y1 and Y2 are related.
45SABRE fits two models that are appropriate to
this analysis.
- Model 2 Random Effects Model
- (or logistic mixture model)
46- Repeated measures data violate an important
assumption of conventional regression models. - The responses of an individual at different
points in time will not be independent of each
other. - This problem has been overcome by the inclusion
of an additional, individual-specific error term.
47 48- The random effects model extends the pooled
cross-sectional model to include a case-specific
random error term to account for residual
heterogeneity. - For a sequence of outcomes for the ith case, the
basic random effects model has the integrated (or
marginal likelihood) given by the equation.
49Davies and Pickles (1985) have demonstrated that
the failure to explicitly model the effects of
residual heterogeneity may cause severe bias in
parameter estimates. Using longitudinal data the
effects of omitted explanatory variables can be
overtly accounted for within the statistical
model. This greatly improves the accuracy of the
estimated effects of the explanatory variables
50(No Transcript)
51Movers and Stayers
- When considering data on recurrent events there
will be individuals for whom there will be zero
(or very low) probabilities of change in outcome
from one event to the next. These individuals are
termed as stayers.
52Months 1 2 3 4 5 6obs 0 0 0 0 0 0This
person is a stayer!
53Months 1 2 3 4 5 6obs 1 1 1 1 1 1So is this
person.
54- An awareness of the issue of stayers is
important for technical reasons. A limitation of
a parametric modelling approach is that the tail
behaviour of the normal distribution is
inconsistent with stayers and they will tend to
be underestimated (see Spilerman 1972). - Spilerman, S. (1972) Extensions of the
Mover-Stayer Model, American Journal of
Sociology, 78, pp.599-626.
55- Recurrent events may be analysed using other
software but SABRE is specifically designed to
handle stayers and this feature increases SABREs
flexibility in representing residual
heterogeneity (Barry, Francis, Davies, and Stott
1998). - Barry, J., Francis, B., Davies, R.B. and Stott,D.
(1998) SABRE Users Guide - http//www.cas.lancs.ac.uk/software/sabre3.1/sabre
use.html
56STATE DEPENDENCE
Past Behaviour
Current Behaviour
57Different Probabilities of Employment
Young People Aged 19
APRIL
MAY
Employed
Unemployed
Employed
58This is called a MARKOV model
- A Markov model helps to control for a previous
outcome (or behaviour).
59ACCOUNTS FOR PREVIOUS OUTCOME (yt-1)
60The Model Provides TWO sets of estimates
APRIL
MAY
Unemployed Explanatory Variables
Employed
Employed Explanatory Variables
61This is a two-state MARKOV model
- But we can make it more complicated.
62First Order Markov Model
Months Y1 Y2 obs 0 0
63Second Order Markov Model
Months Y1 Y2 Y3 obs 0 0 0
64FINAL POINT A THOUGHT!
65Months 1 2 3 4 5 6obs 0 1 0 1 1 0obs 0 0 1 0
1 1obs 0 1 1 0 0 1obs 1 0 0 0 1 0Mixed
employment patterns
66Hierarchical or Multilevel Data Structure
Individuals
f
g
a
b
c
d
e
1 2 3 4 1 2 1 2 3 1 2 3 1
2 1 2 3 1 2
Observations Months
67Is the recurrent events model simply a multilevel
model fitted at the single level?
- A controversial point!
- More later..