Modelling Longitudinal Data - PowerPoint PPT Presentation

About This Presentation

Title:

Modelling Longitudinal Data

Description:

Willet and Singer (1995) conclude that discrete-time methods are generally ... Willet, J. and Singer, J. (1995) Investigating Onset, Cessation, Relapse, and ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 68

Provided by: vg1

Category:

more less

Transcript and Presenter's Notes

Title: Modelling Longitudinal Data

1
Modelling Longitudinal Data

Survival Analysis.
Event History.
Recurrent Events.
A Final Point and link to Multilevel Models
(perhaps).

2
Vector of explanatory variables and estimates
Yi 1 bXi1 ei1
Outcome 1 for individual i
Independent identifiably distributed error
3
THE SAME AGAIN AT TIME 2
Vector of explanatory variables and estimates
Yi 2 bXi2 ei2
Outcome 2 for individual i
Independent identifiably distributed error
4
Considered together conventional regression
analysis in NOT appropriate
Yi 1 bXi1 ei1
Yi 2 bXi2 ei2
5
Change in Score
Yi 2 - Yi 1 b(Xi2-Xi1) (ei2 - ei1)
Here the b is simply a regression on the
difference or change in scores.
6
As social scientists we are often substantively
interested in whether a specific event has
occurred.
7
Survival Data Time to an event

In the medical area
Duration from treatment to death.
Time to return of pain after taking a pain
killer.

8
Survival Data Time to an event

Social Sciences
Duration of unemployment.
Duration of time on a training scheme.
Duration of housing tenure.
Duration of marriage.
Time to conception.

9
Consider a binary outcome or two-state event

0 Event has not occurred
1 Event has occurred

10
0
1
1
0
1
0
t1 t2 t3
End of Study
Start of Study
11
These durations are a continuous Y so why cant
we use standard regression techniques?
12
0
1
0
0
1
CENSORED OBSERVATIONS
1
0
1
0
Start of Study
End of Study
13
A
1
B
CENSORED OBSERVATIONS
Start of Study
End of Study
14
These durations are a continuous Y so why cant
we use standard regression techniques?

What should be the value of Y for person A and
person B at the end of our study (when we fit the
model)?

15
Cox Regression

is a method for modelling time-to-event data in
the presence of censored cases.
Explanatory variables in your model (continuous
and categorical).
Estimated coefficients for each of the
covariates.
Handles the censored cases correctly.

16
UNEMPLOYMENT AND RETURNING TO WORK STUDY
0 Unemployed 1 Returned to work
0
1
0
0
1
CENSORED OBSERVATIONS
1
0
1
0
Start of Study
End of Study
17
A Statistical Model
Y variable duration with censored observations
X1
X2
X3
18
A Statistical Model
Previous Occupation
Y variable duration with censored observations
Educational Qualifications
Length of Work experience
A continuous covariate
19
More complex event history analysis
20
UNEMPLOYMENT AND RETURNING TO WORK STUDY
0 Unemployed 1 Returned to work
1
1
0
0
t1 t2 t3
End of Study
Start of Study
21
UNEMPLOYMENT AND RETURNING TO WORK STUDY
0 Unemployed 1 Returned to work
Spell or Episode
0
t1
End of Study
Start of Study
22
UNEMPLOYMENT AND RETURNING TO WORK STUDY
0 Unemployed 1 Returned to work
Transition movement from one state to another
1
0
t1
End of Study
Start of Study
23
Recurrent Events Analysis
24
The structure of many large-scale studies results
in survey data being collected at a number of
discrete occasions. In this situation, rather
than being continuous, time lends itself to be
conceptualized as a sequence of discrete events.
Furthermore, social scientists are often
substantively interested in whether a specific
event has occurred. Taken together, these two
issues appeal to the adoption of a discrete-time
or event history approach.
25
Recurrent events are merely outcomes that can
take place on a number of occasions. A simple
example is unemployment measured month by month.
In any given month an individual can either be
employed or unemployed. If we had data for a
calendar year we would have twelve discrete
outcome measures (i.e. one for each month).
26
Social scientists now routinely employ
statistical models for the analysis of discrete
data, most notably logistic and log-linear
models, in a wide variety of substantive areas. I
believe that the adoption of a recurrent events
approach is appealing because it is a logical
extension of these models.
27
Willet and Singer (1995) conclude that
discrete-time methods are generally considered to
be simpler and more comprehensible, however,
mastery of discrete-time methods facilitates a
transition to continuous-time approaches should
that be required.Willet, J. and Singer, J.
(1995) Investigating Onset, Cessation, Relapse,
and Recovery Using Discrete-Time Survival
Analysis to Examine the Occurrence and Timing of
Critical Events. In J. Gottman (ed) The Analysis
of Change (Hove Lawrence Erlbaum Associates).
28
STATISTICAL ANALYSIS FOR BINARY RECURRENT EVENTS
(SABRE)

Fits appropriate models for recurrent events.
It is like GLIM.
It can be downloaded free.

29
www.cas.lancs.ac.uk/software
30
Consider a binary outcome or two-state event

0 Event has not occurred
1 Event has occurred
In the cross-sectional situation we are used to
modelling this with logistic regression.

31
UNEMPLOYMENT AND RETURNING TO WORK STUDY A
study for six months
0 Unemployed 1 Returned to work
32
Months 1 2 3 4 5 6obs 0 0 0 0 0 0Constantly
unemployed
33
Months 1 2 3 4 5 6obs 1 1 1 1 1 1Constantly
employed
34
Months 1 2 3 4 5 6obs 1 0 0 0 0 0Employed
in month 1 then unemployed
35
Months 1 2 3 4 5 6obs 0 0 0 0 0 1Unemployed
but gets a job in month six
36
Months 1 2 3 4 5 6obs 0 1 0 1 1 0obs 0 0 1 0
1 1obs 0 1 1 0 0 1obs 1 0 0 0 1 0Mixed
employment patterns
37
Here we have a binary outcome so could we
simply use logistic regression to model
it?Months 1 2 3 4 5 6obs 0 0 0 0 0 0
38
Yes and No!
39
SABRE fits two models that are appropriate to
this analysis.

Model 1 Pooled Cross-Sectional Logit Model

40
POOLED CROSS-SECTIONAL LOGIT MODEL

x it is a vector of explanatory variables and b
is a vector of parameter estimates .
41
POOLED CROSS-SECTIONAL LOGIT MODEL

In conventional logistic regression models, where
each observation is assumed to be independent, a
logistic link function is used, the contribution
to the likelihood by the ith case and the t th
event is given by the equation above.
42
This approach can be regarded as a naïve solution
to our data analysis problem.
43
We need to consider a number of issues.
44
Months Y1 Y2 obs 0 0
Pickles tip - In repeated measured analysis we
would require something like a paired t test
rather than an independent t test because we
can assume that Y1 and Y2 are related.
45
SABRE fits two models that are appropriate to
this analysis.

Model 2 Random Effects Model
(or logistic mixture model)

Repeated measures data violate an important
assumption of conventional regression models.
The responses of an individual at different
points in time will not be independent of each
other.
This problem has been overcome by the inclusion
of an additional, individual-specific error term.

47

48

The random effects model extends the pooled
cross-sectional model to include a case-specific
random error term to account for residual
heterogeneity.
For a sequence of outcomes for the ith case, the
basic random effects model has the integrated (or
marginal likelihood) given by the equation.

49
Davies and Pickles (1985) have demonstrated that
the failure to explicitly model the effects of
residual heterogeneity may cause severe bias in
parameter estimates. Using longitudinal data the
effects of omitted explanatory variables can be
overtly accounted for within the statistical
model. This greatly improves the accuracy of the
estimated effects of the explanatory variables
50
(No Transcript)
51
Movers and Stayers

When considering data on recurrent events there
will be individuals for whom there will be zero
(or very low) probabilities of change in outcome
from one event to the next. These individuals are
termed as stayers.

52
Months 1 2 3 4 5 6obs 0 0 0 0 0 0This
person is a stayer!
53
Months 1 2 3 4 5 6obs 1 1 1 1 1 1So is this
person.
54

An awareness of the issue of stayers is
important for technical reasons. A limitation of
a parametric modelling approach is that the tail
behaviour of the normal distribution is
inconsistent with stayers and they will tend to
be underestimated (see Spilerman 1972).
Spilerman, S. (1972) Extensions of the
Mover-Stayer Model, American Journal of
Sociology, 78, pp.599-626.

Recurrent events may be analysed using other
software but SABRE is specifically designed to
handle stayers and this feature increases SABREs
flexibility in representing residual
heterogeneity (Barry, Francis, Davies, and Stott
1998).
Barry, J., Francis, B., Davies, R.B. and Stott,D.
(1998) SABRE Users Guide
http//www.cas.lancs.ac.uk/software/sabre3.1/sabre
use.html

56
STATE DEPENDENCE
Past Behaviour
Current Behaviour
57
Different Probabilities of Employment
Young People Aged 19
APRIL
MAY
Employed
Unemployed
Employed
58
This is called a MARKOV model

A Markov model helps to control for a previous
outcome (or behaviour).

59
ACCOUNTS FOR PREVIOUS OUTCOME (yt-1)
60
The Model Provides TWO sets of estimates
APRIL
MAY
Unemployed Explanatory Variables
Employed
Employed Explanatory Variables
61
This is a two-state MARKOV model

But we can make it more complicated.

62
First Order Markov Model
Months Y1 Y2 obs 0 0
63
Second Order Markov Model
Months Y1 Y2 Y3 obs 0 0 0
64
FINAL POINT A THOUGHT!
65
Months 1 2 3 4 5 6obs 0 1 0 1 1 0obs 0 0 1 0
1 1obs 0 1 1 0 0 1obs 1 0 0 0 1 0Mixed
employment patterns
66
Hierarchical or Multilevel Data Structure
Individuals
f
g
a
b
c
d
e
1 2 3 4 1 2 1 2 3 1 2 3 1
2 1 2 3 1 2
Observations Months
67
Is the recurrent events model simply a multilevel
model fitted at the single level?