Title: What is
1What is The Analysis of Longitudinal Survey Data
Paul Lambert University of StirlingPrepared
for National Centre for Research Methods,
Research Methods Festival, St Catherines
College, Oxford, 7 July 2010
- Also see www.longitudinal.stir.ac.uk /
www.dames.org.uk
2So whats distinct about the analysis of
longitudinal survey data?
- You already know..
- Working with (survey) datasets with longitudinal
information (data about time) and the specialist
techniques of statistical analysis that are
appropriate - You maybe dont realise..
- Groups of techniques and data types
- Complex data and data management components
31) Types of longitudinal survey data
- Survey resources
- Longitudinal ..of or about time..
- Analysis is concerned with time
- Data is concerned with more than one time point
- e.g. Taris 2000 Blossfeld and Rohwer 2002
- Repeated measures over time
- e.g. Menard 2002 Martin et al 2006
Data analysis is used to give a parsimonious
summary of patterns of relations between
variables in the survey dataset
4Types of data and analysis traditions for
longitudinal surveys cf. www.longitudinal.stir.ac
.uk
0. Temporal effects in cross-sectional data 1. Repeated cross-sections
2. Panel datasets 3. Cohort studies
4. Events history datasets 5. Time series analyses
5Temporal effects in single cross-sectional surveys
Data type 1/6
- Temporal effects are (a) present and (b) of
interest in most social science studies - We can measure differences between people in
terms of their age / year of birth - These matter empirically are interesting
substantively - But we cant tell if differences are due to age
or period or cohort (or other things that are
collinear with these, e.g. life course stage or
major events)
6- Longitudinal statements from cross-sectional data
are common... - We typically fit linear/curvilinear trend lines
for time effects - Treiman (2009 162) nonlinear specifications of
time and age effects - Year of birth effect on literacy in China
discontinuity at 1955 curve 1955-1967 knot at
1967
7Within 20s 0.15
yob cohort, 30s 0.28
Gamma on 40s 0.22
educ to health 50s 0.23
is 60s 0.22
70s 0.15
80s 0.10
8Repeated cross-sections Surveys on same
topics, on multiple occasions, to
different people
Data type 2/6
Data example GHS pooled time-series dataset
(UKDA, SN 5664)
Adults aged 25-65 only
9Repeated cross sections
- Easy to communicate appealing how things have
changed between certain time points - Can distinguishes any 2 of age / period / cohort
- Easier to analyse less data management
- However..
- Dont get other QnLR attractions (nature of
changers residual heterogeneity causality
durations) - Hidden complications are sampling methods,
variable operationalisations really comparable? - More on this below...
10Example Labour Force Survey yearly stats
Percent of UK workers with a higher degree, by employment category and gender (m / f ) Sample size 35,000 m / 30,000 f each year Percent of UK workers with a higher degree, by employment category and gender (m / f ) Sample size 35,000 m / 30,000 f each year Percent of UK workers with a higher degree, by employment category and gender (m / f ) Sample size 35,000 m / 30,000 f each year Percent of UK workers with a higher degree, by employment category and gender (m / f ) Sample size 35,000 m / 30,000 f each year
1991 1996 2001
Profess. 14.4 19.9 24.9
Non-Prof. 1.3 2.5 3.5
Profess. 11.0 24.4 28.3
Non-Prof 0.6 2.3 3.2
11LFS and time (example in SPSS from
www.longitudinal.stir.ac.uk)
12Panel Datasets
Data type 3/6
Information collected on the same cases at more
than one point in time
- classic longitudinal design
- incorporates follow-up, repeated measures,
and cohort large and small in scale - Several major panel studies in UK, e.g.
www.esds.ac.uk/longitudinal - Many cross-sectional surveys feature additional
panel elements
13Illustration Unbalanced panel
Wave Person ? Person-level Vars ? ? Person-level Vars ? ? Person-level Vars ? ? Person-level Vars ?
1 1 1 38 1 36
1 2 2 34 2 0
1 3 2 6 9 -
2 1 1 39 1 38
2 2 2 35 1 16
3 1 1 40 1 36
3 2 2 36 1 18
3 3 2 8 9 -
N_w3 N_p3 also sweep, contact,.. also sweep, contact,.. also sweep, contact,.. also sweep, contact,..
14Complex data example BHPS panel dataset SN 5151
15Panel data advantages
- Study changers how many of them, what are
they like, what caused change - Control for individuals unknown characteristics
(residual heterogeneity) - Develop a full and reliable life history
- e.g. family formation, employment patterns
16Example Panel transitions
Young peoples household circumstance changes by subjective well-being between 1994 and 1995. BHPS youth panel, 11-14yrs in 1994, row percents. Young peoples household circumstance changes by subjective well-being between 1994 and 1995. BHPS youth panel, 11-14yrs in 1994, row percents. Young peoples household circumstance changes by subjective well-being between 1994 and 1995. BHPS youth panel, 11-14yrs in 1994, row percents. Young peoples household circumstance changes by subjective well-being between 1994 and 1995. BHPS youth panel, 11-14yrs in 1994, row percents. Young peoples household circumstance changes by subjective well-being between 1994 and 1995. BHPS youth panel, 11-14yrs in 1994, row percents. Young peoples household circumstance changes by subjective well-being between 1994 and 1995. BHPS youth panel, 11-14yrs in 1994, row percents.
Stays happy Cheers up Becomes miserable Stays miserable N
HH Stable 54 19 10 18 499
HH Changes 42 22 14 22 81
17Panel data can be wide or long
1991 1992 1993 1994 1995
1991
1992
1993
1994
1995
1996
- Depends upon the analytical approach
- Wide format is simpler to envisage but analysis
will need unbalanced data or missing value
imputations - Long format is harder to manipulate (e.g. to
cross-check), but is more flexible in the types
of analysis it supports
18Panel models Regression style models with
various estimators to recognise the repeated
contacts e.g. random effects fixed effects
population average linear(model influences on
GHQ score in the BHPS Stata examples available
via www.dames.org.uk/workshops)
19Cohort Datasets
Data type 4/6
Information on a group of cases which share a
common circumstance, collected repeatedly as they
progress through a life course
- Intuitive type of repeated contact data
- e.g. 7-up series
- Often contributes to cross-cohort comparisons
- e.g. UK Birth cohort studies in 1946, 1958, 1970
and 2000
20Cohort data and analysis in the social sciences
- Many circumstances parallel other panel types
- Large scale studies ambitious expensive
- Small scale cohorts still quite common
- Attrition problems often more severe
- Considerable study duration limits
- Glenn (2005) argues that cohort analysis should
be specifically directed to understanding effects
of ageing/progression over time - Other uses of cohort data are just panel data
- It remains hard - even with extensive cohort data
- to authoritatively understand ageing effects
(age period cohort)
21Event history data analysisesp. Blossfeld et al
2007
Data type 5/6
Focus shifts to length of time in a state
- analyse determinants/patterns to time in
state(s)
- Data sources are panel / cohort studies, or
retrospective interviews (recall errors..) - Analysis of event durations Event history
analysis Survival data analysis Failure
time analysis hazards risks .. - Analysis of event patterns Sequence analysis
trajectory analysis optimal matching
analysis latent growth curves
22Key to event histories is state space
23Example Cox regression (SPSS example at
www.longitudinal.stir.ac.uk)
24Time series data
Data type 6/6
Statistical summary of one particular concept,
collected at repeated time points from one or
more subjects
- Examples
- Unemployment rates by year in UK
- University entrance rates by year by country
- Comments
- Panel many variables few time points
- cross-sectional time series to economists
- Time series few variables, many time points
- Descriptive analyses e.g. charts of statistics
over time - Advanced modelling analyses typically involve
including autoregressive terms (e.g. lag
effects) amongst explanatory factors
25.Six types of data/analysis!
0. Temporal effects in cross-sectional data 1. Repeated cross-sections
2. Panel datasets 3. Cohort studies
4. Event history datasets 5. Time series analyses
262. Data management issues
..and then theres another thing..
- Working with longitudinal survey data is made
more challenging by important issues of data
management - Variable operationalisations for comparisons
- e.g. strategies for standardisation,
harmonisation - Linking datasets internally to a study
- Linking with other datasets to enhance analysis
- Value of organising your data and files e.g.
Long, 2009 - Recognising data structure in analysis
- e.g. missing data survey effects modelling
specifications
27Dealing with complex data
- In the UK we host many projects and centres which
contribute to enabling the analysis of complex
longitudinal data for social science research - Specifying suitably complex statistical models
- Examples at the Centre for Multilevel Modelling
(E-Stat a generic tool for specifying advanced
models Realcom for analysing longitudinal
missing data) Lancaster-Warwick-Stirling NCRM
Node ULSC (Essex) on survey design effects - Resources on accessing and handling complex data
- e.g. ESDS ADMIN Node Obesity e-lab DAMES Node
- ..Session 17 in yesterdays programme..
28My own pet project concerns comparability of
variables over time..(see www.dames.org.uk)
29Effect proportional scaling using parents
occupational advantage
303. Some closing comments on the analysis of
longitudinal survey data
- Why bother with all this..?
- Focus on change / stability
- Focus on the life course
- Distinguish age, period and cohort effects
- Career trajectories / life course sequences
- Focus on time / durations
- Substantive role of durations (e.g. Unemployment)
- Getting the full picture
- Causality and residual heterogeneity
- Examining multivariate relationships
- Representative conclusions
- e.g. Abbott 2006 Mayer 2005 Menard 2002
Baltagi 2001 Rose 2000 Dale and Davies 1994
Hannan and Tuma 1979 Moser 1958
31Research traditions
- geographers study space and economists study
time - adage quoted in Fotheringham et al. 2000245
- Vast economics literature using techniques for
temporal analysis - Other social science disciplines to some degree
catching up - Though methodological research on longitudinal
models, and data quality, cross-cuts disciplines
e.g. Dale and Davies, 1994 - Data expansions c1990 -gt more encompassing
models new substantive applications areas - For example
- Platt 2005 - ethnic minorities social mobility
1971-2001 - Pahl Pevalin 2005 Friendship patterns over
time - Verbakel de Graaf 2008 spouses effect on
careers 1941-2003 - One challenge is getting used to talking about
time in a more disciplined way e.g. traditional
sociological characterisations of the past and
social change may not be empirically
satisfactory
32Whats exciting in the analysis of longitudinal
social survey data?
By and large, the core analytical methodological issues have been recognised for some time
What is exciting is the rapid expansion of secondary quantitative longitudinal data, its quality, its volume and its accessibility (a) - new data (b) - new tools for accessing, handling and modelling large and complex data
33References
- Abbott, A. (2006). 'Mobility What? When? How?'
in Morgan, S.L., Grusky, D.B. and Fields, G.S.
(eds.) Mobility and Inequality. Stanford
Stanford University Press. - Baltagi, B.H. (2001). Econometric Analysis of
Panel Data. New York Wiley. - Blossfeld, H.P. and Rohwer, G. (2002). Techniques
of Event History Modelling New Approaches to
Causal Analysis, 2nd Edition. Mawah, NJ Lawrence
Erlbaum Associates. - Blossfeld, H. P., Grolsch, K., Rohwer, G.
(2007). Event History Analysis with Stata. New
York Lawrence Erlbaum - Davies, R.B. (1994). 'From Cross-Sectional to
Longitudinal Analysis' in Dale, A. and Davies,
R.B. (eds.) Analysing Social and Political Change
A casebook of methods. London Sage. - Fotheringham, A. S., Brunsdon, C., Charlton, M.
(2000). Quantitative Geography Perspectives on
Spatial Data Analysis. London Sage. - Glenn, N. D. (2005). Cohort Analysis, 2nd
Edition. London Sage. - Hannan, M. T., Tuma, N. B. (1979). Methods for
Temporal Analysis. Annual Review of Sociology, 5,
303-328. - Li, Y., Heath, A. F. (2008). Socio-Economic
Position and Political Support of Black and
Ethnic Minority Groups in the United Kingdom,
1972-2005 computer file. 2nd Ed. Colchester,
Essex UK Data Archive distributor, SN 5666. - Long, J.S. (2009). The Workflow of Data Analysis
using Stata. Boca Raton, Texas - Martin, J., Bynner, J., Kalton, G., Boyle, P.,
Goldstein, H., Gayle, V., Parsons, S. and Piesse,
A. 2006. Strategic Review of Panel and Cohort
Studies. London Longview, and www.longviewuk.com/
- Mayer, K.U. 2005. 'Life courses and life chances
in a comparative perspective' in Svallfors, S.
(ed.) Analyzing Inequality Life Chances and
Social Mobility in Comparative Perspective.
Stanford Stanford University Press. - Menard, S. 2002. Longitudinal Research, 2nd
Edition. London Sage, Number 76 in Quantitative
Applications in the Social Sciences Series. - Moser, C. A. (1958). Survey Methods in Social
Investigation. London Heinemann. - Pahl, R., Pevalin, D. (2005). Between family
and friends a longitudinal study of friendship
choice. British Journal of Sociology, 56(3),
433-450. - Platt, L. (2005). Migration and Social Mobility
The Life Chances of Britain's Minority Ethnic
Communities. Bristol The Policy Press. - Rose, D. (2000). Researching Social and Economic
Change The Uses of Household Panel Studies.
London Routledge. - Taris, T.W. (2000). A Primer in Longitudinal Data
Analysis. London Sage. - Treiman, D.J. (2009). Quantitative Data Analysis
Doing Social Research to Test Ideas. New York
Josey Bass.