Title: Multilevel modelling of multivariate ordered response data
1Multilevel modelling of multivariate ordered
response data
- Harvey Goldstein
- And
- Daphne Kounali
- University of Bristol
2Built on ideas and comments contributed by
- Fiona Steele
- Bill Browne
- Jon Rasbash
- James Carpenter
- Developed from latent normal modelling ideas due
to - Chib,
- Aitchison
- and others
- Funded by
- ESRC
Builds on REALCOM project. Freely available
software and training materials
at http//www.cmm.bristol.ac.uk/research/Realcom/i
ndex.shtml
3The latent normal model
- Goldstein, Carpenter, Kenward, and Levin (GCKL)
propose multilevel multivariate model with
properties - Responses can be at any level of data hierarchy
- Responses can be a mixture of normal, ordered
unordered variates - Observed responses related to underlying
multivariate normal distribution - An MCMC algorithm provides appropriate
transformation steps and also provides imputes
for missing data. - Applications to missing data, partially observed
data and flexible prediction systems - Paper submitted for publication
4An efficient prediction system for adult
measurements based on growth in weight
- Consider the repeated measures model for weight
in children where an adult measure is available
The superscripts denote the level at which
response is measured. This represents a cubic
growth model with intercept and slope random at
the individual level (2) and correlated with the
individual level residual. Given an estimate of
the level 2 covariance this provides an efficient
linear prediction of the individual adult level
measure from a collection of growth measures.
5Missing data
- In a repeated measures design data may be missing
in the sense that individuals do not attend for
measurement - The standard model (as above) ignores this since
it models directly the relationship with time
(age) - Nevertheless, missing data due to attrition may
have atypical values and standard ways to deal
with this involve studying earlier
characteristics of individuals with missing data.
- In this paper we look instead to see if the
number of occasions measured during two childhood
periods is related to growth parameters and adult
measures.
6Joint modelling of growth and number of visits
- The data are
- 1000 subjects with 4859 repeated measurements of
weight - nine occasions between birth and age 10
- adult body mass index (BMI, measured in kg/m2)
and plasma glucose (mmol/liter) with log
transforms at around age 30. - The model is a cubic polynomial in age from 2-10
years (childhood) and a quadratic from 1-2 years
(infancy) with intercept and linear random at
individual level, together with 4 individual
level variables - Log(glucose), log(BMI), number of infancy
occasions measured, number of childhood occasions
measured. -
7Full model
8The latent normal model
- Consider the probit link proportional odds model.
where is the probability that the
observation occurs in category g (g1,,p) and
is the pdf for the standard normal
distribution.
Note that in general the linear predictor
will contain higher level random effects and
terms that arise from conditioning on all
correlated random effects from the remaining
responses. Reduces to ordinary probit for binary
response.
9Handling count data
- Any ordered variate, including count data, can be
treated using the ordered probit model - With large numbers of categories this involves
estimating many threshold parameters , so
instead we propose
- Replace threshold parameters with a smooth
function of the count e.g. regression spline,
fractional polynomial etc. We use simple
polynomial
B) Fit a latent normal (1 parameter) Poisson
model to cumulative count probabilities
For reference value . Thus, given
, is determined for unit i
10MCMC steps
- Sample underlying normal from observed count
data, conditioning on correlated responses
yields MVN data - Sample fixed effects
- Sample level 2 residuals
- Sample level 1 residuals
- Sample level 1 and level 2 covariance matrices
- Details in GCKL and REALCOM training materials
- Default (uniform) prior distributions assumed.
- Mixture of Gibbs (fixed effects, level 2
residuals and level 1 variance) and MH sampling
(level 2 covariance matrix and threshold
parameters)
11(No Transcript)
12Model A treats the number of measurement
occasions in infancy and childhood as ordered
categories where each threshold parameter is
estimated. Model B smooths the threshold
parameters using a second order polynomial and
model C fits a Poisson model.
13Correlations at individual level
- Variances on the diagonal, and correlations below
the diagonal. Poisson model (MVN scale). Note
unit variances for counts.
Note that correlations of counts with growth and
adult height are very small, implying that
attrition can be treated as random.
14Further developments
- Important special case is zero truncated Poisson
e.g. no of children in a family (gt0). - Can be extended to cross classifications and
further levels of nesting. - Covariates such as gender, height etc. can be
incorporated as predictors or further responses
that will be conditioned upon for linear
prediction of adult measures. - Care needed when assuming a distributional form
and smoother model can be used very generally. - Multiple imputation is easily incorporated when
data, e.g. on covariates, are missing - Other discrete distributions, e.g. Zipf
distribution, can be handled similarly. - Experimental software available. GCKL model
software freely available at http//www.cmm.bristo
l.ac.uk/research/Realcom/index.shtml