Title: Missing data
1Missing data issues and extensions
- For multilevel data we need to impute missing
data for variables defined at higher levels - We need to have a valid procedure for discrete
variables - Useful to include sampling weights
- Can we deal with partially missing data?
2Consider the imputation stage with a set of
multivariate responses
- We illustrate first with a simple model where the
response joint distribution is MVN and there are
responses at 2 levels - To illustrate how such a model is specified
consider repeated measures of childrens heights
level 2 is the childs adult height.
3Child heights adult height
Child height as a cubic polynomial with intercept
slope random at level 2 and both correlated
with adult height random effect to give 3-variate
normal.
This allows us jointly to model level1 and level
2 variables with missing data. (see Goldstein and
Kounali, JRSSA, 2009)
4Thus, if data are missing at either level 1 or
level 2 they will get imputed via the MCMC
algorithm.
5Mixed response types
- For ordered, or unordered categorical data we can
specify corresponding latent normal
distributions. - For ordered response we can consider a probit
threshold model s.t. - the cumulative probability of being in one of the
categories 1,,s is -
- and the associated latent normal model is
- For a p category unordered response we can
define a latent p-1 variate normal
We can define MCMC steps to sample form observed
categorical responses an underlying normal or
MVN. Note that these are further conditioned on
the remaining set of (correlated) normal
variables. For details see Multilevel models with
multivariate mixed response types (2009)
Goldstein, H, Carpenter, J., Kenward, M., Levin,
K. Statistical Modelling (to appear)
6Imputation
- So now with any mixture of categorical and normal
variables at any level, we sample, for each MCMC
iteration, a MVN set of variables including
imputed values. - Thus imputation is standard and the reverse
transformation is used to obtain imputed
variables on the categorical scales. - For non-normal continuous data we can use e.g. a
Box-Cox normalising transformation to sample a
latent normal. Further extensions for Poisson and
other discrete distributions are also available. - Release 2.10 of MLwiN has a link to REALCOM that
allows these extensions.
7Partially observed (coarsened) data
- Where we have a prior (estimated) probability
distribution (PD) for a missing discrete (or
continuous) variable value we simply insert an
extra MCMC step that accepts the standard MI
value with a probability that is just the
probability given by the PD. A corresponding step
is used for normal data. - This thus uses all of the data efficiently. No
data are discarded so long as it is possible to
assign a PD. - Applications in record matching, rating scales
with uncertain responses etc. - Several completed data sets are produced and
combined as in standard MI
8Sampling weights- briefly
- Consider a 2-level model
- Write level 2 weights as
- Level 1 weights for j-th level 2 unit as
-
- Final level 1 weights
- We use as the level 1 random part
explanatory variable instead of the constant 1 - This will be used for imputation and for MOI
Ongoing work to incorporate this into
MLwiN-REALCOM