Title: REALCOM
1REALCOM
- Multilevel models for realistically complex data
An ESRC research project at Bristol University
Methodology and examples for
Measurement errors Multilevel Structural
equations Multivariate responses at several
levels and of different types
2(No Transcript)
3General Format
- MATLAB software
- Free standing executable programs
- ASCII and worksheet input and output
- Graphical menu based input specification
- Model equation display
- Monitoring of MCMC chains
- A training manual containing
- Outline of methodology
- Worked through examples
4Markov Chain Monte Carlo a quick introduction
- Bayesian simulation based method that, given
starting values samples a new set of parameters
at each cycle of a Markov chain - This yields a final chain (after discarding a
burn-in set) of, say, 5000 sets of values from
the (joint) posterior distribution of the
parameters - This is formed by combining the likelihood based
on the data and a prior distribution typically
diffuse. - These chains are used for inference e.g. the
mean for a parameter is analogous to the point
estimate from a likelihood analysis, intervals
etc.
5 Consider the simple 2-level model
The parameters in this model are the fixed
coefficients, the two variances and the level 2
residuals.
From suitable starting values eventually the
chain settles down so that sampling is from the
true posterior distribution and we need to sample
sufficient to provide stable estimates using
suitable convergence criteria.All the MATLAB
routines use MCMC sampling.
6Measurement errors
- Continuous variables a simple example
- Basic model is
- With a model of interest e.g.
7Some assumptions we need to make
-
- Variance assumed known or alternatively
- Reliability
- We also need a distribution for true value
- An important issue is value for and
sensitivity analysis useful we can also give it
a prior.
82. Missclassification errors
- Assume a binary (0,1) variable, for example
whether or not a school pupil is eligible for
free school meals (yes1) - Probability of observing a zero (no eligibility),
given that the true value is zero, is
and the probability of observing a one
given that the true value is zero by
- likewise we have and - We now assume we know these missclassification
probabilities similar target model as before
with a binary predictor.
9Modelling considerations
- We can model multivariate continuous measurement
errors, but only independent binary
missclassifications. - We can allow different measurement error
variances and covariances for different groups
e.g. gender. - In multivariate case we typically need non-zero
correlations between measurement errors - Thus, say, if R0.7 observed correlation 0.8
then we require measurement error correlation
gt0.33
10An educational example
- Maths test score related to prior test scores and
FSM eligibility. - We will look at continuous, correlated and binary
measurement errors.
Open measurement-error.exe and read file
classsize
11Summary table for analyses
12Factor analysis and structural equation models
Consider a single level factor model where we
have several responses on each member of a
sample Where r indexes the response variable
and i the person. This is a special kind of
multivariate model where we assume the residuals
are independent and the covariance between two
responses is thus given by
A constraint is needed for identifiability and
the default is to choose
13Extensions- further factors
- We can add explanatory variables in addition to
the - (see later) or we can add further factors
-
As number of factors increases, we require
further constraints, typically on loading values.
A popular choice is simple structure with each
response loading on only 1 factor and non-zero
correlations between factors.
14Extensions structural variables
- We can allow the factors themselves to depend on
further variables e.g.
Or alternatively, but less commonly
15Two level factor models
Standard formulation
Alternatively
But we shall not consider this case
16Example PISA data
- A survey of reading performance, of 15 year olds
in 32 countries by OECD in 2000. - We use one subscale of 35 items retrieving
information - and look at France and England.
- First we shall fit one and two level models
assuming responses are Normal in fact they are
binary and ordered but we come to that later. - Open structural-equation.exe load pisadata
17Binary and ordered responses
- Assume a binary response z.
- We will use the idea of a latent Normal
distribution. Consider the (factor) model for a
single response
Where we observe a positive (1) response for our
binary variable z if y is positive, that is
So that we obtain the probit model
18Ordered data
Consider the cumulative probability of being in
one of the lowest s1 categories of a p category
variable - categories numbered from 0 upwards
s0,p-2 We extend the binary response model
as Where the define a set
of thresholds for the categories. So suppose we
have a 3-category variable, then for observed
responses
19PISA data with binary/ordered responses
- In fact all the responses are binary except for 4
with 3 ordered categories C9, C14, C20, and C26 - Change these responses and rerun models.
- Finally fit explanatory variables Country and
Gender in structural part of model.
20Multivariate models with responses at 2 levels
- Consider first 2 Normal responses
- Superscript indicates level
- Models are linked via level 2 covariance matrix
- MCMC algorithm handles missing response data and
categorical (binary, ordered and unordered) as
well as Normal data. - First example is a repeated measures growth curve
model
21Child heights adult height
Child height as a cubic polynomial with intercept
slope random at level 2
22- Load growthdata.txt and fit the model
- Results
-
23Adult height prediction
- Suppose we have 2 growth measures we want a
regression prediction of the form - This leads to
24Mixed response types and missing data
- Normal and ordered data already considered in
structural equation models - We now introduce unordered categorical responses
- We can also have general Normalising
transformations - Missing data via imputation is an important
application for these models
25Unordered categorical responses
Assume p categories where an individual responds
to just one.
- We have
where h indexes the response. For each
we assume an underlying latent variable
exists and that we have the following model - For identifiability we model p-1 categories and
assume . - The maximum indicant model we observe category h
for individual i iff . -
- so that
26(No Transcript)
27Multiple imputation briefly and simply
Consider the model of interest (MOI) We turn
this into a multivariate response model and
obtain residual estimates of
(from an MCMC chain)
which are missing. Use these to fill in
and produce a complete data set. Do this
(independently) n (e.g. 20) times. Fit MOI to
each data set and combine according to rules to
get estimates and standard errors.
28Class size example
- Load classsize_impute
- MOI is Normalised exam score as response
regressed on pretest score, gender, FSM, class
size. 50 level 1 units have missing data.
Multivariate model
29MI estimates vs listwise deletion
- Fixed effects in multivariate model 50 records
MCAR
Estimate Listwise (SE) MI (SE) Complete (SE)
Post maths 0.102 (0.088) 0.134 (0.071) 0.134 (0.070)
Pre Maths 0.011 (0.088) 0.032 (0.071) 0.019 (0.071)
Gender 0.096 (0.074) 0.073 (0.047) 0.069 (0.047)
FSM -1.124 (0.159) -1.090 (0.129) -1.064 (0.129)
Class size (-30) -4.030 (0.602) -4.049 (0.597) -4.267 (0.544)
30Further extensions
- Box-Cox normalising transformations
- Application to survival data treated as an
ordered response when divided into discrete time
intervals - Combination of measurement errors, structural
models and responses at gt1 level into a single
program - Incorporation into MLwiN
31General remarks
- Report back welcome (h.goldstein_at_bristol.ac.uk)
- A REALCOM discussion group is under consideration
-
-
Use with care!