Title: Missing Data Models
1Missing Data Models
- Introduction to missing data models
- Types of missing data
- Corrections for missing data in theory
- Missing data in practice
2Introduction to Missing DataNotes taken from
King, et al (2001)
- According to King, et al, on average about half
of the respondents to surveys in political
science research do not answer on or more of the
questions used in the analysis. - Almost all analysts contaminate their data by
filling in educated guesses for some of these
questions (e.g. for party identification
questions, dont know independent). - Even if these guesses are correct on average,
filling in missing cells of our data matrix in
this way biases our regression coefficients
standard errors downwards. - When educated guesses are not possible, the
standard remedy is listwise deletion of missing
data, eliminating entire observations in a
wholesale manner. - Valuable information is lost, and severe
selection bias effects are possible.
3Some notation
- Let D denote the data matrix, where D includes
independent and dependent variables. D X ,
Y. - We assume that some elements of the data matrix
are missing. - Let M denote the missingness indicator matrix
with the same dimensions of D. Each element of M
is a one or zero that indicates whether or not an
element of D is missing. - Dij 0 indicates that the ith observation for
the jth variable is missing, but that the data
could be observed. - Dij 1 means that piece of data is present.
- Comment it is possible that data cannot be
observed. Sometimes a dont know really means
that the respondent has no basis on which to
provide an answer. - Finally, let Dobs and Dmis denote the observed
and missing parts of the D. - D Dobs, Dmis.
4Three Types of Missingness
- Missing Completely at Random if the data are
missing completely at random then missing values
cannot be predicted any better with the
information in D, observed or not. - Formally, M is independent of D. So, P( M D )
P( M ). - A process is missing completely at random if,
say, an individuals decides whether to answer
survey questions on the basis of coin flips. - If independent are more likely to decline to
answer a vote preference or party id question,
then the data are not missing completely at
random. - In the unlikely event that the process is missing
completely at random, then inferences based on
listwise deletion are unbiased, but inefficient
because we have lost some cases.
5Three Types of Missingness
- Missing at Random if the data are missing at
random then the probability that a cell is
missing may depend on Dobs, but after controlling
for Dobs that probability must be independent of
Dmis. - In other words, the process that determines
whether or not a cell is missing should not
depend on the values in the cell. - Formally, M is independent of Dmis P( M D )
P( M Dobs ) - For example, if Democratic identifiers are more
likely to refuse to answer the vote choice
question, then the process is missing at random
so long as party id is a question to which at
least some people respond. - If data is missing at random, then inferences
based on listwise deletion will be biased and
inefficient.
6Three Types of Missingness
- Non-Ignorable if the probability that a cell is
missing depends on the unobserved value of the
missing response, then the process is
non-ignorable. - Formally, P( M D ) cannot be simplified.
- A standard example is individuals responses to
income questions, where high income people are
more likely to refuse to answer survey questions
about income and other variables in the data set
cannot predict which respondents have high
income. - If your missing data is non-ignorable, then
inferences based on listwise deletion will be
biased and inefficient (and multiple imputation
algorithms that we will talk about wont be of
much aid).
7Multiple Imputation of Missing Data
- Multiple imputation involves imputing m values
for each missing item and creating m completed
data sets. - Across each of the m data sets
- If Mij 1, then Dij the observed data.
- If Mij 0, then Dij an imputed value.
- The imputed values for the data set are based on
our guesses about the value of Mij together with
summaries of our uncertainty regarding the
missing values. - For each imputed data set, you perform whatever
statistical analysis you normally would. Then,
you average your results over the m computed
analyses. - According to King, et al, about 5 or 10 imputed
data sets is often satisfactory.
8Calculating Quantities of InterestApplication of
Kings Technique
- To estimate some quantity of interest Q such as a
variable mean or regression coefficient, the
overall point estimate q of Q is the average of
the m separate estimates qj. - That is, q ?j1 to m qj / m
- Let SE(qj) denote the standard error of qj for
data set j and let - S2q ?j1 to m (qj q)2 / ( m-1 ) denote the
sample variance across the m point estimates. - Then the variance of the multiple imputation
point estimates is the weighted average of the
estimated variances from within each completed
data set plus the sample variance in the point
estimates across the data sets. - The weight is a function of the number of data
sets, so that if m ?, then it would be a
straightforward average of the two sources of
uncertainty. - That is SE(q)2 (1/m) ?j1 to m SE( qj )2
Sq2(1 1/m) - Implemented in WinBugs, this standard error
calculation wont be needed.
9King, et als, imputation model
- To implement multiple imputation, we need a
statistical model that we can use to sample
missing values. - To use Kings AMELIA program (and our most
extensive WinBugs alternative), we assume that
the data are missing at random conditional on the
imputation model (where the model is defined to
include the variables that provide information
about the missing data process.) - Kings missing data program is based on the
assumption that all of the variables in our model
are jointly multivariate normal density. - ? King argues that in most cases, the
multivariate normal density is a very good
approximation, even if some of our variables are
ordinal.
10King, et als, imputation model
- Stated more formally, King et al assume that for
each observation i (i 1, , n), Di denotes the
vector of values for the p variables including
the dependent and independent variables. - The likelihood function for the complete data set
is given by - L(?, ? D) ?? ??i N( Di ?, ? )
- where ? is the vector of p means and ? is the p x
p dimensional variance-covariance matrix that
provides information about how values of the
independent variables depend on one another. - If we assume that the data are missing at random,
then the likelihood function for the observed
data are given by - L(?, ? Dobs) ?? ??i N( Di,obs ?i,obs, ?i,obs
) - King et al note that this will be a tough
likelihood to work with because each observation
is likely to have a different combination of
missing values.
11King, et als, imputation model
- The multivariate normal model implies that each
missing value is imputed linearly. That is, we
can simulate a missing value in the way that we
would usually simulate from a regression. - For example, let Dij denote a simulated value of
observation i and variable j and let Di,?j denote
the vector of values of all observed variables
in row i except j. - Then the posterior distribution of the
coefficient B?j from a regression of Dj on D?j
(which can be calculated from ? and ?) can be
used such that Dij Di,?j B?j ei. - An alternative way to think about this is to
imagine a multivariate normal distribution. An
imputed draw from the multivariate normal
distribution is a draw from the slice of the
distribution for Dmis corresponding to the value
of Dobs.
12Missing Data Algorithms
- Gary King at Harvard has a very easy to use
program called AMELIA that creates data sets with
imputed missing values. - The program is designed to read in a raw data set
with missing values and outputs m new data sets
with imputed missing values. - You then run your analyses on the m imputed data
sets, take averages of coefficients and standard
errors as discussed above, and that is it. - Software packages (including I think SPSS)
increasingly have built in algorithms that
perform this sort of imputation. - Rather than using a small number of multiply
imputed data sets, one might incorporate a
multiple imputation algorithm into your Gibbs
Sampler, taken an independent draw from the
multiple imputation data set with each iteration
of your program. - The advantage of this is that the posterior
standard errors for regression coefficients
already summarize all of the uncertainty about
the process for the missing data and the
uncertainty about the coefficients themselves. - The disadvantage is that the imputations will be
much slower and you will have to check for
convergence. Furthermore, WinBugs wont let you
do this properly except for dependent variables!
13Handling Missing Data in WinBugs
- As you are now aware, WinBugs is an incredibly
flexible, easy to use program with a great
user-friendly interface. Models for missing data
are as easy to implement as almost anything else.
- To enter a data point as missing in WinBugs,
enter an NA in the appropriate matrix cell. - The trick to working with missing data in WinBugs
is that the program is going to treat all of the
missing elements of the data matrix as if they
were unknown model parameters. - All that you have to do is assign a reasonable
probability model to those parameters and the
built in sampling algorithms will handle the
rest. - If you dont assign a reasonable probability
model to those parameters, the program wont run.
It is as if you are trying to run a regression
model without defining prior values for your
regression coefficients. (your probability model)
14WinBugs implementation for a missing dependent
variable
- If you have a missing dependent variable, then
you dont need to do anything different to get
WinBugs to model the missing data. - That is, we assume that yi N(?i, ?),
- where ?i b1 b2X2i bpxpi
- bj N(0, .001) and ? Gamma(.001, .001)
If the data are missing at random or missing
completely at random, then the explanatory
variables X should provide all of the information
that we need to make imputations for the missing
variables yi.
15WinBugs Implementation for data with a missing
dependent variable
- model
- for (i in 11000)
- Y2i dnorm(mui, tau)
- mui lt- b1 b2X1i b3X2i
-
- for (j in 13) bj dnorm(0,.001)
- tau dgamma(.001, .001)
- templt- mean(Y)
-
- list(tau1)
16Posterior summaries of imputed values for Y.
- node mean sd MC error 2.5 median 97.5
- Y25 1.009 1.067 0.03682 -0.9771 1.043 3.201
- Y211 0.8066 0.9817 0.0388 -1.167 0.7347 2.855
- Y212 0.7601 1.056 0.04715 -1.255 0.8032 2.733
- Y215 0.9014 1.066 0.04134 -1.162 0.8827 3.093
- Y221 1.513 1.083 0.05024 -0.6363 1.524 3.608
- Y223 0.9331 1.013 0.05287 -1.169 0.9949 2.83
- Y225 0.6762 1.016 0.0466 -1.232 0.6886 2.681
These are posterior simulations of the missing
data points. They are monitored by setting
WinBugs to monitor (in this case) Y2. Be sure to
check the standard convergence diagnostics, etc.
You will obviously also want information about
the regression coefficients, etc.
17Bayesian implementation ofmissing independent
variables
- Bayesian Best-Guess Methodology
- Social scientistsespecially survey
researchersoften use their best guess about the
value of missing data points. For example, say
that an individual who responds dont know to a
question about partisan identification is an
independent. - If our data is missing completely at random, then
we can use a Bayesian analogue to best-guess
methodology that doesnt lead to biased standard
errors, except insofar as any Bayesian approach
that incorporates prior information necessarily
creates bias. - The intuition behind this approach is that as
Bayesians, we must treat missing data as unknown
parameters. - So, like for any unknown parameter, we assign
prior beliefs for the distribution of the missing
data point, but must assume that the prior is
independent of the data. We then take multiple
imputations from the distribution of our prior
beliefs. - ?This can be built into our sampling algorithm,
but unfortunately not into WinBugs because
WinBugs will want to update the prior based on
the data.
18Bayesian Best-Guess Imputations
- Suppose that yi N(?i, ?) and there is no
missingness in the vector Y, - where ?i b1 b2X2i bpxpi
- bj N(0, .001) and ? Gamma(.001, .001)
- However, there does exist some missing data for
the vectors of covariates X2i and X3i. In this
case, we can assign prior distributions to each
of the missing covariates. - For example, suppose X2 represented data from the
7-point party identification scale and we
believed that on average the missing data points
were independents, but there was still a small
amount of uncertainty about the values of X2. In
this case, we might use a prior distribution for
X2 with mean 4 and precision 2. - Thus, Xmis, 2i N(4,2).
- Suppose further than X3 represented respondent
placements of candidate on the 100 point feeling
thermometer scores. In this case, we might use a
diffuse prior representing real ignorance about
the likely values. - Thus, X3i, mis N(50, .05)
19WinBugs Implementation of Best-Guess Methodology
with missing data for X
WinBugs will not allow you to implement Bayesian
Best-Guess unless you are more clever than I am
and can defeat it. The problem is that if you
specify a prior for X like I do to the right,
then WinBugs treats X as a parameter to be
estimated in the model. In this case, X is the
prior distribution for something like a
regression coefficient. So that if X11 is
missing, X11 is chosen to minimize the error term
just like the b parameters.
20WinBugs implementation for missing variables.
- Standard Multiple Imputation Approaches
- For this method, we assume that all of the data
are jointly distributed multivariate normal with
some unknown mean (denoted g) and covariance
(denoted G). - We then estimate the mean and covariance
parameters, and simulate missing values from the
multivariate normal distribution per Kings
method, or we can just use the draws from the
posterior means and covariances for our final
analyses directly.
21WinBugs implementation of Multiple Imputation
- model
- for (i in 11000)
- Xi,13 dmnorm(g , G,)
-
- assign a prior distribution for the mean
parameters for X - g13 dmnorm(g0 , gv,)
- g01 lt- 0 g02 lt- 0 g03 lt- 0
- gv1,1 lt- .1 gv1,2 lt- 0 gv1,3 lt- 0
- gv2,1 lt- 0 gv2,2 lt- .1 gv2,3 lt- 0
- gv3,1 lt- 0 gv3,2 lt- 0 gv3,3 lt- .1
- assign a prior distribution for the precision
parameters for X. - G13,13 dwish(Omega1,,3)
- VarCov13,13 lt- inverse(G,) monitor this
node to get the inverse of the precision matrix - Omega11,1lt- .01 Omega11,2lt- 0 Omega11,3lt-
0
X,1 X,2 X,3 NA 0.451768816 0.504519352 0.708
124967 NA 2.689245921 NA NA 0.523380701 0.53947122
3 0.992485265 3.316158563 0.294455879 0.649040019
3.729984792 etc
22WinBugs implementation of missing data models
- You can use independent draws from the Coda
chains for the missing observations or from the
posterior distributions for g and G for multiply
imputed data sets. - If you go the latter route, draws from the
posterior distribution of g and G can be used
within a WinBugs program to estimate probability
models based on the concept of direct sampling
from the posterior. - We went over these sorts of routines when we
covered things like normal regression models with
conjugate (and improper) priors.