Missing Data Models - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Missing Data Models

Description:

... in political science research do not answer on or more of the questions used in the analysis. ... whether to answer survey questions on the basis of coin ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 23
Provided by: jeffgry
Category:
Tags: data | missing | models

less

Transcript and Presenter's Notes

Title: Missing Data Models


1
Missing Data Models
  • Introduction to missing data models
  • Types of missing data
  • Corrections for missing data in theory
  • Missing data in practice

2
Introduction to Missing DataNotes taken from
King, et al (2001)
  • According to King, et al, on average about half
    of the respondents to surveys in political
    science research do not answer on or more of the
    questions used in the analysis.
  • Almost all analysts contaminate their data by
    filling in educated guesses for some of these
    questions (e.g. for party identification
    questions, dont know independent).
  • Even if these guesses are correct on average,
    filling in missing cells of our data matrix in
    this way biases our regression coefficients
    standard errors downwards.
  • When educated guesses are not possible, the
    standard remedy is listwise deletion of missing
    data, eliminating entire observations in a
    wholesale manner.
  • Valuable information is lost, and severe
    selection bias effects are possible.

3
Some notation
  • Let D denote the data matrix, where D includes
    independent and dependent variables. D X ,
    Y.
  • We assume that some elements of the data matrix
    are missing.
  • Let M denote the missingness indicator matrix
    with the same dimensions of D. Each element of M
    is a one or zero that indicates whether or not an
    element of D is missing.
  • Dij 0 indicates that the ith observation for
    the jth variable is missing, but that the data
    could be observed.
  • Dij 1 means that piece of data is present.
  • Comment it is possible that data cannot be
    observed. Sometimes a dont know really means
    that the respondent has no basis on which to
    provide an answer.
  • Finally, let Dobs and Dmis denote the observed
    and missing parts of the D.
  • D Dobs, Dmis.

4
Three Types of Missingness
  • Missing Completely at Random if the data are
    missing completely at random then missing values
    cannot be predicted any better with the
    information in D, observed or not.
  • Formally, M is independent of D. So, P( M D )
    P( M ).
  • A process is missing completely at random if,
    say, an individuals decides whether to answer
    survey questions on the basis of coin flips.
  • If independent are more likely to decline to
    answer a vote preference or party id question,
    then the data are not missing completely at
    random.
  • In the unlikely event that the process is missing
    completely at random, then inferences based on
    listwise deletion are unbiased, but inefficient
    because we have lost some cases.

5
Three Types of Missingness
  • Missing at Random if the data are missing at
    random then the probability that a cell is
    missing may depend on Dobs, but after controlling
    for Dobs that probability must be independent of
    Dmis.
  • In other words, the process that determines
    whether or not a cell is missing should not
    depend on the values in the cell.
  • Formally, M is independent of Dmis P( M D )
    P( M Dobs )
  • For example, if Democratic identifiers are more
    likely to refuse to answer the vote choice
    question, then the process is missing at random
    so long as party id is a question to which at
    least some people respond.
  • If data is missing at random, then inferences
    based on listwise deletion will be biased and
    inefficient.

6
Three Types of Missingness
  • Non-Ignorable if the probability that a cell is
    missing depends on the unobserved value of the
    missing response, then the process is
    non-ignorable.
  • Formally, P( M D ) cannot be simplified.
  • A standard example is individuals responses to
    income questions, where high income people are
    more likely to refuse to answer survey questions
    about income and other variables in the data set
    cannot predict which respondents have high
    income.
  • If your missing data is non-ignorable, then
    inferences based on listwise deletion will be
    biased and inefficient (and multiple imputation
    algorithms that we will talk about wont be of
    much aid).

7
Multiple Imputation of Missing Data
  • Multiple imputation involves imputing m values
    for each missing item and creating m completed
    data sets.
  • Across each of the m data sets
  • If Mij 1, then Dij the observed data.
  • If Mij 0, then Dij an imputed value.
  • The imputed values for the data set are based on
    our guesses about the value of Mij together with
    summaries of our uncertainty regarding the
    missing values.
  • For each imputed data set, you perform whatever
    statistical analysis you normally would. Then,
    you average your results over the m computed
    analyses.
  • According to King, et al, about 5 or 10 imputed
    data sets is often satisfactory.

8
Calculating Quantities of InterestApplication of
Kings Technique
  • To estimate some quantity of interest Q such as a
    variable mean or regression coefficient, the
    overall point estimate q of Q is the average of
    the m separate estimates qj.
  • That is, q ?j1 to m qj / m
  • Let SE(qj) denote the standard error of qj for
    data set j and let
  • S2q ?j1 to m (qj q)2 / ( m-1 ) denote the
    sample variance across the m point estimates.
  • Then the variance of the multiple imputation
    point estimates is the weighted average of the
    estimated variances from within each completed
    data set plus the sample variance in the point
    estimates across the data sets.
  • The weight is a function of the number of data
    sets, so that if m ?, then it would be a
    straightforward average of the two sources of
    uncertainty.
  • That is SE(q)2 (1/m) ?j1 to m SE( qj )2
    Sq2(1 1/m)
  • Implemented in WinBugs, this standard error
    calculation wont be needed.

9
King, et als, imputation model
  • To implement multiple imputation, we need a
    statistical model that we can use to sample
    missing values.
  • To use Kings AMELIA program (and our most
    extensive WinBugs alternative), we assume that
    the data are missing at random conditional on the
    imputation model (where the model is defined to
    include the variables that provide information
    about the missing data process.)
  • Kings missing data program is based on the
    assumption that all of the variables in our model
    are jointly multivariate normal density.
  • ? King argues that in most cases, the
    multivariate normal density is a very good
    approximation, even if some of our variables are
    ordinal.

10
King, et als, imputation model
  • Stated more formally, King et al assume that for
    each observation i (i 1, , n), Di denotes the
    vector of values for the p variables including
    the dependent and independent variables.
  • The likelihood function for the complete data set
    is given by
  • L(?, ? D) ?? ??i N( Di ?, ? )
  • where ? is the vector of p means and ? is the p x
    p dimensional variance-covariance matrix that
    provides information about how values of the
    independent variables depend on one another.
  • If we assume that the data are missing at random,
    then the likelihood function for the observed
    data are given by
  • L(?, ? Dobs) ?? ??i N( Di,obs ?i,obs, ?i,obs
    )
  • King et al note that this will be a tough
    likelihood to work with because each observation
    is likely to have a different combination of
    missing values.

11
King, et als, imputation model
  • The multivariate normal model implies that each
    missing value is imputed linearly. That is, we
    can simulate a missing value in the way that we
    would usually simulate from a regression.
  • For example, let Dij denote a simulated value of
    observation i and variable j and let Di,?j denote
    the vector of values of all observed variables
    in row i except j.
  • Then the posterior distribution of the
    coefficient B?j from a regression of Dj on D?j
    (which can be calculated from ? and ?) can be
    used such that Dij Di,?j B?j ei.
  • An alternative way to think about this is to
    imagine a multivariate normal distribution. An
    imputed draw from the multivariate normal
    distribution is a draw from the slice of the
    distribution for Dmis corresponding to the value
    of Dobs.

12
Missing Data Algorithms
  • Gary King at Harvard has a very easy to use
    program called AMELIA that creates data sets with
    imputed missing values.
  • The program is designed to read in a raw data set
    with missing values and outputs m new data sets
    with imputed missing values.
  • You then run your analyses on the m imputed data
    sets, take averages of coefficients and standard
    errors as discussed above, and that is it.
  • Software packages (including I think SPSS)
    increasingly have built in algorithms that
    perform this sort of imputation.
  • Rather than using a small number of multiply
    imputed data sets, one might incorporate a
    multiple imputation algorithm into your Gibbs
    Sampler, taken an independent draw from the
    multiple imputation data set with each iteration
    of your program.
  • The advantage of this is that the posterior
    standard errors for regression coefficients
    already summarize all of the uncertainty about
    the process for the missing data and the
    uncertainty about the coefficients themselves.
  • The disadvantage is that the imputations will be
    much slower and you will have to check for
    convergence. Furthermore, WinBugs wont let you
    do this properly except for dependent variables!

13
Handling Missing Data in WinBugs
  • As you are now aware, WinBugs is an incredibly
    flexible, easy to use program with a great
    user-friendly interface. Models for missing data
    are as easy to implement as almost anything else.
  • To enter a data point as missing in WinBugs,
    enter an NA in the appropriate matrix cell.
  • The trick to working with missing data in WinBugs
    is that the program is going to treat all of the
    missing elements of the data matrix as if they
    were unknown model parameters.
  • All that you have to do is assign a reasonable
    probability model to those parameters and the
    built in sampling algorithms will handle the
    rest.
  • If you dont assign a reasonable probability
    model to those parameters, the program wont run.
    It is as if you are trying to run a regression
    model without defining prior values for your
    regression coefficients. (your probability model)

14
WinBugs implementation for a missing dependent
variable
  • If you have a missing dependent variable, then
    you dont need to do anything different to get
    WinBugs to model the missing data.
  • That is, we assume that yi N(?i, ?),
  • where ?i b1 b2X2i bpxpi
  • bj N(0, .001) and ? Gamma(.001, .001)

If the data are missing at random or missing
completely at random, then the explanatory
variables X should provide all of the information
that we need to make imputations for the missing
variables yi.
15
WinBugs Implementation for data with a missing
dependent variable
  • model
  • for (i in 11000)
  • Y2i dnorm(mui, tau)
  • mui lt- b1 b2X1i b3X2i
  • for (j in 13) bj dnorm(0,.001)
  • tau dgamma(.001, .001)
  • templt- mean(Y)
  • list(tau1)

16
Posterior summaries of imputed values for Y.
  • node mean sd MC error 2.5 median 97.5
  • Y25 1.009 1.067 0.03682 -0.9771 1.043 3.201
  • Y211 0.8066 0.9817 0.0388 -1.167 0.7347 2.855
  • Y212 0.7601 1.056 0.04715 -1.255 0.8032 2.733
  • Y215 0.9014 1.066 0.04134 -1.162 0.8827 3.093
  • Y221 1.513 1.083 0.05024 -0.6363 1.524 3.608
  • Y223 0.9331 1.013 0.05287 -1.169 0.9949 2.83
  • Y225 0.6762 1.016 0.0466 -1.232 0.6886 2.681

These are posterior simulations of the missing
data points. They are monitored by setting
WinBugs to monitor (in this case) Y2. Be sure to
check the standard convergence diagnostics, etc.
You will obviously also want information about
the regression coefficients, etc.
17
Bayesian implementation ofmissing independent
variables
  • Bayesian Best-Guess Methodology
  • Social scientistsespecially survey
    researchersoften use their best guess about the
    value of missing data points. For example, say
    that an individual who responds dont know to a
    question about partisan identification is an
    independent.
  • If our data is missing completely at random, then
    we can use a Bayesian analogue to best-guess
    methodology that doesnt lead to biased standard
    errors, except insofar as any Bayesian approach
    that incorporates prior information necessarily
    creates bias.
  • The intuition behind this approach is that as
    Bayesians, we must treat missing data as unknown
    parameters.
  • So, like for any unknown parameter, we assign
    prior beliefs for the distribution of the missing
    data point, but must assume that the prior is
    independent of the data. We then take multiple
    imputations from the distribution of our prior
    beliefs.
  • ?This can be built into our sampling algorithm,
    but unfortunately not into WinBugs because
    WinBugs will want to update the prior based on
    the data.

18
Bayesian Best-Guess Imputations
  • Suppose that yi N(?i, ?) and there is no
    missingness in the vector Y,
  • where ?i b1 b2X2i bpxpi
  • bj N(0, .001) and ? Gamma(.001, .001)
  • However, there does exist some missing data for
    the vectors of covariates X2i and X3i. In this
    case, we can assign prior distributions to each
    of the missing covariates.
  • For example, suppose X2 represented data from the
    7-point party identification scale and we
    believed that on average the missing data points
    were independents, but there was still a small
    amount of uncertainty about the values of X2. In
    this case, we might use a prior distribution for
    X2 with mean 4 and precision 2.
  • Thus, Xmis, 2i N(4,2).
  • Suppose further than X3 represented respondent
    placements of candidate on the 100 point feeling
    thermometer scores. In this case, we might use a
    diffuse prior representing real ignorance about
    the likely values.
  • Thus, X3i, mis N(50, .05)

19
WinBugs Implementation of Best-Guess Methodology
with missing data for X
WinBugs will not allow you to implement Bayesian
Best-Guess unless you are more clever than I am
and can defeat it. The problem is that if you
specify a prior for X like I do to the right,
then WinBugs treats X as a parameter to be
estimated in the model. In this case, X is the
prior distribution for something like a
regression coefficient. So that if X11 is
missing, X11 is chosen to minimize the error term
just like the b parameters.
20
WinBugs implementation for missing variables.
  • Standard Multiple Imputation Approaches
  • For this method, we assume that all of the data
    are jointly distributed multivariate normal with
    some unknown mean (denoted g) and covariance
    (denoted G).
  • We then estimate the mean and covariance
    parameters, and simulate missing values from the
    multivariate normal distribution per Kings
    method, or we can just use the draws from the
    posterior means and covariances for our final
    analyses directly.

21
WinBugs implementation of Multiple Imputation
  • model
  • for (i in 11000)
  • Xi,13 dmnorm(g , G,)
  • assign a prior distribution for the mean
    parameters for X
  • g13 dmnorm(g0 , gv,)
  • g01 lt- 0 g02 lt- 0 g03 lt- 0
  • gv1,1 lt- .1 gv1,2 lt- 0 gv1,3 lt- 0
  • gv2,1 lt- 0 gv2,2 lt- .1 gv2,3 lt- 0
  • gv3,1 lt- 0 gv3,2 lt- 0 gv3,3 lt- .1
  • assign a prior distribution for the precision
    parameters for X.
  • G13,13 dwish(Omega1,,3)
  • VarCov13,13 lt- inverse(G,) monitor this
    node to get the inverse of the precision matrix
  • Omega11,1lt- .01 Omega11,2lt- 0 Omega11,3lt-
    0

X,1 X,2 X,3 NA 0.451768816 0.504519352 0.708
124967 NA 2.689245921 NA NA 0.523380701 0.53947122
3 0.992485265 3.316158563 0.294455879 0.649040019
3.729984792 etc
22
WinBugs implementation of missing data models
  • You can use independent draws from the Coda
    chains for the missing observations or from the
    posterior distributions for g and G for multiply
    imputed data sets.
  • If you go the latter route, draws from the
    posterior distribution of g and G can be used
    within a WinBugs program to estimate probability
    models based on the concept of direct sampling
    from the posterior.
  • We went over these sorts of routines when we
    covered things like normal regression models with
    conjugate (and improper) priors.
Write a Comment
User Comments (0)
About PowerShow.com