Imputation of numerical data under linear edit restrictions - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Imputation of numerical data under linear edit restrictions

Description:

Imputation of numerical data under linear edit ... Imputation under edit restrictions ... Separate regression imputation model for each variable to be imputed ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 25
Provided by: tonde
Category:

less

Transcript and Presenter's Notes

Title: Imputation of numerical data under linear edit restrictions


1
Imputation of numerical data under linear edit
restrictions
  • Ton de Waal

2
Contents
  • Edit restrictions
  • Imputation and edit restrictions
  • Current approach adjustment of imputed values
  • Alternative approaches
  • Our algorithm
  • Outline
  • Fourier-Motzkin elimination
  • Statistical distribution
  • Example

3
Edit restrictions
  • Used to define consistent data
  • Examples
  • T P C
  • P 0.5 T
  • In general, linear numerical edit restrictions
    written as Ax b
  • Defines a feasible region of allowed values

4
Why edit restrictions?
  • Statistical institutes have responsibility to
    supply undisputed data for many different users
    in society,
  • For most users, inconsistent data are
    incomprehensible. They may reject data as an
    invalid source or make adjustments themselves.
  • For simplicity we ensure consistency during edit
    and imputation phase rather than during
    estimation phase

5
Imputation and edit restrictions
  • Imputation
  • replacement of missing values with values
    representing a statistical distribution
  • Imputation under edit restrictions
  • replacement of missing values with values
    representing a statistical distribution while
    simultaneously satisfying edit restrictions

6
Current approach
  • Standard approach at Statistics Netherlands
  • Impute first without taking edit restrictions
    into account
  • Adjust imputed values so data satisfy the edit
    restrictions
  • Adjustment of imputed values to satisfy the edit
    restrictions is done in such a way that the
    adjustments are as small as possible

7
Adjustment of imputed data (1/2)
  • Minimise distance between imputed record
    (x1,,xn) and adjusted record (y1,,yn) under the
    constraint that adjusted record satisfies all
    edit restrictions

8
Adjustment of imputed data (2/2)
9
Problem with current approach
  • Adjustment of imputed values leads to a record on
    the boundary of the feasible region for the
    variables to be imputed
  • An approach that leads to records inside the
    feasible region for the variables to be imputed
    would be preferred

10
Alternative approaches (1/2)
  • Use truncated multivariate normal distribution
    with support on the feasible region of the
    variables to be imputed
  • Disadvantage
  • Truncated multivariate normal distribution is
    complicated
  • Even determining the mean is complex

11
Alternative approaches (2/2)
  • Partially incomplete MCMC
  • Separate regression imputation model for each
    variable to be imputed
  • Iteratively impute all variables until
    convergence to joint distribution
  • For each variable to be imputed the edit
    restrictions reduce to a feasible interval
  • Disadvantage
  • For each variable to be imputed a separate
    regression model has to be specified and
    estimated
  • Joint distribution may not exist

12
Our approach
  • Estimate the model parameters, e.g. by means of
    the EM algorithm
  • Repeat the following steps for each variable i to
    be imputed
  • Fill in observed values in edit restrictions
  • Use Fourier-Motzkin elimination to determine edit
    restrictions for variable i
  • Draw value for variable i, using the conditional
    distribution given all known values (either
    observed or imputed) until it satisfies the edit
    restrictions

13
Handling edits Fourier-Motzkin elimination
  • Given a set of linear constraints Fourier-Motzkin
    elimination can be used to determine constraints
    for a subset of variables
  • If the constraints for a subset can be satisfied,
    the constraints for the entire set of variables
    can also be satisfied
  • In our case the edit restrictions are the
    constraints

14
Fourier-Motzkin elimination example (1/2)
  • Suppose 3 edit restrictions are given
  • X Y
  • Y 5X
  • Y Z
  • Elimination Y leads to
  • X 5X
  • X Z

15
Fourier-Motzkin elimination example (2/2)
  • Conversely, given the edit restrictions
  • X 5X
  • X Z
  • Hence a value Y exists such that
  • X Y min(5X, Z)
  • That is, a value Y exists such that
  • X Y
  • Y 5X
  • Y Z

16
The statistical distribution
  • For simplicity we assume the data to be
    approximately multivariately normally distributed
  • All conditional distribution are hence also
    approximately multivariately normally distributed

17
Example of our algorithm (1/6)
  • Edit restrictions given by
  • T P C
  • P 0.5T
  • -0.1T P
  • T 0
  • T 550N
  • N 5
  • T, P and C are missing

18
Example of our algorithm (2/6)
  • Fill in observed value for N into the edit
    restrictions
  • This leads to the following edits restrictions
    for T, C and P
  • T P C
  • P 0.5T
  • -0.1T P
  • T 0
  • T 2750

19
Example of our algorithm (3/6)
  • Eliminate P
  • Edit restrictions for T and C
  • T C 0.5T
  • -0.1T T C
  • T 0
  • T 2750

20
Example of our algorithm (3/6)
  • Eliminate P
  • Edit restrictions for T and C
  • 0.5T C
  • C 1.1T
  • T 0
  • T 2750

21
Example of our algorithm (4/6)
  • Eliminate C
  • Edit restrictions for T
  • 0.5T 1.1T
  • T 0
  • T 2750
  • Now we draw values for T from distribution for T
    given observed value N until value satisfies edit
    restrictions, say T 1200

22
Example of our algorithm (5/6)
  • We consider edit restrictions for T and C
  • 0.5T C
  • C 1.1T
  • T 0
  • T 2750
  • Fill in imputed value for T (1200)
  • 600 C
  • C 1320
  • Draw values for C from distribution for C given
    observed or imputed values for N and T until edit
    restrictions are satisfied, say C 700

23
Example of our algorithm (6/6)
  • We consider edit restrictions for T, C and P
  • T P C
  • P 0.5T
  • -0.1T P
  • T 0
  • T 2750
  • Fill in imputed values for T (1200) and C (700)
  • 1200 P 700
  • P 600
  • -120 P
  • We impute only allowed value for P 500
  • Imputed record T1200, C700, P500, N5

24
Current status of research
  • Software has been developed and tested
  • Currently, we are carrying out evaluation
    experiments
  • Our evaluation results will be compared to the
    current approach at Statistics Netherlands
    (imputation followed by adjustment of imputed
    values)
Write a Comment
User Comments (0)
About PowerShow.com