Imputation of numerical data under linear edit restrictions - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Imputation of numerical data under linear edit restrictions

Description:

Imputation of numerical data under linear edit ... Imputation under edit restrictions ... Separate regression imputation model for each variable to be imputed ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 25

Provided by: tonde

Category:

more less

Transcript and Presenter's Notes

Title: Imputation of numerical data under linear edit restrictions

1
Imputation of numerical data under linear edit
restrictions

Ton de Waal

2
Contents

Edit restrictions
Imputation and edit restrictions
Current approach adjustment of imputed values
Alternative approaches
Our algorithm
Outline
Fourier-Motzkin elimination
Statistical distribution
Example

3
Edit restrictions

Used to define consistent data
Examples
T P C
P 0.5 T
In general, linear numerical edit restrictions
written as Ax b
Defines a feasible region of allowed values

4
Why edit restrictions?

Statistical institutes have responsibility to
supply undisputed data for many different users
in society,
For most users, inconsistent data are
incomprehensible. They may reject data as an
invalid source or make adjustments themselves.
For simplicity we ensure consistency during edit
and imputation phase rather than during
estimation phase

5
Imputation and edit restrictions

Imputation
replacement of missing values with values
representing a statistical distribution
Imputation under edit restrictions
replacement of missing values with values
representing a statistical distribution while
simultaneously satisfying edit restrictions

6
Current approach

Standard approach at Statistics Netherlands
Impute first without taking edit restrictions
into account
Adjust imputed values so data satisfy the edit
restrictions
Adjustment of imputed values to satisfy the edit
restrictions is done in such a way that the
adjustments are as small as possible

7
Adjustment of imputed data (1/2)

Minimise distance between imputed record
(x1,,xn) and adjusted record (y1,,yn) under the
constraint that adjusted record satisfies all
edit restrictions

8
Adjustment of imputed data (2/2)
9
Problem with current approach

Adjustment of imputed values leads to a record on
the boundary of the feasible region for the
variables to be imputed
An approach that leads to records inside the
feasible region for the variables to be imputed
would be preferred

10
Alternative approaches (1/2)

Use truncated multivariate normal distribution
with support on the feasible region of the
variables to be imputed
Disadvantage
Truncated multivariate normal distribution is
complicated
Even determining the mean is complex

11
Alternative approaches (2/2)

Partially incomplete MCMC
Separate regression imputation model for each
variable to be imputed
Iteratively impute all variables until
convergence to joint distribution
For each variable to be imputed the edit
restrictions reduce to a feasible interval
Disadvantage
For each variable to be imputed a separate
regression model has to be specified and
estimated
Joint distribution may not exist

12
Our approach

Estimate the model parameters, e.g. by means of
the EM algorithm
Repeat the following steps for each variable i to
be imputed
Fill in observed values in edit restrictions
Use Fourier-Motzkin elimination to determine edit
restrictions for variable i
Draw value for variable i, using the conditional
distribution given all known values (either
observed or imputed) until it satisfies the edit
restrictions

13
Handling edits Fourier-Motzkin elimination

Given a set of linear constraints Fourier-Motzkin
elimination can be used to determine constraints
for a subset of variables
If the constraints for a subset can be satisfied,
the constraints for the entire set of variables
can also be satisfied
In our case the edit restrictions are the
constraints

14
Fourier-Motzkin elimination example (1/2)

Suppose 3 edit restrictions are given
X Y
Y 5X
Y Z
Elimination Y leads to
X 5X
X Z

15
Fourier-Motzkin elimination example (2/2)

Conversely, given the edit restrictions
X 5X
X Z
Hence a value Y exists such that
X Y min(5X, Z)
That is, a value Y exists such that
X Y
Y 5X
Y Z

16
The statistical distribution

For simplicity we assume the data to be
approximately multivariately normally distributed
All conditional distribution are hence also
approximately multivariately normally distributed

17
Example of our algorithm (1/6)

Edit restrictions given by
T P C
P 0.5T
-0.1T P
T 0
T 550N
N 5
T, P and C are missing

18
Example of our algorithm (2/6)

Fill in observed value for N into the edit
restrictions
This leads to the following edits restrictions
for T, C and P
T P C
P 0.5T
-0.1T P
T 0
T 2750

19
Example of our algorithm (3/6)

Eliminate P
Edit restrictions for T and C
T C 0.5T
-0.1T T C
T 0
T 2750

20
Example of our algorithm (3/6)

Eliminate P
Edit restrictions for T and C
0.5T C
C 1.1T
T 0
T 2750

21
Example of our algorithm (4/6)

Eliminate C
Edit restrictions for T
0.5T 1.1T
T 0
T 2750
Now we draw values for T from distribution for T
given observed value N until value satisfies edit
restrictions, say T 1200

22
Example of our algorithm (5/6)

We consider edit restrictions for T and C
0.5T C
C 1.1T
T 0
T 2750
Fill in imputed value for T (1200)
600 C
C 1320
Draw values for C from distribution for C given
observed or imputed values for N and T until edit
restrictions are satisfied, say C 700

23
Example of our algorithm (6/6)

We consider edit restrictions for T, C and P
T P C
P 0.5T
-0.1T P
T 0
T 2750
Fill in imputed values for T (1200) and C (700)
1200 P 700
P 600
-120 P
We impute only allowed value for P 500
Imputed record T1200, C700, P500, N5

24
Current status of research

Software has been developed and tested
Currently, we are carrying out evaluation
experiments
Our evaluation results will be compared to the
current approach at Statistics Netherlands
(imputation followed by adjustment of imputed
values)

Write a Comment

User Comments (0)