Title: Handling inconsistencies in integrated business data
1Handling inconsistencies in integrated business
data
- Bonn
- 25-27 September 2006
-
- Jeffrey Hoogland
- Ilona Verburg
2 ESD Integration
- Goals
- Improvement of transparency and quality of
business data sources - Integration of business data for enterprises for
FATS, National Accounts, SBS, CEREM (external
users such as CPB) - Improvement of consistency of data sources
- Improvement of usability of business registers to
determine reliable aggregates
3 ESD Integration phase 1
- 5 business registers and 3 annual business
surveys for 2001-2004 - 6 key variables
- Enterprises with less than 100 employees
- Goal integrated data, consistent on aggregated
level (publication cell ? size class group) for
2004 - Development of methodology
- methods for filtering outliers in registers
- methods for weighting of incomplete registers
- methods for detecting influential inconsistencies
at micro level - List of causes, consequences and solutions for
inconsistencies
4Annual business data sources
GBR VAT CT TS JSSD SBS GFCF SEE ICT
PC RD
surveys
registers
5Table 1. Available annual sources on enterprise
level for six key variables.
GBR VAT CT TS SSD SBS SEE PC
Number of employed persons X X X X
Gross wages and salaries X X X X X
Total labour costs X X
Net turnover X X X X
Purchase value X X X
Profit X X
6Table 2. Causes for differences between sources
at publication and/or micro level.
Causes for differences at publication level (only) Causes for differences at publication and micro level
Difference in target population Matching error
Difference in weights Difference in variable definition
Classification error Difference in measurement time (period)
Measurement errors in variables
Processing errors in variables, e.g. due to wrong unit transformations
Difference in editing strategy
Observed versus imputed value
Difference in imputation method
7Steps in integration process I
- - Tune target populations
- - Synchronize classifications (NACE, size
class) - - Harmonization of variables and units
- - Match data on enterprise level
- - Correct obvious mistakes
- - Filter and weight incomplete registers
-
8Steps in integration process II
- - Filter and weight incomplete registers
- - Compute temporary aggregates
- - Indicate inconsistent aggregates
- - Detect influential inconsistent records
- - Solve matching errors, edit influential
errors, and adapt weights - - Compute consistent aggregates
9Long-term challenges
- Use the Fellegi-Holt principle to obtain
consistent integrated micro-data - Use repated weighting techniques to obtain
consistent aggregates - Develop a general editing system for business
registers and surveys - Minimize the burden for respondents using a
maximum number of registers and a minimum number
of surveys