Title: CONCEPTUAL MODELLING OF ADMINISTRATIVE REGISTER INFORMATION AND XML
1CONCEPTUAL MODELLING OF ADMINISTRATIVE REGISTER
INFORMATION AND XML
- - TAXATION METADATA AS AN EXAMPLE
Heikki Rouhuvirta, Statistical Methodology RD
heikki.rouhuvirta_at_stat.fi
Ottawa, 16-18 May 2005
2Contents
- Background
- The Challenge
- Primary Questions
- Test Case Finnish Taxation
- Data Semantics of Register Data
- Taxation Metadata Definition
- Some Results
- The Future
- Some Practical Steps on the Way
3Background
- Present state of compilation of administrative
data - as the challenge
- CoSSI
- as the methodological framework for data
semantics of registers - Codacmos
- as the organizational base for concept testing
4Present state of compilation of administrative
data
Statistical Information
Administrative Data Source
Handbook Of Taxation etc.
Data
Source
(e.g.
RDB)
data
tailor-made programs
gathering
or ETL products
Operational systems or
Data Warehouses
(e.g. SQL)
(e.g. Informatica, Oracle)
Data
Source
transmission
Statistical Application
file
(sequential/
Flat File)
statistician
Data
Data communication
Store
physical media
FTP ( VPN)
network
(CDROM, magnetic
(Flat File)
(internet, WAN)
tape)
Data Combining
Destination NSI
Data
Store
transmission
data
data
file
extraction/
Relational DB
(sequential/
gathering
transformation/
Flat File)
(e.g. SQL)
Data
loading
Store
Data
Store
tailor-made programs
Relational DB
or ETL products
(e.g. Informatica, Oracle)
Statistical Register Data
Survey Data
5CoSSI
- Common Structure of Statistical Information
CoSSI - covers different ways of statistical data
organization (statistical data matrix and
statistical table) - includes a model to define contentual information
in statistics - Includes a model to define the methodology used
in statistics (e.g. measuring and classification) - manages the complexity of statistical information
(e.g. nested variables structure) - includes definitions for all types of the
statistical information, data, metadata for
files, statistical metadata, quality
declarations, charts - the main objective was to organise statistical
data so that they also contain statistical
metadata (describing both the structure and logic
of statistical metadata at the same time) - Definition Descriptions available on the web at
http//www.stat.fi/org/tut/dthemes/drafts/cossi_de
finition_descriptions_v_09_2003.pdf - Statistical metadata see also from the web
http//www.stat.fi/org/tut/dthemes/papers/alternat
ive_approach_to_metadata_codacmos_2004.pdf
6Codacmos
- Cluster of Data Collection Integration Metadata
Systems for Official Statistics - EU Project 2003- 2004 (IST-2001-38636)
- Consortium
- Italian National Statistical Institute,
Statistics Finland, University Of Edinburgh,
National Statistical Service of Greece, DESAN
Research Solutions, Statistical Division Of
Municipality Of Milan, The Finnish Tax
Administration, University Of Patras, Institute
Of Informatics And Statistics, University Of
Athens, National Social Security Institute,
Tietokarhu Ltd, Statistics Norway - http//www.codacmos.eu.org
- TAXATION METADATA Partners Statistics Finland,
The Finnish Tax Administration and Tietokarhu Ltd
7The Challenge
- how the present process, where the description
of administrative data can mostly be read from
the authorities' administrative handbooks, can be
transformed into such that it meets the
requirements for the usability and presence of
the contentual description of data both in the
production process to statistics producers and in
the distribution of statistical information to
users of statistics.
8Primary Questions
- what are the metadata of administrative data?
- how to process the metadata specifying the
interpretation and use of administrative data
collection and register data? - how to combine the original data description
(e.g. concept definitions of register fields) to
variable description and measurement information
of statistics? - can accumulating interpretive metadata be
transported in processing of information and if
can, how?
9Test Case Finnish Taxation
(Finnish taxation on the web at
http//www.vero.fi)
10Taxation Types and Sources of income
11Income tax deductions
12Data Semantics of Register Data
- Modelling methodology
- starting point is to distinguish between
- substance concept model and
- information model whereby the concepts are
described - Information organizing method
- any which doesn't lose information
- Technology
- any without restrictions
- Result
- Taxation metadata definition (taxmeta.dtd)
13Basic Substance Concept
Tax type i.e. Personal taxation
Type of incomei.e. earned income, capital income
A)
Incomei.e. salary, pension
Type of tax deduction
B)
Deduction
14Description Information
Incomei.e. salary, pension
Deduction
Lawreference to a section of law
1)
Law casereference to a law case
FormulaHow the tax is calculated
2)
Internal instructionInstruction on spesific
income and deduction area
3)
15Taxation Metadata Definition (taxmeta.dtd)
Available on the web at http//www.stat.fi/org/tu
t/dthemes/drafts/taxmeta_dtd_v_01.txt
16Taxation Metadata - Logical Concept Model (I)
17Taxation Metadata - Logical Concept Model (II)
18 result from register standpoint
Demonstration Report is available on the web at
http//www.stat.fi/org/tut/dthemes/papers/ demorep
ort_on_taxation_metadata_codacmos_2004.pdf
19Taxation register view
Taxpayers tax register record
Plain-language code (derived or column name)
Metadata
Value in euro
Tax type code used in the register
Structure view
Metadata view
20 and result from statistics standpoint
21Income distribution statistics statistical
metadata
22Income distribution statistics taxation
register metadata (I)
statistical metadata
register metadata
23Income distribution statistics taxation
register metadata (II)
statistical metadata
register metadata
24The Future
- Could it be .
- integrated register metadata
- a genuinely metadata-driven statistical
production process - rich metadata is present and available in all
production stages, including editing as well as
transforming of register concepts to statistical
concepts - metadata accumulates as the process advances
without losing old metadata - rich metadata is also available for users during
the dissemination process of statistical
information
25XML based metadata-driven statistical production
collection routines
transaction based data storage
RDB
Hand- book of Register
XMLDB
units based data report with meta
Register Metadata (xml)
1 aggregation
Questionnaires (xml)
data gathering
data transmission
xml based production system
data combining
statistical metadata based on CoSSI
units and variable based data organisation
combined data
collected data matrix based on CoSSI
checked values
new metadata
data editing
conceptual formation
new variables
26Some Practical Steps on the Way
- Plan to implement this scheme of things to
metadata of other registers (e.g. population
register) - Integration of structured statistical metadata
system with statistical software packages (e.g.
SAS, SuperStar) for simultaneous use