Title: Metadata Driven Integrated Statistical Data
1- Metadata Driven Integrated Statistical Data
- Management System
- CSB of Latvia
- By Karlis Zeila Vice President CSB of Latvia
- MEXSAI, Cancun, November 2 -4
2INTRODUCTION
- The system has been developed within 2,5 years
(January 2000 to July 2002), - Development has been done by outsourced company
Microlink Latvia in close cooperation with the
experts from CSB, - 600 000 Euros has been spend for the system
development, - Use of the system in CSB of Latvia started
transition from Stove Pipe to Process Oriented
approach to statistical data production
3META DATA DRIVEN ... ?
- Any action within the system is ruled by
metadata,
- Meta data is the key element of the system,
- All software modules of entire system is
connected with the Core Metadata module (Meta
data base).
- Any changes within the system starts with the
changes of meta data
- Full cycle of the data processing is possible as
late as the proper description process in meta
data base are completed
4INTEGRATED ... ?
- Most of the system software modules are connected
with the Registers module,
- Registers module is an integral part of the
system,
- All surveys are supported by adequate
classifications stored in the Meta data base
- In all surveys respondent data fields are
connected with registers data
- All data is stored in corporative data warehouse
- Statistical data processing has split in unified
steps for different surveys
- Export / Import procedures ensure work with the
system data files using different standard
software packages
5Advantages and Restrictions
Advantages
1. At most standardized main business
statistics data entry, processing and storage
procedures, that provide the transfer from stove
pipe data processing to process oriented data
processing.
- Centralized processing and storage of the
statistical data, including metadata, by using
data warehouse technologies and OLAP tools.
3. All the data processing procedures are being
hosted from common metadata system. These
procedures are being described in metadata base,
by using special pseudo language and defined
notation group. Therefore for standardized
procedure execution for each survey individual
programming is not required.
4. The system is informatively connected with
Business Register, which provides with the direct
respondent data retrieval and updating.
5. Special import and export procedure is created
for data exchange with other systems.
6. A link with PC Axis is created for electronic
data dissemination.
61.The system is oriented towards the data
processing of different periodicity surveys,
where data collected using respondents filled
questionnaires (Some adaptation would be
necessary for use CAPI, CATI technologies ).
2.Metadata base does not foreseen description of
confidentiality rules for data dissemination,
they are hard coded in the system.
3. Diagnostic tools for the metadata descriptions
are not powerful enough, therefore experts
preparing meta data descriptions should be of
high experience.
4. Hardware and Standard software requirements
PCs gt/ Pentium II, RAM gt/128Mb equipped with
W 95 to W-2000 and MS Office 2000.
5. Metadata base does not foreseen description of
algorithm for automatic creation of respondents
lists for Sample surveys from the Business
register frame.
7ISDMS architecture
Integrated statistical data management system
Corporative data Warehouse
CSB Web Site
User adminis- tration data base
Dissemi-nation data base
Macrodata base
Metadata base
FIREWALL
Raw data base
Registers base
OLAP data base
Microdata base
Windows 2000 Server Advanced MS Internet
Information Server SQL server 2000, PC-Axis
ISDMS Business application Software Modules
Data entry and validation module related with DB
Data aggregation module related with DB
Data analysis module related with DB
Core metadata base module related with DB
Registers module related with DB
METADATA USER ADMINISTRATION
REGISTERS USER ADMINISTRATION
METADATA MICRODATA REGISTERS USER ADMINISTRATION
METADATA MICRODATA REGISTERS USER ADMINISTRATION
OLAP METADATA MACRODATA
Data dissemination module related with DB
Data WEB entry module related with DB
User administration module related with DB
Data mass entry module related with DB
Missed data imputation module related with DB
METADATA MICRODATA REGISTERS RAW DATABASE USER
ADMINISTRATION
METADATA MACRODATA REGISTERS USER ADMINISTRATION
METADATA MICRODATA REGISTERS USER ADMINISTRATION
METADATA MICRODATA REGISTERS DATA IMPUTATION
SOFTWARE
METADATA MICRODATA MACRODATA USER ADMINISTRATION
8(No Transcript)
9Structure of microdata (observation data) Bo
Sundgren model
- Objects characteristics Co O(t).V(t),
- where O - is an object type V - is a variable
t - is a time parameter. Each result of
observation is a value of variable (data element)
- Co
- All values of each variable are attached to
object (respondent) requisites, which could be
called - vectors or dimensions. Analysing
population of the respondents, these dimensions
we are using for formation of different groupings
and for data aggregation.
- The dimensions listed below could be attached to
each value of variable in agricultural statistics
- - Main kind of Activities (ISIC classification)
- Kind of
Ownership and Entrepreneurship (code)
- Regional location (code)
- Employees group classification
(code)
- Turnover group classification
(code).
10Structure of macrodata (statistics)
- Macrodata are the result of estimations
(aggregations) of a set of microdata.
- Statistical characteristics Cs
O(t).V(t).f, - where O and V - is an object characteristics t
- is a time parameter, f is a aggregation
function (sum,count,average, etc) summarizing the
true values of V(t) for the objects in O(t).
- The structure for macrodata is referred in
metadata base to as box structure or
alfa-beta-gamma-tau structure (???? ). - For data interchange alfa refers to the selection
property of objects (O), beta summarized values
of variables (V), gamma crossclassifying
variables, tau time parameters (t).
11Structure of Surveys (questionnaires)
- New survey should be described in the Metadata
base. For each survey shall by created
questionnaire version, which is valid for at
least one year. If questionnaire content and/or
layout do not change, then current version and it
description in Metadata base is usable for next
year.
- Each survey contains one or more data entry
tables or chapters which could be constant table
- with fixed number of rows and columns or table
with variable number of rows or variable number
of columns.
- Rows and columns for each chapter we are
describing in the Metadata base with their codes
and titles. This information is necessary for
automatic data entry application generation, data
validation e.t.c.
- Last step in the questionnaire content and layout
description is cells formation. Cells are
smallest data unit in survey data processing.
Cells are created as combination of row and
column from survey version side and variable from
indicators and attributes side.
12Example of agricultural questionnaire
13Structure of agricultural statistics
questionnaire (example - fixed table)
- Name of Questionnaire, index, code
- Respondents (object) code, name and address
- Period (year, quarter, month)
- Name of chapter
Metadata repository common table of statistical
indicators, table of attributes (classifications)
and table of created variables
INDICATOR 1 ATTRIBUTE
Agricultural Products Row code Realized total (tonnes) Realized to Processors (tonnes) Income Total (USD) Average price (USD/tonne)
A B 1 2 3 4
Crop products, total (? 2010, 2020, 2030,2040) 2000 150 120 38.000 253
Cereals 2010 12000 75 10.000 100
Pulses 2020 5 5 4.000 800
Industrial crops 2030 30 30 21.000 700
Vegetables 2040 15 10 3.000 200
I n d i c a t o r s
CELL 2010,1
VARIABLE 1
A t t r i b u t e s
141. Data matrix - Fixed number of Rows (3) and
variable number of Columns (n)
(Example) Main economical indicators of the
economics activity
152. Data matrix - Fixed number of Columns (3) and
variable number of Rows (n)
(Example) Production of products
16Creating of variables
ATTRIBUTES (CLASSIFICATIONS)
VARIABLES
INDICATOR
Dimensions (Vectors) of indicators
Example
Number of employees
Number of employees, total
no attribute
Number of employees in breakdown by kind
of activity
Local kind of activity (ISIC)
Regional code
Number of employees in breakdown by regions
17Dimensions of objects and indicators (example)
Main dimensions (vectors) of respondents (objects
O(t) )
MAIN KIND OF ACTIVITY (ISIC)
REGIONS (Code)
OWNERSHIP AND ENTERPRENERSHIP (Code)
EMPLOYEES GROUP (Code)
TURNOVER GROUP (Code)
Number of employees in breakdown by kind of activity Number of employees in breakdown by kind of activity Number of employees in breakdown by kind of activity Number of employees in breakdown by kind of activity
ISIC 1 ISIC 2 ISIC 3 ISIC 4
55 35 5 5
Number of employees, total
100
Region 1 60
Region 2 25
Region 3 15
INDICATOR
Number of employees in breakdown by regions
Dimensions (vectors) of indicator
18Integrated Metadata Driven Quasy Process Oriented
Technology
19Metadata base link with Microdata and Macrodata
bases
META DATA BASE (REPOSITORY)
General description of survey
Selecting Indicators
Selecting Attributes
Description of survey version
Creating of Variables
Description of chapters (data matrix)
Description of rows and columns
Linking variables to cells
Generation form for data entry (automatically)
Defining of data aggregation rules
Data aggregation function (automatically)
MICRO DATABASE
MACRO DATABASE
IMPORT EXPORT
20Data entry and validation
META DATA BASE
BUSINESS REGISTER
Description of validation rules
Data import from files
Description of data entry forms
Creating list of Respon- dents
Full data validation
MICRO DATA BASE
Standard data entry and validation
Data validation
RAW DATA BASE
Data transfer to Microdata Base
Mass data entry
F i r e w a l l
RAW Web DATA BASE
Web data entry and validation
Web Data validation
21RESULTS ACHIEVED
- To date within the Metadata Driven Integrated
Statistical Data Processing and Dissemination
System 67 different surveys are implemented - Response rate of WEB data collection for some
surveys achieved 30 - System has been presented on the Conferences
- - On ISIS 2002, April 2002, Geneva,
- - METANET Project Meeting, Samos, Greece, May
2003, - - AMRADS Final Conference, Roma, Italy,
November 2003, - - MSIS 2004, May 2004, Geneva Switzerland,
- - Statistics - investment in the future,
Prague, September 2004, - - Development of the State Statistical
System Yalta, Ukraine, September 2004.
22LESSONS LEARNED
- Design of the new information system should be
based on the results of deep analysis of the
statistical processes and data flows - Clear objectives of achievements have to be set
up, discussed and approved by all parties
involved - Statisticians
- IT personal
- Administration
23LESSONS LEARNED
- Within the process of the design and
implementation of metadata driven integrated
statistical information system both parties
statisticians and IT specialists should be
involved from the very beginning - Both parties have to have clear understanding of
all statistical processes,which will be covered
by the system, as well as metadata meaning and
role within the system from production and user
sides
24LESSONS LEARNED
- Initiative to move from classical stove-pipe
production approach to process oriented have to
come from statisticians side not from IT personal
or administration - Motivation of the statisticians to move from
existing to the new data processing environment
is essential - Improvement of knowledge about metadata is one of
the most important tasks through out of the all
process of the design and implementation phases
of the project
25LESSONS LEARNED
- Clear division of the tasks and responsibilities
between statisticians and IT personal is the key
point to achieve successful implementation - To achieve the best performance of the entire
system it is important to organize the execution
of the statistical processes in the right
sequence - Design of the new surveys and questionnaires
particularly as well as changes in the existing
ones should be done in accordance with the system
requirements
26LESSONS LEARNED
- As the result of feasibility study we clear
understood, that some steps of statistical data
processing for different surveys defy
standardization, each survey may require
complementary functionality (non standard
procedures), which is necessary just for this
exact survey data processing - For solving problems with the non-standard
procedures interfaces for data export/import
to/from system has been developed to ensure use
of the standard statistical data processing
software packages and other generalized software
available in market
27LESSONS LEARNED
- It is necessary to establish and train special
group of statisticians, which will maintain
Metadata base and which will be responsible for
accurateness of metadata - For the administration and maintenance of the
system it is necessary to have well trained IT
staff, which is familiar with the MS SQL Server
2000 administration, MS Analysis Service, other
MS tools, PC AXIS family products and system Data
Model, system applications
28Thank you for attention !
- Karlis Zeila Karlis.Zeila_at_csb.gov.lv
- http//www.csb.lv