Metadata Driven Integrated Statistical Data - PowerPoint PPT Presentation

About This Presentation
Title:

Metadata Driven Integrated Statistical Data

Description:

Title: Integr t statistisko datu apstr des un vad bas sist ma ISDAVS Author: csb Created Date: 7/15/2003 1:31:32 PM Document presentation format – PowerPoint PPT presentation

Number of Views:185
Avg rating:3.0/5.0
Slides: 29
Provided by: CSB60
Category:

less

Transcript and Presenter's Notes

Title: Metadata Driven Integrated Statistical Data


1
  • Metadata Driven Integrated Statistical Data
  • Management System
  • CSB of Latvia
  • By Karlis Zeila Vice President CSB of Latvia
  • MEXSAI, Cancun, November 2 -4

2
INTRODUCTION
  • The system has been developed within 2,5 years
    (January 2000 to July 2002),
  • Development has been done by outsourced company
    Microlink Latvia in close cooperation with the
    experts from CSB,
  • 600 000 Euros has been spend for the system
    development,
  • Use of the system in CSB of Latvia started
    transition from Stove Pipe to Process Oriented
    approach to statistical data production

3
META DATA DRIVEN ... ?
  • Any action within the system is ruled by
    metadata,
  • Meta data is the key element of the system,
  • All software modules of entire system is
    connected with the Core Metadata module (Meta
    data base).
  • Any changes within the system starts with the
    changes of meta data
  • Full cycle of the data processing is possible as
    late as the proper description process in meta
    data base are completed

4
INTEGRATED ... ?
  • Most of the system software modules are connected
    with the Registers module,
  • Registers module is an integral part of the
    system,
  • All surveys are supported by adequate
    classifications stored in the Meta data base
  • In all surveys respondent data fields are
    connected with registers data
  • All data is stored in corporative data warehouse
  • Statistical data processing has split in unified
    steps for different surveys
  • Export / Import procedures ensure work with the
    system data files using different standard
    software packages

5
Advantages and Restrictions
Advantages
1. At most standardized main business
statistics data entry, processing and storage
procedures, that provide the transfer from stove
pipe data processing to process oriented data
processing.
  1. Centralized processing and storage of the
    statistical data, including metadata, by using
    data warehouse technologies and OLAP tools.

3. All the data processing procedures are being
hosted from common metadata system. These
procedures are being described in metadata base,
by using special pseudo language and defined
notation group. Therefore for standardized
procedure execution for each survey individual
programming is not required.
4. The system is informatively connected with
Business Register, which provides with the direct
respondent data retrieval and updating.
5. Special import and export procedure is created
for data exchange with other systems.
6. A link with PC Axis is created for electronic
data dissemination.
6
  • Restrictions 

1.The system is oriented towards the data
processing of different periodicity surveys,
where data collected using respondents filled
questionnaires (Some adaptation would be
necessary for use CAPI, CATI technologies ).
2.Metadata base does not foreseen description of
confidentiality rules for data dissemination,
they are hard coded in the system.
3. Diagnostic tools for the metadata descriptions
are not powerful enough, therefore experts
preparing meta data descriptions should be of
high experience.
4. Hardware and Standard software requirements
PCs gt/ Pentium II, RAM gt/128Mb equipped with
W 95 to W-2000 and MS Office 2000.
5. Metadata base does not foreseen description of
algorithm for automatic creation of respondents
lists for Sample surveys from the Business
register frame.
7
ISDMS architecture
Integrated statistical data management system
Corporative data Warehouse
CSB Web Site
User adminis- tration data base
Dissemi-nation data base
Macrodata base
Metadata base
FIREWALL
Raw data base
Registers base
OLAP data base
Microdata base
Windows 2000 Server Advanced MS Internet
Information Server SQL server 2000, PC-Axis
ISDMS Business application Software Modules
Data entry and validation module related with DB
Data aggregation module related with DB
Data analysis module related with DB
Core metadata base module related with DB
Registers module related with DB
METADATA USER ADMINISTRATION
REGISTERS USER ADMINISTRATION
METADATA MICRODATA REGISTERS USER ADMINISTRATION
METADATA MICRODATA REGISTERS USER ADMINISTRATION
OLAP METADATA MACRODATA
Data dissemination module related with DB
Data WEB entry module related with DB
User administration module related with DB
Data mass entry module related with DB
Missed data imputation module related with DB
METADATA MICRODATA REGISTERS RAW DATABASE USER
ADMINISTRATION
METADATA MACRODATA REGISTERS USER ADMINISTRATION
METADATA MICRODATA REGISTERS USER ADMINISTRATION
METADATA MICRODATA REGISTERS DATA IMPUTATION
SOFTWARE
METADATA MICRODATA MACRODATA USER ADMINISTRATION
8
(No Transcript)
9
Structure of microdata (observation data) Bo
Sundgren model
  • Objects characteristics Co O(t).V(t),
  • where O - is an object type V - is a variable
    t - is a time parameter. Each result of
    observation is a value of variable (data element)
    - Co
  • All values of each variable are attached to
    object (respondent) requisites, which could be
    called - vectors or dimensions. Analysing
    population of the respondents, these dimensions
    we are using for formation of different groupings
    and for data aggregation.
  • The dimensions listed below could be attached to
    each value of variable in agricultural statistics
  • - Main kind of Activities (ISIC classification)
    - Kind of
    Ownership and Entrepreneurship (code)
    - Regional location (code)

    - Employees group classification
    (code)
    - Turnover group classification
    (code).

10
Structure of macrodata (statistics)
  • Macrodata are the result of estimations
    (aggregations) of a set of microdata.
  • Statistical characteristics Cs
    O(t).V(t).f,
  • where O and V - is an object characteristics t
    - is a time parameter, f is a aggregation
    function (sum,count,average, etc) summarizing the
    true values of V(t) for the objects in O(t).
  • The structure for macrodata is referred in
    metadata base to as box structure or
    alfa-beta-gamma-tau structure (???? ).
  • For data interchange alfa refers to the selection
    property of objects (O), beta summarized values
    of variables (V), gamma crossclassifying
    variables, tau time parameters (t).

11
Structure of Surveys (questionnaires)
  • New survey should be described in the Metadata
    base. For each survey shall by created
    questionnaire version, which is valid for at
    least one year. If questionnaire content and/or
    layout do not change, then current version and it
    description in Metadata base is usable for next
    year.
  • Each survey contains one or more data entry
    tables or chapters which could be constant table
    - with fixed number of rows and columns or table
    with variable number of rows or variable number
    of columns.
  • Rows and columns for each chapter we are
    describing in the Metadata base with their codes
    and titles. This information is necessary for
    automatic data entry application generation, data
    validation e.t.c.
  • Last step in the questionnaire content and layout
    description is cells formation. Cells are
    smallest data unit in survey data processing.
    Cells are created as combination of row and
    column from survey version side and variable from
    indicators and attributes side.

12
Example of agricultural questionnaire
 
13
Structure of agricultural statistics
questionnaire (example - fixed table)
  • Name of Questionnaire, index, code
  • Respondents (object) code, name and address
  • Period (year, quarter, month)
  • Name of chapter

Metadata repository common table of statistical
indicators, table of attributes (classifications)
and table of created variables
INDICATOR 1 ATTRIBUTE
Agricultural Products Row code Realized total  (tonnes) Realized to Processors  (tonnes) Income Total  (USD) Average price  (USD/tonne)
A B 1 2 3 4
Crop products, total (? 2010, 2020, 2030,2040) 2000 150 120 38.000 253
Cereals 2010 12000 75 10.000 100
Pulses 2020 5 5 4.000 800
Industrial crops 2030 30 30 21.000 700
Vegetables 2040 15 10 3.000 200
I n d i c a t o r s
CELL 2010,1
VARIABLE 1
A t t r i b u t e s
14
1. Data matrix - Fixed number of Rows (3) and
variable number of Columns (n)
(Example) Main economical indicators of the
economics activity
15
2. Data matrix - Fixed number of Columns (3) and
variable number of Rows (n)
(Example) Production of products

16
Creating of variables
ATTRIBUTES (CLASSIFICATIONS)
VARIABLES
INDICATOR

Dimensions (Vectors) of indicators
Example
Number of employees
Number of employees, total
no attribute
Number of employees in breakdown by kind
of activity
Local kind of activity (ISIC)
Regional code
Number of employees in breakdown by regions
17
Dimensions of objects and indicators (example)
Main dimensions (vectors) of respondents (objects
O(t) )
MAIN KIND OF ACTIVITY (ISIC)
REGIONS (Code)
OWNERSHIP AND ENTERPRENERSHIP (Code)
EMPLOYEES GROUP (Code)
TURNOVER GROUP (Code)
Number of employees in breakdown by kind of activity Number of employees in breakdown by kind of activity Number of employees in breakdown by kind of activity Number of employees in breakdown by kind of activity
ISIC 1 ISIC 2 ISIC 3 ISIC 4
55 35 5 5
Number of employees, total
100
Region 1 60
Region 2 25
Region 3 15
INDICATOR
Number of employees in breakdown by regions
Dimensions (vectors) of indicator
18
Integrated Metadata Driven Quasy Process Oriented
Technology
19
Metadata base link with Microdata and Macrodata
bases
META DATA BASE (REPOSITORY)
General description of survey
Selecting Indicators
Selecting Attributes
Description of survey version
Creating of Variables
Description of chapters (data matrix)
Description of rows and columns
Linking variables to cells
Generation form for data entry (automatically)
Defining of data aggregation rules
Data aggregation function (automatically)
MICRO DATABASE
MACRO DATABASE
IMPORT EXPORT
20
Data entry and validation
META DATA BASE
BUSINESS REGISTER
Description of validation rules
Data import from files
Description of data entry forms
Creating list of Respon- dents
Full data validation
MICRO DATA BASE
Standard data entry and validation
Data validation
RAW DATA BASE
Data transfer to Microdata Base
Mass data entry
F i r e w a l l
RAW Web DATA BASE
Web data entry and validation
Web Data validation
21
RESULTS ACHIEVED
  • To date within the Metadata Driven Integrated
    Statistical Data Processing and Dissemination
    System 67 different surveys are implemented
  • Response rate of WEB data collection for some
    surveys achieved 30
  • System has been presented on the Conferences
  • - On ISIS 2002, April 2002, Geneva,
  • - METANET Project Meeting, Samos, Greece, May
    2003,
  • - AMRADS Final Conference, Roma, Italy,
    November 2003,
  • - MSIS 2004, May 2004, Geneva Switzerland,
  • - Statistics - investment in the future,
    Prague, September 2004,
  • -  Development of the State Statistical
    System Yalta, Ukraine, September 2004.

22
LESSONS LEARNED
  • Design of the new information system should be
    based on the results of deep analysis of the
    statistical processes and data flows
  • Clear objectives of achievements have to be set
    up, discussed and approved by all parties
    involved
  • Statisticians
  • IT personal
  • Administration

23
LESSONS LEARNED
  • Within the process of the design and
    implementation of metadata driven integrated
    statistical information system both parties
    statisticians and IT specialists should be
    involved from the very beginning
  • Both parties have to have clear understanding of
    all statistical processes,which will be covered
    by the system, as well as metadata meaning and
    role within the system from production and user
    sides

24
LESSONS LEARNED
  • Initiative to move from classical stove-pipe
    production approach to process oriented have to
    come from statisticians side not from IT personal
    or administration
  • Motivation of the statisticians to move from
    existing to the new data processing environment
    is essential
  • Improvement of knowledge about metadata is one of
    the most important tasks through out of the all
    process of the design and implementation phases
    of the project

25
LESSONS LEARNED
  • Clear division of the tasks and responsibilities
    between statisticians and IT personal is the key
    point to achieve successful implementation
  • To achieve the best performance of the entire
    system it is important to organize the execution
    of the statistical processes in the right
    sequence
  • Design of the new surveys and questionnaires
    particularly as well as changes in the existing
    ones should be done in accordance with the system
    requirements

26
LESSONS LEARNED
  • As the result of feasibility study we clear
    understood, that some steps of statistical data
    processing for different surveys defy
    standardization, each survey may require
    complementary functionality (non standard
    procedures), which is necessary just for this
    exact survey data processing
  • For solving problems with the non-standard
    procedures interfaces for data export/import
    to/from system has been developed to ensure use
    of the standard statistical data processing
    software packages and other generalized software
    available in market

27
LESSONS LEARNED
  • It is necessary to establish and train special
    group of statisticians, which will maintain
    Metadata base and which will be responsible for
    accurateness of metadata
  • For the administration and maintenance of the
    system it is necessary to have well trained IT
    staff, which is familiar with the MS SQL Server
    2000 administration, MS Analysis Service, other
    MS tools, PC AXIS family products and system Data
    Model, system applications

28
Thank you for attention !
  • Karlis Zeila Karlis.Zeila_at_csb.gov.lv
  • http//www.csb.lv
Write a Comment
User Comments (0)
About PowerShow.com