Title: ESSnet DWH - SGA II
1ESSnet on microdata linking and data warehousing
in statistical production
The statistical data warehouse a central
datahub, integrating new datasources and
statistical output
Harry Goossens - ESSnet Coordinator Head Data
Service Centre at Statistics Netherlands
hct.goossens_at_cbs.nl
UNECE - Seminar on New Frontiers for Data
Collection Geneva, 31 October - 2 November 2012
2Content
- Background ESS-net
- Challenges
- Explaining the statistical data warehouse
(S-DWH) - Elements of the S-DWH
- Business architecture
- GSBPM mapping
- Meta data
3ESSnet on microdata linking and data warehousing
in statistical production
4ESSnet Partnership
- ESS-net coordinator
- Statistics Netherlands (CBS)
- Co-partners
- Estonia, Italy, Lithuania, Portugal, Sweden, UK
- Starting date
- 4 October 2010
- SGA 1 first year, till 3 October 2011
- SGA 2 last 2 years, till 3 October 2013
5General Objectives ESSnet DWH
- Provide assistance in the development and
implementation of a maximum efficient statistical
process for business and trade statistics,
independent of any (technical) specific
architecture - Results in daily statistical practice
- increase the efficiency of data processing in
statistical production systems - maximize the reuse of already collected data
- a 'data warehouse' approach to statistics
6The Challenges
- Decrease of costs administrative
burden versus increase of efficiency
flexibility - Rapidly changing demand for information
- growing need for more information on more topics
- decreasing lifecycle of policymakers, quicker
delivery - Disclosure of all new data sources coming from
global use of modern technology - Make optimal use of all available data sources
(existing new)
7The Statistical Data Warehouse
- A central data hub to connect and integrate all
available data sources, supporting statistical
production AND data collection processes by
providing - a detailed and correct overview/insight of all
available data sources - a framework for adequate data governance,
including metadata management, confidentiality
aspects and data authorisation - flexible data storage and data exchange between
processes - access to registers sampling frames (BR, etc)
A central statistical data store for
managingall available data of interest,
regardles of its source, enabling the NSI to
produce necessary information ( statistics !)
and to (re)use available data to create new
data / new outputs.
8Rules for generating samples etc.
Data extracts
Dataset
Selected sample
Data extracts
Dataset
Selected sample
Working data
Aggregate Statistics
Staging area
Aggregate Statistics
Dataset
Admin data source
Microdata
Backbones(BR eg.)
Admin data source
BB snapshots
Data extracts
Rules for updating BB
Storage, combination
Outputs
Input data
Input reference frame
9Explaining the S-DWH
- A system or set of integrated systems, designed
to handle the processing of statistical data in
the production of statistics, comprimising - technical facilities for storing and processing
data, receiving data in and producing outputs in
a flexible way - rules for updating the sources for the DWH
- definitions necessary to achieve those samples /
sources - The S-DWH is a concept that provides an
architectural model of the statistical data
flow, from data collection to statistical
output
10The S-DWH Business Architecture
- Conceptualisation of how to build up a S-DWH
- A common model for the total statistical process
and data flow - Provide optimal organisation of all structured
data,enabling re-use, creation of new data etc. - 4 Layers, covering all statistical activities
- Sources
- Integration
- Interpretation Analysis
- Data Access / Output
11The layered architecture of the S-DWH, with focus
on the data sources used in each layer
ESS-net DWH
10
12Mapping the S-DWH on the GSBPM
Use the GSBPM as common language to identify and
locatethe various phases on the 4 S-DWH layers
13Managing the S-DWH
- The S-DWH is a logically coherent central data
store, not necessarily one single physical unit. - Metadata is vital in the governance, satisfying 2
essential needs - to guide statisticians in processing and
controlling the statistical data - to inform users by giving insight in the exact
meaning of the statistical data - The vertical metadata layer enables to search all
(meta)data in the 4 layers and, if permitted,
give access to the data.
14Meta data layer
Data Access Layer
Metadata Layer
Interpretation and Data Analysis Layer
Integration Layer
Source Layer
15Meta data - the DNA of the S-DWH
- Framework
- General meta data definitions
- Meta data for the S-DWH
- Use of meta data models
- Meta data standards norms
- Meta data quality governance
- Categories subsets
- Minimum requirements
ESS-net DWH
14
16S-DWH meta data requirements
Subsets
Standards Norms
ISO 11179
Internal rulesGuidelines
Mata data model
S-DWH Gatekeeper
ESS-net DWH
15
17Organisational aspects
- Implementation of a S-DWH has huge organisational
impact - It means moving from single operations to
integrated, generic processes - It needs a redesign of the statistical
process - It asks new IT systems, tools, high
investments - It is a new way of working
- Only changing systems will not do the
trick, changing people is the key to success
18ESSnet on data warehousing
Thank you !
ESS-net DWH