Title: Input Data Warehousing Canada
1Input Data WarehousingCanadas Experience with
Establishment Level Information
- Presentation to the Third International
Conference on Establishment Statistics - Montreal, QC
- June 20 2007
2Overview
- Introduction of data warehousing as a concept
- Approaches to holding data
- Introduction to the Statistics Canadas Unified
Enterprise Statistics (UES) Program - Centralized warehousing of UES data
- Example of the data warehouse at work
3Subject-matter areas need or generate different
types of information
- Data to support collection
- Questionnaires and supporting metadata
- Frame and sample information
- Status of each respondent during collection
- Survey data
- Administrative data
- Post-collection processing
- Edits (metadata)
- Imputation specifications
- Allocation specifications
- Generation of clean datasets
- Tabulation of estimates/analysis of results
- Value of estimate
- Data quality indicators
- Suppression patterns
- Analysis of coherence
Input Data
4Input Data Warehouse
- A copy of statistical input data specifically
structured for querying and reporting - Collection
- Post-collection processing
- Tabulation of estimates
5Approaches to organizing information holdings
- Decentralized
- In a completely decentralized approach, each
subject matter area maintains its own input data - Centralized
- Centralized data warehouse contains all input
data from all subject matter program areas - All program areas need to use common concepts and
standards for classification, or else a
concordance would have to be found among these
systems. - These are extremes along a continuum
6Centralized approach
- Advantages
- Economies of scale should lead to reduced overall
development and maintenance costs - Some human resource issues are eased (knowledge
and skills retention and transfer) - Eases integration of data to support data
analysis, coherence analysis, etc. - Allows subject-matter divisions to specialize in
data analysis rather than data management
7Decentralized approach
- Advantages
- Specialized subject matter expertise readily
available - Subject matter areas are not dependent on a
central authority to make changes therefore
flexibility is increased - Care and control of the data is clearly
established
8Questions to address in moving to a more
centralized environment
- What purpose does it serve?
- What must be done to the statistical model to
ensure compatibility with other data sources? - What mechanisms need to be in place to ensure
productive client-service relationship? - Who is custodian of the data?
- Do the benefits in moving to a more centralized
environment truly outweigh the costs?
9Statistics Canada and the Unified Enterprise
Survey Program
- In the late 1990s, Statistics Canada undertook a
major program to improve the quality of the
provincial economic accounts released by the
Agency and the annual business surveys that feed
into accounts - These surveys were integrated in order to
increase the quality of data produced from these
surveys in terms of - Consistency
- Coherence
- Breadth
- Depth
10Features of the UES
- Improved frame (business register)
- Sampling made to be consistent across surveys and
improved coverage - Harmonized content and common collection
applications - Administrative data are to be used instead of
survey data if possible and if the data are of
good quality - Common post-collection processing systems
- Common storage of data
- Central contact management system
- Improvements in outputs
11Moving to a more centralized environment
- What is the purpose?
- The UES data warehouse forms a repository of all
the files created through the processing phases
of UES and accompanying metadata. - This supports the work of analysts and survey
managers in subject matter divisions, collection
managers, statistical methodologists and users in
the System of National Accounts
12Moving to a more centralized environment
- What must be done to the statistical model to
ensure compatibility with other data sources? - The statistical model for UES surveys forced the
harmonization of concepts, definitions and
classifications across surveys - Integration of survey and administrative data
required the mapping of tax data to survey data
(harmonized conceptually as well as
characteristically)
13Moving to a more centralized environment
- What mechanisms need to be in place to ensure
productive client-service relationship? - Project management structure for the UES that
crosses functional boundaries - Change management function to ensure seamless
integration of surveys into UES
14Moving to a more centralized environment
- Who is custodian of the data?
- ESD controls access to all common systems.
- Subject matter divisions are exclusively
responsible for dissemination, including the
determination of aggregations and data
suppressions (due to quality and confidentiality)
15Moving to a more centralized environment
- Do the benefits in moving to a more centralized
environment truly outweigh the costs? - Reduction in development costs
- Development of best practices that can be shared
across the bureau - Single point of access for input data improves
security of all UES related data - Rationalization of hardware to minimize the
number of servers
16The UES Data Warehouse
- UES Warehouse is centrally managed within
Enterprise Statistics Division - Major components of the data warehouse include
- Metadata repository
- Processing metadata
- Central data store (CDS)
- External data
- Data that originate outside UES but have been
integrated in the UES framework
17The UES Data Warehouse
- Systems interfacing with the data warehouse
- Unified Tracking and Retrieval Tool (USTART)
- Integrated Questionnaire Metadata System (IQMS)
- UES Processing Interface
- Working Estimation Environment (WEE) interface
- Macro-data adjustment Facility
18Operational applications
- Operational monitoring
- Coherence analysis
- Baseline information for operational research
- Quality measures (i.e. response rate analysis)
- Integrated data analysis
19Response rates in collection
20(No Transcript)
21(No Transcript)
22Final response rates
23The centralized system in action
- Outcomes
- The centralized input data warehouse provides a
centralized tool that allows users to track
performance on a consistent basis - Same method
- Same source data
24Conclusion
- The centralized data warehouse offers benefits to
statistical programs - There are a number of conditions that must be
fulfilled for success - Purpose
- Data compatibility
- Client-service relationship
- Custodian of data
- Cost-benefit