Title: Metadata models to support the statistical cycle: IMDB
1Metadata models to support the statistical cycle
IMDB
- Alice BornStatistics Canada
- UNECE Workshop on Statistical Metadata
- July 4 to 6, 2007
2Outline
- Survey life cycle and the IMDB
- IMDB model
- Data dimension model
- Business dimension model
- Questionnaire model
- Registration
- Classification of administered items
- Use of metadata in the statistical system
3Role of the IMDB
- Information management interpretability of
Statistics Canadas 590 current surveys - Assist in coherence of the data
- Promote knowledge sharing across STC and with
external users - Preserve corporate memory
- Promote reuse of our metadata assets
4IMDB in the survey life cycle
Data Warehouses
Operations Management
Quality Assurance
Analysis
Dissemination
IMDB
IMDB
Metadata
Collect
Edit
Estimate
Tabulate
Publish
Design
Archive
Operational Data
Registers
Survey Data
Administrative Data
Operational Data Stores
5IMDB metadata model
- Corporate Metadata Repository (CMR), which is an
extension of ISO/IEC 11179 Metadata Registries - Statistical surveys
- Sample
- Questionnaire
- Data sets
- Products
- Systems
- IMDB data dimension, business dimension,
questionnaire model, administration and documents
model
6Data dimension model ISO/IEC 11179
Data Element
Data Element Concept
Object Class
Survey variable
Property
Conceptual Domain
Value Domain
7Data dimension model
- Currently in the IMDB
- 85 object classes (statistical units)
- 290 properties
- 506 data element concepts (O.C. property)
- 202 conceptual domains (representation class
property) - 1509 value domains (classifications)
- 1034 data elements ( representation class
property object class variables) - Type of revenues of establishments
8Business dimension model in the IMDB
Survey
Applications/Software
Survey instance
Frame and sample
Datasets
Questionnaire
Products(COR)
Data elements
Survey design
Value domains
9Administrative layer
Statistical Activity
Organization
Survey
Contact
Stewardship
Universe
Documentation
Frame
Identification
Survey instance
Identification
Time Frame
Instrument
Keyword
Question
Classification
Theme
Data file
Methodology
Data Element
Instrument designSamplingData sourceError
detectionImputationEstimationQuality
evaluationDisclosure controlRevisions and
seasonal adjustmentData accuracy
Data Element Concept
Administereditems
Object Class
Property
Formula
Conceptual Domain
Value Domain
10Information management - Administered items
- Any item that is managed, tracked, organized and
registered in a registry - Administered items have
- their own set of characteristics specific to the
administered item - and shared administrative characteristics which
are common to all administered items
administrative layer
11Information management - Administrative Layer
- Shared administrative characteristics
- Terminological Designation (Names)
- Terminological Description
- Time Frame
- Organization/Contact
- Reference Document1
- Version Management
- Stewardship/Registration
- Classification
- 1 Reference document is an administered item with
all the administrative layer characteristics.
12IMDB Administrative Layer - Version Management
- A snapshot of the information recorded for the
administered item. - Rules for creation of a version are established
for each type of administered item.
13Information Management - IMDB Administrative
Layer
- The administrative layer is used to manage
administrative information for all IMDB
administered items. - Administered items are managed in a consistent
manner.
14Surveys
- Metadata in the IMDB is organized around the
survey administered item - Refers to collection, compilation and publication
of data measuring characteristics of a population - Three types of surveys are recognized
- Direct
- Administrative
- Derived
15Statistical Activities
- Group of surveys that share common feature,
common explanatory text - E.g., System of National Accounts, Unified
Enterprise Statistics, Health Statistics
16Common metadata set
- Statistical activity
- Survey (direct, administrative, derived)
- Target population (population, statistical
unit) - Survey instance (each survey process)
- Collection instrument (questionnaire)
Methodology - Data accuracy
- Documentation
- Data file
- (Data elements, value domains)
17Common metadata set for survey life cycle
- Methodology
- Instrument design
- Sampling
- Collection method
- Error detection
- Imputation
- Estimation
- Quality evaluation
- Disclosure control
- Revisions and seasonal adjustment
18Questionnaire model
Question block Item_IDBlock_type, etc
Questionnaire Item_ID,etc
Data element Item_IDRepresentation_class, etc
Response choice Question_item_IDResponse choice,
etc
Question Item_IDDE_item_ID, etc
Value domain Item_IDVD_type, etc
19Questionnaire model in the IMDB
- Metadata for survey planning and design phase
- Does the concept or question already exist?
- Metadata discovery - STCWiki
- Align with output variables - definitions
- Harmonized Content Modules Project
- Content development of key socio-demographic data
elements (e.g., marital status, age, ethnic
origin) in IMDB for registration as a STC
standard - Leading to development of standard question
blocks and questions stored in the IMDB - Specifications (i.e., skip patterns, modes) /
BLAISE and other code stored in Survey
Specification Manager
20Registration/Stewardship
- Registration and stewardship information is
managed for each administered item - Who is the owner of the item?
- Who is responsible for the items information?
- Who is responsible for registration?
- Verification for editorial, accuracy, bilingual
conformance? - State new, candidate, recorded, qualified,
standard, preferred/prescribed standard, retired? - Degree of sharing/harmonization divisional,
branch, agency, provincial, national,
international? - Dissemination Internal, public?
- Versioning note
21Registration Attributes in the IMDB
- Three registration attributes
- Registration status identifies the quality or
progression of quality - Registration level level of conformance or
harmonization - Administrative status stage in the registration
process
221. Registration status
Registration Authority
Preferred standard
Retired
(Completeness, accuracy, adherence to quality and
terminological description standards)
Standard
Superseded
Standards Division Registrar
Qualified
Regular Registrar
Recorded
Responsible Owner (Content)
Candidate
Historical
Submitter
Steward
Incomplete
Application
232. Registration level
Level of conformance or harmonization
International
Departmental
U.S.
Recommended
Program-specific
Canadian
Survey
Provincial
243. Administrative status
Stages in registration process
De-registered
Registered
Reserved for edit
Not registered
New
25Classification of administered items
- Organization and classification of the
administered item - Keyword
- STC taxonomy (28 themes, 200 sub-themes)
- UNECE Classification of International Statistical
Activities data elements - Program Activity Architecture for reporting to
Treasury Board Secretariat and to parliament -
- Organization of the items administrative and
item-specific information for different purposes - HTML, Wiki, SDMX, CWM, DDI, XBRL.,
26Survey design and dissemination phases
Collect
Edit
Estimate
Tabulate
Publish
Design
Survey Universe Frame Instance Collection
Instrument Methodology Data Files Enterprise
Architecture
Concepts (Object Class, Property, Data Element
Concept) Data Elements Questions Questions
Blocks Classifications (Conceptual Domain Value
Domain)
27Reuse of Information Assets in Applications
Development
Classification coding
Collection instrument development Survey
Specification Manager Integrated Questionnaire
and Metadata System
Publishing
Other applications Software Register
28Reuse of Information AssetsIntegration with Data
Data Warehouses
CANSIM
29Reuse of Information Assets in Dissemination and
information discovery
- One meta data source
- many uses for the information
- many output formats
30Corporate Memory Data FilesDissemination and
archive phases
Operational Data
Registers
Survey Data
Administrative Data
Operational Data Stores
Public Use Master File
Archival information
Clean Master File
Archived Data