Title: Combining%20Metadata%20Standards:%20Approaches%20and%20Benefits
1Combining Metadata Standards Approaches and
Benefits
- Arofan Gregory
- Open Data Foundation
2Overview
- Recent events of interest
- The Standards Comparison and Explanation
- Emerging Implementation Approaches
- DDI and SDMX
- SDMX and the Semantic Web Technologies
- Classifications Multiple Standards
- Ideas about Future Work
3Recent Events of Interest
- Note Some of these events/implementations have
been or will be described in detail in other
papers they are only mentioned here. - Schloss Dagstuhl, Germany, November 2009 (DDI 3
Workshop) - SDMX 2.0 DDI 3 field-level mapping work started
- Topic DDI and the Semantic Web???
4Recent Events of Interest (2)
- Semantic Web and SDMX
- ONS hosted 2-day meeting in the UK, February 2009
(produced draft SDMX-RDF) - Banca dItalia has a prototype project
- New project launched at University of Tillburg in
the Netherlands (RDF expression of OECD SDMX
data) - Australian Bureau of Statistics (ABS) starts
looking at SDMX and DDI to support data
production lifecycle - Prototype implementations
- Some other NSIs also very interested
5Recent Events of Interest (3)
- Classifications and ISO/IEC 11179
- Australia Government agencies looking to
exchange classifications with ABS from existing
ISO/IEC 11179 system, using SDMX, DDI - Statistics Canada Evaluation of IMDB (ISO/IEC
11179-based metadata repository) for use in
coordination with Canadian RDC Network (based on
DDI 3)
6What Does This Mean?
- Not a complete list of events/implementations,
but - Indicates the interest we are seeing in the
combined use of standards! - These are not just experiments!
- Organizations are looking at implementation in a
serious way now
7Characterizing the Standards
- SDMX
- Data structures and formats
- Reference metadata structures and formats
- Web-services architecture based on registry
services - Content-oriented gudelines
- ISO/IEC 11179
- Model for managing concepts and data elements
- Metadata registries and lifecycle
- ISO 19115
- Standard metadata model for geographies
- Used by DDI as geographical model
8Characterizing the Standards (2)
- Dublin Core
- Citation metadata
- Widely used in the Semantic Web
- Used natively by DDI for citations
- Semantic Web/ Linked Data / RDF
- See Open Issues on the Semantic Web
- DDI 3
- Will give more detail, as it is not as familiar
to the METIS community
9Characterizing the Standards (3)
- DDI 1./2. was a standard used by archives and
data libraries - Based on a codebook model
- Used by some NSIs, especially in the developing
world because of the IHSN Metadata Management
Toolkit - Used by the European network of data archives,
CESSDA - Used by many data archives in North America
- Documentation of a single Study (survey)
- Designed to help researchers find and use
microdata - DDI 3 is more ambitious capture and use of
metadata throughout the entire data lifecycle
10DDI 3 Lifecycle Model
Notice This is very like a high-level view of
the METIS model!
11Characterizing the Standards (4)
- DDI 3 provides machine-actionable metadata to
support metadata-driven systems throughout the
lifecycle - Focus is on upstream metadata capture and reuse
- Describes tabulation/aggregation of microdata
- Provides support for comparison across surveys,
detailed geography, data processing, register
data - Aggregate NCube model aligned with SDMX
- No architecture/web services support (yet)
12An Observation
- It is easy to say that two standards are
aligned - Many of these standards were intentionally
aligned as they were developed - It is much more difficult to understand how to
use them in combination effectively
13Approaches and Benefits
- SDMX and DDI
- DDI microdata production/SDMX aggregate
dissemination - Using SDMX data in DDI-based systems (combining
aggregates and microdata) - Combined SDMX/DDI supporting the entire data
lifecycle - DDI register data reported to SDMX collection
system - SDMX and the Semantic Web
- Classifications and the Standards
14DDI 3 Metadata
Surveys
Input data
Dissemination data
Registers
Cleaning, editing, estimation, aggregation, etc.
Website/Web Service
SDMX-ML Data, Metadata, Structure
15DDI SDMX Benefits
- The benefits of this approach are those found by
using the standards generally - Supports metadata-driven system for data
production throughout the lifecycle (DDI) - Metadata-rich dissemination format, preferred by
data collectors (SDMX) - Shared tools SDMX registry services, Web
Services for discovery and use of aggregates
16SDMX DDI Integrating Aggregates and Microdata
- Scenario is common in some research
- Economic data is often only available as
aggregates - Challenge is to combine aggregates and other
microdata
17SDMX Web Service
SDMX-to-DDI 3 Transform
Data archive/ repository
Surveys
(DDI 3)
Processing to produce Integrated data and
Metadata (DDI 3)
Registers
(DDI 3)
18SDMX DDI Benefits
- Allows for easy use of official statistics by
researchers - Solves problems of combining aggregates and
microdata - Note This does not involve dis-aggregation of
published data - Structural transformation only, to allow DDI 3
systems to process aggregates easily
19DDI SDMX The Data Lifecycle
- Uses a metadata model capable of expression as
either SDMX or DDI, depending - Provides support for process management
- Uses many features of SDMX (process model,
structure sets, reporting taxonomies, etc.) - Uses SDMX architecture/services model
- Designed to allow incorporation of other standards
20Process-management system
(BPML)
All registry interactions use SDMX
(SDMX)
Dissemination data store
Input data store
SDMX Registry
Surveys
(DDI 3)
Web site/ Print/ Web Services
Registers
(DDI 3)
Interactions between systems are DDI or SDMX Web
Services, as appropriate
(SDMX, DDI, etc.)
Data and metadata repositories/ application
databases
21SDMX DDI Benefits
- Leverages Web-Services technologies (registry,
event triggers, etc.) for efficient automation,
migration, flexibility - Choice of tools is broad
- Use the best format for any given task
- All the benefits of DDI-SDMX case
- Good support for process management as well as
data management
22SDMX and the Semantic Web Technologies
- Potentially applies to other standards as well
(DDI, ISO/IEC 11179, etc.) - Note that Semantic Web technologies only apply to
dissemination - Not designed to support data production
- Terms
- Raw data in an SW context does not mean raw
data - Data in an SW context means anything that can
be described using RDF not numeric data
23Assumptions
- Creation of a harmonized statistical model based
on proven models/standards, but expressed as RDF
(ontology or vocabulary in SW terms) - Implementation of an SDMX-RDF in standard SDMX
dissemination packages
24Internal (production environment)
External (dissemination to Web)
Triplestore (SDMX- RDF)
SDMX-RDF Transform
(SPARQL Queries)
(RDF)
(SDMX-driven production system)
SDMX Web Service
(SDMX-ML)
Dissemination data store (SDMX)
25SDMX and the Semantic Web Benefits
- Leverages the Linked Data phenomenon without
requiring a deep understanding of RDF, etc. - Uses existing standards/models and best practices
to do heavy lifting (data production) - Puts a lot of reliable, quality data into the
Linked Data Web - Helps address issues of provenance
26Warning
- RDF is verbose!
- 4.5 Megs of GESMES/TS 45 Megs of compact
SDMX-ML XML 420 Megs of RDF triples - This may encourage the on-demand production of
RDF data from web services, rather than static
files
27Standards and Classifications
- Some maintainers of standard classifications are
looking at expressing them in useful formats
(SDMX, DDI) - This is an easy thing to do
- It is very useful promotes re-use,
comparability, etc. - Could apply to Semantic Web RDF expressions as
well as XML-based standards
28Ideas for Future Work
- Endorse SDMX DDI mappings now being produced
- Develop an SDMX-RDF (?) or
- Develop a harmonized statistical model for
expression in RDF (based on DDI, SDMX, ISO/IEC
11179) (?) - Encourage tools developers to implement it in
standard dissemination packages - Publish standard classifications in standard
formats
29Summary
- Combined use of standards is becoming a reality
- Proactive engagement with the Semantic Web world
could provide benefits to all concerned parties,
as well as users