A Semantic Modelling Approach to Biological Parameter Interoperability - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

A Semantic Modelling Approach to Biological Parameter Interoperability

Description:

BODC and Rijkswaterstaat both have marine databases holding a wide range of ... Error prone and 500 entries is pushing the limit of human endurance! Semantic Matching ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 18
Provided by: royl150
Category:

less

Transcript and Presenter's Notes

Title: A Semantic Modelling Approach to Biological Parameter Interoperability


1
A Semantic Modelling Approach to Biological
Parameter Interoperability
Ocean Biodiversity Informatics
Roy Lowry Laura Bird British Oceanographic Data
Centre Pieter Haaring RIKZ, Rijkswaterstaat, The
Netherlands
2
Presentation Overview
  • The nature of the problem
  • Dictionaries and data models
  • The starting position
  • Manual mapping
  • Automation through semantic matching
  • From dictionary to semantic model
  • Mapping semantic models
  • Semantic model applications
  • Conclusions and lessons learned

3
The Nature of the Problem
  • BODC and Rijkswaterstaat both have marine
    databases holding a wide range of physical,
    chemical and biological parameters
  • Both were to be included pan-European
    metadatabases (EDIOS and SEA-SEARCH CDI) using a
    common discovery vocabulary
  • BODC set up the vocabulary and obviously included
    a mapping to the BODC Parameter Dictionary
  • Problem arose of how to provide a similar mapping
    for the Rijkswaterstaat
  • If the Rijkswaterstaat data markup vocabulary
    could be mapped to the BODC Parameter Dictionary
    then the BODC discovery vocabulary mapping could
    be used

4
Dictionaries and Data Models
  • BODC systems have roots in the GF3 model, which
    means
  • Data values are linked to a parameter code
  • Parameter code is defined in a Parameter
    Dictionary
  • The parameter code specifies more than one
    metadata item for the data value
  • For chemical and biological data more than one
    becomes a lot

5
Dictionaries and Data Models
  • Rijkswaterstaat uses data models (DONAR becoming
    WADI)
  • Measurements are accompanied by attributes
    containing specific atomic metadata items
  • Each attribute is populated from a controlled
    vocabulary
  • DONAR constrains attribute term combinations
    using a parameter dictionary concept
  • WADI reduces maintenance overheads by allowing
    any combination

6
The Starting Position
  • BODC
  • Parameter Codes defined by two plain-text fields
  • Related semantic information not necessarily in
    the same field
  • Fields would not concatenate sensibly
  • OK for humans, but not for machines
  • Rijkswaterstaat
  • Consistently located semantics
  • Metadata fields that concatenate sensibly in both
    Dutch and English

7
Manual Mapping
  • Manual mapping protocol
  • For each entry in the Rijkswaterstaat
    dictionary spreadsheet
  • Look up code with identical meaning using BODC
    Dictionary search tools (Access Filter by Form)
  • If found
  • Copy BODC code from Access and paste into
    spreadsheet
  • Else
  • Prepare dictionary update record and submit for
    QA and load
  • Error prone and 500 entries is pushing the limit
    of human endurance!

8
Semantic Matching
  • When code lists run into thousands, automation is
    required
  • Rijkswaterstaat developed a semantic matching
    tool to pull matching terms (preferably one) from
    the BODC dictionary
  • Defeated by the lack of standardisation in the
    BODC plain-text fields e.g.
  • Calanus abundance
  • Abundance of Calanus
  • Calanus count
  • Number of Calanus

9
Dictionary to Semantic Model
  • Became apparent that the BODC Dictionary required
    significant improvement if it was to support
    mapping automation
  • Development strategy was to model the parameter
    code in the same way DONAR models a measurement
  • Semantic model developed to cover all codes in
    BODC Dictionary

10
Dictionary to Semantic Model
  • Semantic model developed from DONAR with an
    increased semantic element count to overcome
    shoe-horning
  • Principle that semantic elements may be combined
    automatically to produce text descriptions
    maintained
  • Currently implemented as three sub-models
  • Element superset will ultimately be created as a
    single model

11
Dictionary to Semantic Model
  • Biological sub-model semantic elements
  • Parameter (Abundance, Biomass)
  • Taxon_code (ITIS code)
  • Taxon_name
  • Taxon_subgroup (gender, size, stage)
  • Parameter_compartment_relationship (per unit
    volume of the, per unit area of the)
  • Compartment (water column, bed, sediment)
  • Sample_preparation
  • Analysis
  • Data_processing
  • Needs further refinement e.g. subdivide
    Taxon_subgroup

12
Mapping Semantic Models
  • Two stage process
  • First map the semantic elements
  • DONAR Parameter BODC Parameter
    Parameter_compartment_relationship
  • DONAR Compartment BODC Compartment
  • Then map vocabularies for mapped elements
  • Surface water water column
  • Relational database designers will recognise this
    as normalisation

13
Mapping Semantic Models
  • Number of look-ups required is reduced by an
    order of magnitude
  • Vocabulary elements have simple semantics so
    automation is possible
  • Approximately 90 of the Rijkswaterstaat to BODC
    mapping accomplished by a single SQL statement
  • Straightforward extension of vocabulary maps
    (different names for same thing) sorted out most
    of the rest
  • Thesauri could help reduce the need for this

14
Mapping Semantic Models
  • Hard Core problems required manual resolution
  • Unclear or ambiguous semantics in Rijkswaterstaat
    element vocabularies (residual beta)
  • Problems with Dutch to English translation
  • Some mapping errors were detected
  • Caused by homonyms (Branchiura)
  • Emphasises the need for more than just a name for
    a taxon (reference or ITIS code)

15
Semantic Model Applications
  • Semantic modelling is a lowest common denominator
    approach to metadata
  • This is what makes it good for mapping
  • The approach also offers the basis for
    user-controlled data discovery and
    interoperability
  • User chooses the semantic element subset
  • User data selection interaction based on the
    subset vocabulary
  • Automated interoperability requires more
    sophistication (thesauri, ontologies)

16
Conclusions
  • Dont even think about manual mapping of large
    parameter dictionaries
  • 99 of a map is completed in the first 10 of the
    time
  • More standardisation means fewer errors and
    problems
  • Semantic model vocabularies need ontologies and
    thesauri to achieve their full interoperability
    potential

17
Conclusions
  • Semantic modelling works for mappings between
    dictionaries and data models
  • It also has great potential for parameter
    discovery and interoperability
Write a Comment
User Comments (0)
About PowerShow.com