SDD - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

SDD

Description:

2002 Fall: Brazil. 2003 Spring: Paris. 2003 Fall: Lisbon. 2004 Spring: Berlin ... pathology, archeology, musical instruments, restaurants... Multilingual ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 32
Provided by: gregorh
Category:
Tags: sdd | brazil | music

less

Transcript and Presenter's Notes

Title: SDD


1
SDD
Structured Descriptive Data
(Talk held Tuesday, 2004-10-12, at the TDWG 2004
meeting in Christchurch, New Zealand)
2
What are descriptive data?
  • Descriptive data inform about the state of
    repeatably observable, inherent properties of
  • objects ( individual organisms)
  • classes ( taxon)
  • Specimens in natural history collections are
    special named cases of such instances.
  • SDD also considers descriptions of observed
    objects / field data.

3
What are descriptive data?
  • Not limited to morphological / ultrastructural
    data. However, these are important because they
  • have a long history of use in biology, reflected
    in a huge knowledge base
  • in most cases are easily observable
  • are easily memorized by human beings
  • Specifically, the definition includes
  • chemical/enzymatic features
  • molecular data like nucleic/proteinsequences,
    RFLP/AFLP patterns, etc.
  • behavior patterns

4
Why are people doing it?
  • The driving force behind most of the interest in
    descriptive data is the identification of
    organisms

5
Why are people doing it?
Phylogenetic Relationships (cladograms)
Taxonomic concepts (species, genera etc)
The real world (The Field)
(From talk by K. Thiele, Library of Life, 2003)
6
Why are people doing it?
Phylogenetic Relationships (cladograms)
Taxonomic concepts (species, genera etc)
Maps field guides, floras, monographs etc
The real world (The Field)
(From talk by K. Thiele, Library of Life, 2003)
7
Why are people doing it?
TOL Tree of Life
GBIF-Specimen Services
GBIF-Names Services
Library of Life?Key to Life?
The real world (The Field)
(From talk by K. Thiele, Library of Life, 2003)
8
Species diversity and estimated completeness by
taxonomic groups
Describedand named
Unknownto science!
Purvis Hector 2000
9
Description of new Species?
  • mycologists inadvertently redescribe already
    known species at the rate of about 2.5 1
    (Hawksworth 1991)

10
Kind of descriptive data
  • Terminology
  • Principal definitions of terms in natural
    language (glossary/ontology)
  • Operational definitions of preferred / selected
    terms (character/state)
  • Can be defined by scientist but voluntary
    standardization recommended
  • Coded description
  • Like a taxon character spreadsheet (with
    multiple values per cell)
  • Alternative paradigm a questionnaire form

11
Example form
12
Kind of descriptive data
  • Terminology
  • Principal definitions of terms in natural
    language (glossary/ontology)
  • Operational definitions of preferred / selected
    terms (character/state)
  • Can be defined by scientist but voluntary
    standardization recommended
  • Coded description
  • Like a taxon character spreadsheet (with
    multiple values per cell)
  • Alternative paradigm a questionnaire form
  • Natural language description
  • Traditional free-form text not the ideal form,
    but important for legacy publications and
    low-learning-curve collaborations (e. g., through
    WIKIs)
  • Can be dynamically generated if coded
    descriptions and terminology wordings are present

13
Example form
14
Kind of descriptive data
  • Terminology
  • Principal definitions of terms in natural
    language (glossary/ontology)
  • Operational definitions of preferred / selected
    terms (character/state)
  • Can be defined by scientist but voluntary
    standardization recommended
  • Coded description
  • Like a taxon character spreadsheet (with
    multiple values per cell)
  • Alternative paradigm a questionnaire form
  • Natural language description
  • Traditional free-form text not the ideal form,
    but important for legacy publications and
    low-learning-curve collaborations (e. g., through
    WIKIs)
  • Can be dynamically generated if coded
    descriptions and terminology wordings are present
  • Stored / static identification keys
  • Traditional printed di-/polychotomous keys
    legacy data
  • Can be dynamically generated if coded
    descriptions and terminology wordings are present
    but manual keys may be better!
  • Tool to capture taxonomists intuition about
    preferred paths

15
Background
  • DELTA
  • Well used standard (gt 25 years old!)
  • Quite complex gt 170 directives
  • Legacy problems, outgrown
  • Some principal limitations
  • Natural language descriptions and printed
    dichotomous keys from coded data
  • Interactive identification from coded data
  • DELTA II proposal as extension of DELTA
  • Inclusion of taxon names, literature, etc.
  • Other relevant standards
  • Lucid Interchange Format (LIF, identification)
  • NEXUS (phylogenetics)

16
Background SDD
  • Initiated in 1999 as a revision of DELTA in xml
  • Took much longer than expected
  • Recently 2 yearly meetings
  • 2002 Fall Brazil
  • 2003 Spring Paris
  • 2003 Fall Lisbon
  • 2004 Spring Berlin
  • 2004 Fall Christchurch
  • Currently we call SDD 1.0 beta fairly complete
  • Criticism trap need experience!
  • This week
  • Refocus and scale down to a light version 1.0?
  • Preserving forward compatibility!
  • Release complete schema in parallel as 1.1?

17
SDD Schema design philosophy
  • Strongly typed
  • Close to object-oriented programming types
    correspond directly to OO-classes
  • Using schema inheritance mechanisms to promote
    extensibility and ease of evolution
  • Attempt at intuitiveness of type/element names
  • Less concerned with human-readability and
    compact xml-text
  • Using object-relations!
  • Definitions with id, references with a ref
    attribute instead of labels. Validated by
    identity constraints ? use in correct context is
    validated. To ease OOP, IDs are also typed
    (CharacterRelationID, etc.).

18
Some SDD Requirements
  • Should be complete format for scientific data
    not optimized for a specific purpose
  • Lucid LIF is optimized for simple identification
    data already pre-processed during building
    process
  • Should describe taxa, specimens, observations,
    media such as images
  • Structured, analyzable state annotations or
    nuances (in addition to free-form notes)
  • Not bound to biological knowledge domain
  • Medicine, pathology, archeology, musical
    instruments, restaurants
  • Multilingual design
  • Already DELTA was able to handle multiple
    languages. Problems with comments were to be
    addressed in DELTA II.
  • SDD (not UBIF!) extends language with audience

19
UBIF
  • SDD has a strong need to use objects from other
    knowledge domains
  • class names and hierarchy
  • collected or observed specimen objects
  • agent data (person/organization)
  • geographical data
  • publication (description may be digitized from a
    publication or cite published information)
  • media resources for example images
  • If all these would be used as fully developed
    data models (ABCD, TCS, MARC) SDD would become
    even more complex than it is.

20
UBIF
  • Instead, we need a simplified abstraction
    framework
  • This is not specific to the purpose of SDD and
    should be elaborated with other modeling groups
  • UBIF Universal Biosciences Information
    Framework is an attempt to do so
  • UBIF is under development SDD, ABCD, TCS
  • For the purpose of SDD, it is best to learn UBIF
    by example!
  • See also separate talk about UBIF held at TDWG
    2004!

21
SDD is a UBIF application
UBIF proxy data
22
UBIF provides Metadata and Relations to external
objects
23
SDD inside UBIF
SDD inside UBIF
24
SDD Schema
25
Character types
  • Different kind of characters are being used
  • Categorical characters
  • Enumeration of values defining categories
  • Examples redgreenblue, broadnarrow,
    alternateopposite
  • Quantitative (numerical) characters
  • Measures with or without measurement unit
  • Actual values or statistical summary like mean,
    std. dev., sample size

26
Example form
Categorical and Quantitative (
numerical)characters in conventionalDeltaAccess
web forms
27
Character types
  • Different kind of characters are being used
  • Categorical characters
  • Enumeration of values defining categories
  • Examples redgreenblue, broadnarrow,
    alternateopposite
  • Quantitative (numerical) characters
  • Measures with or without measurement unit
  • Actual values or statistical summary like mean,
    std. dev., sample size
  • Color range measurements
  • Color range defined as area in color space
  • (Aside Use triangle, circle, polygon?)
  • Parameterized/numeric shape functions
  • Various molecular data
  • specially structured data for AFLP pattern
    data, etc.
  • Multiple characters may be needed
  • Nothing yet done only conceptual to make sure
    SDD is extensible!
  • etc.

28
Abstract characters
  • Different character types all are based on an
    abstract character
  • Categorical characters
  • Quantitative (numerical) characters
  • Color range measurements
  • Currently as choice, rather than using xml schema
    xsitype mechanism social question!
  • However, OOP software would benefit from
    implementing it through type polymorphism

29
Mapping between character types
30
Modifiers
  • Modifiers act on statements
  • characters has state / measure
  • variable has value
  • Examples
  • frequency rarely, usually, often
  • probability perhaps, certainly, and (special
    case of certainly not) by misinterpretation
  • spatial at the top, at the base
  • temporal in spring, in autumn, when young
  • degree strongly
  • Abstract base type and derived concrete types
    defined ? extensible
  • Complication character modifiers vs. state
    modifiers

31
Frequently asked questions that have never been
asked
  • SDD is intimidating
  • When you find things unintuitive do ask
  • Nobody is expected to have read all the docu
  • This meeting is for review, not politeness
Write a Comment
User Comments (0)
About PowerShow.com