Content aggregation and information re-use - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Content aggregation and information re-use

Description:

Helmholtz Association big infrastructure labs. AWI - RV 'Polarstern' (100 M ) and stations ... WDC-Mare. Ana Macario, Bastian Onken and Hans Pfeiffenberger ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 30

Provided by: dranam

Category:

more less

Transcript and Presenter's Notes

Title: Content aggregation and information re-use

1
Content aggregation and information re-use
Ana Macario, Bastian Onken and Hans
Pfeiffenberger Alfred Wegener Institute for
Polar and Marine Research
2
About us

Helmholtz Association big infrastructure labs
AWI - RV Polarstern (100 M) and stations
400 scientists
50 TB of ship- and station-generated datasets-
among those up to 100 years old time series
Computer centre in charge of supplying
IT-part of productive working environment
preservation of valuable or at least costly
datasets - since finished Ph.D.s dont care
(almost)
gt mostly in that order of precedence
We try to take the middle ground at the
institute as well as here

45 Gt/a primary production 50 of living matter
Why Plankton ??
3
Road map

EU-project PlanktonNet
Introduction to taxonomy
Rich content
Towards NOA for PlanktonNet

4
Background
Early 2004, AWI started a small project with MBL
to archive images and taxonomic keys/descriptions
for phytoplankton found in the North Sea

-gt 2 year EU project (acronym Plankton-Net)
with 6 partners AWI Marine Biology Lab, Woods
Hole Station Biologique, Roscoff Universidade
de Lisboa IPIMAR, Lisbon Natural History
Museum, London
-gt Original scope to create a network of
interoperable repositories on plankton taxonomy
-gt Motivation to give taxonomists support in the
hard task of identifying species and to rescue
historically relevant collections
-gt Scope keeps growing information system which
aggregates taxonomic content, descriptions,
assets (images, documents), environmental and
molecular data, annotations, etc and supplies
an interactive environment for contributing

5
Road map

EU-project PlanktonNet
Introduction to taxonomy
Rich content
Towards NOA for PlanktonNet

6
Taxonomy and its challenges

Information about organisms is often linked to a
name. This can create problems in information
retrieval
one taxon can have many names
the same name can refer to many taxa

7
Taxonomic Name Server

The uBio Taxonomic Name Server (MBL-WHOI Library,
Woods Hole, USA), implemented as a web service,
acts as a name thesaurus. Two services are
offered
NameBank is a repository of millions of recorded
biological names and facts that link those names
together
ClassificationBank stores multiple
classifications and taxonomic concepts that are
the result of expert opinions. It extends the
functionality of NameBank.

8
Whats in a name?
Scientific names evolve over time as specimens
names are updated over the years. When dealing
with vernacular (common) name, the problem is
even more difficult given the fact that it may
appear in several languages
nameBank
9
Whats in a classification?

ClassificationBank is a taxon concept server

10
Road map

EU-project PlanktonNet
Introduction to taxonomy
Rich content
Towards NOA for PlanktonNet

11
What is the content of PlanktonNet?

Data and meta-data associated with organisms
(taxa)
by value
descriptive metadata (Darwin Core schema)
Images, SEM photos, schematic drawings, etc
Annotations
by reference linkout, include via Web-Service
Taxonomic keys, synonyms and classification
Bibliographical references
Geo-referenced environmental data
Molecular data

12
http//planktonnet.awi.de
from BioPedia, re-use via WS
to PANGAEA, WDC-Mare
quality linkouts
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17

This is the working, local prototype (not a
vision !!)
It has been fitted with an OAI-PMH module, to
enable it as a data-provider

18
Road map

EU-project PlanktonNet
Introduction to taxonomy
Rich content
Towards Network Overlays for PlanktonNet

19
Rich Content for and from PlanktonNet
SP
Planktonnet_at_AWI
Planktonnet_at_Roscoff ..
OAI-PMH ()
DP
DP
20
Reality check

Highly heterogeneous information systems
Metadata harvesting is problematic lacking
OAI-PMH compliance
Providing web services is not standard
Schema use is not standard crosswalks
problematic
Why RDF-Ontology (and such things) when one can
do tagging (and annotation) with Flickr?

21
(No Transcript)
22
(No Transcript)
23
Short-term goals

Create a central catalog with Dublin Core
metadata as minimum and Darwin Core as an
extended metadata format for PlanktonNet
Harvest all PlanktonNet data providers (with
respective set information) using OAI-PMH
Long-term archival of all harvested records in a
repository
Create a portal for accessing the locally
harvested items as well as remote ones

24
Short-term goals (cont.)

Limitations
Only metadata is harvested
Relationships limited to collectionlt-gtitem
Restricted only to publicly available items
No support for collaborative work (e.g.,resource
annotation/revision)

25
Long-term goals

Harvesting of metadata AND data (images,
documents, etc) associated with a given resource
relevant for preservation / mirroring purposes
Branding as a result of targeted quality
control of metadata from field experts
workflow needed
Versioning and traceability
Access control policies at item level

26
Long-term goals (cont.)

Expression of rich relationship
beyond simple collectionlt-gtitems (e.g.,
structural, equivalence and annotation type of
relationships)
Combine and disseminate harvested content
with other, re-used content in flexible ways -gt
foundation for a rich service offering
gt Networked Overlay Architecture (NOA) with
FEDORA

27
Conclusions

Ontologists can learn from hundreds of years of
taxonomy
Though an old field, information is a moving
target (preservation vs. improvement ?)
Where is the (inter-)action happening ?
What (where and when) do we preserve ?
We believe that the visions and concepts of
Fedora and NOA are appropriate to the problem
The scope of the problem and user ambitions have
to be contained and satisfied in stages

Thank You !
Questions ??

29
Branding and taxonomy

Traditional field dates back 4th century BC
Specimen identification is not straight forward
world-wide experts on a class or genus level
Information quality relevant in several cases
(e.g., harmful algae blooms and associated
health consequences)
Revision/annotation as unstructured metadata
about a resource
Information on both metadata provenance and
annotation provenance are relevant for branding
Type of desired queries
Find resources contributed by ...
Find resources revised / annotated by ..., etc

gt Versioning, traceability

Write a Comment

User Comments (0)