Title: Creating a Taxonomic EScience
1Creating a Taxonomic E-Science www.cate-project.or
g
- Taxonomic resources (basic biodiversity data) are
fragmented - Getting access to taxonomic/biodiversity data is
difficult
2CATE team
3- Where we are now
- Fragmented resources
- literature
- collections
- Websites
- expertise
- Largely a paper medium
- Results inaccessible
- Where we want to be
- Better access through
- Authoritative hubs (like CATE)
- In the context of distributed systems
- Taxonomy easier to update
- Coordinated effort
- Web (or dual) medium
- Free access to data
- And the purpose of taxonomy better understood
4Fragmentation a paper system with the
information scattered in time and across many
journals
- Names in formal (binomial) system Linnaeus, 1753
(plants) 1758 (animals) - ca 250 years worth of
- taxonomic literature
- Principle of Priority in nomenclature first
use of a name stands (typically) - Names remain part of
- the taxonomic system
Carl Linnaeus 1707 - 1778
5Fragmentation up to 3 billion specimens
scattered across many collections
- National collections (typically museums
herbariums) - Provincial collections
- Municipal collections
- University collections
- Private collections (since cabinets of
curiosities)
6Collections and associated data go back a long
way
Contemporary arrangements same styles of
storage, but better metadata
James Petiver 17th Century
Carl Clerck 18th Century
7Taxonomic expertise is also fragmented
- Expertise within a nation or institution does not
cover all groups of organisms - Uneven representation of taxonomists against size
(no. of species) of taxa - Many amateurs independent and often working in
relative isolation
8Barcode of Life, Lepidoptera projects (DNA
identification of species) Bill's Lepidoptera
Photos (mostly eastern U. S.) Bioimages
(Virtual Field Guide, United Kingdom)
Butterflies and Moths (describes anatomy,
physiology, ecology) Butterflies and Moths in
the Netherlands Butterflies and Moths of
Southern Vancouver Island (Canada) Butterflies
and Moths of the World (Natural History Museum,
London) Butterfly and Moth Resources (very
extensive) The Butterfly Site Caterpillar
Identification Guide Caterpillar
HostplantsDatabase Caterpillars of Eastern
Forests (North America) Caterpillars, Pupae,
and Adults (Costa Rica) Caterpillars of La
Selva (Costa Rica) Caterpillars of Northeastern
and Mid-Atlantic United States Caterpillars of
the Pacific Northwest (North America) Colour
Atlas of the Siberian Lepidoptera Catalogue des
Lepidopteres des Antilles Francaises Clemson
University Arthropod Collection Database (South
Carolina, USA) David's Butterflies and Moths
(Ontario, Canada Biology, Photos, etc.)
Endangered/Threatened Species List, U. S.,
(for Invertebrates) Georgia Lepidoptera
(Southeastern U. S.) Global Lepidoptera Names
Index Gordon's Lepidoptera Page (much general
information) Illinois Lepidoptera Database
Insectes du Quebec sur Internet International
Lepidoptera Survey Iowa State University's
"Entomology Index" (very extensive set of links)
Iridescence in Butterflies and Moths John
Abbot Watercolors of Lepidoptera (Southeastern
U.S.) Journal of Research on the Lepidoptera
(back issues in PDF format) Kimmos Lep Site
(Northern European larvae and microlepidoptera)
Larval Food Plant Information Lepidoptera of
Finland (and other holarctic lepidoptera)
Lepidoptera of Hungary Lepidoptera of Siberia
and Central Asia (or mirror site in North
America) Lepidoptera of Southeastern Arizona
LepIndex (The Global Lepidoptera Names Index,
Natural History Museum) Moths and Butterflies
of Europe Nomina Insecta Nearctica
(Lepidoptera section--list of names for North
American Leps) Online Library of Lepidoptera
Resources (access to many published journal
articles) Pherolist (Lepidopteran sex
pheromones) SA Leps Online (Current news about
Lepidoptera) Season Summary of the
Lepidopterists' Society Schmetterlinge
Butterflies of Germany
Fragmentation alive and well on the web Over
350 websites for Lepidoptera!
9- Inaccessible data
- Print based (often old literature) but few
comprehensive libraries - Collections highly fragmented
- Consequences
- High chance of misidentification and creation of
synonyms - Users other than taxonomists are frustrated and
ignore taxonomy or use the most accessible, but
not necessarily the best, results
10Volume of data and data-diversity growing.
Taxonomy an information science?
Phylogenetic data
Descriptive data (from Linnaeus)
Molecular data
11Solutions
- Digitising and linking taxonomic infrastructure
- BioCASE unit-level and collection-level data
- Species 2000
- LepIndex online nomenclature archive
- Linking distributed sources of wider information
- I-Species
- AntBase
- Creating hubs with high quality, comprehensive,
consensus taxonomic treatments - CATE
12Where can I find specimens of species X? Where
is the type specimen of species Y? What
geographical data are associated with specimens
of species A in European collections?
13- Species 2000
- A federation of Global Species Databases
-
- Of basic nomenclatural information
- Databases peer reviewed
- Real-time access to GSDs
- Annual fixed edition
- Online version of Lepidoptera nomenclatural card
archive in the Natural History Museum, London.
Index cards - 264,207 species-group names of Lepidoptera
- About 10 of the described species of animals
14Weaving data from different sources
Nature 439, 6-7 (5 January 2006)
- antbase.org
- access to data on the ant species of the world
- collaborative effort aiming at providing the
best possible access to the wealth of information
on ants - converting literature into machine-readable
online descriptions
iSpecies Builds species pages from GenBank,
Google Scholar, Yahoo images
15Nature, 2002. 417 17-19
- comprehensive single site on Web
- self contained, consensus treatment
- Web-revision should replace printed version
- draft Web-revision for community
- comment/review
- Web-published as unitary revision
- nomenclature fixed from date of
- publication
- a hub, but with links to other data sources
- - further study leads to next version
- - lack of achievable goals
- too much legacy data
- too little data on the web
- fragmented
16specifics
- Not a mashup - takes a warehouse approach
- but accessible within distributed system
- Authoritative, archived revisions of two
demonstrator taxa freely accessible on the web. - Sphingidae (Hawkmoths)
- Araceae (Aroids)
- Why these? Manageable conspicuous broad
community - professional and amateur interest
17(No Transcript)
18(No Transcript)
19generalities
- High quality data
- Peer review Editorial Board for each taxon
- Must satisfy current botanical and zoological
codes of nomenclature - Consensus taxonomy controversial
- but alternatives included on unpublished part
of website - Community involvement
- Encourage critical comment on content
- Additional data
- Engender a sense of common ownership
20Publication
Review
Current Consensus
Working Draft
Alternative Theories Proposed Revisions
V 1.0
V 1.1
Aus bea L.
Aus bea
Xus beus
Contributions Images, Specimens Observations
21But sustaining the databases will be a big
challenge
Nature, 2005. Vol 435 110
22The complexity of taxonomic/biodiversity data
Date of description
Species name
Locality
Author of taxon
Observations
DNA barcodes
(subjective)
Date of specimen collection
Species concepts
Genus name (for binomial)
Synonyms
Type specimen
Images
Time of specimen collection
Name of collector
Homonyms
23A taxonomic concept is composed of lots of
information
Locality
Name
Has type
Was collected
Has a
Is circumscribed by
Is circumscribed by
Specimens
Description
Concept
Has related
Data
Literature
24Standards are being developed to facilitate
interchange of this data
Open Geospatial Consortium Standards
Locality
Nomenclator
Structured Descriptive Data
Taxon Concept Schema
Collection
Description
Concept
Distributed Generic Information Retrieval
Access to Biological Collections Data
Literature
Data Provider
25- Some final thoughts
- Biodiversity Informatics is developing rapidly
- New technologies (e.g. LSIDs Life Science
Identifiers) - New initiatives (e.g. Global Biodiversity
Information Facility - GBIF), International Plant
Names Index (IPNI), ZooBank (register of animal
names) - Standards under development, not finalized
- Lots of content being digitized
- BUT the technology is only part of the problem
- Taxonomy has often been a solitary activity
- Creating a model of collaborative, incremental
activity on-line is a huge challenge - And revisionary content is needed in a climate
where taxonomists who really know their groups
appear to be diminishing.
26Why CATE and other projects?
The means
The need
Collections Expertise
Biodiversity loss
collections
literature
Technology
Funds
Can we do the sociology?
27Website with static pages now live
at www.cate-project.org Demonstration by Ben
Clark