Title: GLOBAL BIODIVERSITY
1GLOBALBIODIVERSITY
INFORMATIONFACILITY
Developing Uncertainty Measures Related to
Taxonomic Determinations
Larry Speers Global Biodiversity Information
Facility Arthur Chapman Australian Biodiversity
Information Services
WWW.GBIF.ORG
2Disclaimers
- I intend to draw attention to a problem for users
with some GBIF data - I do not intend to present any finalized
recommends as to how to deal with this issue - I hope to initiate a broader discussion as to
possible solutions and I will present an example
solution to initiate this discussion
3http//www.gbif.org/prog/digit/data_quality/data_q
uality
4http//www.gbif.org/prog/digit/data_quality/data_c
leaning
5Issues with QA/QC
- Legacy Data
- Need to deal with what we have
- Data cleaning tools
- New data
- Do everything in our power to avoid the problems
we find with todays legacy data
6Data Quality
Quality as applied to data, has various
definitions but in the geographic world one
definition is now largely accepted that of
fitness for use (Chrisman 1983).
7Fitness for Use
In a database, the data have no actual quality or
value they only have potential value. That value
is realized only when someone uses the data to do
something useful (English 1999). The quality of
data cannot be assessed independently of the
users of that data (Strong et al. 1997).
8What do we mean by fitness for use?
- Fitness for use
- Does species x occur in Tasmania?
- Does species x occur in National Park y
X
Diagram Compliments Arthur Chapman
9Fitness for use
Data are of high quality if they are fit for
their intended use in operations, decision-making,
and planning. (Juran 1964)
10Exploring biodiversity data
- Organisation of biodiversity data
- By taxonomy
- By geography
- By time
Time
2006 2000 1950 1900 1800 500
Geography
Italy
Europe
Belgium
Andorra
Congo
Africa
Benin
Angola
India
India
Asia
China
Bangladesh
Taxonomy
Chordata
Chordata
Annelida Arthropoda
Ascomycota Basidiomycota Coniferophyta
Equisetophyta
Animalia Fungi Plantae
11(No Transcript)
12J. Wieczorek et al. INT. J. GEOGRAPHICAL
INFORMATION SCIENCE VOL. 18, NO. 8, DECEMBER
2004, 745767
13Arthur D. Chapman et al. 2006
14Exploring biodiversity data
- Organisation of biodiversity data
- By taxonomy
- By geography
- By time
Time
2006 2000 1950 1900 1800 500
Geography
Italy
Europe
Belgium
Andorra
Congo
Africa
Benin
Angola
India
India
Asia
China
Bangladesh
Taxonomy
Chordata
Chordata
Annelida Arthropoda
Ascomycota Basidiomycota Coniferophyta
Equisetophyta
Animalia Fungi Plantae
15Documenting Fitness for Use
- In general, error must not be treated as a
potentially embarrassing inconvenience, because
error or uncertanty provides a critical component
in judging fitness for use.
16(No Transcript)
17Problem Misidentification
During the revision of Euscelidia, a frightening
proportion of the borrowed determined material
was found to be misidentified (6273), and a
literature search in a BIOSIS Previews revealed
that the problem is widespread.
Meier Dikow Conservation Biology, Pages 478488
Volume 18, No. 2, April 2004
18Problem Misidentification
For example, of the 1522 rove beetle specimens
(Staphylinidae Coleoptera) in the Struve
collection 262 (17) were misidentified (Rose
2000), and Papp (1978) reports that for a
collection of Hungarian Lauxaniidae (Diptera) 28
of the 74 species determined and labeled by
Szilády were consistently misidentified.
Meier Dikow Conservation Biology, Pages 478488
Volume 18, No. 2, April 2004
19Problem Use of Invalid Names
In Euscelidia 13 of all borrowed specimens were
classified under an incorrect name, and for a
recent inventory of palm collections in botanical
gardens, 260 (22) of the submitted 1208 names
were synonyms and 46 (4) were invalid (Maunder
et al. 2001).
Meier Dikow Conservation Biology, Pages 478488
Volume 18, No. 2, April 2004
20Exploring biodiversity data
- Organisation of biodiversity data
- By taxonomy
- By geography
- By time
Time
2006 2000 1950 1900 1800 500
Geography
Italy
Europe
Belgium
Andorra
Congo
Africa
Benin
Angola
India
India
Asia
China
Bangladesh
Taxonomy
Chordata
Chordata
Annelida Arthropoda
Ascomycota Basidiomycota Coniferophyta
Equisetophyta
Animalia Fungi Plantae
21Documenting Taxonomic Determinations
- Several methods exist for documenting taxonomic
determinations - none are completely satisfactory - Herbarium Information Standards and Protocols for
the Interchange of Data (HISPID) - Australian National Fish Collection (1993)
- Several others restricted to one or two
institutions - Proposal four level
- Who determined the specimen and when
- What was the determination based on (type
specimen, local flora, monograph, etc.) - Level of expertise of the determiner
- What confidence did the determiner have in the
determination.
22Taxon Verification Status - proposed
Name of determiner
From Chapman (2005) Principles of Data Quality.
GBIF
23Issues with QA/QC
- Legacy Data
- Need to deal with what we have
- Data cleaning tools
- New data
- Do everything in our power to avoid the problems
we find with todays legacy data
24Taxon Verification Status - proposed
Name of determiner Date of determination Basis
of determination (e.g. compared with holotype,
used national flora)
- identified by World expert in the taxon with high
certainty - identified by World expert in the taxon with
reasonable certainty - identified by World expert in the taxon with some
doubt - identified by regional expert in the taxon with
high certainty - identified by regional expert in the taxon with
reasonable certainty - identified by regional expert in the taxon with
some doubt - identified by non-expert in the taxon high
certainty - identified by non-expert in the taxon reasonable
certainty - identified by non-expert in the taxon some doubt
- identified by the collector with high certainty
- identified by the collector with reasonable
certainty - identified by the collector with some doubt.
From Chapman (2005) Principles of Data Quality.
GBIF
25Where does this discussion fit within the TDWG
process?