Title: Towards a Logic Formalization of Taxonomic Concepts
1Towards a Logic Formalization of Taxonomic
Concepts
- Dave Thau, Bertram Ludäscher, Shawn Bowers
- UC Davis
- thau_at_learningsite.com
2Names are Confusing
Adapted from R. Peet
Ranunculus plumosa
R.plumosa var intermedia
R.plumosa var plumosa
Ranunculus pinetcola
Ranunculus plumosa
Ranunculus plumosa
Ranunculus homunculus
3Impact on Data Analysis
- Cant find data
- If A º B, a search on A should retrieve B
- Same if A ? B
- Cant aggregate data
- If A ? B, you should be able to combine data from
A into B
4Where In Greece Can I Find Ranunculus aquatilis?
?
R. aquatilis
R. trichophyllus
5Mapping Taxonomies
FNA-03, 1997
Benson, 1948
?
Ranunculus aquatilis
Ranunculus aquatilis
º
R.a. var aquatilis
R.a. var diffusus
R.a. var hispidulus
R.a. var capillaceus
R.a. var calvescens
º
?
º
?
This results in 512 (more than 240 million)
possible sets of relationships.
6Overview
- The problems Names change, experts disagree,
data become incomparable - The partial solution Taxonomic Concepts
- Another part of the solution Logic
- Representing taxonomy in logic
- Using the representation to detect
inconsistencies and discover new relations - Applications
7Logic, why?
- Precise modeling language
- Solid mathematical basis
- Good tools for reasoning are available
- Explicit, portable representation (not buried
in code)
8Basic Taxonomy
A
- Rooted tree
- Only Isa relations
isa
isa
B
C
B
A
C
A
In the basic taxonomy ??T???isa?T?
9Some Additional Constraints
- No empty nodes
- All nodes have at least one element
- ????T????x n(x)??n ? N, T(N,E))
- Disjointness
- The children of a node are disjoint
- ?!?T?????x n1(x) ? n2(x) ?
- ?n1 m ? E, n2
m ? E, T(N,E)) - Closed World
- A node with children is defined as the union of
those children - This ones formula is a bit long trust me
10Mapping Formulae
- Mappings between nodes in two different
taxonomies have their own??s - In the slides and proofs to come I will use these
symbols
A ? B A is included in B A ? B A includes
B A ? B A and B are equivalent
11Inferring Unstated Correspondences
Benson, 1948
Kartesz, 2004
Ranunculus arizonicus
Ranunculus arizonicus
Given º
Given ?
R.a. var chihuahua
R.a. var typicus
We can demonstrate ?
Peet, 2005 B.1948R.a.typicus is included in
K.2004R. arizonicus B.1948R. arizonicus is
congruent to K.2004R. arizonicus
12Proving New Mappings
Benson, 1948
Kartesz, 2004
A Ranunculus arizonicus
D Ranunculus arizonicus
º
?
B R.a. var chihuahua
C R.a. var typicus
? ?
Show B ? D and ?(D ? B)
13Formal Proof of Mapping
Part 1
Part 2
14Inconsistent Mapping
Benson, 1948
Kartesz, 2004
Ranunculus hydrocharoides
Ranunculus hydrocharoides
º
R.h. var natans
R.h. var stolonifer
R.h. var typicus
R.h. var stolonifer
R.h. var typicus
º
º
Peet, 2005 B.1948R.h.stolonifer is congruent
to K.2004R.h.stolonifer B.1948R.h.typicus is
congruent to K.2004R.h.typicus B.1948R.
hydrocharoides is congruent to K.2004R.
hydrocharoides
15Proving Inconsistency
Benson, 1948
Kartesz, 2004
Ranunculus hydrocharoides
Ranunculus hydrocharoides
º
R.h. var natans
R.h. var stolonifer
R.h. var typicus
R.h. var stolonifer
R.h. var typicus
º
º
16Formal Proof of Inconsistency
17Showing Inconsistency Using Popular Tools
Benson, 1948
Kartesz, 2004
Ranunculus
Ranunculus
Ranunculus petiolaris
Ranunculus petiolaris
Ranunculus macranthus
?
??
B.48R. petiolaris ? K.04R. petiolaris ? B.48R.
macranthus contradicts
B.48R. macranthus and B.48R. petiolaris are
disjoint.
Peet, 2005 B.1948R. macranthus contains
K.2004 R. petiolaris B.1948R. petiolaris is
contained by K. petiolaris
18Resolving Inconsistencies
- Trying to simultaneously satisfy no emptiness,
disjointness and the closed world - Relaxing any of these makes the mapping
consistent giving us clues to hidden truths - It turns out that Kartesz and Benson focus on
different localities.
19Inconsistent Mapping
Benson, 1948
Kartesz, 2004
Ranunculus hydrocharoides
Ranunculus hydrocharoides
º
R.h. var natans
R.h. var stolonifer
R.h. var typicus
R.h. var stolonifer
R.h. var typicus
º
º
Peet, 2005 B.1948R.h.stolonifer is congruent
to K.2004R.h.stolonifer B.1948R.h.typicus is
congruent to K.2004R.h.typicus B.1948R.
hydrocharoides is congruent to K.2004R.
hydrocharoides
20Summary
- Taxonomic Concepts are important
- Logic is a useful tool when reasoning about
mappings between taxonomies - We have the beginnings of a representation for
taxonomies - That representation can find unstated mappings
- And detect inconsistent mappings
21Future Work
- Beefing up the representation
- Formalizing more constraints, such as rank
- Working in other factors, such as locality
- Adding intelligence to tools which build
mappings - Using the representation in a workflow system to
aid data integration
22Thanks! Questions?
- We would like to acknowledge
- Bob Peet for the Ranunculus data set
- NSF, under SEEK awards 0225676, 0225665, 0225635,
and 0533368