Title: Data%20and%20ontology%20integration%20issues%20in%20the%20biosciences
1Data and ontology integration issues in the
biosciences
Presentation at the Micro-Array Department,
University of Amsterdam 23-8-2004
- Marijke Keet
- Napier University, 10 Colinton Road, Edinburgh
EH10 5DT m.keet_at_napier.ac.uk / marijke_at_meteck.org
2Overview presentation
- Data integration ontology
- Ontologies
- kinds, formalisation, bias bioscience after
the break - Ontology integration
- categorisation, some challenges
3Overview presentation
- Data integration ontology
- Ontologies
- kinds, formalisation, bias bioscience after
the break - Ontology integration
- categorisation, some challenges
4Data heterogeneity
- Schematic
- data type, labelling, aggregation,
generalisation - Semantic
- naming, scaling and units, confounding
- Intensional
- domain, integrity constraints
Based on Goh (1996)
5Integrating data
e.g. DB1 has attribute name colour and value
green and DB2 with color and 2DE60E
Data is different, but the conceptualisation is
the same. Capture this agreement in an
ontology. Shorthand specification of a shared
conceptualisation (Gruber), but better An
ontology is a logical theory accounting for the
intended meaning of a formal vocabulary, i.e. its
ontological commitment to a particular
conceptualisation of the world. The intended
models of a logical language using such a
vocabulary are constrained by its ontological
commitment. An ontology indirectly reflects this
commitment (and the underlying conceptualisation)
by approximating these intended models.
(Guarino, 1998).
6Overview presentation
- Data integration ontology
- Ontologies
- kinds, formalisation, bias bioscience after
the break - Ontology integration
- categorisation, some challenges
7Kinds of ontologies
- Representation ontologies conceptualisations
that underlie knowledge representation
formalisms. - Top-level ontologies generic and intermediate
ontology concepts. This can be on top of a domain
ontology or as stand-alone effort main aspect is
domain independence. - Generic ontologies consist of the general,
foundational aspects of a conceptualisation (a
lower branch in a top-level) - Intermediate ontologies are slightly more
tailored towards a conceptualisation of a
specific domain. There may not be references to
generic ontologies. - Domain ontologies specialize in a subset of
generic ontologies in a domain or sub-domain. - Application ontologies () the UoD is even
narrower than a domain ontology.
8Levels of formalisation (1-2)
Catalogue of normalised terms is a simple list
without inclusion order, axioms or
glosses. Glossed catalogue a catalogue with
natural language glossary entries, e.g. a
dictionary of medicine. Prototype-based ontology
types and subtype are distinguished by prototypes
rather than definitions and axioms in a formal
language Taxonomy is a collection of concepts
having a partial order induced by inclusion.
Axiomatised taxonomy as taxonomy, but then with
axioms and stated in a formal language. Context
library / axiomatised ontology a set of
axiomatised taxonomies with relations among them,
like the inclusion of one context into another
one, or the use of a concept from one in the
other one.
Lightweight ontologies
Informal ontology
Semi-formal ontology?
Heavyweight ontologies
Formal ontology
9Formalisations (2-2)
game athletic game court game tennis
outdoor game field game football
game(x) ? activity(x) athletic game(x) ?
game(x) court game(x) ? athletic game(x) ? ?y.
played_in(x,y) ? court(y) tennis(x) ? court
game(x) double fault(x) ? fault(x) ? ?y.
part_of(x,y) ? tennis(y)
tennis football game field game court
game athletic game outdoor game
Axiomatized theory
Taxonomy
game NT athletic game NT court game RT
court NT tennis RT double fault
Glossary
DB/OO scheme
Catalog
Thesaurus
Ontological precision
precision the ability to catch all and only the
intended meaning (for a logical theory, to be
satisfied by intended models)
Gangemi (2004)
10Overview presentation
- Data integration ontology
- Ontologies
- kinds, formalisation, bias bioscience after
the break - Ontology integration
- categorisation, some challenges
11Ontology integration (1-4)
- Combining different conceptualisations (views on
reality) somehow. - System, language/syntax, structure, and semantic
integration. Latter most difficult. - Structure and/versus semantic integration example
- Anarchy of terminology, definitions and
methodologies (now at least 24 terms and 48
definitions methodologies) - Organise into levels of integration. Develop
taxonomy of ontology integration?
12Ontology integration (2-4)
Example structure/semantics
back
13Ontology integration (3-4)
Initial categorisation
Increase in (perceived) difficulty of operation
Unification, total compatibility, merging
similar subject domains
Merging different subject domains, partial
compatibility
Mapping, approximations, helper model, alignment,
intersection ontology
Extending, incremental loading
Use in/for applications
Queried ontologies, hybrid ontologies
Increase in level of integration
14Ontology integration (4-4)
Some challenges
- (In)formal ontologies
- (In)consistencies in ontology design decisions
during development (relationships) detail - Top-down versus/combined with bottom-up
- Using foundational aspects in ontology
development decreases the chance of design
inconsistencies and facilitates integration - Subject domain heterogeneity example
- Conflicting goals
- More conflicts and mismatches here
15(In)consistencies in ontology design decisions
(1-2)
- Subsumption versus instantiation if A isA B,
then all instances of A are also instances of B.
The latter says a instanceOf A, i.e. a is an
individual (particular, instance) and not a
subtype of A. - Desiderata to create the hierarchy. Like keeping
function, structure, process separate. - E.g. the OBO phenotype ontology does not
- attribute\excretory_function PATO00300204
- attribute\urination PATO00305204
- attribute\urine_composition PATO00301204
16(In)consistencies in ontology design decisions
(2-2)
- E.g. aseptate hypha isa hypha aseptate hypha
without cross walls and hypha in mycelium isa
hypha. Former is about a special kind of hypha,
the latter takes topology as distinction for
subtyping -gt are distinct factors though treated
as a same kind of isA where in the FAO hypha
subsumes both. - Allow multiple inheritance - or not?
- partOf such as parthood, proper parthood,
connection, external connection, tangential
parthood, interior parthood, partial coincidence
and located-in (see e.g. Smith and Rosse, 2004
Donnelly, 2004) - Properties and meta properties (see Guarino and
Welty (2000) for details)
back
17Conflicts and mismatches
- Factors affecting ontology
combination tasks - Practical problems finding matchings, diagnosis
repeatability, software usability, social
factors of cooperation, goals - Mismatches between ontologies
- - language level
- syntax, logical representation, semantics of
primitives, language expressivity, precision - - ontology level
- - conceptualisation
- content/UoD, concept scope, relationship
scope, context, aggregation, accuracy - - explication
- terminological hyper-/hyponyms
(generalization), homonyms, synonyms - modelling style paradigm, entity/concept
description - encoding
- Versioning identification, traceability,
translation
18Overview presentation
- Data integration ontology
- Ontologies
- kinds, formalisation, bias bioscience
- Ontology integration
- categorisation, some challenges
19Ontologies for bioscience (1-3)
Theory (3)
Formation of a theory
Formation of hypothesis
Explanation
Confirmation
New empirical axioms/laws (universal) (4)
Empirical axioms/laws (universal) (2)
Confirmation
Prediction
Induction, confirmation
Prediction
Facts with an empirical basis (1)
20Ontologies for bioscience (2-3)
- Ontologies as engineering artefacts
- - Facilitate knowledge reuse, interoperability
- Modelling practice
- Another item in the problem-solvers toolbox
- Part of a new/improved software system
- - SW tools for ontology development,
maintenance, integration - Ontologies embedded in science
- - Top-level ontologies
- Attempt to understand, what/why
- - W.r.t. bioscience
- Co-defining concepts?
- Part of falsification paradigm and steps 2, 3
of standard view - -gt synergy, mutually beneficial process, but
21Ontologies for bioscience (3-3)
- The very essence of scientific progress is
change, redefinition and creation of new
concepts. - -gt ontology subject to (extensive)
modification. Complicates integration - Concepts underspecified, hypotheses and theories
exist simultaneously. - -gt accommodate this in an ontology? E.g. a
library of alternative views ontologies, with
loose coupling instead of integration? - -gt capture what is, what can be, (and what
might be?) - Biological data is more complicated than
technological and practice data. - -gt more here
- Systems Thinking, integrative concepts, holism
and process-orientation contradict with
objectifying knowledge in ontologies - -gt interdisciplinary work of ontologists
with scientists - Empiricism and the theoretical methodology in
life sciences. - -gt bottom-up resp. top-down procedures for
ontology development
22Formalising biological knowledge
- Challenging biological data characteristics
detail - Are these aspects real challenges, or due to
limited expressiveness of non-formal approaches
and software modelling paradigms (ER, OO, ), or
maybe due to limited knowledge of both the
domain expert and ontologist? - Applied sciences within bio (medicine, ecology,
environmental sciences), contexts detail
back
next
23Main biological data characteristics
back
24Applied bioscience
Emphasis core sciences All-inclusive
comprehensive models
Emphasis applied bioscience Conceptually
representing the integration
of various core disciplines, Only what is
relevant in limited context example
back
25Example applied bioscience
back
26References and more info (1-2)
- Donnelly, M. (2004). On parts and holes the
spatial structure of the human body. MEDINFO
2004, San Francisco, USA. - Gangemi, A. (2004). Some design patterns for
domain ontology building and analysis. Manchester
15-16 January. www.loa-cnr.it/Tutorials.html - Goh, C.H. (1996). Representing and reasoning
about semantic conflicts in heterogeneous
information sources. PhD, MIT. - Guarino, N. (1998). Formal Ontology and
Information Systems. In Formal Ontology in
Information Systems, Proceedings of FIOS'98,
Trento, Italy, Amsterdam IOS Press. - Guarino, N. and Welty, C. (2000). A formal
ontology of properties. Proceedings of 12th Int.
Conf. on Knowledge Engineering and Knowledge
Management, Lecture Notes on Computer Science,
Springer Verlag. - Keet, C.M. (2004). Ontology development and
integration for the biosciences. Technical
Report, Napier University, Edinburgh, UK.
www.meteck.org/research.html - Smith, B. and Rosse, C. (2004). The role of
foundational relations in the alignment of
biomedical ontologies. Proceedings of MEDINFO,
San Francisco, USA.
27References and more info (2-2)
- Some websites with different perspectives/aims/inf
ormation on ontologies - LOA www.loa-cnr.it
- IFOMIS www.ifomis.de
- Ontology www.ontology.org
- Formal Ontology www.formalontology.it
- RE Kent www.ontologos.org
- WonderWeb project http//wonderweb.semanticweb.or
g - JF Sowa www.jfsowa.com/ontology/index.htm
- SUMO http//ontology.teknowledge.com/
- AAAI page http//www.aaai.org/AITopics/html/ontol
.html - Links to a few of groups developing tools
- KAON http//km.aifb.uni-karlsruhe.de/kaon2,
Protégé http//protege.stanford.edu, VU
http//www.cs.vu.nl/, STARLab www.starlab.vub.ac.b
e/default.htm
28Thank you!