Title: Ontologies in Bioinformatics
1Ontologies in Bioinformatics
- Robert Stevens
- Department of Computer Science
- University of Manchester
- Robert.stevens_at_cs.man.ac.uk
2Introduction
- What is knowledge?
- What is an ontology?
- Relationships between the two communities
- The last decade of bio-ontologiesontologies
- The future
3What is Knowledge?
man academic, senior ancient university, 5
rated European important figure in biology
B I O L O G Y
- Knowledge all information and an understanding
to carry out tasks and to infer new information - Information -- data equipped with meaning
- Data -- un-interpreted signals that reach our
senses
4Things, Symbols Concepts
- Humans require words (or at least symbols) to
communicate efficiently. The mapping of words to
things is only indirectly possible. We do it by
creating symbols that stand for things. - The relation between symbols and things has been
described in the form of the meaning triangle
5Representing Knowledge
- Language uses symbols and rules (natural
language) to communicate knowledge - Need human intelligence to deal with pragmatics
- NLP notoriously difficult
- Need to capture knowledge in a computationally
amenable manner - Ontology A conceptual model
- Ontology plus lexicon is a terminology
- Primary aim of creating a shared understanding of
a domain and the relationships within that domain
- Common symbols for the things within a domain
- Capturing domain knowledge with fidelity and
precision
6Sharing info ? Sharing meaning
- Metadata
- Data describing the content and meaning of
resources and services. - But everyone must speak the same language
- Terminologies
- Shared and common vocabularies
- For search engines, agents, curators, authors and
users - But everyone must mean the same thing
- Ontologies
- Shared and common understanding of a domain
- Essential for search, exchange and discovery
7What is an Ontology?
- Concepts Units of thought Classes and
individuals - Protein, Gene, DNA, Hexokinase, glycolysis,
- Terms Labels for concepts Protein, Gene,
- Relationships Semantic links between concepts
- Is-a-kind, is-a, part-of, name-of,
- Taxonomy backbone of ontology
8So what Counts as an ontology? Deborah
McGuinness, Stanford
General Logical constraints
Frames (properties)
Formal Is-a
Thesauri
Catalog/ ID
Disjointness, Inverse, partof
Formal instance
Informal Is-a
Terms/ glossary
Value restrictions
Arom
Gene Ontology
TAMBIS
EcoCyc
Mouse Anatomy
PharmGKB
9- The art of ranking things in genera and species
is of no small importance and very much assists
our judgment as well as our memory. You know how
much it matters in botany, not to mention animals
and other substances, or again moral and notional
entities as some call them. Order largely depends
on it, and many good authors write in such a way
that their whole account could be divided and
subdivided according to a procedure related to
genera and species. This helps one not merely to
retain things, but also to find them. And those
who have laid out all sorts of notions under
certain headings or categories have done
something very useful. -
- Gottfried Wilhelm Leibniz, New Essays on Human
Understanding
10The Gene Ontology
11Bio-Ontologies in the Past Decade
- Explicit use of ontologies fairly recent
- EcoCyc and RiboWeb using Frame Based Systems to
create knowledge bases - An area in which the CS community can test their
technology - Large, complex and dynamic
- A knowledge based discipline
- The post-genomic era encourages the need for
shared understanding - Cross-genome comparisons need structured,
controlled vocabularies - Moved from small nich to a much bigger niche
- Biologists are building ontologies
12Uses of Bio-Ontologies
- Controlled vocabularies for annotation
- Describing schema dn the content of schema
- Domain maps
- Query mechanisms
- Resolution of semantic heterogeneiety
- Text analysis.
13The Gene Ontology
- Tutorial and the first Bio-Ontologies meeting at
ISMB 1998 in Montreal - Fly, mouse and yeast get together to develop GO
- First release some 3,500 terms covering Molecular
Function, biological Process and Cellular
Component - Now some 15,000 terms and growing
- Gene Ontology Consortium covers some 15 organism
databases plus SWISS-PROT and others - Synonyms, abbreviations and associations to gene
products Access to names, genes etc. - A common understanding across a community
14GO DAG for heparin biosynthesis
- GO0003673 Gene_Ontology (46199)
GO0008150 biological_process
(30188) GO0008151 cell growth
and/or maintenance (20547)
GO0008152 metabolism (14693)
GO0016051 carbohydrate metabolism
(267) GO0006023
aminoglycan metabolism (18)
GO0030203 glycosaminoglycan metabolism -
GO0030202 heparin metabolism (3) -
GO0030210 heparin biosynthesis (3)
15Open bio-Ontologies (OBO)
- Go, though large, is narrow
- Sequence Ontology
- Chemical Ontology
- Promotes a common ontology format, tools and
house-style - Micro-array community a further boost avoiding
mistakes of previous bioinformatics resources - Need ontolgoies for phenotype, tissues,
anatomies, etc.
16Two Communities
Computer Scientists Building ontologies KR Reasoni
ng
Biologists Ontology content Domain Knowledge
Better Ontologies
17What are We Saying?
Person
is-a
is-a
Woman
Man
- Are all instances of Man instances of Person?
- Can an instance of Person be both a Man
- and an instance of Woman?
- Can there be any more kinds of Person?
18This Years Meeting
- A theme of text analysis and ontology
- First time talks have matched theme
- Ontologies and indexing
- Integrating ontologies into NLP systems
- Ontologies in information retrieval
- Developing terminologies
- GO in NLP
- New Ontologies
- Semantic Similarity
19Opportunities
- Ontologies to help text analysis
- Text analysis to help build ontologies
- Biology community steadily building a large
number of large domain ontologies - CS community can help build computationally
amenable ontologies - Vast quantities of domain knowledge in natural
language forms in literature and databanks - Opportunities for language and ontology
communities