Title: Gene Ontology GO
1Gene Ontology (GO)
- An Introduction
- Dev H. W. Oliver
2References
- GO For Newbies
- Introduction to the Gene Ontology Suparna
Mundodi PAG 2006 - http//www.geneontology.org/GO.teaching.resources
.shtmlpresent - GO the Gene Ontology Introduction to GO, its
purpose, structure, annotation and applications,
plus other biomedical ontologies - Pascale Gaudet Presentation to bioinformatics
graduate students - http//www.geneontology.org/GO.teaching.resources
.shtmlpresent - Building Biomedical Ontologies Jennifer Clark
EBI internal seminar - http//www.geneontology.org/GO.teaching.resources
.shtmlpresent
3Outline
- Introduction and Motivation
- Building an Ontology
- GO Annotations
- Editing The Gene Ontology
- Applications of Gene Ontology (GO)
4Introduction and Motivation
- Gene annotation system
- Controlled vocabulary that can be applied to all
organisms - Used to describe gene products
5What is in a name?
- Example What is a cell?
- Mundodi, et al
6The Importance of a Name
7The Importance of a Name
8The Importance of a Name
9The Importance of a Name
10The Importance of a Name
11The Importance of a Name
- Different names can be used to describe the same
concepts - Different concepts can be described by the same
name - Example Glucose synthesis, Glucose biosynthesis,
Glucose formation, Glucose anabolism,
Gluconeogenesis all refer to the process of
making glucose from simpler components
12What is the Gene Ontology?
- A controlled vocabulary that can be applied to
all organisms - Used to describe gene products - proteins and RNA
- in any organism - Gaudet, et al
13Ontology
- Study of being or existence
- Seeks to describe the basic categories and
relationships of being or existence - Defines entities and types of entities within its
framework - Studies the conceptions of reality
- Wikipedia
14Ontology
- Includes
- A vocabulary of terms (names for
- concepts)
- Definitions
- Defined logical relationships to each other
15Ontology Structure
- Ontologies can be represented as graphs, where
the nodes are connected by edges - Nodes concepts in the ontology
- Edges relationships between the concepts
16Ontology Structure
- The Gene Ontology is structured as a hierarchical
directed acyclic graph (DAG) - Terms can have more than one parent and zero, one
or more children - Terms are linked by two relationships
- is-a
- part-of
17Simple hierarchies (Trees)
Directed Acyclic Graphs
Single parent
One or more parents
18Directed Acyclic Graphs (DAG)
protein complex
organelle
other organelles
mitochondrion
other protein complexes
fatty acid beta-oxidation multienzyme complex
is-a part-of
19Parent-Child Relationships
The cell component term Nucleus has 5 children
20True Path Rule
- The path from a child term all the way up to its
top-level parent(s) must always be true - cell
- cytoplasm
- chromosome
- nuclear chromosome
- nucleus
- nuclear chromosome
is-a ? part-of ?
21How does GO work?
- What information might we want to capture about a
gene product? i.e. The biochemical material,
either rna or protein, resulting from expression
of a gene. The amount of gene product is used to
measure how active a gene is - What does the gene product do?
- Why does it perform these activities?
- Where does it act?
22The Three Gene Ontologies
Molecular Function
23ExampleGene Product hammer
- Function (what) Process (why)
- Drive nail (into wood) Carpentry
- Drive stake (into soil) Gardening
- Smash roach Pest Control
- Clowns juggling object Entertainment
24Molecular Function
- A single reaction or activity, not a gene product
- A gene product may have several functions
- Sets of functions make up a biological process
25Biological Process
26Cellular Component
- Where a gene product acts
27Mitochondrial membrane
28What is in a GO Term?
- term gluconeogenesis
- id GO0006094
- definition The formation of glucose from
noncarbohydrate precursors, such as pyruvate,
amino acids and glycerol.
29Areas Not Covered By GO
- No pathological processes
- No experimental conditions
- NO evolutionary relationships
- NO gene products
- NOT a system of nomenclature
30(No Transcript)
31Content Of GO
- Current term counts as of September 17, 2006 at
200 Pacific time - 20623 terms, 95.7 with definitions
- 11360 biological_process
- 1806 cellular_component
- 7457 molecular_function
- There are 1007 obsolete terms not included in the
above statistics -
- http//www.geneontology.org/GO.
downloads.shtmlont
32Outline
- Introduction and Motivation
- Building an Ontology
- GO Annotations
- Editing The Gene Ontology
- Applications of Gene Ontology (GO)
33Clark et al., 2005
34Clark et al., 2005
35Synonyms
- classes GO terms, types, kinds, universals
- instances annotated gene product attributes,
tokens, individuals, particulars
36A Deeper Look at Relationships
37part_of
- Represents how objects combine together to form
complex objects - E.g. Steering wheel is a part of Ford Explorer.
38How to define A is_a B
- A and B are names of universals (natural kinds,
types) in reality - All instances of A are as a matter of biological
science also instances of B
39Easy term request
- Please add
- leucophore differentiation
- erythrophore differentiation
- cyanophore differentiation
- neuron differentiation
40part_of
41Examples Of Cell Differentiation
is_a
42Circular Definitions BAD!
- What is X-Cell Differentiation?
- Differentiation of an x cell.
43Non Circular Definitions Good
- What is X-Cell Differentiation?
- The process whereby a relatively unspecialized
cell acquires specialized features of an x cell. - Here we list the characteristics of the x cell
-
44This is a term in the GO
Term id GO0030182 name neuron
differentiation namespace biological_process def
"The process whereby a relatively unspecialized
cell acquires specialized features of a neuron."
GOmah is_a GO0030154 ! cell
differentiation relationship part_of GO0048699
! neurogenesis
45This is the related term in the cell type
ontology
Term id CL0000540 name neuron def "The
basic cellular unit of nervous tissue. Each
neuron consists of a body\, an axon\, and
dendrites. Their purpose is to receive\,
conduct\, and transmit impulses in the nervous
system." MESHA.08.663 xref_analog
FBbt00005106 xref_analog FBbt00005146 is_a
CL0000393 ! electrically responsive cell is_a
CL0000404 ! electrically signaling
cell relationship develops_from CL0000031 !
neuroblast
46Once we put the info together then the term in GO
is much better and has far fewer circular
definitions
Term id GO0030182 name neuron
differentiation namespace biological_process def
"The process whereby a relatively unspecialized
cell acquires specialized features of a neuron.
The basic cellular unit of nervous tissue. Each
neuron consists of a body\, an axon\, and
dendrites. Their purpose is to receive\,
conduct\, and transmit impulses in the nervous
system." MESHA.08.663, GOmah is_a
GO0030154 ! cell differentiation relationship
part_of GO0048699 ! neurogenesis intersection_o
f is_a GO0030154 ! cell differentiation inte
rsection_of has_participant CL0000540 !
neuron
47Basis in Reality
- When building or maintaining an ontology, always
think carefully about how classes relate to
instances in reality
48(No Transcript)
49Ontology
cartoon character super power ontology
super senses super physical powers x-ray
cat super
super vision senses
leaping strength
is_a
Every genus (parent class) has an instantiated
species (differentia genus) Every genus (parent
class) has at least two children
50No instance should be annotated to two leaf terms
or two terms on the same level, but they are
here, so what is wrong?
51Cartoon Character Super Power Ontology
- Actually it is the superpowers that are
annotated, not the superheros. Once we fix that,
the rule is obeyed and the ontology is being
correctly used so is more powerful for reasoning
52Ontology
cartoon character super power ontology
super senses super physical powers x-ray
cat super
super vision senses
leaping strength
is_a
Supermans X-ray vision
Catwomans cat senses
Catwomans super strength
Supermans super leaping
53Ontology
cartoon character super power ontology
super senses super physical powers
is_a
Supermans X-ray vision
Catwomans cat senses
Catwomans super strength
Supermans super leaping
54Outline
- Introduction and Motivation
- Building an Ontology
- GO Annotations
- Editing The Gene Ontology
- Applications of Gene Ontology (GO)
55substrate O2 CO2 H20 product
56Types of GO Annotations
- Electronic Annotation
- Manual Annotation
- All annotations must
- be attributed to a source
- indicate what evidence was found to support the
GO term-gene/protein association
57Manual Annotations
- Highquality, specific gene/gene product
associations made, using - Peer-reviewed papers
- Evidence codes to grade evidence
- Very time consuming and requires trained
biologists
58Manual Annotations Methods
- Extract information from published literature
- Curators performs manual sequence similarity
analyses to transfer annotations between highly
similar gene products (BLAST, protein domain
analysis)
59Electronic Annotations
- Provides large-coverage
- High-quality
- Annotations tend to use high-level GO terms and
provide little detail
60Electronic Annotations Methods
- Database entries
- Manual mapping of GO terms to concepts external
to GO (translation tables) - Proteins then electronically annotated with the
relevant GO term(s) - Automatic sequence similarity analyses to
transfer annotations between highly similar gene
products
61Additionally
- A gene product can have several functions,
cellular locations and be involved in many
processes - Annotation of a gene product to one ontology is
independent from its annotation to other
ontologies - Annotations are only to terms reflecting a normal
activity or location - Usage of unknown GO terms
62Unknown v.s. Unannotated
- Unknown is used when the curator has determined
that there is no existing literature to support
an annotation - Biological process unknown GO0000004
- Molecular function unknown GO0005554
- Cellular component unknown GO0008372
- NOT the same as having no annotation at all
- No annotation means that no one has looked yet
63Outline
- Introduction and Motivation
- Building an Ontology
- GO Annotations
- Editing The Gene Ontology
- Applications of Gene Ontology (GO)
64How is GO maintained?
- Several full-time editors
- Requests from community
- database curators, researchers, software
developers - SourceForge tracker
- GO Consortium meetings for large changes
- Mailing lists
65(No Transcript)
66Ensuring Stability in a Dynamic Ontology
- Terms become obsolete when they are removed or
redefined - GO IDs are never deleted
- For each term, a comment is added to explains why
the term is now obsolete
67Why modify the GO?
- GO reflects current knowledge of biology
- New organisms being added makes existing terms
arrangements incorrect - Not everything perfect from the outset
68Example - parasites
69Example - parasites
- Annotation of P. falciparum
- protozoan cellular parasite
- intracellular infection (erythrocytes)
- Parasite proteins located in host nucleus
- What cellular component term to annotate to?
- nucleus refers to parasite nucleus when
annotating parasite
70Example - parasites
71Example - parasites
72Requesting changes to GO - curator requests
tracker
- Common changes suggested
- new term requests
- reporting errors (typos, etc)
- obsoletion/merge requests
- add synonym
- queries
- term move (change parents)
73Outline
- Introduction and Motivation
- Building an Ontology
- GO Annotations
- Editing The Gene Ontology
- Applications of Gene Ontology (GO)
74What can scientists do with GO?
- Access gene product functional information
- Find how much of a proteome is involved in a
process/ function/ component in the cell -
- Map GO terms and incorporate manual annotations
into own databases - Provide a link between biological knowledge and
- gene expression profiles
- proteomics data
75What can scientists do with GO?
- Microarray analysis / gene expression
- See http//www.geneontology.org/GO.tools for
current Microarray analysis tools
76End of Talk