Title: Judith Blake
1Functional Genome Annotation Using Controlled
Vocabularies
- Judith Blake
- The Jackson Laboratory
- Bar Harbor, ME USA
2Premise of Talk
- Development of bio-ontologies has been largely
driven by functional annotation needs
- Structured vocabularies and ontologies support
semantic integration and promote knowledge
discovery within and between biological
annotation systems
3Mouse Genome Informatics (MGI)community
informatics resource for the laboratory mouse
Objective Facilitate the use of the mouse as a
model for human biology by furthering our
understanding of the relationship between
genotype and phenotype.
4Model Organism Databases
5(No Transcript)
6TCTCTCCCCCGCCCCCCAGGCTCCCCCGGTCGCTCTCCTCCGGCGGTCGC
CCGCGCTCGGTGGATGTGGC TGGCAGCTGCCGCCCCCTCCCTCGCTCGC
CGCCTGCTCTTCCTCGGCCCTCCGCCTCCTCCCCTCCTCCT TCTCGTCT
TCAGCCGCTCCTCTCGCCGCCGCCTCCACAGCCTGGGCCTCGCCGCGATG
CCGGAGAAGAGG CCCTTCGAGCGGCTGCCTGCCGATGTCTCCCCCATCA
ACTACAGCCTTTGCCTCAAGCCCGACTTGCTGG ACTTCACCTTCGAGGG
CAAGCTGGAGGCCGCCGCCCAGGTGAGGCAGGCGACTAATCAGATTGTGA
TGAA TTGTGCTGATATTGATATTATTACAGCTTCATATGCACCAGAAGG
AGATGAAGAAATACATGCTACAGGA TTTAACTATCAGAATGAAGATGAA
AAAGTCACCTTGTCTTTCCCTAGTACTCTGCAAACAGGTACGGGAA CCT
TAAAGATAGATTTTGTTGGAGAGCTGAATGACAAAATGAAAGGTTTCTAT
AGAAGTAAATATACTAC CCCTTCTGGAGAGGTGCGCTATGCTGCTGTAA
CACAGTTTGAGGCTACTGATGCCCGAAGGGCTTTTCCT TGCTGGGATGA
GCCTGCTATCAAAGCAACTTTTGATATCTCATTGGTTGTTCCTAAAGACA
GAGTAGCTT TATCAAACATGAATGTAATTGACCGGAAACCATACCCTGA
TGATGAAAATTTAGTGGAAGTGAAGTTTGC CCGCACACCTGTTATGTCT
ACATATCTGGTGGCATTTGTTGTGGGTGAATATGACTTTGTAGAAACAAG
G TCAAAAGATGGTGTGTGTGTCCGTGTTTACACTCCTGTTGGCAAAGCA
GAGCAAGGAAAATTTGCGTTAG AGGTTGCTGCTAAAACCTTGCCTTTTT
ATAAGGACTACTTCAATGTTCCTTATCCTCTACCTAAAATTGA TCTCAT
TGCTATTGCAGACTTTGCAGCTGGTGCCATGGAGAACTGGGGCCTTGTTA
CTTATAGGGAGACT GCATTGCTTATTGATCCAAAAAATTCCTGTTCTTC
ATCCCGCCAGTGGGTTGCTCTGGTTGTGGGACATG AACTCGCCCATCAA
TGGTTTGGAAATCTTGTTACTATGGAATGGTGGACTCATCTTTGGTTAAA
TGAAGG TTTTGCATCCTGGATTGAATATCTGTGTGTAGACCACTGCTTC
CCAGAGTATGATATTTGGACTCAGTTT GTTTCTGCTGATTACACCCGTG
CCCAGGAGCTTGACGCCTTAGATAACAGCCATCCTATTGAAGTCAGTG T
GGGCCATCCATCTGAGGTTGATGAGATATTTGATGCTATATCATATAGCA
AAGGTGCATCTGTCATCCG AATGCTGCATGACTACATTGGGGATAAGGA
CTTTAAGAAAGGAATGAACATGTATTTAACCAAGTTCCAA CAAAAGAAT
GCTGCCACAGAGGATCTCTGGGAAAGTTTAGAAAATGCTAGTGGTAAACC
TATAGCAGCTG GTTTCTGCTGATTACACCCGTGCCCAGGAGCTTGACGC
CTTAGATAACAGCCATCCTATTGAAGTCAGTG TGGGCCATCCATCTGAG
GTTGATGAGATATTTGATGCTATATCATATAGCAAAGGTGCATCTGTCAT
CCG AATGCTGCATGACTACATTGGGGATAAGGACTTTAAGAAAGGAATG
AACATGTATTTAACCAAGTTCCAA CAAAAGAATGCTGCCACAGAGGATC
TCTGGGAAAGTTTAGAAAATGCTAGTGGTAAACCTATAGCAGCTG
From the birth of the field of genetics until a
decade ago, it was generally assumed that the
parental origin of a gene could have no effect on
its function. In the vast majority of studies
carried out during the last 90 years, this
paradigm has appeared to hold true. However, with
increasingly sophisticated genetic and
embryological investigations in the mouse,
important exceptions to this rule have been
uncovered over the last decade. First, the
results of nuclear transplantation experiments
carried out with single-cell fertilized embryos
have demonstrated an absolute requirement for
both a maternally-derived and a
paternally-derived pronculeus to allow full-term
development (McGrath and Solter, 1983). Second,
in animals that receive both homologs of certain
chromosomes or subchromosomal regions from one
parent and not the other (through the mating of
translocation heterozygotes as described in
Section 5.2.3), dramatic effects on development
can be observed including enhanced or retarded
growth and outright lethality (Cattanach and
Kirk, 1985). Third, either of two deletions that
cover a small region of mouse chromosome 17 can
be transmitted normally from a father to his
offspring, but these same deletions cause
prenatal lethality when they are maternally
transmitted (Johnson, 1974 Winking and Silver,
1984). Fourth, similar parent-of-origin effects
have been observed on the phenotypes expressed by
animals that carry a targeted knock-out allele at
the Igf2 locus (DeChiara et al., 1991). Finally,
molecular techniques have been used to directly
demonstrate the expression of transcripts from
one parental allele and not the other at the
Igf2r locus (Barlow et al., 1991) and the H19
locus (Bartolomei et al., 1991). The accumulated
data indicate that a subset of mouse genes (on
the order of 0.2) will function differently in
normal embryos depending on whether they have
been inherited through the male or the female
gamete, such that one allele will be expressed
and the other will be silent. Genomic imprinting
is the term that has been coined to describe this
situation in which the phenotype expressed by a
gene varies depending on its parental origin
(Sapienza, 1989). Further experiments have
demonstrated that, in general, the "imprint" is
erased and regenerated during gametogenesis so
that the function of an imprintable gene is fully
determined by the sex of its progenitor alone,
and not by earlier ancestors.
7Annotation Pipeline
Literature Loads
- Data Acquisition
- Object Identity
- Standardizations
- Data Associations
- Integration with other bioinformatics resources
New Gene, Strain or Sequence?
Controlled Vocabularies
Evidence Citation
Co-curation of shared objects and concepts
8Multiple Keyword Sets in MGI aid integration and
queries
- Strain Nomenclature
- Gene Nomenclatue
- Allele Nomenclature
- Gene/Marker Type
- Allele Type
- Assay Type
- Expression
- Mapping
- Evidence Codes
- Tissue
- Cell Lines
- Units
- Cytogenetic
- Molecular
- ES Cell Line
- Molecular Mutation
- Inheritance Mode
9But, keyword lists are not enough
Anatomy keywords
- Sheer number of terms too much to remember and
sort - Need standardized, stable, carefully defined
terms - Need to describe different levels of detail
- Sodefined terms need to be related in a
hierarchy - With structured vocabularies/hierarchies
- Parent/child relationships exist between terms
- Increased depth -gt Increased resolution
- Can annotate data at appropriate level
- May query at appropriate level
Organ system Cardiovascular system Heart
10Example of biological detail
Cell type neuron
Process cell migration
Anatomical part spinal cord
Organism age 15.5dpc mouse
11(No Transcript)
12Vocabularies in MGI GO Example
13Ednrb
14(No Transcript)
15Orthology Record Ednrb
16Retrieve all genes that are associated with a
locus on chromosome 14 and that contribute to the
process of neural crest cell migration and whose
gene products contain endothelin B receptor
domains
17GO Curation Strategies in MODs
- Manual Curation
- Emphasis on Primary Literature
- Over 90,000 references
- Three curators
- Computational
- Collaborations between InterPro and SwissProt to
integrate objects and assign GO terms - E.C. mappings
- RIKEN pipeline
18GO Annotations tell us
19GO Annotations tell us
20GO Annotations tell us
21GO Annotations tell us
22Similarity assertions used to Annotate to GO
Riken example??
23GO Repository
SGD
FlyBase
MGI
TAIR
WormBase
RGD
Gramene
ZFIN
TIGR microbes
GO at EBI
Sanger Pathogens
24Structured Vocabularies in MGI
- Nomenclatures
- Anatomies
- GO
- Molecular function,
- Biological process,
- Cellular component
- Phenotypes MP
- Disease Models
- SO Sequence Ontology
25Anatomical Browser
26Mammalian Phenotype Ontology
27(No Transcript)
28Controlled Vocabularies and Ontologies
- support uniform data encoding (level 1)
- provide semantic integrations
- enhance searchibility and retrieval (level 2)
- enable data comparison and analysis (level 3)
- support multiple views (DAG) (level 4)
- support formal definitions and reasoning (level 5)
Chris Wroes escalator
29Summary
- Structured vocabularies and ontologies support
semantic integration for the MGI system and
promote broader integration of mouse knowledge - To meet community needs, practical
implementations parallel formal ontological
development - The Gene Ontology project precipitated a
generalized implementation for structured
vocabularies and ontologies in MGI - The Mouse Genome Informatics group continues its
strong interest and participation in community
bio-ontology efforts
30Acknowledgments - GO
EBI/GO FlyBase Michael Ashbruner Midori
Harris Jane Lomax Amelia Ireland Rebecca
Folgar Jen Clark
Reactome Gramene Lincoln Stein Ewan Birney
Jackson Lab MGI David Hill Harold Drabkin Martin
Ringwald Joel Richardson Alex Diehl Ln Ni
Sanger Pathogen/S.pombe Genomes Matt
Berriman Arnaud Kerhornou Val Wood
TIGR Microbial Genomes,Arabidopsis Michelle
Gwynn-Giglio Linda Hannick
Berkeley-BDGP Suzanna Lewis John
Day-Richter Chris Mungall Karen Eilbeck
EBI-SWISS-PROT Rolf Apweiler Evelyn
Cameron Daniel Barrell Emily Dimmer
Zebrafish Doug Howe
CalTech-WormBase Paul Sternberg Kimberley
DeVore Rahane Kisore
Stanford-SGD Mike Cherry Karen Christies Rama
Balakrishnan
UChicago-DictyBase Rex Chisholm Pascal
Gaudet Petra Fey
Carnegie Inst.-TAIR Sue Rhee Tanya
Beradini Suparna Mundodi
UWisc-RGD Simon Twillinger Victoria Petri
31Acknowledgments - MGI
Richard Baldarelli David Shaw Dirck Bradt
Paul Szauter Bob Sinclair John
Boddy Deborah Reed Diane Dahman Sophia
Zhu Lucette Glass Donnie Qi Janice Ormsby Ken
Frazer Susan McCutchan Bob Sinclair Debbie
Krupke Dale Begley Dieter Naf Ingeborg
McCright Matthew Vincent Terry
Hayamizu Theresa Alio Connie Smith Jackie Finger
Sharon Cousins Matt Baya Jeff Campbell
Josh Winslow Jill Lewis Iry Witham David
Miers Mike McCrossin Danial Modrusan Lesley
Trombley Michael Walker Jon Beal
Janan Eppig Joel Richardson Martin
Ringwald Carol Bult Jim Kadin David
Hill Harold Drabkin Lori Corbani Li Ni
Alex Diehl Mary Dolan Lois Maltais
Lindsay McClellan Pat Grant Nancy Butler
Cynthia Smith Donna Burkart Ira Lu
Howard Dene Megan Cassell Moyha
Lennon-Pierce Linda Washburn
32Gene Ontology Consortium www.geneontology.org Mou
se Genome Informatics www.informatics.jax.org
- MGI is supported by NHGRI, NICH, and NCI.
- The GO Consortium is supported by the NHGRI and
by the - European Union RTD Programme.
33October 2004_MGI