Title: SO meets RNAO
1SO meets RNAO
- Karen Eilbeck
- University of Utah
- RNAO Consortium Meeting
- May 28-29 2007
2- What SO is.
- How SO is used
- How SO is managed
- Where do SO and RNAO meet
- How SO and RNAO can work together
- If we have time - a demo of OBO-Edit
3The Sequence Ontology describes the features of
biological sequence
- Genome sequence
- Annotation of regions
- Coordinates
- Need to agree on meaning of terms. E.g. Does the
CDS include the stop codon?
4An annotation captures what we know about a gene
3 Alternate transcripts of Glut1 gene
evidence
Annotations
Start codon
5 UTR
Coding exon
Transposon within intron
5Structure of the ontology
- SO is structured into a directed acyclic graph.
exon
transcript
intron
processed transcript
polyA site
primary transcript
clip
splice site
protein coding primary transcript
nc primary transcript
CDS
mRNA
ncRNA
UTR
tRNA
rRNA
five_prime_UTR
three_prime_UTR
6GFF3
- SO is used to type the features and
relationships.
Id type start end strand
attributes ctg123 . gene 1000 9000 .
. IDgene00001NameEDEN ctg123 .
TF_binding_site 1000 1012 . .
IDtfbs00001Parentgene00001 ctg123 . mRNA
1050 9000 . . IDmRNA00001Parentg
ene00001NameEDEN.1 ctg123 . mRNA
1050 9000 . . IDmRNA00002Parentgene00001
NameEDEN.2 ctg123 . mRNA 1300
9000 . . IDmRNA00003Parentgene00001Name
EDEN.3 ctg123 . exon 1300 1500 .
. IDexon00001ParentmRNA00003 ctg123 .
exon 1050 1500 . .
IDexon00002ParentmRNA00001,mRNA00002 ctg123 .
exon 3000 3902 . .
IDexon00003ParentmRNA00001,mRNA00003 ctg123
. exon 5000 5500 . .
IDexon00004ParentmRNA00001,mRNA00002,mRNA00003
ctg123 . exon 7000 9000 . .
IDexon00005ParentmRNA00001,mRNA00002,mRNA00003
terms
relationships
7Why we made SO
- Standardize vocabulary used in genomics.
- Clarify the relationships between the terms.
- Make genomics data more computable by adding
semantics to the sequence. Its not just about
sequence similarity.
8What is the scope of SO?
- Features that can be located on a sequence with
coordinates. exon, promoter, binding_site - Properties of these features
- Sequence attributes
- Maternally_imprinted
- Consequences of mutation
- mutation_affecting_editing
- Chromosome variation
- aneuploid
9The SO community
- Model Organism DB
- SGD
- (MGI)
- FlyBase
- WormBase
- DictyBase
- Pombe
- GMOD
- Comparative genomics
- MGED Ontology
- NLP
10Genome annotation unification
- The model organism databases use SO to type their
features. - The GFF3 file format for annotation, the Chado db
schema and DAS2 annotation protocol rely on SO to
type features.
11Genomic analysis
- The Comparative Genomics Library written in Perl
uses SO based annotations to perform complex
analysis over multiple genomes. - Yandell M, Mungall CJ, Smith C, Prochnik S,
Kaminker J, Hartzell G, Lewis S, Rubin GM. 2006.
Large-Scale Trends in the Evolution of Gene
Structures within 11 Animal Genomes. PLoS Comput
Biol. 2e15
12Genome data integration
- Multiple genomes are organized using SO
- Flymine,
- Gramene,
- the BRCs
13NLP/text mining
- Recently SO have been used for some new projects
- - Semantic enrichment by the Royal Society of
Chemistry. - Anaphora resolution by the NLIP group in
Cambridge.
14How SO is managed
- SO uses CVS to manage and version the ontology.
- There is a mailing list for developers to get
things off their chest. - There is a tracker for term suggestions
- There are workshops when we get a critical mass
for a given problem. We want to do more
workshops. - SO is expressed in OBO format.
15Example of OBO format
- http//www.geneontology.org/GO.format.obo-1_2.shtm
l
Term id SO0000587 name group_I_intron def
"Group I catalytic introns are large
self-splicing ribozymes. They catalyse their own
excision from mRNA, tRNA and rRNA precursors in a
wide range of organisms. The core secondary
structure consists of 9 paired regions (P1-P9).
These fold to essentially two domains, the P4-P6
domain (formed from the stacking of P5, P4, P6
and P6a helices) and the P3-P9 domain (formed
from the P8, P3, P7 and P9 helices). Group I
catalytic introns often have long ORFs inserted
in loop regions." http//www.sanger.ac.uk/cgi-bin
/Rfam/getacc?RF00028 subset SOFA is_a
SO0000188 ! intron
16OBO and OWL
- http//purl.org/obo/owl/SO
- Mapping OBO and OWL http//www.bioontology.org/wik
i/index.php/OboInOwlMain_Page
17Navigate SO using OBO-Edit
Search the ontology
Structure of the ontology
All parents of the term
- Details for selected term
18Annotating with SO and RNAO
- AGAGGGCGAATCCAGCTCTGGAGCAGAGGCTCTGGCAGCTTTTGCAGCGT
TTATATAACATGAAATATATATACGCATTCCGATCAAAGCTGGGTTAACC
AGATAGATAGATAGTAACGTTTAAATAGCGCCTGGCGCGTTCGATTTTAA
AGAGATTTAGAGCGTTATCCCGTGCCTATAGATCTTATAGTATAGACAAC
GAACGATCACTCAAATCCAAGTCAATAATTCAAGAATTTATGTCTGTTTC
TGTGAAAGGGAAACTAATTTTGTTAAAGAAGACTTACAATATCGTAATAC
TTGTTCAATCGTCGTGGCCGATAGAAATATCTTACAATCCGAAAGTTGAT
GAATGGAATTGGTCTGCAACTGGTCGCCTTCATTTCGTAAAATGTTCGCT
TGCGGCCGAAAAATTTCGATATATCTACAATTGATCTACAATCTTTACTA
AATTTTGAAAAAGGAACACTTTGAATTTCGAACTGTCAATCGTATCATTA
GAATTTAATCTAAATTTAAATCTTGCTAAAGGAAATAGCAAGGAACACTT
TCGTCGTCGGCTACGCATTCATTGTAAAATTTTAAATTTTGACATTCCGC
ACTTTTTGATAGATAAGCGAAGAGTATTTTTATTACATGTATCGCAAGTA
TTCATTTCAACACACATATCTATATATATATATATATATATATATATATA
TATATATATATATGTTATATATTTATTCAATTTTGTTTACCATTGATCAA
TTTTTCACACATGAAACAACCGCCAGCATTATATAATTTTTTTATTTTTT
TAAAAAATGTGTACACATATTCTGAAAATGAAAAATTCAATGGCTCGAGT
GCCAAATAAAGAAATGGTTACAATTTAAGG
Translational control element
The nanos translational control element represses
translation in somatic cells by a Bearded
box-like motif.?Duchow HK, Brechbiel JL,
Chatterjee S, Gavis ER. Developmental Biology
Volume 282, Issue 1, 1 June 2005, Pages 207-217
19Overlap with RNAO
- SO provides regions of sequence - start and stop
coordinates with regards to the whole sequence -
i.e. assembly / chromosome - Transcripts and parts of transcripts
- Some secondary structure
- Some motifs
- Results of algorithms such as blast
20SO names features
21Secondary structure
- This part of SO needs work.
- Any volunteers?
22Divergent from RNAO
- Where do SO and RNAO differ dramatically?
- Multiple sequence alignments. SO does not provide
a solution to this. It does however provide the
terms to describe the results of sequence
similarity searches. - Numerical results. SO has not needed to use
values so far.
23RNAO working groups
- Motif identification/annotation
- RNA interaction
- Biochemical-structure mapping
- Multiple sequence alignment
- Backbone conformation
- Base stacking
24Working together
- Remain 2 separate ontologies.
- Give SO annotators option of importing RNAO
terms using the OBO programs - SO and RNAO work together to align key terms in
their ontologies.
25SO is still evolving
- RNAO could use the SO features to describe
regions of sequence - SO could reference RNAO for detailed annotation
of structure and biochemical features.
26Multiple ontologies in OBO
- 2 options.
- The ontologies reference each other
- Will always need to load both ontologies
- There is a mapping file that you can load to
import external terms. - Maintain separate ontologies and keep mapping up
to date.
http//obofoundry.org/wiki/index.php/Mappings
27Example Importing terms from SCOR.
- 1. Made an OBO file from a subset of SCOR terms
- 2. Work out where there is overlap
- 3. Make OBO mapping file between the two
ontologies - 4. Load all 3 files at once.
28scor.obo
mapping file
- format-version 1.2
- date 16052007 1526
- saved-by kareneilbeck
- auto-generated-by OBO-Edit 1.100
- Term
- id SC0000000
- name hairpin_loop
- Term
- id SC0000001
- name diloop
- is_a SC0000000 ! hairpin_loop
- Term
- id SC0000002
- name triloop
- is_a SC0000000 ! hairpin_loop
- format-version 1.2
- date 24052007 1037
- saved-by kareneilbeck
- import so-xp.obo
- import scor2.obo
- id SC0000015 hairpin loop
- is_a SO0000715 is_a RNA motif
- id SC0000016 internal loop
- is_a SO0000715 is_a RNA motif
- id SC0000035 tertiary interaction
- is_a SO0000122 is_a RNA sequence secondary
structure
29OBO-Edit DEMO
30Possible action items
- A SO-RNAO mailing list for discussion of
collaboration - Phone/skype/webinars at intervals to keep track
of progress.
31Resources
- GFF3 http//www.sequenceontology.org/gff3.shtml
- Apollo http//www.fruitfly.org/annot/apollo/
- SO http//www.sequenceontology.org
- OBO-Edit http//sourceforge.net/projects/geneontol
ogy - OBO foundry http//www.obofoundry.org
- GO-perl http//www.godatabase.org/dev/go-perl/doc/
go-perl-doc.html
32Acknowledgements
- SO is funded as part fo the Gene Ontology
Consortium, via the NIH P41-HG002274 - People
- Suzi Lewis and Michael Ashburner - the vision
- Chris Mungall - programming infrastructure
- John Richter - made OBO-Edit
33- keilbeck_at_genetics.utah.edu