SO meets RNAO - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

SO meets RNAO

Description:

Where do SO and RNAO meet. How SO and RNAO can work together. If ... E.g. Does the CDS include the stop codon? An annotation captures what we know about a gene ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 34
Provided by: karene5
Category:
Tags: rnao | cds | meets

less

Transcript and Presenter's Notes

Title: SO meets RNAO


1
SO meets RNAO
  • Karen Eilbeck
  • University of Utah
  • RNAO Consortium Meeting
  • May 28-29 2007

2
  • What SO is.
  • How SO is used
  • How SO is managed
  • Where do SO and RNAO meet
  • How SO and RNAO can work together
  • If we have time - a demo of OBO-Edit

3
The Sequence Ontology describes the features of
biological sequence
  • Genome sequence
  • Annotation of regions
  • Coordinates
  • Need to agree on meaning of terms. E.g. Does the
    CDS include the stop codon?

4
An annotation captures what we know about a gene
3 Alternate transcripts of Glut1 gene
evidence
Annotations
Start codon
5 UTR
Coding exon
Transposon within intron
5
Structure of the ontology
  • SO is structured into a directed acyclic graph.

exon
transcript
intron
processed transcript
polyA site
primary transcript
clip
splice site
protein coding primary transcript
nc primary transcript
CDS
mRNA
ncRNA
UTR
tRNA
rRNA
five_prime_UTR
three_prime_UTR
6
GFF3
  • SO is used to type the features and
    relationships.

Id type start end strand
attributes ctg123 . gene 1000 9000 .
. IDgene00001NameEDEN ctg123 .
TF_binding_site 1000 1012 . .
IDtfbs00001Parentgene00001 ctg123 . mRNA
1050 9000 . . IDmRNA00001Parentg
ene00001NameEDEN.1 ctg123 . mRNA
1050 9000 . . IDmRNA00002Parentgene00001
NameEDEN.2 ctg123 . mRNA 1300
9000 . . IDmRNA00003Parentgene00001Name
EDEN.3 ctg123 . exon 1300 1500 .
. IDexon00001ParentmRNA00003 ctg123 .
exon 1050 1500 . .
IDexon00002ParentmRNA00001,mRNA00002 ctg123 .
exon 3000 3902 . .
IDexon00003ParentmRNA00001,mRNA00003 ctg123
. exon 5000 5500 . .
IDexon00004ParentmRNA00001,mRNA00002,mRNA00003
ctg123 . exon 7000 9000 . .
IDexon00005ParentmRNA00001,mRNA00002,mRNA00003
terms
relationships
7
Why we made SO
  • Standardize vocabulary used in genomics.
  • Clarify the relationships between the terms.
  • Make genomics data more computable by adding
    semantics to the sequence. Its not just about
    sequence similarity.

8
What is the scope of SO?
  • Features that can be located on a sequence with
    coordinates. exon, promoter, binding_site
  • Properties of these features
  • Sequence attributes
  • Maternally_imprinted
  • Consequences of mutation
  • mutation_affecting_editing
  • Chromosome variation
  • aneuploid

9
The SO community
  • Model Organism DB
  • SGD
  • (MGI)
  • FlyBase
  • WormBase
  • DictyBase
  • Pombe
  • GMOD
  • Comparative genomics
  • MGED Ontology
  • NLP

10
Genome annotation unification
  • The model organism databases use SO to type their
    features.
  • The GFF3 file format for annotation, the Chado db
    schema and DAS2 annotation protocol rely on SO to
    type features.

11
Genomic analysis
  • The Comparative Genomics Library written in Perl
    uses SO based annotations to perform complex
    analysis over multiple genomes.
  • Yandell M, Mungall CJ, Smith C, Prochnik S,
    Kaminker J, Hartzell G, Lewis S, Rubin GM. 2006.
    Large-Scale Trends in the Evolution of Gene
    Structures within 11 Animal Genomes. PLoS Comput
    Biol. 2e15

12
Genome data integration
  • Multiple genomes are organized using SO
  • Flymine,
  • Gramene,
  • the BRCs

13
NLP/text mining
  • Recently SO have been used for some new projects
    -
  • Semantic enrichment by the Royal Society of
    Chemistry.
  • Anaphora resolution by the NLIP group in
    Cambridge.

14
How SO is managed
  • SO uses CVS to manage and version the ontology.
  • There is a mailing list for developers to get
    things off their chest.
  • There is a tracker for term suggestions
  • There are workshops when we get a critical mass
    for a given problem. We want to do more
    workshops.
  • SO is expressed in OBO format.

15
Example of OBO format
  • http//www.geneontology.org/GO.format.obo-1_2.shtm
    l

Term id SO0000587 name group_I_intron def
"Group I catalytic introns are large
self-splicing ribozymes. They catalyse their own
excision from mRNA, tRNA and rRNA precursors in a
wide range of organisms. The core secondary
structure consists of 9 paired regions (P1-P9).
These fold to essentially two domains, the P4-P6
domain (formed from the stacking of P5, P4, P6
and P6a helices) and the P3-P9 domain (formed
from the P8, P3, P7 and P9 helices). Group I
catalytic introns often have long ORFs inserted
in loop regions." http//www.sanger.ac.uk/cgi-bin
/Rfam/getacc?RF00028 subset SOFA is_a
SO0000188 ! intron
16
OBO and OWL
  • http//purl.org/obo/owl/SO
  • Mapping OBO and OWL http//www.bioontology.org/wik
    i/index.php/OboInOwlMain_Page

17
Navigate SO using OBO-Edit
Search the ontology
Structure of the ontology
All parents of the term
  • Details for selected term

18
Annotating with SO and RNAO
  • AGAGGGCGAATCCAGCTCTGGAGCAGAGGCTCTGGCAGCTTTTGCAGCGT
    TTATATAACATGAAATATATATACGCATTCCGATCAAAGCTGGGTTAACC
    AGATAGATAGATAGTAACGTTTAAATAGCGCCTGGCGCGTTCGATTTTAA
    AGAGATTTAGAGCGTTATCCCGTGCCTATAGATCTTATAGTATAGACAAC
    GAACGATCACTCAAATCCAAGTCAATAATTCAAGAATTTATGTCTGTTTC
    TGTGAAAGGGAAACTAATTTTGTTAAAGAAGACTTACAATATCGTAATAC
    TTGTTCAATCGTCGTGGCCGATAGAAATATCTTACAATCCGAAAGTTGAT
    GAATGGAATTGGTCTGCAACTGGTCGCCTTCATTTCGTAAAATGTTCGCT
    TGCGGCCGAAAAATTTCGATATATCTACAATTGATCTACAATCTTTACTA
    AATTTTGAAAAAGGAACACTTTGAATTTCGAACTGTCAATCGTATCATTA
    GAATTTAATCTAAATTTAAATCTTGCTAAAGGAAATAGCAAGGAACACTT
    TCGTCGTCGGCTACGCATTCATTGTAAAATTTTAAATTTTGACATTCCGC
    ACTTTTTGATAGATAAGCGAAGAGTATTTTTATTACATGTATCGCAAGTA
    TTCATTTCAACACACATATCTATATATATATATATATATATATATATATA
    TATATATATATATGTTATATATTTATTCAATTTTGTTTACCATTGATCAA
    TTTTTCACACATGAAACAACCGCCAGCATTATATAATTTTTTTATTTTTT
    TAAAAAATGTGTACACATATTCTGAAAATGAAAAATTCAATGGCTCGAGT
    GCCAAATAAAGAAATGGTTACAATTTAAGG

Translational control element
The nanos translational control element represses
translation in somatic cells by a Bearded
box-like motif.?Duchow HK, Brechbiel JL,
Chatterjee S, Gavis ER. Developmental Biology
Volume 282, Issue 1, 1 June 2005, Pages 207-217
19
Overlap with RNAO
  • SO provides regions of sequence - start and stop
    coordinates with regards to the whole sequence -
    i.e. assembly / chromosome
  • Transcripts and parts of transcripts
  • Some secondary structure
  • Some motifs
  • Results of algorithms such as blast

20
SO names features
21
Secondary structure
  • This part of SO needs work.
  • Any volunteers?

22
Divergent from RNAO
  • Where do SO and RNAO differ dramatically?
  • Multiple sequence alignments. SO does not provide
    a solution to this. It does however provide the
    terms to describe the results of sequence
    similarity searches.
  • Numerical results. SO has not needed to use
    values so far.

23
RNAO working groups
  • Motif identification/annotation
  • RNA interaction
  • Biochemical-structure mapping
  • Multiple sequence alignment
  • Backbone conformation
  • Base stacking

24
Working together
  • Remain 2 separate ontologies.
  • Give SO annotators option of importing RNAO
    terms using the OBO programs
  • SO and RNAO work together to align key terms in
    their ontologies.

25
SO is still evolving
  • RNAO could use the SO features to describe
    regions of sequence
  • SO could reference RNAO for detailed annotation
    of structure and biochemical features.

26
Multiple ontologies in OBO
  • 2 options.
  • The ontologies reference each other
  • Will always need to load both ontologies
  • There is a mapping file that you can load to
    import external terms.
  • Maintain separate ontologies and keep mapping up
    to date.

http//obofoundry.org/wiki/index.php/Mappings
27
Example Importing terms from SCOR.
  • 1. Made an OBO file from a subset of SCOR terms
  • 2. Work out where there is overlap
  • 3. Make OBO mapping file between the two
    ontologies
  • 4. Load all 3 files at once.

28
scor.obo
mapping file
  • format-version 1.2
  • date 16052007 1526
  • saved-by kareneilbeck
  • auto-generated-by OBO-Edit 1.100
  • Term
  • id SC0000000
  • name hairpin_loop
  • Term
  • id SC0000001
  • name diloop
  • is_a SC0000000 ! hairpin_loop
  • Term
  • id SC0000002
  • name triloop
  • is_a SC0000000 ! hairpin_loop
  • format-version 1.2
  • date 24052007 1037
  • saved-by kareneilbeck
  • import so-xp.obo
  • import scor2.obo
  • id SC0000015 hairpin loop
  • is_a SO0000715 is_a RNA motif
  • id SC0000016 internal loop
  • is_a SO0000715 is_a RNA motif
  • id SC0000035 tertiary interaction
  • is_a SO0000122 is_a RNA sequence secondary
    structure

29
OBO-Edit DEMO
  • Fingers crossed

30
Possible action items
  • A SO-RNAO mailing list for discussion of
    collaboration
  • Phone/skype/webinars at intervals to keep track
    of progress.

31
Resources
  • GFF3 http//www.sequenceontology.org/gff3.shtml
  • Apollo http//www.fruitfly.org/annot/apollo/
  • SO http//www.sequenceontology.org
  • OBO-Edit http//sourceforge.net/projects/geneontol
    ogy
  • OBO foundry http//www.obofoundry.org
  • GO-perl http//www.godatabase.org/dev/go-perl/doc/
    go-perl-doc.html

32
Acknowledgements
  • SO is funded as part fo the Gene Ontology
    Consortium, via the NIH P41-HG002274
  • People
  • Suzi Lewis and Michael Ashburner - the vision
  • Chris Mungall - programming infrastructure
  • John Richter - made OBO-Edit

33
  • keilbeck_at_genetics.utah.edu
Write a Comment
User Comments (0)
About PowerShow.com