Initial Proposal for the RNA Alignment Ontology - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Initial Proposal for the RNA Alignment Ontology

Description:

Explicitly annotate correspondences at the level where they occur ... Ryan Kennedy. Julia Goodrich. Meg Pirrung. Reece Gesumaria. Trp project: Irene Majerfeld ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 20
Provided by: robkn
Category:

less

Transcript and Presenter's Notes

Title: Initial Proposal for the RNA Alignment Ontology


1
Initial Proposal for the RNA Alignment Ontology
  • Rob Knight
  • Dept Chem Biochem
  • CU Boulder

2
What do we want to do?
  • Represent detailed structural info and other
    metadata on alignment
  • Avoid horizontal and vertical expansion
  • Explicitly annotate correspondences at the level
    where they occur

3
What do alignments look like now?
4
Why is this a problem?
5
so real alignments look like this, to shoehorn
everything into columns that are assumed to be
homologous
6
Homology is problematic
  • Fundamental problem systems that are homologous
    at one level are not necessarily homologous at
    other levels
  • E.g. bat wings and bird wings homologous as
    pentadactyl limbs, but not homologous as wings
  • Homology is hierarchical andcan partially
    overlap at any level(e.g. Griffiths 2006)

Bat forelimbs
Bird forelimbs
Frog forelimbs
Rodent forelimbs
Mammal forelimbs
Tetrapod forelimbs
Ridley Evolution 3rd ed.
7
and correspondence need not be homology at all!
  • Example from SELEX hammerhead ribozymes
    independently evolved at least three times in
    nature, and in Jack Szostak and Ron Breakers
    labs
  • However, we still want to be able to align the
    functionally equivalent sequences although there
    is not evolutionary relationship

8
So what are going to use the alignment ontology
for?
9
Use case 1 aligning rRNA
10
Problem have millions of fragments, want to
align (incl. noncanonical pairs) assign
named regions
11
Solution
  • Use existing alignment, try to fit new seqs in
  • Would be improved if we could explicitly annotate
    helices, noncanonical pairs, etc. on the sequence
    overall
  • For display, need to easily show/hide groups of
    sequences and/or regions of the sequence

12
Use case 2 SELEX
  • From large number of unaligned sequences, want to
    identify motifs like this (Majerfeld Yarus 2005)

13
How is this currently done?
  • Find regions that are similar in more sequences
    than chance
  • Group these sequences centered on the motif
  • See if the parts of the motif can be related by
    helices
  • See if anything else is reliably found by the
    motif
  • Repeat for other families and see if there are
    relationships between them
  • Group these families together, then iterate

14
e.g. here we discovered unpaired G important
15
So how do we handle all this? A proposal
  • Entities
  • sequence_region a thing that defines a set of
    bases relative to some sequence (i.e. with
    indices for each base)
  • paired_sequence_region two regions linked by
    pairs
  • helical_sequence_region two regions completely
    paired
  • base region that consists of single nucleotide
  • base_pair region that consists of two, paired
    bases
  • canonical_base_pair base pair that is cis-WW
  • loop contiguous sequence_region stretching from
    i to j such that i-1 and j1 are a base pair
  • etc. (bulge, internal_loop, junction, etc.)

16
So how do we handle all this? A proposal
  • Relationships
  • correspondence relation among set of
    sequence_regions implying all share a feature
    (with metadata about how determined)
  • homology correspondence implying continuous
    chain of descent preserving the relation
  • sequence_similarity correspondence implying
    regions are similar in primary sequence
  • two_d_structure_similarity correspondence
    implying regions are similar in 2D structure,
    i.e. nested canonical base pairs
  • secondary_structure_similarity correspondence
    implying regions are similar in secondary
    structure, i.e. incl. pseudoknots/noncanonicals
  • tertiary_structure_similarity correspondence
    implying regions are similar in 3D structure

17
So how do we handle all this? A proposal
  • Relationships
  • pairing relation that asserts that two
    sequence_regions each have parts of at least one
    base_pair that connects them
  • helical_pairing pairing that includes several
    base_pairs (not necessarily contiguous) between
    two sequence_regions
  • unbroken_helical_pairing helical_pairing that
    includes no bases in the sequence_regions that
    are not paired with the other sequence_region, in
    order
  • base_pairing pairing that connects exactly two
    bases, annotated with the Leontis-Westhof
    classification
  • More exotic uses for alignment
  • microrna_target pairing relation in which one
    member is a miRNA and the other is an mRNA
    according to SO
  • same_microrna_target a relation among a set of
    sequences that have microrna_target relation to
    the same miRNA

18
Implementation notes
  • Must be able to name regions (e.g. P3 in RNaseP)
    and subclass them (e.g. P3 in firmicutes)
  • Must be able to subclass homologies, e.g.
    homologous as wing vs. homologous as limb
  • Correspondences are all symmetric and transitive,
    so can implement as set of regions that share the
    correspondence
  • (probably) dont want to reify names of parts of
    well-known RNAs in the overall RNAO?

19
Acknowledgements
  • RNA Alignment Ontology working group
  • James. W. Brown
  • Fabrice Jossinet
  • Rym Kachouri
  • B. Franz. Lang
  • Neocles Lenotis
  • Gerhard Steger
  • Jesse Stombaugh
  • Eric Westhof
  • Other coauthors
  • Amanda Birmingham
  • Paul Griffiths
  • Franz Lang
  • Knight Lab members
  • Cathy Lozupone
  • Micah Hamady
  • Chris Lauber
  • Jesse Zaneveld
  • Jeremy Widmann
  • Elizabeth Costello
  • Jens Reeder
  • Daniel McDonald
  • Anh Vu
  • Ryan Kennedy
  • Julia Goodrich
  • Meg Pirrung
  • Reece Gesumaria
  • Trp project
  • Irene Majerfeld
  • Jana Chochosolousova
  • Vikas Malaiya
  • Matthew Iyer
  • Mike Yarus

NSF RCN grant 0443508
Write a Comment
User Comments (0)
About PowerShow.com