Title: RNA Secondary Structure
1Predicting RNA Secondary Structures A Lattice
Walk Approach to Modeling Sequences Within the
HIV-1 RNA Structure
- Facing the Challenge of Infectious Diseases in
Africa The Role of Mathematical Modeling - University of Witswatersrand
- Johannesburg, South Africa
- September 25-27, 2006
- Asamoah Nkwanta, Ph.D.
- Morgan State University
- Nkwanta_at_jewel.morgan.edu
2TOPICS
- RNA Prediction Molecular Biology
- RNA Combinatorics
- Certain Class of Random Walks
- Matrix Theory
- Connection Between Walks RNA
- Modeling HIV-1 RNA Sequences
3RNA Secondary Structure Prediction
The Human Genome Project and related efforts
have generated enormous amounts of raw biological
sequence data. However, understanding how
biological sequences encode structural
information remains a fundamental scientific
challenge. For instance, understanding the base
pairing, or secondary structure, of
single-stranded RNA sequences is crucial to
advancing knowledge of their novel biochemical
functions. C. E. Heithsch, Combinatorics
on Plane Trees, Motivated by RNA Secondary
Structure Configuration (preprint, 2005)
4What is RNA Secondary Sequence Prediction ?
5RNA Secondary Structure Prediction
- Given a primary sequence, we want to find the
biological function of the related secondary
structure. To achieve this goal we predict
(model) its secondary structure. - Most methods predict secondary structure rather
than tertiary structure. The three dimensional
shape is important for biological function, and
it is harder to predict. -
6Molecular Biology (Cont.)
3-D structure of Haloarcula marismortui 5S
ribosomal RNA in large ribosomal subunit
7(No Transcript)
8Molecular Biology
- Central Dogma
- DNA ? RNA ? Protein
- Transcription / Translation
9Molecular Biology (Cont.)
10Molecular Biology (Cont.)
However, the "Central Dogma" has had to be
revised a bit. It turns out that you CAN go back
from RNA to DNA, and that RNA can also make
copies of itself. It is still NOT possible to go
from Proteins back to RNA or DNA, and no known
mechanism has yet been demonstrated for proteins
making copies of themselves.
11Molecular Biology (cont.)
- HIV is one of a group of atypical viruses called
retroviruses that maintain their genetic
information in the form of RNA. Retroviruses are
capable of producing DNA from RNA.
12Molecular Biology (Cont.)
13Molecular Biology (cont.)
- Ribonucleic acid (RNA) molecule Three main
categories - mRNA (messenger) carries genetic information
from genes to other cells - tRNA (transfer) carries amino acids to a
ribosome (cells for making proteins) - rRNA (ribosomal) part of the structure of a
ribosome
14Molecular Biology (cont.)
- Other types (RNA) molecules
- snRNA (small nuclear RNA) carries genetic
information from genes to other cells - miRNA (micro RNA) carries amino acids to a
ribosome (cells for making proteins) - iRNA (immune RNA) part of the structure of a
ribosome (Important for HIV studies)
15RNA Secondary Structure
- RNA secondary structures are important in many
biological processes and efficient structure
prediction can give vital directions for
experimental investigation. - B. Knudsen and J. Hein, Pfold RNA secondary
structure prediction using stochastic
context-free grammars (Nucleic Acids Research,
2003) - There are published examples involving tRNA,
rRNA, and other types of RNA
16RNA Secondary Structure (cont.)
- A ribonucleic acid (RNA) molecule consists of a
sequence of ribonucleotides (typically single
stranded) - Each ribonucleotide contains one of four bases
adenine (A), cytosine (C), guanine (G), and
uracil (U)
17Secondary Structure (cont.)
- Note U is replaced by thymine (T) in DNA
- As the molecule forms, chemical bonds join A-U
and C-G pairs, (Unstable G-U). These are called
the Watson-Crick pairs.
18Secondary Structure (cont.)
- Primary Structure The linear sequence of bases
in an RNA molecule - Secondary Structure The folding or coiling of
the sequence due to bonded nucleotide pairs A-U,
G-C - Tertiary Structure The three dimensional
configuration of an RNA molecule
19Primary RNA Sequence
- CAGCAUCACAUCCGCGGGGUAAACGCU
- Nucleotide Length, 27 bases
20Geometric Representation
- Secondary structure is a graph defined on a set
of n labeled points - (M.S. Waterman, 1978)
- Biological
- Combinatorial/Graph Theoretic
- Random Walk
21(No Transcript)
22RNA COMBINATORICS
- RNA Numbers 1,1,1,2,4,8,17,37,82,185,423,978,
- These numbers count various combinatorial objects
including RNA secondary structures of length n.
23(No Transcript)
24RNA COMBINATORICS (cont.)
- The number of RNA secondary structures for the
sequence 1,n is counted by the coefficients of
s(z) - Coefficients of the formal power series
- (1,1,1,2,4,8,17,37,82,185,423,978,)
25 RNA COMBINATORICS (cont.)
- The number of lattice paths with unit steps R
(right), U (up) D (down) that go from (0,0),
remain in the first quadrant of the coordinate
plane, and return to the x-axis under the
restriction that there are never consecutive UD
steps is the nth RNA number - (1,1,1,2,4,8,17,37,82,185,423,978,)
26 RNA COMBINATORICS (cont.)
- The number of RNA sequences of length n that can
be formed over the words A,U,G,C such that the
letters A U are not adjacent is equal to - What a remarkable formula for an integer, when n
1 we get 4, and n 2 we get 14.
27Counting Sequence Database
- The On-line Encyclopedia of Integer Sequences
http/www.research.att.com/njas/sequences/index.ht
ml - N.J.A. Sloane S. Plouffe, The Encyclopedia of
Integer Sequences, Academic Press, 1995.
28RNA EQUATIONS
29RNA EQUATIONS (cont.)
- Generating Function
- 1,1,1,2,4,8,17,37,82,185,423,978,
30RNA EQUATIONS (cont.)
31RNA EQUATIONS (cont.)
- s(n,k) is the number of structures of length n
with exactly k base pairs For n,k gt 0,
32RNA EQUATIONS (cont.)
- Asymptotic Estimate As n grows without bound
33Random Walk
- A random walk is a lattice path from one point to
another such that steps are allowed in a discrete
number of directions and are of a certain length
34RNA Walk Type I
- NSE Walks Unit step walks starting at the
origin (0,0) with steps up, down, and right -
- No walks pass below the x-axis and there are no
consecutive NS steps
35RNA Walk Type I (cont.)
- N (0,1) up
- S (0,-1) down
- E (1,0) right
36Type I Walk Array (n x k)
37RNA Walk Type II
- NSE Walks Unit-step walks starting at the
origin (0,0) with steps up, down, and right such
that no walks pass below the x-axis and there are
no consecutive SN steps
38Type II Walk Array (n x k)
39Examples
- Type I ENNESNESSE
- Type II NEEENSEEES
40RNA Walk Bijection
- Theorem There is a bijection between the set of
NSE walks of length n1 ending at height k 0
and the set of NSE walks of length n ending at
height k 0. - Source Lattice paths, generating functions, and
the Riordan group, Ph.D. Thesis, Howard
University, Washington, DC, 1997
41Matrices Count Lattice Walks
- Type I Walks
- 1 0 0 0 0 0 0 -
- 1 1 0 0 0 0 0 -
- 1 2 1 0 0 0 0 -
- 2 3 3 1 0 0 0 -
- 4 6 6 4 1 0 0 -
- 8 13 13 10 5 1 0 -
- 17 28 30 24 15 6 1 -
- - - - - - - - -
- Type II Walks
- 1 0 0 0 0 0 0 -
- 1 1 0 0 0 0 0 -
- 2 2 1 0 0 0 0 -
- 4 4 3 1 0 0 0 -
- 8 9 7 4 1 0 0 -
- 17 20 17 11 5 1 0 -
- 37 41 41 29 16 6 1 -
- - - - - - - - -
The ith-jth entry corresponds to the number of
random walks of length i and ending height j.
42Type I Formation Rule (Recurrence)
43The Connection Between RNA and the Walks
- Theorem There is a bijection between the set of
RNA secondary structures of length n and the set
of NSE walks ending at height k0. - Source Lattice paths and RNA secondary
structures, DIMAC Series in Discrete Math.
Theoretical Computer Science 34 (1997) 137-147.
(CAARMS2 Proceedings)
44(No Transcript)
45HIV-1 RNA Sequence Prediction
- We want to construct a lattice walk method to
predict secondary RNA sequences that code for
regions of the SL2 and SL3 domains within the
HIV-1 5 UTR RNA molecule. - These domains are important for HIV genomic
packaging
46HIV-1 RNA Structural Components
47Components of Secondary Structure
- Base pairs
- Bulges
- Interior Loops
- End loops
- Hairpin
- Multibranch loops junctions where more than one
hairpin or more complex secondary structures are
appended.
48HIV-1 Sequence (SL2 SL3)
- The following sequence was obtained from the NCBI
website. The first 363 nucleotides were
extracted from the entire HIV-1 RNA genomic
sequence - GGUCUCUCUGGUUAGACCAGAUCUGAGCCUGGGAGCUCUCUGGCUAACUA
GGGAACCCACUGCUUAAGCCUCAAUAAAGCUUGCCUUGAGUGCUUCAAGU
AGUGUGUGCCCGUCUGUUGUGUGACUCUGGUAACUAGAGAUCCCUCAGAC
CCUUUUAGUCAGUGUGGAAAAUCUCUAGCAGUGGCGCCCGAACAGGGACC
UGAAAGCGAAAGGGAAACCAGAGGAGCUCUCUCGACGCAGGACUCGGCUU
GCUGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACUGGUGAGUACGCC
AAAAAUUUUGACUAGCGGAGGCUAGAAGGAGAGAGAUGGGUGCGAGAGCG
UCAGUAUUAAGCG - Color key
- SL2 yellow
- SL3 - red
49Known Sequence of the SL2 Domain
50Lattice Walk Model
- Start with an RNA primary sequence
- Perform RNA combinatorial analysis on the given
sequence - Connect lattice walks to the given sequence using
Type I and II walks - Calculate identified sequences to find the
minimum free energy - Predict secondary sequence
- Conduct laboratory experiments for biological
functionality
51Acknowledgments
- National Science Foundation, DIMACS, AIMS,
Burroughs Wellcome, SACEMA, WITS - MATH. Modeling 561, Graduate Students
- Collaborators Dwayne Hill, Biology Dept., MSU,
and Alvin Kennedy, Chemistry Dept., MSU