Title: RNA Folding Algorithms
1RNA Folding Algorithms
2Free energy of RNA folding
3Watson-Crick base pairing
4Nussinov Algorithm
?(i,j) is the max number of basepairs in
subsequence i..j ?(i,j) is 1 iff (i,j) form a
basepair
5Example implementation
- For j 1 to L (in ascending order)
- Set ?(j,j) 0
- If jgt1 Set ?(j-1,j) ?(j-1,j)
- If jgt2 For i j-2 to 1 (in descending order)
- Set ?(i,j)
- For k i1 to j-1 (in ascending order)
- Set ?(i,j)
Rate-limiting step
6Nussinov memory complexity
- Number of stored values ?(i,j) isusing the
result...
7Asymptotic memory complexity
- Number of stored values ?(i,j) isFor
sufficiently large L, - the L2 term will come to dominate
- the coefficient (1/2) will be irrelevant if we
compare to other powers of L (e.g. L, L3, L4)
8Asymptotic memory complexity
- Number of stored values ?(i,j) isWe write
this as O(L2)...which simply means that for some
k, L0 and L gt L0...
9Nussinov time complexity
- Number of iterations of inner k-loop is using
the results...
10Time complexity Big O notation
- Number of iterations of inner k-loop is...or
O(L3) - Much easier to estimate the asymptotic (big-O)
expression than the exact expression!
11Example what resources needed to fold HIV genome?
- Genome is 10kb in size
- i.e. 104 bases
- Memory 108 cells
- Time 1012 operations
- Each cell is 4 bytes(8 bytes on a 64-bit
machine) - Each operation is 10-7 seconds(assuming 100
cycles on a 1GHz CPU)
12Wobble non-canonical base pairs
13Base-pair stacking
p-orbital conjugation Induced polarity Van der
Waals forces
14Loop closure
15Tetraloops, etc.
16Pseudoknots
17RNA Free Energy Terms
- Sequence-dependent
- Base pairing (16 possibilities)
- Base stacking (256 possibilities)
- Stem opening, closing terms
- Tetraloops, triloops, triple-A platforms, etc.
- Length-dependent
- Loop closure
- Topology
- Stacked and nested basepairs
- Pseudoknots
18Zukers algorithm
- Let E(S) be the free energy of folding of
structure S - Zukers algorithm computes E0 maxS E(S)
- From this, one can find S0 argmaxS E(S)
- Sequence length LTime complexity O(L3)Memory
complexity O(L2)
19Zukers algorithm
?(x,y) energy of pairing x-y h(n) loop b(n)
bulge i(n) interior s stacked
Max energy (i..j) with i and/or j dangling
W(i,j) max
Max energy (i..j) given that i and j are to be
paired
loop
stacked basepair
V(i,j) max
right bulge
left bulge
interior loop
multibranched loop
20McCaskills algorithm
- Let E(S) be the free energy of folding of
structure S - McCaskills algorithm computes the partition
function Z ? exp(-E(S)/kT) - From this, one can find the Boltzmann probability
of a particular structure S P(S)
exp(-E(S)/kT) / Z - Can also find probabilities of individual
basepairs - Same complexity as Zuker algorithm
21Programs
- Without pseudoknots (Zukers algorithm)
- MFOLD
- Vienna
- With pseudoknots
- PKNOTS
- NUPACK
- Many variants (two strands of RNA, etc.)