Title: SPIRE2005
1 Local Alignment of RNA Sequences with Arbitrary
Scoring Schemes
Rolf Backofen Danny Hermelin Gad M. LandauOren
Weimann
2RNA sequences
3RNA sequences
C G
C G
G C
C
A U
U A
C G
A
G
U
A
G
U
C
G
U
A
G
U
A
C
C
A
C
A
G
U
U
G
C
G
G
4RNA sequences
C G
C G
A U
G C
C
U A
C G
A
G
U
A
G
U
C
G
U
A
G
U
A
C
C
A
C
A
G
U
U
G
C
G
G
5Alignment of Strings
S1
U C A C C G __ A __ G
S2
U C G C G G U A U G
Global Alignment
6Alignment of RNA sequences
A A G G C C C U G A U
A G A C C G U U
A
U
7Alignment of RNA sequences
A A G G C C C U G A U
A G A C C G U U
U
8Alignment of RNA sequences
A A G G C C C U G A U
A G A C C G U U
U
RNA Global Alignment via tree edit distance
SZ 1989
Theorem All these algorithms compute the edit
distance between any two arcs provided we match
these arcs.
K 1998
n
DMRW 2006
m
9The Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
Theorem There is a one to one correspondence
between all paths in the alignment graph and all
alignments of substrings of R1 and R2.
10The Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
Theorem There is a one to one correspondence
between all paths in the alignment graph and all
alignments of substrings of R1 and R2.
11The Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
12The Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
13The Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
Theorem There is a one to one correspondence
between all paths in the alignment graph and all
alignments of substrings of R1 and R2 in which
all arcs are deleted.
14The Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
15The Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
Theorem There is a one to one correspondence
between HEAVIEST paths in the alignment graph and
OPTIMAL alignments of substrings of R1 and R2.
16The Local Alignment algorithms
- We use the alignment graph to compute the local
similarity between two RNA sequences according to
two well known metrics - Smith-Waterman the highest scoring alignment
between any pair of substrings of the input RNAs. - Its normalized version.
17Standard Local Similarity (Smith-Waterman)
U C A C C G A G
U
C
- The score is computed via dynamic program
- Score(i,j)
- max
G
C
G
G
U
A
U
G
Score(i,j) Weight of the incoming edge from
(i,j),
0
Time complexity O(mn) one
run of a global algorithm
n
m
18Normalized Local Similarity
- The weakness of Smith Waterman approach AP
2001 - Solution look for the substrings (with
their arcs) that maximize - and some given value.
19Normalized Local Similarity
U C A C C G A G
U
C
G
- Define Length(k,i,j) to be the length of the
shortest path that ends at vertex (i,j) and has
weight equal to k.
C
G
G
U
- The best k/Length(k,i,j) over all i,j,k is the
normalized score.
A
U
G
20Normalized Local Similarity
Length(k-w,i,j)
- Define Length(k,i,j) to be the length of the
shortest path that ends at vertex (i,j) and has
weight equal to k.
For every k,i,j compute Length(k,i,j) min
Length(k,i,j)
Length(k-w,i,j) (j-ji-i) where w
weight of the incoming edge from (i,j)
Time complexity
one run of a global algorithm
n
m
21Open Problems
U C A C C G A G
- Arc deletion
- Improve global tree edit distance
U
C
G
C
G
G
U
A
U
G
22Muchas Gracias por la atencion