Sequence Alignment - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Sequence Alignment

Description:

each char of S (T) aligned with char of T (S) or space -' in O(nm) time, ... Local alignment is often called Smith-Waterman alignment. 21. Gap alignment models ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 33

Provided by: nathanjoh

Category:

more less

Transcript and Presenter's Notes

Title: Sequence Alignment

1
Sequence Alignment

Lecture 15 October 20, 2005
Algorithms in Biosequence Analysis
Nathan Edwards - Fall, 2005

2
Sequence Alignment

Global alignment of S, length m, and T, length n
each char of S (T) aligned with char of T (S) or
space -
in O(nm) time, using (n1)(m1) space
(min) edit-distance and (max) similarity
formulations
Dynamic Programming
Base conditions recurrence relation
Dynamic programming table (bottom up)
Traceback from (n,m) to (0,0) to obtain sequence
alignment

3
Recurrence Relation

Base conditions
V(i,0) Ski s(S(k),-), for all i 0,...,n
V(0,j) Skj s(-,T(k)), for all j 0,...,m
Recurrence
V(i,j) max V(i-1,j) s(S(i),-),
V(i,j-1) s(-,T(j)),
V(i-1,j-1) s(S(i),T(j))

4
Dynamic Programming Table
5
Similar Protein Sequences(Human v Worm)

8 FAKDFLAGGVAAAISKTAVAPIERVKLLLQVQHASKQITADKQYK
GIIDCVVRIPKEQGV 67
F D GG AAASKTAVAPIERVKLLLQVQ ASK I
DKYKGID RPKEQGV
12 FLIDLASGGTAAAVSKTAVAPIERVKLLLQVQDASKAIAVDKRYK
GIMDVLIRVPKEQGV 71
68 LSFWRGNLANVIRYFPTQALNFAFKDKYKQIFLGGVDKRTQFWRY
FAGNLASGGAAGATS 127
WRGNLANVIRYFPTQANFAFKD YK IFL GDK
FWFAGNLASGGAAGATS
72 AALWRGNLANVIRYFPTQAMNFAFKDTYKAIFLEGLDKKKDFWKF
FAGNLASGGAAGATS 131
128 LCFVYPLDFARTRLAADVGKAGAEREFRGLGDCLVKIYKSDGIKG
LYQGFNVSVQGIIIY 187
LCFVYPLDFARTRLAADGKA REFGL DCLKI KSDG
GLYGF VSVQGIIIY
132 LCFVYPLDFARTRLAADIGKAN-DREFKGLADCLIKIVKSDGPIG
LYRGFFVSVQGIIIY 190
188 RAAYFGIYDTAKGML-PDPKNTHIVISWMIAQTVTAVAGLTSYPF
DTVRRRMMMQSGRKG 246
RAAYFGDTAK D W IAQ VT G
SYPDTVRRRMMMQSGRK
191 RAAYFGMFDTAKMVFASDGQKLNFFAAWGIAQVVTVGSGILSYPW
DTVRRRMMMQSGRK- 249
247 TDIMYTGTLDCWRKIARDEGGKAFFKGAWSNVLRGMGGAFVLVLY
DEIKKY 297
DIY TLDC KI EG A FKGA SNV RG GGA VL
YDEIK
250 -DILYKNTLDCAKKIIQNEGMSAMFKGALSNVFRGTGGALVLAIY
DEIQKF 299

6
Global Alignment Schematic
T
(0,0)
S
(n,m)
7
End-space free variant
T
(0,0)
S
(n,m)
8
End-space free variant
T
(0,0)
S
(n,m)
9
End-space free variant
T
(0,0)
S
(n,m)
10
End-space free variant

Dont charge for optimal alignment starting in
cells (i,0) or (0,j)
Base conditions V(i,0) V(0,j) 0
Dont charge for adding spaces at end of
alignment
Find cell (n,j) or (i,m) with maximum similarity
value, begin traceback from there

11
Approximate Search
T
T
(0,0)
P
(n,m)
Similarity P T d
12
Approximate Search

Dont charge for optimal alignment starting in
cells (0,j)
Base conds V(0,j) 0, V(i,0) Ski s(S(k),-)
Dont charge for ending alignment at end of P
(but not necc. T)
Find cell (n,j) with similarity value d

13
Local alignment

In many biological contexts, two strings may only
have regions of similarity.
S pqraxabcstvq, T xyaxbacsll
poor global alignment, but for a axabcs and ß
axbacs, there is strong similarity.

14
Local alignment problem

Given two sequences S, length n, and T, length m,
find substrings a from S and ß from T whose
similarity is maximum over all pairs of
substrings from S and T
For S pqraxabcstvq, T xyaxbacsll, a x a b
c s a x b a c shas similarity 8 for match
score 2, mismatch -2, and space -1.

15
Local alignment

Surprisingly, the optimal local alignment can be
computed in O(nm) time and O(nm) space.
Base cond v(i,0) v(0,j) 0 for all i,j
Recurrence v(i,j) max 0, v(i-1,j)
s(S(i),-),
v(i,j-1) s(-,T(j)),
v(i-1,j-1) s(S(i),T(j))
Check each cell to find max v(i,j) for all i,j.

16
Local Alignment Schematic
T
(0,0)
S
(n,m)
17
Local Alignment Schematic
T
(0,0)
S
(n,m)
18
Local Alignment Schematic
T
(0,0)
S
(n,m)
19
Local alignment

Dont charge for optimal alignment starting in
any cell (i,j)
Base conds V(i,0) V(0,j) 0
Can re-start alignment in any cell.
Dont charge for ending alignment in any cell
Find cell (i,j) with maximum similarity value
Traceback from end of alignment.

20
Terminology

Global alignment is often called Needleman-Wunsch
alignment
Local alignment is often called Smith-Waterman
alignment

21
Gap alignment models

Consecutive run of spaces in a sequence
alignment
Need to model block insertions and deletions
better than linear gap model does.
No encouragement for long gaps to form
Arbitrary gap model
cost of gap of length g is w(g)
Affine gap model (open extension cost)
cost of gap of length g is o e.g

22
Gap alignment models

Have to keep track of whether we are opening or
extending a gap
Current DP formulation doesnt cut it!
Consider any alignment of S1...i and T1...j.
Either
1.) S(i) and T(j) are aligned with each other
2.) S(i) is aligned to T(j), with j lt j
3.) T(j) is aligned to S(i), with i lt i or

23
Gap alignment models

Let G(i,j) be maximum value of any alignment with
S(i) aligned with T(j) 1
Let E(i,j) be maximum value of any alignment with
T(j) aligned with a gap 2
Let F(i,j) be maximum value of any alignment with
S(i) aligned with a gap 3
Let V(i,j) max E(i,j), F(i,j), G(i,j)

24
Arbitrary gap cost recurrence

Alignment type 1
G(i,j) V(i-1,j-1) s(S(i),T(j))
Alignment type 2
E(i,j) max 0kj-1 V(i,k) w(j-k)
Alignment type 3
F(i,j) max0ki-1 V(k,j) w(i-k)
V(i,j) max E(i,j), F(i,j), G(i,j)

25
Arbitrary gap cost recurrence

Base conditions
V(i,0) -w(i), E(i,0) -w(i)
V(0,j) -w(j), F(0,j) -w(j)
V(0,0) G(0,0) 0
Optimal value of alignment is found in cell (n,m)
Traceback may jump multiple cells horizontally or
vertically
Running time is O(nm(nm)), space is O(nm) as
before.

26
Affine gap model recurrence

Base conditions
V(i,0) E(i,0) o e.i
V(0,j) F(0,j) o e.j
V(0,0) G(0,0) 0
Recurrences
V(i,j) max E(i,j), F(i,j), G(i,j)
G(i,j) V(i-1,j-1) s(S(i),T(j))
E(i,j) max E(i,j-1) e, V(i,j-1) o e
F(i,j) max F(i-1,j) e, V(i-1,j) o e
Running time O(nm), space O(nm)

27
Linear space global alignment algorithm

Notice that if we only wanted the value of the
optimal alignment, then O(m) space is sufficient
Only use previous row of table when computing
current row
So V(n,m) in O(m) space and O(nm) time.
How can we recover the optimal alignment without
giving up O(m) space?

28
Optimal global alignment in linear space

Define VR(i,j) to be the similarity of SR1...i
and TR1...j
Run DP from bottom right corner up left
V(n,m) max0km V(n/2,k) VR(n/2,m-k)
The optimal alignment can be broken into the
piece for S1...n/2 and the piece for
Sn/21...n
T1...k aligns with the first half of S,
whileTk1...m aligns with the second half of
S.

29
Optimal global alignment in linear space