Pair-wise Sequence Alignment (II) - PowerPoint PPT Presentation

About This Presentation
Title:

Pair-wise Sequence Alignment (II)

Description:

Title: PowerPoint Presentation Author: heringa Last modified by: Jaap Heringa Created Date: 2/20/2003 6:00:44 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 53
Provided by: heri4
Category:

less

Transcript and Presenter's Notes

Title: Pair-wise Sequence Alignment (II)


1
Introduction to bioinformatics 2008 Lecture 6
Pair-wise Sequence Alignment (II)
2
What can sequence alignment tell us about
structure HSSP Sander Schneider, 1991
30 sequence identity
3
Sequence alignmentHistory of Dynamic Programming
algorithm
1970 Needleman-Wunsch global pair-wise
alignment Needleman SB, Wunsch CD (1970) A
general method applicable to the search for
similarities in the amino acid sequence of two
proteins, J Mol Biol. 48(3)443-53. 1981
Smith-Waterman local pair-wise alignment Smith,
TF, Waterman, MS (1981) Identification of common
molecular subsequences. J. Mol. Biol. 147,
195-197.
4
Global dynamic programming
j-1 j
i-1 i
Value from residue exchange matrix
H(i-1,j-1) S(i,j) H(i-1,j) - g H(i,j-1) - g
diagonal vertical horizontal
H(i,j) Max
This is a recursive formula
5
Dynamic programming
j
i
The cell i, j contains the alignment score of
the best scoring alignment score of subsequence
1..i and 1..j, that is, the subsequences up to
i, j Cell i, j does not know what that best
scoring alignment is (it is one out of many
possibilities)
6
Global dynamic programmingPAM250, Gap 6 (linear)
S P E A R E
0 -6 -12 -18 -24 -30 -36
S -6 2 -4 -10 -16 -22 -28
H -12 -4 2 -3 -9 -14 -20
A -18 -10 -3 0 -1 -7 -13
K -24 -16 -9 -3 -1 2 -4
E -30 -22 -15 -5 -3 -2 6
S P E A R E
S 2 1 0 1 0 0
H -1 0 1 -1 2 1
A 1 1 0 2 -2 0
K 0 -1 0 -1 3 0
E 0 -1 4 0 -1 4
These values are copied from the PAM250 matrix
(see earlier slide) and represent the S(i, j)
values in the DP formula (back two slides)
SPEARE S-HAKE
The easy algorithm is only for linear gap
penalties
Higgs Attwood, p. 124 Note There are errors
in the matrices!!
7
Global dynamic programmingPAM250, Gap 6 (linear)
S P E A R E
0 -6 -12 -18 -24 -30 -36
S -6 2 -4 -10 -16 -22 -28
H -12 -4 2 -3 -9 -14 -20
A -18 -10 -3 2 -1 -7 -13
K -24 -16 -9 -3 1 2 -4
E -30 -22 -15 -5 -3 -2 6
S P E A R E
S 2 1 0 1 0 0
H -1 0 1 -1 2 1
A 1 1 0 2 -2 0
K 0 -1 0 -1 3 0
E 0 -1 4 0 -1 4
These values are copied from the PAM250 matrix
(see earlier slide)
Start in left upper cell before either sequence
(circled in red). Path will end in lower right
cell (circled in blue)
SPEARE S-HAKE
The easy algorithm is only for linear gap
penalties
Higgs Attwood, p. 124 Note There are errors
in the matrices!!
8
DP is a two-step process
  • Forward step calculate scores
  • Trace back start at highest score and
    reconstruct the path leading to the highest score
  • These two steps lead to the highest scoring
    alignment (the optimal alignment)
  • This is guaranteed when you use DP!

9
Global dynamic programming
j-1 j
i-1 i
H(i-1,j-1) S(i,j) H(i-1,j) - g H(i,j-1) - g
diagonal vertical horizontal
H(i,j) Max
  • Problem with simple DP approach
  • Can only do linear gap penalties
  • Not suitable for affine and concave penalties,
    but algorithm can be extended to deal with affine
    penalties (preceding lecture)

10
Global dynamic programmingusing affine penalties
j-2 j-1 j
Looking back from cell (i, j) we can adapt the
algorithm for affine gap penalties by looking
back to four more cells (magenta)
i-2 i-1 i
If you came from here, gap was already open, so
apply gap-extension penalty
If you came from here, gap was opened so apply
gap-open penalty
11
Time and memory complexity of DP
  • The time complexity is O(n2) if you would align
    two sequences of n residues, you would need to
    perform n2 algorithmic steps (square search
    matrix has n2 cells that need to be filled)
  • The memory (space) complexity is also O(n2) if
    you would align two sequences of n residues, you
    would need a square search matrix of n by n
    containing n2 cells

12
Global dynamic programmingall types of gap
penalties
j-1
The complexity of this DP algorithm is increased
from O(n2) to O(n3)
The gap length is known exactly and so any gap
penalty regime can be used
i-1
MaxS0ltxlti-1, j-1 - Gap(i-x-1) Si-1,j-1 MaxSi-1,
0ltyltj-1 - Gap(j-y-1)
Si,j si,j Max
13
Global dynamic programmingif affine penalties
are used
j-1
i-1
MaxS0ltxlti-1, j-1 -Go -(i-x-1)Ge Si-1,j-1 MaxSi
-1, 0ltyltj-1 -Go -(j-y-1)Ge
Si,j si,j Max
14
DP recipe for using affine gap penalties
j-1
i-1
  • Mi,j is optimal alignment (highest scoring
    alignment until i,j)
  • Check
  • preceding row until j-2 apply appropriate score
    and gap penalties
  • preceding row until i-2 apply appropriate score
    and gap penalties
  • and celli-1, j-1 apply score for celli-1,
    j-1

15
Global dynamic programmingAffine penalties
Gapo10, Gape2
D W V T A L K
0 -12 -14 -16 -18 -20 -22 -24
T -12 8 -9 -6 -5 -9 -11 -14
D -14 0 9 2 2 3 -5 -3 -34
W -16 -13 25 11 5 4 9 0 -21
V -18 -10 -4 37 21 19 19 15 -16
L -20 -14 -2 23 46 31 37 26 1
K -22 -12 -9 17 33 53 39 50 14
-34 -29 -1 17 39 27 50
D W V T A L K
T 8 3 8 11 9 9 8
D 12 1 6 8 8 4 8
W 1 25 2 3 2 6 5
V 6 2 12 8 8 10 6
L 4 6 10 9 6 14 5
K 8 5 6 8 7 5 13
These values are copied from the PAM250 matrix
(see earlier slide), after being made
non-negative by adding 8 to each PAM250 matrix
cell (-8 is the lowest number in the PAM250
matrix)
The extra bottom row and rightmost column give
the final global alignment scores
16
DP is a two-step process
  • Forward step calculate scores
  • Trace back start at highest score and
    reconstruct the path leading to the highest score
  • These two steps lead to the highest scoring
    alignment (the optimal alignment)
  • This is guaranteed when you use DP!

17
Global dynamic programming
18
Semi-global pairwise alignment
  • Global alignment all gaps are penalised
  • Semi-global alignment N- and C-terminal gaps
    (end-gaps) are not penalised
  • MSTGAVLIY--TS-----
  • ---GGILLFHRTSGTSNS

End-gaps
End-gaps
19
Semi-global dynamic programmingPAM250, Gap 6
(linear)
S P E A R E
0 0 0 0 0 0 0
S 0 2 1 0 1 0 0
H 0 -1 2 2 -1 3 1
A 0 1 0 2 4 -2 3
K 0 0 0 0 1 7 1
E 0 0 -1 4 0 1 11
S P E A R E
S 2 1 0 1 0 0
H -1 0 1 -1 2 1
A 1 1 0 2 -2 0
K 0 -1 0 -1 3 0
E 0 -1 4 0 -1 4
These values are copied from the PAM250 matrix
(see earlier slide)
Start in left upper cell before either sequence
(circled in red). Path will end in cell anywhere
in the bottom row or rightmost columns with the
highest score
SPEARE -SHAKE
The easy algorithm is only for linear gap
penalties
Higgs Attwood, p. 124 Note There are errors
in the matrices!!
20
Semi-global dynamic programming- two examples
with different gap penalties -
These values are copied from the PAM250 matrix
(see earlier slide), after being made
non-negative by adding 8 to each PAM250 matrix
cell (-8 is the lowest number in the PAM250
matrix)
21
Semi-global pairwise alignment
  • Applications of semi-global
  • Finding a gene in genome
  • Placing marker onto a chromosome
  • Generally if one sequence is much longer than
    the other
  • Danger if gap penalties high -- really bad
    alignments for divergent sequences

22
There are three kinds of alignments
  • Global alignment
  • Semi-global alignment
  • Local alignment

23
Local dynamic programming (Smith Waterman,
1981)
LCFVMLAGSTVIVGTR
E D A S T I L C G S
Negative numbers
Amino Acid Exchange Matrix
Search matrix
Gap penalties (open, extension)
AGSTVIVG A-STILCG
24
Local dynamic programming (Smith and Waterman,
1981)basic algorithm
j-1 j
i-1 i
H(i-1,j-1) S(i,j) H(i-1,j) - g H(i,j-1) - g 0
diagonal vertical horizontal
H(i,j) Max
25
Example local alignment of two sequences
  • Align two DNA sequences
  • GAGTGA
  • GAGGCGA (note the length difference)?
  • Parameters of the algorithm
  • Match score(A,A) 1
  • Mismatch score(A,T) -1
  • Gap g -2

Mi-1, j-1 1
Mi, j-1 2
max
Mi, j
Mi-1, j 2
0
26
The algorithm. Step 1 init
Mi-1, j-1 1
  • Create the matrix
  • Initiation
  • No beginning row/column
  • Just apply the equation

Mi, j-1 2
Mi, j
max
Mi-1, j 2
0
27
The algorithm. Step 2 fill in
Mi-1, j-1 1
Mi, j-1 2
Mi, j
max
  • Perform the forward step

Mi-1, j 2
0
28
The algorithm. Step 2 fill in
Mi-1, j-1 1
Mi, j-1 2
Mi, j
max
  • Perform the forward step

Mi-1, j 2
0
29
The algorithm. Step 2 fill in
Mi-1, j-1 1
Mi, j-1 2
Mi, j
max
  • Were done
  • Find the highest cell anywhere in the matrix

Mi-1, j 2
0
30
The algorithm. Step 3 trace back
Mi-1, j-1 1
  • Reconstruct path leading to highest scoring cell
  • Trace back until zero alignment path can begin
    and terminate anywhere in matrix
  • Alignment GAG
  • GAG

Mi, j-1 2
Mi, j
max
Mi-1, j 2
0
31
Local dynamic programmingMatch/mismatch 1/-1,
Gap 2
A T G A C G T
0 0 0 0 0 0 0 0
T 0 0 1 0 0 0 0 1
A 0 1 0 0 1 0 0 0
G 0 0 0 1 0 0 1 0
A 0 1 0 0 2 0 0 0
C 0 0 0 0 0 3 1 0
T 0 0 1 0 0 0 2 2
Fill the matrix (forward pass), then do trace
back from highest cell anywhere in the matrix
till you reach 0 or the beginning of a sequence
32
Local dynamic programmingMatch/mismatch 1/-1,
Gap 2
A T G A C G T
0 0 0 0 0 0 0 0
T 0 0 1 0 0 0 0 1
A 0 1 0 0 1 0 0 0
G 0 0 0 1 0 0 1 0
A 0 1 0 0 2 0 0 0
C 0 0 0 0 0 3 1 0
T 0 0 1 0 0 0 2 2
GAC GAC
Fill the matrix (forward pass), then do trace
back from highest cell anywhere in the matrix
till you reach 0 or the beginning of a sequence
33
Local dynamic programming (Smith Waterman,
1981)
j-1
This is the general DP algorithm, which is
suitable for linear, affine and concave
penalties, although for the example here affine
penalties are used
i-1
Gap opening penalty
Si,j MaxS0ltxlti-1,j-1 - Pi - (i-x-1)Px Si,j
Si-1,j-1 Si,j Max Si-1,0ltyltj-1 - Pi -
(j-y-1)Px 0
Si,j Max
Gap extension penalty
34
Local dynamic programming
35
Global or Local Pairwise alignment
B
B
C
A
A
B
A
A
C
A
B
C
A
Local
B
Local
A
B
C
A
B
C
B
A
Global
Global
A
B
C
A
36
Globin fold ? protein myoglobin PDB 1MBN
Alpha-helices are labelled A (blue) to H
(red). The D helix can be missing in some
globins What happens with the alignment if
D-helix containing globin sequences are aligned
with D-less ones?
37
? sandwich ? protein immunoglobulin PDB 7FAB
Immunoglobulinstructures have variable regions
where numbers of amino acids can vary
substantially
38
TIM barrel ? / ? protein Triose phosphate
IsoMerase PDB 1TIM
The evolutionary history of this protein family
has been the subject of rigorous debate.
Arguments have been made in favor of both
convergent and divergent evolution. Because of
the general lack of sequence homology, the
ancestry of this molecule is still a mystery.
39
What does all this mean for alignments?
  • Alignments need to be able to skip secondary
    structural elements to complete domains (i.e.
    putting gaps opposite these motifs in the shorter
    sequence).
  • Depending on gap penalties chosen, the algorithm
    might have difficulty with making such long gaps
    (for example when using high affine gap
    penalties), resulting in incorrect alignment.
  • Alignments are only meaningful for homologous
    sequences (with a common ancestor)

40
There are three kinds of pairwise alignments
  • Global alignment align all residues in both
    sequences all gaps are penalised
  • Semi-global alignment align all residues in
    both sequences end gaps are not penalised (zero
    end gap penalties)
  • Local alignment align one part of each
    sequence end gaps are not applicable

41
Easy global DP recipe for using affine gap
penalties (after Gotoh)
j-1
Penalty Pi gap_lengthPe
MaxS0ltxlti-1, j-1 - Pi - (i-x-1)Px Si-1,j-1 MaxS
i-1, 0ltyltj-1 - Pi - (j-y-1)Px
Si,j si,j Max
i-1
  • Mi,j is optimal alignment (highest scoring
    alignment until i, j)
  • At each cell i, j in search matrix, check Max
    coming from
  • any cell in preceding row until j-2 add score
    for celli, j minus appropriate gap penalties
  • any cell in preceding column until i-2 add score
    for celli, j minus appropriate gap penalties
  • or celli-1, j-1 add score for celli, j
  • Select highest scoring cell in bottom row and
    rightmost column and do trace-back

42
Lets do an example global alignmentGotohs DP
algorithm with affine gap penalties (PAM250,
Pi10, Pe2)
D W V T A L K
0 -12 -14 -16 -18 -20 -22 -24
T -12 0 -17 -14 -13 -17 -19 -22 -22
D -14 -8 -7 -14 -14 -13 -42
W -16 -21 9 -13 -19 -18
V -18 -18 -20 13 -3 -16
L -20 -22 -18 -1 14 -1 -14
K -22 -20 -21 -12
-24 -42 -41 -18 -16 -14 -12 0
D W V T A L K
T 0 -5 0 3 1 1 0
D 4 -7 -2 0 0 -4 0
W -7 17 -6 -5 -6 -2 -3
V -2 -6 4 0 0 2 -2
L -4 -2 2 1 -2 6 -3
K 0 -3 -2 0 -1 -3 5
PAM250
Cell (D2, T4) can alternatively come from two
cells (same score) high-road or low-road
Row and column 0 are filled with 0, -12, -14,
-16, if global alignment is used (for
N-terminal end-gaps) also extra row and column
at the end to calculate the score including
C-terminal end-gap penalties. Note that only
non-diagonal arrows are indicated for clarity
(no arrow means that you go back to earlier
diagonal cell).
43
Lets do another example semi-global
alignmentGotohs DP algorithm with affine gap
penalties (PAM250, Pi10, Pe2)
D W V T A L K
T 0 -5 0 3 1 1 0
D 4 -7 -2 0 0 -4 0
W -7 17 -6 -5 -6 -2 -3
V -2 -6 4 0 0 2 -2
L -4 -2 2 1 -2 6 -3
K 0 -3 -2 0 -1 -3 5
D W V T A L K
T 0 -5 0 3
D 4 -7 -7
W -7 21 -13
V -2 -13 25 9
L
K
PAM250
Starting row and column 0, and extra column at
right or extra row at bottom is not necessary
when using semi global alignment (zero end-gaps).
Rest works as under global alignment.
44
Easy local DP recipe for using affine gap
penalties (after Gotoh)
j-1
Penalty Pi gap_lengthPe
Si,j MaxS0ltxlti-1,j-1 - Pi - (i-x-1)Px Si,j
Si-1,j-1 Si,j Max Si-1,0ltyltj-1 - Pi -
(j-y-1)Px 0
Si,j Max
i-1
  • Mi,j is optimal alignment (highest scoring
    alignment until i, j)
  • At each cell i, j in search matrix, check Max
    coming from
  • any cell in preceding row until j-2 add score
    for celli, j minus appropriate gap penalties
  • any cell in preceding column until i-2 add score
    for celli, j minus appropriate gap penalties
  • celli-1, j-1 add score for celli, j
  • or 0
  • Select highest scoring cell anywhere in matrix
    and do trace-back until zero-valued cell or start
    of sequence(s)

45
Lets do yet another example local
alignmentGotohs DP algorithm with affine gap
penalties (PAM250, Pi10, Pe2)
D W V T A L K
T 0 -5 0 3 1 1 0
D 4 -7 -2 0 0 -4 0
W -7 17 -6 -5 -6 -2 -3
V -2 -6 4 0 0 2 -2
L -4 -2 2 1 -2 6 -3
K 0 -3 -2 0 -1 -3 5
D W V T A L K
T 0 0 0 3
D 4 0 0 0
W 0 21 0 0
V 0 0 25 9
L 0 0 11
K 0 0
PAM250
Extra start/end columns/rows not necessary (no
end-gaps). Each negative scoring cell is set to
zero. Highest scoring cell may be found anywhere
in search matrix after calculating it. Trace
highest scoring cell back to first cell with zero
value (or the beginning of one or both sequences)
46
Dot plots
  • Way of representing (visualising) sequence
    similarity without doing dynamic programming (DP)
  • Make search matrix as for DP, but locally
    represent sequence similarity by averaging using
    a sliding window

47
Dot-plots
Dot plots are calculated using a diagonal window
of preset length that is slid through the search
matrix --typically the central cell holds the
window score (e.g. sum, average)
48
Dot-plotsa simple way to visualise sequence
similarity
Filter 6/10 residues have to match...
Can be a bit messy, though...
49
Dot-plots, what about...
  • Insertions/deletions -- DNA and proteins
  • Duplications (e.g. tandem repeats) DNA and
    proteins
  • Inversions -- DNA

Dot plots are calculated using a diagonal window
of preset length that is slid through the search
matrix --typically the central cell holds the
window score (e.g. sum, average)
50
Dot-plots, self-comparison
51
  • a heuristic
  • Heuristics A rule of thumb that often helps in
    solving a certain class of problems, but makes no
    guarantees.Perkins, DN (1981) The Mind's Best
    Work

52
For your first exam D1Make sure you
understand and can carry out 1. the simple DP
algorithm for global, semi-global and local
alignment (using linear gap penalties but make
sure you know the extension of the basic
algorithm for affine gap penalties) and 2. The
general DP algorithm for global, semi-global and
local alignment (using linear, affine and concave
gap penalties)!
Write a Comment
User Comments (0)
About PowerShow.com