Pair-wise Sequence Alignment (III) - PowerPoint PPT Presentation

About This Presentation
Title:

Pair-wise Sequence Alignment (III)

Description:

Title: PowerPoint Presentation Author: heringa Last modified by: heringa Created Date: 2/20/2003 6:00:44 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 41
Provided by: heringa
Category:

less

Transcript and Presenter's Notes

Title: Pair-wise Sequence Alignment (III)


1
Introduction to bioinformatics 2007 Lecture 7
Pair-wise Sequence Alignment (III)
2
What can sequence alignment tell us about
structure HSSP Sander Schneider, 1991
30 sequence identity
3
Global dynamic programmingPAM250, Gap 6 (linear)
S P E A R E
0 -6 -12 -18 -24 -30 -36
S -6 2 -4 -10 -16 -22 -28
H -12 -4 2 -3 -9 -14 -20
A -18 -10 -3 2 -1 -7 -13
K -24 -16 -9 -3 1 2 -4
E -30 -22 -15 -5 -3 -2 6
S P E A R E
S 2 1 0 1 0 0
H -1 0 1 -1 2 1
A 1 1 0 2 -2 0
K 0 -1 0 -1 3 0
E 0 -1 4 0 -1 4
These values are copied from the PAM250 matrix
(see earlier slide)
Start in left upper cell before either sequence
(circled in red). Path will end in lower right
cell (circled in blue)
SPEARE S-HAKE
The easy algorithm is only for linear gap
penalties
Higgs Attwood, p. 124 Note There are errors
in the matrices!!
4
DP is a two-step process
  • Forward step calculate scores
  • Trace back start at highest score and
    reconstruct the path leading to the highest score
  • These two steps lead to the highest scoring
    alignment (the optimal alignment)
  • This is guaranteed when you use DP!

5
Time and memory complexity of DP
  • The time complexity is O(n2) if you would align
    two sequences of n residues, you would need to
    perform n2 algorithmic steps (square search
    matrix has n2 cells that need to be filled)
  • The memory (space) complexity is also O(n2) if
    you would align two sequences of n residues, you
    would need a square search matrix of n by n
    containing n2 cells

6
Global dynamic programming
j-1 j
i-1 i
H(i-1,j-1) S(i,j) H(i-1,j) - g H(i,j-1) - g
diagonal vertical horizontal
H(i,j) Max
  • Problem with simple DP approach
  • Can only do linear gap penalties
  • Not suitable for affine and concave penalties

7
Global dynamic programmingusing affine penalties
j-2 j-1 j
Looking back from cell (i, j) we can adapt the
algorithm for affine gap penalties by looking
back to four more cells (magenta)
i-2 i-1 i
If you came from here, gap was already open, so
apply gap-extension penalty
If you came from here, gap was opened so apply
gap-open penalty
8
Global dynamic programmingall three types of gap
penalties
j-1
i-1
Gap opening penalty
MaxS0ltxlti-1, j-1 - Pi - (i-x-1)Px Si-1,j-1 MaxS
i-1, 0ltyltj-1 - Pi - (j-y-1)Px
Si,j si,j Max
Gap extension penalty
9
Global dynamic programmingGapo10, Gape2
D W V T A L K
0 -12 -14 -16 -18 -20 -22 -24
T -12 8 -9 -6 -5 -9 -11 -14
D -14 0 9 2 2 3 -5 -3 -34
W -16 -13 25 11 5 4 9 0 -21
V -18 -10 -4 37 21 19 19 15 -16
L -20 -14 -2 23 46 31 37 26 1
K -22 -12 -9 17 33 53 39 50 14
-34 -29 -1 17 39 27 50
D W V T A L K
T 8 3 8 11 9 9 8
D 12 1 6 8 8 4 8
W 1 25 2 3 2 6 5
V 6 2 12 8 8 10 6
L 4 6 10 9 6 14 5
K 8 5 6 8 7 5 13
These values are copied from the PAM250 matrix
(see earlier slide), after being made
non-negative by adding 8 to each PAM250 matrix
cell (-8 is the lowest number in the PAM250
matrix)
The extra bottom row and rightmost column give
the final global alignment scores
10
Easy DP recipe for using affine gap penalties
j-1
i-1
  • Mi,j is optimal alignment (highest scoring
    alignment until i,j)
  • Check
  • preceding row until j-2 apply appropriate gap
    penalties
  • preceding row until i-2 apply appropriate gap
    penalties
  • and celli-1, j-1 apply score for celli-1,
    j-1

11
DP is a two-step process
  • Forward step calculate scores
  • Trace back start at highest score and
    reconstruct the path leading to the highest score
  • These two steps lead to the highest scoring
    alignment (the optimal alignment)
  • This is guaranteed when you use DP!

12
Global dynamic programming
13
There are three kinds of alignments
  • Global alignment
  • Semi-global alignment
  • Local alignment

14
Semi-global pairwise alignment
  • Global alignment all gaps are penalised
  • Semi-global alignment N- and C-terminal gaps
    (end-gaps) are not penalised
  • MSTGAVLIY--TS-----
  • ---GGILLFHRTSGTSNS

End-gaps
End-gaps
15
Semi-global dynamic programmingPAM250, Gap 6
(linear)
S P E A R E
0 0 0 0 0 0 0
S 0 2 1 0 1 0 0
H 0 -1 2 2 -1 3 1
A 0 1 0 2 4 -2 3
K 0 0 0 0 1 7 1
E 0 0 -1 4 0 1 11
S P E A R E
S 2 1 0 1 0 0
H -1 0 1 -1 2 1
A 1 1 0 2 -2 0
K 0 -1 0 -1 3 0
E 0 -1 4 0 -1 4
These values are copied from the PAM250 matrix
(see earlier slide)
Start in left upper cell before either sequence
(circled in red). Path will end in cell anywhere
in the bottom row or rightmost columns with the
highest score
SPEARE -SHAKE
The easy algorithm is only for linear gap
penalties
Higgs Attwood, p. 124 Note There are errors
in the matrices!!
16
Semi-global dynamic programming- two examples
with different gap penalties -
These values are copied from the PAM250 matrix
(see earlier slide), after being made
non-negative by adding 8 to each PAM250 matrix
cell (-8 is the lowest number in the PAM250
matrix)
Global score would have been 65 10 12 10
22 (because of the two end gaps)
17
Semi-global pairwise alignment
  • Applications of semi-global
  • Finding a gene in genome
  • Placing marker onto a chromosome
  • Generally if one sequence is much longer than
    the other
  • Danger if gap penalties high -- really bad
    alignments for divergent sequences

18
Local dynamic programming (Smith Waterman,
1981)
LCFVMLAGSTVIVGTR
E D A S T I L C G S
Negative numbers
Amino Acid Exchange Matrix
Search matrix
Gap penalties (open, extension)
AGSTVIVG A-STILCG
19
Local dynamic programming (Smith and Waterman,
1981)basic algorithm
j-1 j
i-1 i
H(i-1,j-1) S(i,j) H(i-1,j) - g H(i,j-1) - g 0
diagonal vertical horizontal
H(i,j) Max
20
Local dynamic programmingMatch/mismatch 1/-1,
Gap 2
A T G A C G T
0 0 0 0 0 0 0 0
T 0 0 1 0 0 0 0 1
A 0 1 0 0 1 0 0 0
G 0 0 0 1 0 0 1 0
A 0 1 0 0 2 0 0 0
C 0 0 0 0 0 3 1 0
T 0 0 1 0 0 0 2 2
Fill the matrix (forward pass), then do trace
back from highest cell anywhere in the matrix
till you reach 0 or the beginning of a sequence
21
Local dynamic programmingMatch/mismatch 1/-1,
Gap 2
A T G A C G T
0 0 0 0 0 0 0 0
T 0 0 1 0 0 0 0 1
A 0 1 0 0 1 0 0 0
G 0 0 0 1 0 0 1 0
A 0 1 0 0 2 0 0 0
C 0 0 0 0 0 3 1 0
T 0 0 1 0 0 0 2 2
GAC GAC
Fill the matrix (forward pass), then do trace
back from highest cell anywhere in the matrix
till you reach 0 or the beginning of a sequence
22
Local dynamic programming (Smith Waterman,
1981)
j-1
This is the general DP algorithm, which is
suitable for linear, affine and concave
penalties, although for the examples here affine
penalties are used
i-1
Gap opening penalty
Si,j MaxS0ltxlti-1,j-1 - Pi - (i-x-1)Px Si,j
Si-1,j-1 Si,j Max Si-1,0ltyltj-1 - Pi -
(j-y-1)Px 0
Si,j Max
Gap extension penalty
23
Local dynamic programming
24
Global and local alignment
B
B
C
A
A
B
A
A
C
A
B
C
A
Local
B
Local
A
B
C
A
B
C
B
A
Global
Global
A
B
C
A
25
Global or Local Pairwise alignment
B
B
C
A
A
B
A
A
C
A
B
C
A
Local
B
Local
A
B
C
A
B
C
B
A
Global
Global
A
B
C
A
26
Globin fold ? protein myoglobin PDB 1MBN
Alpha-helices are labelled A (blue) to H
(red). The D helix can be missing in some
globins What happens with the alignment if
D-helix containing globin sequences are aligned
with D-less ones?
27
? sandwich ? protein immunoglobulin PDB 7FAB
Immunoglobulinstructures have variable regions
where numbers of amino acids can vary
substantially
28
TIM barrel ? / ? protein Triose phosphate
IsoMerase PDB 1TIM
The evolutionary history of this protein family
has been the subject of rigorous debate.
Arguments have been made in favor of both
convergent and divergent evolution. Because of
the general lack of sequence homology, the
ancestry of this molecule is still a mystery.
29
Pyruvate kinase Phosphotransferase
b barrel regulatory domain a/b barrel
catalytic substrate binding domain a/b
nucleotide binding domain
30
What does all this mean for alignments?
  • Alignments need to be able to skip secondary
    structural elements to complete domains (i.e.
    putting gaps opposite these motifs in the shorter
    sequence).
  • Depending on gap penalties chosen, the algorithm
    might have difficulty with making such long gaps
    (for example when using high affine gap
    penalties), resulting in incorrect alignment.
  • Alignments are only meaningful for homologous
    sequences (with a common ancestor)

31
There are three kinds of pairwise alignments
  • Global alignment align all residues in both
    sequences all gaps are penalised
  • Semi-global alignment align all residues in
    both sequences end gaps are not penalised (zero
    end gap penalties)
  • Local alignment align one part of each
    sequence end gaps are not applicable

32
Easy global DP recipe for using affine gap
penalties (after Gotoh)
j-1
Penalty Pi gap_lengthPe
MaxS0ltxlti-1, j-1 - Pi - (i-x-1)Px Si-1,j-1 MaxS
i-1, 0ltyltj-1 - Pi - (j-y-1)Px
Si,j si,j Max
i-1
  • Mi,j is optimal alignment (highest scoring
    alignment until i, j)
  • At each cell i, j in search matrix, check Max
    coming from
  • any cell in preceding row until j-2 add score
    for celli, j minus appropriate gap penalties
  • any cell in preceding column until i-2 add score
    for celli, j minus appropriate gap penalties
  • or celli-1, j-1 add score for celli, j
  • Select highest scoring cell in bottom row and
    rightmost column and do trace-back

33
Lets do an example global alignmentGotohs DP
algorithm with affine gap penalties (PAM250,
Pi10, Pe2)
D W V T A L K
0 -12 -14 -16 -18 -20 -22 -24
T -12 0 -17 -14 -13 -17 -19 -22 -22
D -14 -8 -7 -14 -14 -13 -42
W -16 -21 9 -13 -19 -18
V -18 -18 -20 13 -3 -16
L -20 -22 -18 -1 14 -1 -14
K -22 -20 -21 -12
-24 -42 -41 -18 -16 -14 -12 0
D W V T A L K
T 0 -5 0 3 1 1 0
D 4 -7 -2 0 0 -4 0
W -7 17 -6 -5 -6 -2 -3
V -2 -6 4 0 0 2 -2
L -4 -2 2 1 -2 6 -3
K 0 -3 -2 0 -1 -3 5
PAM250
Cell (D2, T4) can alternatively come from two
cells (same score) high-road or low-road
Row and column 0 are filled with 0, -12, -14,
-16, if global alignment is used (for
N-terminal end-gaps) also extra row and column
at the end to calculate the score including
C-terminal end-gap penalties. Note that only
non-diagonal arrows are indicated for clarity
(no arrow means that you go back to earlier
diagonal cell).
34
Lets do another example semi-global
alignmentGotohs DP algorithm with affine gap
penalties (PAM250, Pi10, Pe2)
D W V T A L K
T 0 -5 0 3 1 1 0
D 4 -7 -2 0 0 -4 0
W -7 17 -6 -5 -6 -2 -3
V -2 -6 4 0 0 2 -2
L -4 -2 2 1 -2 6 -3
K 0 -3 -2 0 -1 -3 5
D W V T A L K
T 0 -5 0 3
D 4 -7 -7
W -7 21 -13
V -2 -13 25 9
L
K
PAM250
Starting row and column 0, and extra column at
right or extra row at bottom is not necessary
when using semi global alignment (zero end-gaps).
Rest works as under global alignment.
35
Easy local DP recipe for using affine gap
penalties (after Gotoh)
j-1
Penalty Pi gap_lengthPe
Si,j MaxS0ltxlti-1,j-1 - Pi - (i-x-1)Px Si,j
Si-1,j-1 Si,j Max Si-1,0ltyltj-1 - Pi -
(j-y-1)Px 0
Si,j Max
i-1
  • Mi,j is optimal alignment (highest scoring
    alignment until i, j)
  • At each cell i, j in search matrix, check Max
    coming from
  • any cell in preceding row until j-2 add score
    for celli, j minus appropriate gap penalties
  • any cell in preceding column until i-2 add score
    for celli, j minus appropriate gap penalties
  • or celli-1, j-1 add score for celli, j
  • Select highest scoring cell anywhere in matrix
    and do trace-back until zero-valued cell or start
    of sequence(s)

36
Lets do yet another example local
alignmentGotohs DP algorithm with affine gap
penalties (PAM250, Pi10, Pe2)
D W V T A L K
T 0 -5 0 3 1 1 0
D 4 -7 -2 0 0 -4 0
W -7 17 -6 -5 -6 -2 -3
V -2 -6 4 0 0 2 -2
L -4 -2 2 1 -2 6 -3
K 0 -3 -2 0 -1 -3 5
D W V T A L K
T 0 0 0 3
D 4 0 0 0
W 0 21 0 0
V 0 0 25 9
L 0 0 11
K 0 0
PAM250
Extra start/end columns/rows not necessary (no
end-gaps). Each negative scoring cell is set to
zero. Highest scoring cell may be found anywhere
in search matrix after calculating it. Trace
highest scoring cell back to first cell with zero
value (or the beginning of one or both sequences)
37
Dot plots
  • Way of representing (visualising) sequence
    similarity without doing dynamic programming (DP)
  • Make same matrix, but locally represent sequence
    similarity by averaging using a window

38
Comparing two sequences We want to be able to
choose the best alignment between two
sequences. A simple method of visualising
similarities between two sequences is to use dot
plots. The first sequence to be compared is
assigned to the horizontal axis and the second is
assigned to the vertical axis.
39
Dot plots can be filtered by window approaches
(to calculate running averages) and applying a
threshold They can identify insertions,
deletions, inversions
40
For your first exam D1Make sure you
understand and can carry out 1. the simple DP
algorithm for global, semi-global and local
alignment (using linear gap penalties but make
sure you know the extension of the basic
algorithm for affine gap penalties) and 2. The
general DP algorithm for global, semi-global and
local alignment (using linear, affine and concave
gap penalties)!
Write a Comment
User Comments (0)
About PowerShow.com