Sequence Alignments - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

Sequence Alignments

Description:

Smith-Waterman Alignments. An adaptation of Needleman-Wunsch ... Smith-Waterman Algorithm. Gives optimal local alignments of. c x d e x d e. c d e x c d e ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 44

Provided by: christoph347

Category:

more less

Transcript and Presenter's Notes

Title: Sequence Alignments

1
Sequence Alignments

Chris Bailey
Bacterial Pathogenesis Genomics Unit
cmb036_at_bham.ac.uk

2
Why align sequences

What does my new gene do?
Known gene x, with a function z
Unknown gene y
If alignment of x and y shows high degree of
similarity, gene y may also have function z

3
Why align sequences

E.g. Multiple sclerosis

4
Why align sequences

E.g. Multiple sclerosis

5
Why align sequences

Myelin sheath proteins were sequenced
Protein database searched for similar bacterial
and viral sequences
Lab research to determine T-cell reaction to
bacterial / viral proteins

6
Proteins Evolve!

Substitution
Insertion
Deletion
Duplication
Inversion

Common ancestor (Probably extinct)
Z
X
Y
Available (And probably homologous)
7
How to Align

Take the following sequences
ACBCBD, and
CADBD
An example alignment
A C - - B C D B
- C A D B - D
The character represents a space or gap. This
could be due to
Insertion
Deletion

8
Evaluating Alignments

Use a scoring function
Exact match between two characters scores 2
Mismatch or space scores -1
A C - - B C D B
- C A D B - D

1
- 1
2
- 1
- 1
2
2
- 1
- 1
9
Scoring Functions

Let x and y be single characters (or spaces)
Then s(x,y) denotes the score of aligning x and y
s is called the scoring function
E.g.
s(A,A) 2
s(B,D) -1
s(-,A) -1
s(B,-) -1

10
More Definitions

If S is a string, S denotes the length of S
Si is the ith character of S
Let S and T be strings. An alignment A maps S and
T into S' and T', that may contain spaces where
l S' T'
In the example S acbcdb, T cadbd, S'
ac--bcdb and T' -cadb-d-

11
Yet More Definitions

For the alignment A, the value is

12
Yet More Definitions

For the alignment A, the value is

13
Brute Force Alignment

Get all subsequences of S and T
Form an alignment of the 2 subsequences
Score the alignment
Where n gt 3 number of basic operations
approximates to 22n

14
Dynamic Alignment

Using strings S and T where
S n and T m
V(i,j) is the value of the optimal alignment of
S1Si and T1 Tj
Optimal alignment of S and T is V(n,m)
Basic operations n2, vs 22n

15
Needleman-Wunsch Algorithm

Starting at i 0 and j 0
V(0,0) 0
V(i,0) V(i - 1,0) s(Si,-), for i gt 0
V(0,j) V(0,j - 1) s(-,Tj), for j gt 0

16
Needleman-Wunsch Algorithm

And for V(i,j) where i gt 0 and j gt 0

17
Needleman-Wunsch Algorithm

And for V(i,j) where i gt 0 and j gt 0

18
What !??

Thats kind of hard to work out in your head
So use a matrix

19
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
0
V(0,0) 0
20
Needleman-Wunsch Algorithm

Fill in the table using the rules for s(i,j)
Match 2
Mismatch -1
Therefore
V(i,0) V(i - 1,0) s(Si,-), and
V(0,j) V(0,j - 1) s(-,Tj)
Become
V(i,0) V(i - 1,0) -1, and
V(0,j) V(0,j - 1) -1

21
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
-1
-2
-3 -4 -5 -6
V(2,0) V(1,0) - 1
V(1,0) V(0,0) - 1
22
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
-1
V(0,2) V(0,1) - 1
V(0,1) V(0,0) - 1
-2
-3
-4
-5
23
Needleman-Wunsch Algorithm

Okay so now we need this formula

24
Needleman-Wunsch Algorithm

Or more simply

25
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
-1
-2
Since s(a,c) -1
V(1,1) V(0,1) - 1
Max -1(from V(0,0) - 1)
V(1,1) V(1,0) - 1
V(1,1) V(0,0) - 1
26
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
1
-3
-2
Since s(c,c) 2
V(2,1) V(2,0) - 1
V(2,1) V(1,0) 2
V(2,1) V(1,1) - 1
Max 1(from V(1,0) 2)
27
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
28
Needleman-Wunsch Algorithm

Creating the optimal alignments
Follow the arrows back from the bottom-right of
the table
Following an arrow left inserts a gap into T, and
uses a letter from S
Following an arrow up inserts a gap into S, and
uses a letter from T
Following an arrow diagonally uses a letter from
S and T

29
A bit of code

if (going left)
unshift S, Si
unshift T, -
--i
if (going up)
unshift S, -
unshift T, Tj
--j
if (going diagonally)
unshift S, Si
unshift T, Tj
--i --j

30
Needleman-Wunsch Algorithm
S - A C B C D B T C A D B - D -
S A C B C D B - T - C A - D B D
S A C B C D B - T - C - A D B D
31
Global vs Semi-Global vs Local

Global
Semi-Global
Local

32
Semi-Global Alignments

Same as global alignment algorithm, except
Initialise 1st row and column with 0
No gap penalty in last row/column
(or start from max value in last row/column)

33
Semi-global alignment
34
Smith-Waterman Alignments

An adaptation of Needleman-Wunsch

35
Smith-Waterman Alignments

An adaptation of Needleman-Wunsch

Only Difference
36
Smith-Waterman Alignments

An adaptation of Needleman-Wunsch
Initialize the matrix with 0 in 1st row column

Only Difference
37
Smith-Waterman Alignments

An adaptation of Needleman-Wunsch
Initialize the matrix with 0 in 1st row column
Start backtrace from the maximum value in the
matrix, end it at 0

Only Difference
38
Smith-Waterman Algorithm
39
Smith-Waterman Algorithm

Gives optimal local alignments of
c x d e x d e
c d e x c d e

and
40
Gap Penalties

Linear Gap Scores
g(k) ?k
Affine Gap Scores
g(k) ??k
Convex Gap Scores
g(k) log(k)
Where
k is gap size
a is gap extension penalty
b is gap introduction penalty

41
Gap Scoring

Concave g(k) log(k)
Best Model of real life
Computationally complex

Linear g(k) a(k)
Not a good model of reality
Computationally simple

Affine g(k) b a(k)
Closer to reality
Computationally manageable

42
Gap Penalties Biological Motivation

Insertion/deletion events (Indels) involving a
whole substring often happen as 1 event
Therefore, linear model is not representative of
real life
However, affine model requires 3 matrices (E, F
and G) to calculate V (the alignment score)

43
Thoughts for next time

How can you align multiple sequences by adapting
pairwise alignment algorithms?
When aligning proteins should 2 similar (e.g.
hydrophobic), amino acids still receive a
positive score?
How does BLAST actually work?

Write a Comment

User Comments (0)