Sequence Alignments - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Sequence Alignments

Description:

Smith-Waterman Alignments. An adaptation of Needleman-Wunsch ... Smith-Waterman Algorithm. Gives optimal local alignments of. c x d e x d e. c d e x c d e ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 44
Provided by: christoph347
Category:

less

Transcript and Presenter's Notes

Title: Sequence Alignments


1
Sequence Alignments
  • Chris Bailey
  • Bacterial Pathogenesis Genomics Unit
  • cmb036_at_bham.ac.uk

2
Why align sequences
  • What does my new gene do?
  • Known gene x, with a function z
  • Unknown gene y
  • If alignment of x and y shows high degree of
    similarity, gene y may also have function z

3
Why align sequences
  • E.g. Multiple sclerosis

4
Why align sequences
  • E.g. Multiple sclerosis

5
Why align sequences
  • Myelin sheath proteins were sequenced
  • Protein database searched for similar bacterial
    and viral sequences
  • Lab research to determine T-cell reaction to
    bacterial / viral proteins

6
Proteins Evolve!
  • Substitution
  • Insertion
  • Deletion
  • Duplication
  • Inversion

Common ancestor (Probably extinct)
Z
X
Y
Available (And probably homologous)
7
How to Align
  • Take the following sequences
  • ACBCBD, and
  • CADBD
  • An example alignment
  • A C - - B C D B
  • - C A D B - D
  • The character represents a space or gap. This
    could be due to
  • Insertion
  • Deletion

8
Evaluating Alignments
  • Use a scoring function
  • Exact match between two characters scores 2
  • Mismatch or space scores -1
  • A C - - B C D B
  • - C A D B - D

1
- 1
2
- 1
- 1
2
2
- 1
- 1
9
Scoring Functions
  • Let x and y be single characters (or spaces)
  • Then s(x,y) denotes the score of aligning x and y
  • s is called the scoring function
  • E.g.
  • s(A,A) 2
  • s(B,D) -1
  • s(-,A) -1
  • s(B,-) -1

10
More Definitions
  • If S is a string, S denotes the length of S
  • Si is the ith character of S
  • Let S and T be strings. An alignment A maps S and
    T into S' and T', that may contain spaces where
  • l S' T'
  • In the example S acbcdb, T cadbd, S'
    ac--bcdb and T' -cadb-d-

11
Yet More Definitions
  • For the alignment A, the value is

12
Yet More Definitions
  • For the alignment A, the value is

13
Brute Force Alignment
  • Get all subsequences of S and T
  • Form an alignment of the 2 subsequences
  • Score the alignment
  • Where n gt 3 number of basic operations
    approximates to 22n

14
Dynamic Alignment
  • Using strings S and T where
  • S n and T m
  • V(i,j) is the value of the optimal alignment of
    S1Si and T1 Tj
  • Optimal alignment of S and T is V(n,m)
  • Basic operations n2, vs 22n

15
Needleman-Wunsch Algorithm
  • Starting at i 0 and j 0
  • V(0,0) 0
  • V(i,0) V(i - 1,0) s(Si,-), for i gt 0
  • V(0,j) V(0,j - 1) s(-,Tj), for j gt 0

16
Needleman-Wunsch Algorithm
  • And for V(i,j) where i gt 0 and j gt 0

17
Needleman-Wunsch Algorithm
  • And for V(i,j) where i gt 0 and j gt 0

18
What !??
  • Thats kind of hard to work out in your head
  • So use a matrix

19
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
0
V(0,0) 0
20
Needleman-Wunsch Algorithm
  • Fill in the table using the rules for s(i,j)
  • Match 2
  • Mismatch -1
  • Therefore
  • V(i,0) V(i - 1,0) s(Si,-), and
  • V(0,j) V(0,j - 1) s(-,Tj)
  • Become
  • V(i,0) V(i - 1,0) -1, and
  • V(0,j) V(0,j - 1) -1

21
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
-1
-2
-3 -4 -5 -6
V(2,0) V(1,0) - 1
V(1,0) V(0,0) - 1
22
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
-1
V(0,2) V(0,1) - 1
V(0,1) V(0,0) - 1
-2
-3
-4
-5
23
Needleman-Wunsch Algorithm
  • Okay so now we need this formula

24
Needleman-Wunsch Algorithm
  • Or more simply

25
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
-1
-2
Since s(a,c) -1
V(1,1) V(0,1) - 1
Max -1(from V(0,0) - 1)
V(1,1) V(1,0) - 1
V(1,1) V(0,0) - 1
26
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
1
-3
-2
Since s(c,c) 2
V(2,1) V(2,0) - 1
V(2,1) V(1,0) 2
V(2,1) V(1,1) - 1
Max 1(from V(1,0) 2)
27
Needleman-Wunsch Algorithm
i 0 1 2 3 4 5 6
j 0 1 2 3 4 5
28
Needleman-Wunsch Algorithm
  • Creating the optimal alignments
  • Follow the arrows back from the bottom-right of
    the table
  • Following an arrow left inserts a gap into T, and
    uses a letter from S
  • Following an arrow up inserts a gap into S, and
    uses a letter from T
  • Following an arrow diagonally uses a letter from
    S and T

29
A bit of code
  • if (going left)
  • unshift S, Si
  • unshift T, -
  • --i
  • if (going up)
  • unshift S, -
  • unshift T, Tj
  • --j
  • if (going diagonally)
  • unshift S, Si
  • unshift T, Tj
  • --i --j

30
Needleman-Wunsch Algorithm
S - A C B C D B T C A D B - D -
S A C B C D B - T - C A - D B D
S A C B C D B - T - C - A D B D
31
Global vs Semi-Global vs Local
  • Global
  • Semi-Global
  • Local

32
Semi-Global Alignments
  • Same as global alignment algorithm, except
  • Initialise 1st row and column with 0
  • No gap penalty in last row/column
  • (or start from max value in last row/column)

33
Semi-global alignment
34
Smith-Waterman Alignments
  • An adaptation of Needleman-Wunsch

35
Smith-Waterman Alignments
  • An adaptation of Needleman-Wunsch

Only Difference
36
Smith-Waterman Alignments
  • An adaptation of Needleman-Wunsch
  • Initialize the matrix with 0 in 1st row column

Only Difference
37
Smith-Waterman Alignments
  • An adaptation of Needleman-Wunsch
  • Initialize the matrix with 0 in 1st row column
  • Start backtrace from the maximum value in the
    matrix, end it at 0

Only Difference
38
Smith-Waterman Algorithm
39
Smith-Waterman Algorithm
  • Gives optimal local alignments of
  • c x d e x d e
  • c d e x c d e

and
40
Gap Penalties
  • Linear Gap Scores
  • g(k) ?k
  • Affine Gap Scores
  • g(k) ??k
  • Convex Gap Scores
  • g(k) log(k)
  • Where
  • k is gap size
  • a is gap extension penalty
  • b is gap introduction penalty

41
Gap Scoring
  • Concave g(k) log(k)
  • Best Model of real life
  • Computationally complex
  • Linear g(k) a(k)
  • Not a good model of reality
  • Computationally simple
  • Affine g(k) b a(k)
  • Closer to reality
  • Computationally manageable

42
Gap Penalties Biological Motivation
  • Insertion/deletion events (Indels) involving a
    whole substring often happen as 1 event
  • Therefore, linear model is not representative of
    real life
  • However, affine model requires 3 matrices (E, F
    and G) to calculate V (the alignment score)

43
Thoughts for next time
  • How can you align multiple sequences by adapting
    pairwise alignment algorithms?
  • When aligning proteins should 2 similar (e.g.
    hydrophobic), amino acids still receive a
    positive score?
  • How does BLAST actually work?
Write a Comment
User Comments (0)
About PowerShow.com