Dynamic Programming for Sequence alignment - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Dynamic Programming for Sequence alignment

Description:

Title: PowerPoint Presentation Author: Neha Last modified by: lenovo Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 59
Provided by: Neha9
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Programming for Sequence alignment


1
Dynamic Programming for Sequence alignment
  • Neha Jain
  • Lecturer
  • School of Biotechnology
  • Devi Ahilya University, Indore

2
Sequence alignment
  • Sequence alignment is the procedure of comparing
    two (pair-wise alignment) or more multiple
    sequences by searching for a series of individual
    characters or patterns that are in the same order
    in the sequences.
  • There are two types of alignment local and
    global.
  • In Global alignment, an attempt is made to align
    the entire sequence. If two sequences have
    approximately the same length and are quite
    similar, they are suitable for the global
    alignment.
  • Local alignment concentrates on finding
    stretches of sequences with high level of matches.

3
Interpretation of sequence alignment
  • Sequence alignment is useful for discovering
    structural, functional and evolutionary
    information.
  • Sequences that are very much alike may have
    similar secondary and 3D structure, similar
    function and likely a common ancestral sequence.
    It is extremely unlikely that such sequences
    obtained similarity by chance
  • Large scale genome studies revealed existence of
    horizontal transfer of genes and other sequences
    between species, which may cause similarity
    between some sequences in very distant species.

4
Methods of sequence alignment
  • Dot matrix analysis- Starting from the first
    character in second sequence, one moves across
    the page keeping in the first row and placing a
    dot in many column where the character in A is
    the same. The process is continued until all
    possible comparisons between both the sequences
    are made. Any region of similarity is revealed by
    a diagonal row of dots
  • The dynamic programming (DP) algorithm- The
    method compares every pair of characters in the
    two sequences and generates an alignment, which
    is the best or optimal.
  • Word or k-tuple methods BLAST is the best
    example to deal with k-tuple.

5
Pairwise Sequence Alignment
  • The Aim given two sequences and scoring system
    find the best alignment
  • Points to remember
  • 1) Should consider all possible Pairs
  • 2) Take the best score found
  • 3) There may be more than one best alignment

6
Finding the best alignment is hard!!
  • How to get optimal alignment?
  • The number of possible alignments is large.
  • If both sequences have the same length there is
    one possible for complete alignment with no gap.
  • More complicated when gaps are allowed
  • It is not good idea to go over all alignments
  • Solution Dynamic Programming Algorithm

7
Dynamic Programming
  • General optimization method
  • Proposed by Richard Bellman of Princeton
    University in 1950s. The word dynamic was chosen
    by Bellman to capture the time-varying aspect of
    the problems, and because it sounded impressive.
     The word programming referred to the use of the
    method to find an optimal program
  • Extensively used in sequence alignment and other
    computational problems
  • Applied to biological sequences by Needleman and
    Wunsch

8
Dynamic Programming
  • Original problem is broken into smaller sub
    problems and then solved
  • Pieces of larger problem have a sequential
    dependency
  • 4th piece can be solved using solution of the 3rd
    piece, the 3rd piece can be solved by using
    solution of the 2nd piece and so on

9
Dynamic Programming
  • First solve all the subproblems
  • Store each intermediate solution in a table along
    with a score
  • Uses an m x n matrix of scores where m and n
    are the lengths of sequences being aligned.
  • Can be used for
  • Local Alignment (Smith-Waterman Algorithm)
  • Global Alignment (Needleman-Wunsch Algorithm)

10
Formal description of dynamic programming
algorithm
  • This diagram indicates the moves that are
    possible to reach a certain position (i,j)
    starting from the previous row and column at
    position (i -1, j-1) or from any position in the
    same row or column
  • Diagonal move with no gap penalties or move from
    any other position from column j or row i, with a
    gap penalty that depends on the size of the gap

11
Dynamic Programming
  • Sequence alignment has an optimal-substructure
    property
  • As a result DP makes it easier to consider all
    possible alignments
  • DP algorithms solve optimization problems by
    dividing the problem into independent
    subproblems.
  • Each subproblem is then only solved once, and the
    answer is stored in a table, thus avoiding the
    work of recomputing the solution.

12
Dynamic Programming
  • With sequence alignment, the subproblems can be
    thought of as the alignment of the prefixes of
    the two sequences to a certain point.
  • DP matrix is computed.
  • The optimal alignment score for any particular
    point in the matrix is built upon the optimal
    alignment that has been computed to that point.

13
Dynamic Programming
  • Advantage The method is guaranteed to give a
    global optimum given the choice of parameters
    the scoring matrix and gap penalty with no
    approximation
  • A disadvantage Many alignment may give the same
    optimal score. And none of these correspond to
    the biologically correct alignment

14
Dynamic Programming
  • Comparison of a- ß- chains of chicken
    hemoglobin, Fitch Smith found 17 optimal
    alignments, only one of which was correct
    biologically (1317 alignments were 5 of
    optimal score)
  • Another bad news The time required to align two
    sequences of length n m is proportional to n
    x m.
  • This makes DP unsuitable for use in searching a
    sequence DB for a match to a probe sequence

15
Dynamic Programming
  • Steps Involved
  • Initialization
  • Matrix Fill (scoring)
  • Traceback (alignment)

16
Gap Penalties..????
  • Gaps are due to Insertion or deletion mutations
    in the genes.
  • Penalties are given for the gaps.
  • Through empirical studies for globular proteins,
    a set of penalty values have been developed that
    appear to suit most alignment purposes.
  • They are normally implemented as default values
    in most alignment programs.

17
Gap Penalties..????
  • Caution-
  • Penalty too low- gaps numerous, even non related
    pairs will be aligned.
  • If penalties too high- difficult to pair even
    the related ones.
  • Another factor to consider is the cost difference
    between opening a gap and extending an existing
    gap. It is known that it is easier to extend a
    gap that has already been started. Thus, gap
    opening should have a much higher penalty than
    gap extension.
  • This is based on the rationale that if insertions
    and deletions ever occur, several adjacent
    residues are likely to have been inserted or
    deleted together.
  • Affine Gap Penalties- Gap opening penalty should
    always be lower then gap extension penalty..
  • Constant Penalty- When gap opening and gap
    extension penalties are same

18
Global Alignment Needleman-Wunsch Algorithm
  • In global sequence alignment, an attempt to align
    the entirety of two different sequences is made,
    up to and including the ends of the sequence.
  • Needleman and Wunsch (1970) were among the first
    to describe a dynamic programming algorithm for
    global sequence alignment.

19
Example
  • Two sequences TACT, AATC
  • Scoring system
  • Match 3
  • Mismatch -1
  • Gap -2

20
  • Initializing entry (0,0) 0
  • Fill the matrix from top left to bottom right
  • The score in each entry (i,j) is calculated using
    the three near entries values
  • Global alignment score is the bottom right cell
    value
  • May find more than one alignment

21
4 T 3 C 2 A 1 T 0 -
0 -
1 A
2 A
3 T
4 C
Construct a matrix one sequence (TACT) at the
top another sequence (AATC) at the left
  • Entry (i,j)
  • i for column, j for row
  • alignment of i first letters of one sequence
  • with j first letters of another

22
4 T 3 C 2 A 1 T 0 -
0 0 -
1 A
2 A
3 T
4 C
Initialization entry (0,0) 0
Fill the matrix from top left to bottom right
23
4 T 3 C 2 A 1 T 0 -
-2 0 0 -
1 A
2 A
3 T
4 C
entry (1,0) entry(0,0) gap score 0 (-2)
-2
T -
Horizontal line gap in the left sequence
24
4 T 3 C 2 A 1 T 0 -
-4 -2 0 0 -
1 A
2 A
3 T
4 C
TA - -
entry (2,0) entry(1,0) gap score -2
(-2) -4
25
4 T 3 C 2 A 1 T 0 -
-6 -4 -2 0 0 -
1 A
2 A
3 T
4 C
TAC - - -
entry (3,0) entry(2,0) gap score -4
(-2) -6
26
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
1 A
2 A
3 T
4 C
TACT - - - -
entry (4,0) entry(3,0) gap score -6
(-2) -8
27
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
-2 1 A
-4 2 A
-6 3 T
-8 4 C
- - - - AATC
Vertical line gap in the top sequence
28
Global Alignment Needleman-Wunsch Algorithm
For each position, Si,j is defined to be the
maximum score at position i,j i.e. Si,j
MAXIMUM Si-1, j-1 s(ai,bj)
(match/mismatch in the diagonal), Si,j-1 w
(gap in sequence 1), Si-1,j w (gap in
sequence 2)
29
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
? -2 1 A
-4 2 A
-6 3 T
-8 4 C
Three options
30
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
-1 -2 1 A
-4 2 A
-6 3 T
-8 4 C
First option Entry(0,0) mismatch score
0(-1) -1
T A
31
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
-4 -2 1 A
-4 2 A
-6 3 T
-8 4 C
Second option Entry(1,0) gap score -2(-2)
-4
T - - A
32
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
-4 -2 1 A
-4 2 A
-6 3 T
-8 4 C
Third option Entry(0,1) gap score -2(-2)
-4
- T A -
33
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
-1 -2 1 A
-4 2 A
-6 3 T
-8 4 C
Choosing the option with the maximal score
T A
34
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
? -1 -2 1 A
-4 2 A
-6 3 T
-8 4 C
35
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
? -1 -2 1 A
-4 2 A
-6 3 T
-8 4 C
First option Entry(1,0) match score -2(3)
1
TA -A
36
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
? -1 -2 1 A
-4 2 A
-6 3 T
-8 4 C
Second option Entry(2,0) gap score -4(-2)
-6
TA - - - A
37
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
? -1 -2 1 A
-4 2 A
-6 3 T
-8 4 C
Third option Entry(1,1) gap score -1(-2)
-3
TA A -
38
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
1 -1 -2 1 A
-4 2 A
-6 3 T
-8 4 C
Choosing the option with the maximal score
T A - A
39
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
-3 -1 1 -1 -2 1 A
-4 2 A
-6 3 T
-8 4 C
TACT - A - -
40
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
-3 -1 1 -1 -2 1 A
-3 -4 2 A
-6 3 T
-8 4 C
T - AA
- T AA
41
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
-3 -1 1 -1 -2 1 A
0 2 -3 -4 2 A
3 T
4 C
TAC -AA
TACAA -
42
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
-3 -1 1 -1 -2 1 A
-2 0 2 -3 -4 2 A
3 1 0 -1 -6 3 T
1 3 -2 -3 -8 4 C
43
4 T 3 C 2 A 1 T 0 -
-8 -6 -4 -2 0 0 -
-3 -1 1 -1 -2 1 A
-2 0 2 -3 -4 2 A
3 1 0 -1 -6 3 T
1 3 -2 -3 -8 4 C
44
4 T 3 C 2 A 1 T 0 -
-2 0 0 -
1 -1 1 A
0 2 2 A
3 0 3 T
1 3 4 C
Three possible of alignments
45
4 T 3 C 2 A 1 T 0 -
-2 0 0 -
1 1 A
0 2 A
3 3 T
1 4 C
T A C T - A A T C
46
4 T 3 C 2 A 1 T 0 -
0 0 -
-1 1 A
2 2 A
0 3 T
1 3 4 C
T A - C T A A T C -
47
4 T 3 C 2 A 1 T 0 -
0 0 -
-1 1 A
0 2 2 A
3 3 T
1 4 C
T A C T A A - T C
48
Local Alignment Algorithm
  • Algorithm of Smith Waterman (1981)
  • Makes an optimal alignment of the best segment of
    similarity between two sequences
  • Sequences that are not highly similar as a whole,
    but contain regions that are highly similar
  • Use when one sequence is short and the other is
    very long (e.g. database)
  • Can return a number of highly aligned segments

49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
Does a Local Alignment program always produce a
Local Alignment and a Global Alignment program
always produces a Global Alignment?
  • Although a Computer program that is based on the
    Smith waterman local alignment algorithm is used
    for producing an optimal alignment, this does not
    assure that a local alignment will be produced.
  • The scoring matrix or match/mismatch scores and
    gap penalties chosen also influence whether or
    not a local alignment is obtained.
  • Similar is the case with Needleman-Wunsch
    algorithm.

55
  • IF the matched regions are long and cover most of
    the sequences and depends on the presence of many
    gaps, the alignment is global.
  • A local alignment will tends to be shorter and
    not include many gaps.

56
Tools based on Dynamic programming
  • Global Alignment-
  • GAP- No penalties for terminal gaps, thus suits
    for unequal length sequences.
  • Local Alignment-
  • SIM, SSEARCH and LALIGN

57
Multiple Sequence Alignment
  • It is theoretically possible to use dynamic
    programming to align any number of sequences for
    the pair wise alignment
  • The amount of computing time increases
    exponentially as the number of sequences
    increases
  • Therefore full dynamic programming cannot be
    applied for datasets having more then ten
    sequences
  • So heuristic method is used for MSA.

58
Thank you
Write a Comment
User Comments (0)
About PowerShow.com