Title: Sequence comparison: More dynamic programming
1Sequence comparison More dynamic programming
- Genome 559 Introduction to Statistical and
Computational Genomics - Prof. William Stafford Noble
2One-minute responses
- WAY TOO FAST. Please walk around more during
sample problems. I was completely lost. - Today I felt a bit lost. Most times I was still
trying to figure out one slide or problem, while
the class was on the next one. - It was fast today, but after the reading I was
prepared to take things more quickly and I
understood things much better today. - I enjoyed class today. I thought it moved at a
great pace. - I thought the pace was good today.
- I liked the pace of the lecture even though you
said we spent too much time on the dynamic
programming, it gave me time to understand. - The pace is great and gives me time to explore.
- I thought this lecture built nicely on the last
lecture. I struggled last class but it clicked
today.
3One-minute responses
- The matrix exercise was very helpful, even though
Im not fully clear on how it works yet. - I found todays class time much more
understandable. - I struggled a little bit to grasp the matrix, but
by the end I had it. The pace and numerous
examples helped. - The DP matrix was simple to grasp after computing
one or two matrix values, so the portion of the
lecture could go faster. - I like the sample problems.
- Dynamic programming reminded me of sudoku, which
was fun. - Going through the alignment table helped a lot.
- It was nice to do examples with DNA sequences.
- Im feeling a lot better about it all. I really
like going through examples. - Again, the small steps with programming problems
helped, although the first problem was overly
challenging (when explained in a different way it
was fine). - I was a little confused when writing the program.
I think more practice is required. The practice
problems will help. - Todays class was much better since we had
appropriate reading first. The sample problems
were interesting since they actually relate to
biology.
4One-minute responses
- Is there a place to get more samples of simple
code to use to help see patterns of how this
works? Or is there plenty in the book? - There are lots of examples in the book. And of
course, you can easily find lots of examples on
the web. For a reference book with examples, try
Python Cookbook, by Martelli, Ravenscroft and
Ascher. - Im a little fuzzy about how dynamic programming
differs from other sorts of programming, but
everything else was really clear. - The term dynamic programming predates
computers. There is no relationship between this
use of the word programming and what we are
learning to do in Python.
5DP matrix
G A A T C
0 -4 -8 -12 -16 -20
C -4 -5 -9 -13 -12 -6
A -8 -4 5 1 -3 -7
T -12 -8 1 0 11 7
A -16 -12 2 11 7 6
C -20 -16 -2 7 11 17
6Three legal moves
- A diagonal move aligns a character from the left
sequence with a character from the top sequence. - A vertical move introduces a gap in the sequence
along the top edge. - A horizontal move introduces a gap in the
sequence along the left edge.
7DP matrix
GA-ATC CATA-C
G A A T C
0 -4 -8 -12 -16 -20
C -4 -5 -9 -13 -12 -6
A -8 -4 5 1 -3 -7
T -12 -8 1 0 11 7
A -16 -12 2 11 7 6
C -20 -16 -2 7 11 17
8DP matrix
GAAT-C CA-TAC
G A A T C
0 -4 -8 -12 -16 -20
C -4 -5 -9 -13 -12 -6
A -8 -4 5 1 -3 -7
T -12 -8 1 0 11 7
A -16 -12 2 11 7 6
C -20 -16 -2 7 11 17
9DP matrix
GAAT-C C-ATAC
G A A T C
0 -4 -8 -12 -16 -20
C -4 -5 -9 -13 -12 -6
A -8 -4 5 1 -3 -7
T -12 -8 1 0 11 7
A -16 -12 2 11 7 6
C -20 -16 -2 7 11 17
10DP matrix
GAAT-C -CATAC
G A A T C
0 -4 -8 -12 -16 -20
C -4 -5 -9 -13 -12 -6
A -8 -4 5 1 -3 -7
T -12 -8 1 0 11 7
A -16 -12 2 11 7 6
C -20 -16 -2 7 11 17
11Multiple solutions
- When a program returns a sequence alignment, it
may not be the only best alignment.
GA-ATC CATA-C
GAAT-C CA-TAC
GAAT-C C-ATAC
GAAT-C -CATAC
12DP in equation form
- Align sequence x and y.
- F is the DP matrix s is the substitution matrix
d is the linear gap penalty.
13DP in equation form
14A simple example
Find the optimal alignment of AAG and AGC. Use a
gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
A
G
C
15A simple example
Find the optimal alignment of AAG and AGC. Use a
gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0
A
G
C
16A simple example
Find the optimal alignment of AAG and AGC. Use a
gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 -5 -10 -15
A -5
G -10
C -15
17A simple example
Find the optimal alignment of AAG and AGC. Use a
gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 -5 -10 -15
A -5 2 -3 -8
G -10 -3 -3 -1
C -15 -8 -8 -6
18Traceback
- Start from the lower right corner and trace back
to the upper left. - Each arrow introduces one character at the end of
each aligned sequence. - A horizontal move puts a gap in the left
sequence. - A vertical move puts a gap in the top sequence.
- A diagonal move uses one character from each
sequence.
19A simple example
Find the optimal alignment of AAG and AGC. Use a
gap penalty of d-5.
- Start from the lower right corner and trace back
to the upper left. - Each arrow introduces one character at the end of
each aligned sequence. - A horizontal move puts a gap in the left
sequence. - A vertical move puts a gap in the top sequence.
- A diagonal move uses one character from each
sequence.
A A G
0 -5
A 2 -3
G -1
C -6
20A simple example
Find the optimal alignment of AAG and AGC. Use a
gap penalty of d-5.
- Start from the lower right corner and trace back
to the upper left. - Each arrow introduces one character at the end of
each aligned sequence. - A horizontal move puts a gap in the left
sequence. - A vertical move puts a gap in the top sequence.
- A diagonal move uses one character from each
sequence.
A A G
0 -5
A 2 -3
G -1
C -6
AAG- AAG- -AGC A-GC
21Traceback problem 1
G A A T C
0 -4 -8 -12 -16 -20
C -4 -5 -9 -13 -12 -6
A -8 -4 5 1 -3 -7
T -12 -8 1 0 11 7
A -16 -12 2 11 7 6
C -20 -16 -2 7 11 17
Write down the alignment corresponding to the
circled score.
22Solution 1
GA CA
G A A T C
0 -4 -8 -12 -16 -20
C -4 -5 -9 -13 -12 -6
A -8 -4 5 1 -3 -7
T -12 -8 1 0 11 7
A -16 -12 2 11 7 6
C -20 -16 -2 7 11 17
Write down the alignment corresponding to the
circled score.
23Traceback problem 2
G A A T C
0 -4 -8 -12 -16 -20
C -4 -5 -9 -13 -12 -6
A -8 -4 5 1 -3 -7
T -12 -8 1 0 11 7
A -16 -12 2 11 7 6
C -20 -16 -2 7 11 17
Write down three alignments corresponding to the
circled score.
24Solution 2
GAATC CA---
G A A T C
0 -4 -8 -12 -16 -20
C -4 -5 -9 -13 -12 -6
A -8 -4 5 1 -3 -7
T -12 -8 1 0 11 7
A -16 -12 2 11 7 6
C -20 -16 -2 7 11 17
Write down three alignments corresponding to the
circled score.
25Solution 2
GAATC CA---
GAATC C-A--
G A A T C
0 -4 -8 -12 -16 -20
C -4 -5 -9 -13 -12 -6
A -8 -4 5 1 -3 -7
T -12 -8 1 0 11 7
A -16 -12 2 11 7 6
C -20 -16 -2 7 11 17
Write down three alignments corresponding to the
circled score.
26Solution 2
GAATC CA---
GAATC C-A--
GAATC -CA--
G A A T C
0 -4 -8 -12 -16 -20
C -4 -5 -9 -13 -12 -6
A -8 -4 5 1 -3 -7
T -12 -8 1 0 11 7
A -16 -12 2 11 7 6
C -20 -16 -2 7 11 17
Write down three alignments corresponding to the
circled score.