Title: Sequence comparison: Local alignment
1Sequence comparison Local alignment
- Genome 559 Introduction to Statistical and
Computational Genomics - Prof. William Stafford Noble
2One-minute responses
- It would be helpful to somehow get the solutions
for the sample problems in our lecture printouts. - You can see the solutions by visiting the class
web page and opening the slides there. - I am still a little confused about the difference
between strings and lists. - A string is like a tuple of characters. Unlike a
string a list is (1) mutable and (2) may contain
other objects besides characters. - The Biotechniques paper went into a lot of detail
-- how much of this should we understand? - I intend the paper to provide background for
those who are interested. You should be sure you
understand just what I go over in lecture. - I am slightly worried because I never seem to do
things in the most straightforward way. - This just takes practice. Often, there is no
single best way.
3One-minute responses
- There was perhaps a bit too much programming in
this class. - There was more class time for Python, which was
nice. - I really liked the sample problem times.
- Problem set is very reasonable.
- The examples and practice are most useful
teaching methods for me at least. I am getting
comfortable with the code through practice. - I like the sample problems. In the last few
classes I felt rushed to finish them, but this
time I was able to do all 3. It's very
satisfying when they work.
- I had somewhat more difficulty with today's
exercises. I think it was due to the inherent
complexity of adding new types to the repertoire. - Class moved at a good speed today.
- I enjoyed the pace today.
- Today's pace was good.
- The pace was good -- it was helpful for me to
have more time for problems. - Good pace.
- Programming problems were a good speed today.
- The biostats portion was a little fast but
manageable.
4One-minute responses
- The cheat sheet really helped.
- I really liked the list of operations and methods
on the back of the lecture notes. - Lists of commands in slides were helpful.
- Reviewing the DP matrix was very helpful.
- I'm glad we reviewed the Needleman-Wunsch
algorithm. - The traceback review helped me realize I'd
forgotten how to do it.
5Local alignment
- A single-domain protein may be homologous to a
region within a multi-domain protein. - Usually, an alignment that spans the complete
length of both sequences is not required.
6BLAST allows local alignments
Global alignment
Local alignment
7Global alignment DP
- Align sequence x and y.
- F is the DP matrix s is the substitution matrix
d is the linear gap penalty.
8Local alignment DP
- Align sequence x and y.
- F is the DP matrix s is the substitution matrix
d is the linear gap penalty.
9Local DP in equation form
0
10A simple example
Find the optimal local alignment of AAG and
AGC. Use a gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
A
G
C
0
11A simple example
Find the optimal local alignment of AAG and
AGC. Use a gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0
G 0
C 0
0
12A simple example
Find the optimal local alignment of AAG and
AGC. Use a gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0
G 0
C 0
0
-5
2
0
-5
0
13A simple example
Find the optimal local alignment of AAG and
AGC. Use a gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0 2
G 0 ?
C 0 ?
0
14A simple example
Find the optimal local alignment of AAG and
AGC. Use a gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0 2 ? ?
G 0 0 ? ?
C 0 0 ? ?
0
15A simple example
Find the optimal local alignment of AAG and
AGC. Use a gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0 2 2 0
G 0 0 0 4
C 0 0 0 0
0
16Local alignment
- Two differences with respect to global alignment
- No score is negative.
- Traceback begins at the highest score in the
matrix and continues until you reach 0. - Global alignment algorithm Needleman-Wunsch.
- Local alignment algorithm Smith-Waterman.
17A simple example
Find the optimal local alignment of AAG and
AGC. Use a gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
A 0 2 2 0
G 0 0 0 4
C 0 0 0 0
0
AG AG
18Local alignment
Find the optimal local alignment of AAG and
GAAGGC. Use a gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
G 0
A 0
A 0
G 0
G 0
C 0
0
19Local alignment
Find the optimal local alignment of AAG and
GAAGGC. Use a gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
G 0 0 0 2
A 0 2 2 0
A 0 2 4 0
G 0 0 0 6
G 0 0 0 2
C 0 0 0 0
0
20Local alignment
Find the optimal local alignment of AAG and
GAAGGC. Use a gap penalty of d-5.
A C G T
A 2 -7 -5 -7
C -7 2 -7 -5
G -5 -7 2 -7
T -7 -5 -7 2
A A G
0 0 0 0
G 0 0 0 2
A 0 2 2 0
A 0 2 4 0
G 0 0 0 6
G 0 0 0 2
C 0 0 0 0
AAG AAG
0
21Summary
- Local alignment finds the best match between
subsequences. - Smith-Waterman local alignment algorithm
- No score is negative.
- Trace back from the largest score in the matrix.