Biological Sequence Comparison Database Homology Searching

About This Presentation

Title:

Description:

Number of Views:36

Avg rating:3.0/5.0

Slides: 14

Provided by: aoifemc

Category:

more less

Transcript and Presenter's Notes

Title: Biological Sequence Comparison Database Homology Searching

1
Biological Sequence Comparison / Database
Homology Searching

2
Database Homology Searching

Use algorithms to increase efficiency and to
provide a mathematical basis for searches which
can be translated into statistical significance
Assumes that sequence, structure and function are
inter-related
BLAST (Basic Local Alignment Search Tool) and
FastA (Fast Alignment)
heuristic approximations of Needleman-Wunsch and
Smith-Waterman algorithms
reduce computation

3
Needleman-Wunsch Algorithm

General algorithm for sequence comparison
Maximise a similarity score, to give maximum
match
Maximum match largest number of residues of one
sequence that can be matched with another
allowing for all possible deletions
Finds the best GLOBAL alignment of any two
sequences
N-W involves an iterative matrix method of
calculation
All possible pairs of residues (bases or amino
acids) - one from each sequence - are represented
in a 2-dimensional array
All possible alignments (comparisons) are
represented by pathways through this array

4
Needleman-Wunsch Algorithm (cont.)

Three main steps
1. Assign similarity values
2. For each cell, look at all possible pathways
back to the beginning of the sequence (allowing
insertions and deletions) and give that cell the
value of the maximum scoring pathway
3. Construct an alignment (pathway) back from the
highest scoring cell to give the highest scoring
alignment

5
Needleman-Wunsch Algorithm (cont.)

Similarity values
A numerical value is assigned to every cell in
the array depending on the similarity/dissimilarit
y of the two residues
These may be simple scores or more complicated,
e.g. related to chemical similarities or
frequency of observed substitutions
The example shown has
match 1
mismatch 0

6
Needleman-Wunsch Algorithm (cont.)

Score pathways through array
For each cell want to know the maximum possible
score for an alignment ending at that point
Searches subrow and subcolumn, as shown, for the
highest score
Adds this to the score for the current cell
Proceeds row by row through the array
Gap penalty for the introduction of gaps in the
alignment (presumed insertions or deletions into
one sequence) here 0

HijmaxHi-1, j-1 s(ai,bj), maxHi-k,j-1 -Wk
s(ai,bj), maxHi-1, j-l -Wl s(ai,bj)
7
Needleman-Wunsch Algorithm (cont.)

Construct alignment
The alignment score is cumulative by adding along
a path through the array
The best alignment has the highest score i.e. the
maximum match
Maximum match largest number resulting from
summing the cell values of every pathway
The maximum match will ALWAYS be somewhere in the
outer row or column shown
The alignment is constructed by working backwards
from the maximum match

MP-RCLCQR-JNCBA -PBRCKC-RNJ-CJA
8
Needleman-Wunsch Algorithm (cont.)

Statistical Significance
Maximum match is a function of sequence
relationship and composition
Would like to know probability of obtaining
result (maximum match) from a pair of random
sequences
Estimate this experimentally
form pairs of random sequences by randomly
drawing one member from each set (I.e. have same
composition as the real proteins)
if the value found for the real proteins is
significantly different from that for the random
proteins then the difference is a function of the
sequences alone and not of their composition

9
Smith-Waterman Algorithm

Instead of looking at each sequence in its
entirety this compares segments of all possible
lengths (LOCAL alignments) and chooses whichever
maximise the similarity measure
For every cell the algorithm calculates ALL
possible paths leading to it. These paths can be
of any length and can contain insertions and
deletions

10
Smith-Waterman Algorithm (cont.)

Only works effectively when gap penalties are
used
In example shown
match 1
mismatch -1/3
gap -11/3k (kextent of gap)
Start with all cell values 0
Looks in subcolumn and subrow shown and in direct
diagonal for a score that is the highest when you
take alignment score or gap penalty into account

HijmaxHi-1, j-1 s(ai,bj), maxHi-k,j -Wk,
maxHi, j-l -Wl, 0
11
Smith-Waterman Algorithm (cont.)

Four possible ways of forming a path
For every residue in the query sequence
1. Align with next residue of db sequence score
is previous score plus similarity score for the
two residues
2. Deletion (i.e. match residue of query with a
gap) score is previous score minus gap penalty
dependent on size of gap
3. Insertion (i.e. match residue of db sequence
with a gap) score is previous score minus gap
penalty dependent on size of gap
4. Stop score is zero
Choose whichever of these is the highest

12
Smith-Waterman Algorithm (cont.)

Construct Alignment
The score in each cell is the maximum possible
score for an alignment of ANY LENGTH ending at
those coordinates
Trace pathway back from highest scoring cell
This cell can be anywhere in the array
Align highest scoring segment

GCC-UCG GCCAUUG
13
Differences

Write a Comment

User Comments (0)

About PowerShow.com

Biological Sequence Comparison Database Homology Searching - PowerPoint PPT Presentation