Pairwise and multiple sequence alignment - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Pairwise and multiple sequence alignment

Description:

Pairwise and multiple sequence alignment. PhD training, April 4 2006 ... Smith-Waterman for local alignment. Computational cost encumbers database searches ! ... – PowerPoint PPT presentation

Number of Views:1533
Avg rating:3.0/5.0
Slides: 31
Provided by: hugo59
Category:

less

Transcript and Presenter's Notes

Title: Pairwise and multiple sequence alignment


1
PhD training, April 4 2006
Pairwise and multiple sequence alignment
Hugo CEULEMANS Section of Biochemistry, Campus
GHB ON1-901 Hugo.Ceulemans_at_med.kuleuven.be
(016/3)30.248
2
Pairwise multiple sequence alignment
Format Theoretical introduction (today) 1 ½
h Practical application (tomorrow) 1 ½
h (BLAST exercises)
3
Sequence alignment definitions and remarks
Sequence of nucleotides (4 standard types
gap in DNA) of amino acids (20 standard types
gap in proteins)
Alignment matching of corresponding residues
ALI---EN- ALIGNMENT
Correspondence of at least some of the residues
is assumed before alignment !
4
Sequence alignment definitions and remarks
Alignment at the nucleotide level e.g.
BLASTN nt query, nt subject
Alignment at the amino acid level e.g.
BLASTP aa query, aa subject
TBLASTN aa query, ntT subject BLASTX ntT
query, aa subject TBLASTX ntT query, ntT
subject
5
Sequence alignment definitions and remarks
Numerous alignments can be produced for any set
of sequences !
Sequence alignment optimal sequence alignment(s)
Biologically optimal alignment aligned
residues derive from the same residue in a
common ancestral sequence
Numerically optimal alignment alignment with
the highest summed scores of the aligned sets
of residues, given a scoring function
6
Pairwise alignment scoring
Match/mismatch scheme (e.g. BLASTN) match
score, given for identical aligned residues
mismatch score, given for different aligned
residues e.g. default BLASTN match (r) 1
default BLASTN mismatch (q) -3
gap existence penalty, given for gap creation
gap extension penalty, given for gap
extension e.g. default BLASTN gap extension
(G) 5 default BLASTN gap extension (E) 2
7
Pairwise alignment scoring
(Position-independent) matrix scheme (e.g.
BLASTaa) defines a score for every possible
residue pairing e.g. default BLASTaa matrix
(M) BLOSUM62
gap existence penalty, given for gap creation
gap extension penalty, given for gap
extension e.g. default BLASTaa gap extension
(G) 11 default BLASTaa gap extension (E)
1
8
Pairwise alignment scoring
Sij ? ? ln(qij / pipj)
BLOSUM62
9
Pairwise alignment scoring
Sij ? ? ln(qij / pipj) blocks of sequences
identical to at least X provided qij values for
BLOck SUbstitution Matrix X BLOSUM80 for
aligning vertebrate sequences (or short
sequences) BLOSUM62 for aligning animal
sequences BLOSUM45 for alignment of distant
sequences BLOSUM matrices outperform the older
PAM matrices
10
Pairwise alignment scoring
Raw score S (unit-less) Sum of the scores
for the aligned residues minus the sum of the
gap penalties
Normalized score S (bits) S (?S - ln k) /
ln 2 normalized with respect to the chosen
scoring scheme e.g. in BLAST output, the
bit score is followed by the raw score between
parentheses
11
Pairwise alignment algorithms
Given a sequence pair and an additive scoring
function, the best-scoring alignment can be
computed with dynamic programming e.g. Needleman-
Wunsch for global alignment Smith-Waterman for
local alignment
Computational cost encumbers database searches
! Due to search space reductions, heuristic
algorithms are much faster and most often still
find the best-scoring alignment e.g. FASTA, BLAST
12
Pairwise alignment BLAST algorithm
Seeding word subsequence of a certain
size e.g. default BLASTN word size (W) 11
residues default BLASTaa word size (W) 3
residues
seed word self-alignment or word alignment
meeting a minimal score e.g. default BLASTaa
minimal seed score (f) 11 (BLASTN requires
word self-alignment)
in PHI-BLAST, seed alignment of words matching
a user-defined motif in PROSITE syntax
13
Pairwise alignment BLAST algorithm
Extension of all seeds (single-hit scheme) of
pairs of seeds within a window of given
size (double-hit scheme) e.g. default BLASTN
single-hit flag (P) on (1) default
BLASTaa single-hit flag (P) off (0) default
BLASTaa hit window (A) 40 residues
14
Pairwise alignment BLAST algorithm
Extension
in each direction, extend until S lt Smax X?
and revert to the alignment with score Smax,
which must meet a minimal alignment score
Smin X? lt y, X, Z parameters, ESmin lt e
parameter
15
Pairwise alignment BLAST algorithm
Query limitation query with a subsequence by
defining query limits (L)
Masking What ? BLASTN low-complexity filter
DUST (F) T, D BLASTaa low-complexity filter
SEG (F) T, S user-defined filter (U) T,
stretches in lower-case When ? hard masking,
during seeding and extension (default) soft
masking, during seeding only (F) m D, m S
(lookup table)
16
Pairwise alignment BLAST algorithm
Sensitivity/speed trade-off in order of effect
(sensitivity up speed down) word size (W)
and minimal seed score (f) down T
single-hit (P) on (1) if single-hit (P) is
off, double-hit window (A) up dropoff
parameters (y, X, Z) up X1, X2, X3
17
Pairwise alignment BLAST algorithm
PSI-BLAST an initial (BLASTP/PHI-BLAST) or
earlier (PSI-BLAST) run yields a query-anchored
multiple alignment of hits with at least a
minimal E-value e.g. default PSI-BLAST inclusion
E-value (h) 0.005 for each column in the
alignment, a position-specific scoring matrix
(PSSM) is calculated the PSSM is used in the
score calculation of new PSI-BLAST runs, until
convergence
18
Pairwise alignment BLAST algorithm
output formatting plain text (command line
default) html (command line -T T, server
default) pairwise alignment (default) flat
query-anchored without identities for
multiple alignment (command line -m 4) ...
19
Pairwise alignment BLAST algorithm
statistical evaluation S raw score S
normalized score or bit score E number of
alignments with at least score S (S)
expected to occur by chance with a query and a
database of the given size P probability
of occurrence by chance of an alignment
with at least score S (S) with a query
and database of the given size E k m n e -?S
m n 2 -S P 1 - e-E
20
Pairwise alignment BLAST algorithm
recommendations for occasional use, conduct
searches on web servers (NCBI, EBI, ...) for
frequent use, speed and better database control,
install a local copy of the program (command line
operated) choose the best-suited program
flavour e.g. BL2SEQ to align two sequences
TBLASTX to align divergent nt sequences
21
Pairwise alignment BLAST algorithm
recommendations define query subsequence when
possible (command line -L start,stop)
(server Set subsequence From start To
stop) choose the smallest appropriate database
(command line -d database) (on server
Limit by entrez query organism ORGN) choose
the best-suited scoring matrix and gap penalties
for BLASTaa (lower BLOSUM and gap penalties
with increasing sequence divergence)
22
Multiple alignment scoring
(weighted) sum of pairs (weighted) scores for
all pairwise alignments in the multiple alignment
are summed Sum of pairs has no probabilistic
justification, but it is widely used for multiple
alignment ! other schemes have been formulated
on theoretical grounds, but are not generally
accepted
23
Multiple alignment algorithms
multidimensional dynamic programming guarantee
the best-scoring alignment, which requires
simultaneous alignment way too slow for larger
alignments, time complexity LN (L length and N
number of sequences) e.g. MSA, DCA
24
Multiple alignment algorithms
  • heuristic algorithms - functioning
  • almost all use progressive global alignment
  • evolutionary distances are estimated from
    pairwise
  • global alignments
  • a guide tree is constructed from the distances
    using
  • quick and dirty algorithms (UPGMA, NJ, ...)
  • these assign a branch to every sequence, and
  • repeat fusing the closest pair of branches to
    a
  • higher-order branch, until one branch is left

25
Multiple alignment algorithms
  • heuristic algorithms - functioning
  • the alignment process follows the branching
    order
  • of the guide tree
  • iteratively, branches (sequences or
    lower-level
  • alignments) are fused (aligned) using dynamic
  • programming to maximize the (weighted) pair of
  • sums
  • time complexity L2
  • Mistakes in an alignment will influence all
    further alignment !

26
Multiple alignment algorithms
heuristic algorithms - brands http//www.people.v
irginia.edu/wrp/cshl02/smith_res.html large
computational costs for small gains in accuracy
CLUSTAL W (popular golden oldie) several
servers, stand-alone versions for various
platforms, medium speed, medium accuracy, global
alignment T-COFFEE (end-of-lineage power
CLUSTAL) a few servers, stand-alone version for
UNIX/Linux, lower speed, high accuracy, global
alignment
27
Multiple alignment algorithms
heuristic algorithms - brands MUSCLE, MAFFT
(born for speed, more heuristic) one server,
stand-alone version for UNIX/Linux, high speed,
lower accuracy, global alignment DIALIGN (good
for domain alignment) a few servers,
stand-alone version for UNIX/Linux, lower
speed, high accuracy, local alignment POA(2)
(promising fast new-comer) one server,
stand-alone version for UNIX/Linux, high speed,
lower accuracy, local/global alignment
28
Multiple alignment algorithms
  • heuristic algorithms - recommendations
  • quick and dirty guide trees increase alignment
    speed, use more sophisticated programmes for
    trustworthy phylogenetic trees
  • parametrization is largely set by the algorithms,
    the
  • effect of parameter tweaking is poorly
    predictable
  • input influences performance
  • consider aligning conserved portions
  • include many sequences, related to various
    degrees
  • test stability with different sequence subsets

29
Multiple alignment algorithms
  • automated alignment of divergent sequences
    remains
  • troublesome
  • alternatives to standard multiple alignment
    algorithms
  • pre-defined or home-made profiles (hidden markov
  • models) can help
  • pre-defined PFAM, SMART, ...
  • home-made smaller doable alignments, query-
  • anchored multiple alignments from
    PSI-BLAST
  • can be transformed into a profile
  • e.g. HMMer2, available for UNIX/Linux

30
Pairwise and multiple sequence alignment Q A
Write a Comment
User Comments (0)
About PowerShow.com