Multiple sequence alignment

About This Presentation

Title:

Multiple sequence alignment

Description:

... (Heringa 1999) T-Coffee (Notredame Higgins Heringa 2000) HMMER (Eddy 1998) [Hidden Markov Model] SAGA (Notredame Higgins1996) [Genetic algorithm] ... – PowerPoint PPT presentation

Number of Views:134

Avg rating:3.0/5.0

Slides: 46

Provided by: Vict80

Category:

more less

Transcript and Presenter's Notes

Title: Multiple sequence alignment

1
Multiple sequence alignmentWhy?

It is the most important means to assess
relatedness of a set of sequences
Gain information about the structure/function of
a query sequence (conservation patterns)
Construct a phylogenetic tree
Putting together a set of sequenced fragments
(Fragment assembly)
Recognise alternative splice sites
Many bioinformatics methods depend on it
(secondary/tertiary structure)

2
Multiple sequence alignment (MSA) of 12
Flavodoxin cheY
3
Pairwise alignment

Now we know how to do it
How do we get a multiple alignment (three or more
sequences)?
Multiple alignment much greater combinatorial
explosion than with pairwise alignment..

4
Multi-dimensional dynamic programming(Murata et
al. 1985)
5
Simultaneous Multiple alignmentMulti-dimensional
dynamic programming

MSA (Lipman et al., 1989, PNAS 86, 4412)
extremely slow and memory intensive
up to 8-9 sequences of 250 residues
DCA (Stoye et al., 1997, CABIOS 13, 625)
still very slow

6
Alternative multiple alignment methods

Biopat (Hogeweg Hesper 1984, first method ever)
MULTAL (Taylor 1987)
DIALIGN (Morgenstern 1996)
PRRP (Gotoh 1996)
Clustal (Thompson Higgins Gibson 1994)
Praline (Heringa 1999)
T-Coffee (Notredame Higgins Heringa 2000)
HMMER (Eddy 1998) Hidden Markov Model
SAGA (Notredame Higgins1996) Genetic algorithm

7
Progressive multiple alignment general principles
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Scores
Similarity matrix
55
Scores to distances
Iteration possibilities
Guide tree
Multiple alignment
8
General progressive multiple alignment
technique(follow generated tree)
d
1
3
1
3
2
5
1
3
2
5
1
root
3
2
5
4
9
Progressive multiple alignment

Problem
Accuracy is very important
Errors are propagated into the progressive steps
Once a gap, always a gap
Feng Doolittle, 1987

10
Pair-wise alignment quality versus sequence
identity(Vogt et al., JMB 249, 816-831,1995)
11
Multiple alignment profilesGribskov et al. 1987
i
A C D ? ? ? W Y
0.3 0.1 0 ? ? ? 0.3 0.3
Gap penalties
0.5
1.0
Position dependent gap penalties
12
Profile-sequence alignment
sequence
profile
ACDVWY
13
Profile-profile alignment
profile
A C D . . Y
profile
ACDVWY
14
Clustal, ClustalW, ClustalX

CLUSTAL W/X (Thompson et al., 1994) uses
Neighbour Joining (NJ) algorithm (Saitou and Nei,
1984), widely used in phylogenetic analysis, to
construct guide tree.
Sequence blocks are represented by profiles, in
which the individual sequences are additionally
weighted according to the branch lengths in the
NJ tree.
Further carefully crafted heuristics include
(i) local gap penalties
(ii) automatic selection of the amino acid
substitution matrix, (iii) automatic gap penalty
adjustment
(iv) mechanism to delay alignment of sequences
that appear to be distant at the time they are
considered.
CLUSTAL (W/X) does not allow iteration (Hogeweg
and Hesper, 1984 Corpet, 1988, Gotoh, 1996
Heringa, 1999, 2002)

15
Strategies for multiple sequence alignment

Profile pre-processing
Secondary structure-induced alignment
Globalised local alignment
Matrix extension
Objective try to avoid (early) errors

16
Pre-profile generation
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Cut-off
Pre-profiles
Pre-alignments
1
A C D . . Y
1
2
3
4
5
2
2
A C D . . Y
1
3
4
5
5
A C D . . Y
1
5
2
3
4
17
Pre-profile alignment
Pre-profiles
1
A C D . . Y
2
A C D . . Y
Final alignment
3
A C D . . Y
1
2
3
4
5
4
A C D . . Y
A C D . . Y
5
18
Pre-profile alignment
1
2
1
3
4
5
2
2
1
3
4
Final alignment
5
3
1
1
3
2
2
4
3
5
4
5
4
4
1
2
3
5
5
1
5
2
3
4
19
Strategies for multiple sequence alignment

Profile pre-processing
Secondary structure-induced alignment
Globalised local alignment
Matrix extension
Objective try to avoid (early) errors

20
Protein structure hierarchical levels
TERTIARY STRUCTURE (fold)
21
One of the Molecular Biology Dogmas

Structure more conserved than sequence

22
Secondary structure-induced alignment
23
Using secondary structure for alignment
Dynamic programming search matrix
Amino acid exchange weights matrices
MDAGSTVILCFV
HHHCCCEEEEEE
M D A A S T I L C G S
H H H H C C E E E C C
H
H
C
C
E
E
Default
24
Flavodoxin-cheYUsing predicted secondary
structure
1fx1 -PK-ALIVYGSTTGNTEYTAETIARQLANAG-YE
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFD
S-LEETGAQGRKVACF e eeee b
ssshhhhhhhhhhhhhhttt eeeee stt tttttt seeee b
ee sss ee ttthhhhtt ttss tt
eeeee FLAV_DESVH MPK-ALIVYGSTTGNTEYTaETIARELA
DAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDD
FIPLFDS-LEETGAQGRKVACf e eeeeee
hhhhhhhhhhhhhhh eeeeee eeeeee
hhhhhh
eeeee FLAV_DESGI MPK-ALIVYGSTTGNTEGVaEAIAKTLN
SEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQED
FVPLYED-LDRAGLKDKKVGVf e eeeeee
hhhhhhhhhhhhhh eeeeee hhhhhh eeeeeee
hhhhhh
eeeeee FLAV_DESSA MSK-SLIVYGSTTGNTETAaEYVAEAF
ENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQD
DFIPLYDS-LENADLKGKKVSVf
eeeeee hhhhhhhhhhhhhh eeeee
eeeee hhhhhhh h
eeeee FLAV_DESDE MSK-VLIVFGSSTGNTESIaQKLEELIA
AGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDD
FLSLFEE-FNRFGLAGRKVAAf eeee
hhhhhhhhhhhhhh eeeee hhhhhhhhhhheeeee
hhhhhhh hh eeeee 2fcr
--K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVT
DPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKD
LPVAIF eeeee
ssshhhhhhhhhhhhhggg b eeggg s gggggg seeeeeee
stt s s s sthhhhhhhtggg tt
eeeee FLAV_ANASP SKK-IGLFYGTQTGKTESVaEIIRDEFG
ND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSD
WEGLYSE-LDDVDFNGKLVAYf eeeee
hhhhhhhhhhhh eee hhh hhhhhhheeeeee
hhhhhhhhh
eeeeee FLAV_ECOLI -AI-TGIFFGSDTGNTENIaKMIQKQL
GKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA--------QC
DWDDFFPT-LEEIDFNGKLVALf eee
hhhhhhhhhhhh eee hhh hhhhhhheeeee
hhhhh
eeeeee FLAV_AZOVI -AK-IGLFFGSNTGKTRKVaKSIKKRF
DDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENE
SWEEFLPK-IEGLDFSGKTVALf eee
hhhhhhhhhhhhh hhh hhhhhhheeeee
hhhhhhhhh
eeeeee FLAV_ENTAG MAT-IGIFFGSDTGQTRKVaKLIHQKL
DG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYD
SWQEFTNT-LSEADLTGKTVALf eeee
hhhhhhhhhhhh hhh hhhhhhheeeee
hhhhh eeeee 4fxn
----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDV
NIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KIS
GKKVALF eeeee
ssshhhhhhhhhhhhhhhtt eeeettt sttttt seeeeee
btttb ttthhhhhhh hst t tt
eeeee FLAV_MEGEL M---VEIVYWSGTGNTEAMaNEIEAAVK
AAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSV
VEPFFTD-LAP-KLKGKKVGLf
hhhhhhhhhhhhhh eeeee hhhhhhhh eeeee

eeeee FLAV_CLOAB M-K-ISILYSSKTGKTERVaKLIEEGVK
RSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWE
MKKWIDE-SSEFNLEGKLGAAf eee
hhhhhhhhhhhhhh eeeeee hhhhhhhhhh eeee
hhhhhhhhh eeeee 3chy
ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DAL
NKLQAGGYGFVISD---WNMPNM----------DGLELLKTIRADGAMSA
LPVLMV tt eeee s
hhhhhhhhhhhhhht eeeesshh hhhhhhhh eeeee s
sss hhhhhhhhhh ttttt eeee 1fx1
GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD-----------
----------GLRIDGD--PRAARDDIVGWAHDVRGAI--------
eee s ss sstthhhhhhhhhhhttt ee s
eeees gggghhhhhhhhhhhhhh FLAV_
DESVH GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD------
---------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------
- eee hhhhhhhhhhhh
eeeee eeeee
hhhhhhhhhhhhhh FLAV_DESGI GCGDS-SY-TYFCGAVDVI
EKKAEELgATLVAS---------------------SLKIDGE--P--DSA
EVLDwAREVLARV-------- eee
hhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_DESSA
GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD-----------------
----SLKIDGD--P--ERDEIVSwGSGIADKI--------
hhhhhhhhhhhh eeeee
e eee FLAV_DESDE
ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE-----------------
----GLKMEGD--ASNDPEAVASfAEDVLKQL--------
e hhhhhhhhhhhhhh eeeee
ee hhhhhhhhhhh 2fcr
GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSV
RD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------
eee ttt ttsttthhhhhhhhhhhtt eee b gggs
s tteet teesseeeettt ss hhhhhhhhhhhhhhhht FLAV_A
NASP GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYD
FNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------
hhhhhhhhhhhhhh
eeee
hhhhhhhhhhhhhhhh FLAV_ECOLI
GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADD
DHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA
hhhhhhhhhhhhhh eeee
hhhhhhhhhhhhhhhhhh FLAV_AZOVI
GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESS
EAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L--
e hhhhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_ENTA
G GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSF
SAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------
hhhhhhhhhhhhhhh eeee
hhhhhhh hhhhhhhhhhhh 4fxn
G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------
------------PLIVQNE--PDEAEQDCIEFGKKIANI---------
e eesss shhhhhhhhhhhhtt ee s
eeees ggghhhhhhhhhhhht FLAV
_MEGEL G-----SYGWGSGEWMDAWKQRTEDTgATVIGT-----
-----------------AIVNEM--PDNAPE-CKElGEAAAKA-------
-- hhhhhhhhhhh
eeeee eeee h
hhhhhhhh FLAV_CLOAB STANSIA-GGSDIALLTILNHLMVK
-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfG
ERiANkV--KQIF--
hhhhhhhhhhhhhh eeeee
hhhh hhh hhhhhhhhhhhh h 3chy
-----------TAEAKKENIIAAAQAGASGY-------------------
------VVK----P-FTAATLEEKLNKIFEKLGM------
ess hhhhhhhhhtt see
ees s hhhhhhhhhhhhhhht

G
25
Strategies for multiple sequence alignment

Profile pre-processing
Secondary structure-induced alignment
Globalised local alignment
Matrix extension
Objective try to avoid (early) errors

26
Globalised local alignment
1. Local (SW) alignment (M Po,e)

2. Global (NW) alignment (no M or Po,e)
Double dynamic programming
27
M BLOSUM62, Po 0, Pe 0
28
M BLOSUM62, Po 12, Pe 1
29
M BLOSUM62, Po 60, Pe 5
30
Strategies for multiple sequence alignment

Profile pre-processing
Secondary structure-induced alignment
Globalised local alignment
Matrix extension
Objective try to avoid (early) errors

31
Matrix extension

T-Coffee
Tree-based Consistency Objective Function For
alignmEnt Evaluation
Cedric Notredame
Des Higgins
Jaap Heringa J. Mol. Biol., 302, 205-2172000

32
Matrix extension T COFFEE
2
1
3
1
4
1
3
2
4
2
4
3
33
Integrating alignment methods and alignment
information with T-Coffee

Integrating different pair-wise alignment
techniques (NW, SW, ..)
Combining different multiple alignment methods
(consensus multiple alignment)
Combining sequence alignment methods with
structural alignment techniques
Plug in user knowledge

34
Using different sources of alignment information

Structure alignments
Clustal
Clustal
Dialign
Lalign
Manual
T-Coffee
35
Search matrix extension
36
T-Coffee

Combine different alignment techniques by adding
scores
W(A(x), B(y)) ?S(A(x), B(y))
A(x) is residue x in sequence A
summation is over the scores S of the global and
local alignments containing the residue pair
(A(x), B(y))
S is sequence identity percentage of the
associated alignment
Combine direct alignment seqA- seqB with each
seqA-seqI-seqB
W(A(x), B(y)) W(A(x), B(y))
?I?A,BMin(W(A(x), I(z)), W(I(z), B(y)))
Summation over all third sequences I other than A
or B

37
T-Coffee
Other sequences
Direct alignment
38
Search matrix extension
39
Evaluating multiple alignments

Conflicting standards of truth
evolution
structure
function
With orphan sequences no additional information
Benchmarks depending on reference alignments
Quality issue of available reference alignment
databases
Different ways to quantify agreement with
reference alignment (sum-of-pairs, column score)
Charlie Chaplin problem

40
Evaluating multiple alignments

As a standard of truth, often a reference
alignment based on structural superpositioning is
taken

41
Evaluation measures
Query
Reference
Column score
Sum-of-Pairs score
42
Evaluating multiple alignments
?SP
BAliBASE alignment nseq len
43
Summary

Weighting schemes simulating simultaneous
multiple alignment
Profile pre-processing (global/local)
Matrix extension (well balanced scheme)
Smoothing alignment signals
globalised local alignment
Using additional information
secondary structure driven alignment
Schemes strike balance between speed and
sensitivity

44
References

Heringa, J. (1999) Two strategies for sequence
comparison profile-preprocessed and secondary
structure-induced multiple alignment. Comp. Chem.
23, 341-364.
Notredame, C., Higgins, D.G., Heringa, J. (2000)
T-Coffee a novel method for fast and accurate
multiple sequence alignment. J. Mol. Biol., 302,
205-217.
Heringa, J. (2002) Local weighting schemes for
protein multiple sequence alignment. Comput.
Chem., 26(5), 459-477.

45
Where to find this.http//www.ibivu.cs.vu.nl/tea
ching

Write a Comment

User Comments (0)

About PowerShow.com

Multiple sequence alignment - PowerPoint PPT Presentation

Multiple sequence alignment

... (Heringa 1999) T-Coffee (Notredame Higgins Heringa 2000) HMMER (Eddy 1998) [Hidden Markov Model] SAGA (Notredame Higgins1996) [Genetic algorithm] ... – PowerPoint PPT presentation