Title: Medical Natural Sciences Year 2: Introduction to Bioinformatics
1Medical Natural Sciences Year 2Introduction to
Bioinformatics
Lecture 8 Multiple sequence alignment
(II) Centre for Integrative Bioinformatics VU
2Progressive multiple sequence alignment
- Accuracy is very important
- Problem
- Errors are propagated through the progressive
steps - Once a gap, always a gap
- Feng Doolittle, 1987
3Progressive multiple alignment - general principle
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Scores
Similarity matrix
55
Scores to distances
Iteration possibilities
Guide tree
Multiple alignment
4Multiple alignment profilesGribskov et al. 1987
i
A C D ? ? ? W Y
0.3 0.1 0 ? ? ? 0.3 0.3
Gap penalties
0.5
1.0
Position dependent gap penalties
5Clustal, ClustalW, ClustalX
- CLUSTAL W/X (Thompson et al., 1994) uses
Neighbour Joining (NJ) algorithm (Saitou and Nei,
1984), widely used in phylogenetic analysis, to
construct a guide tree. - Sequence blocks are represented by profiles, in
which the individual sequences are additionally
weighted according to the branch lengths in the
NJ tree. - Further carefully crafted heuristics include
- (i) local gap penalties
- (ii) automatic selection of the amino acid
substitution matrix, (iii) automatic gap penalty
adjustment - (iv) mechanism to delay alignment of sequences
that appear to be distant at the time they are
considered. - CLUSTAL (W/X) does not allow iteration (Hogeweg
and Hesper, 1984 Corpet, 1988, Gotoh, 1996
Heringa, 1999, 2002)
6Sequence weighingPair-wise alignment quality
versus sequence identity(Vogt et al., JMB 249,
816-831,1995)
7Pair-wise sequence alignment (more than just
string matching)
Global dynamic programming
MDAGSTVILCFVG
Evolution
M D A A S T I L C G S
Amino Acid Exchange Matrix
Search matrix
Gap penalties (open,extension)
MDAGSTVILCFVG-
MDAAST-ILC--GS
8Integrating Primary and Predicted Secondary
Structure data for Multiple Alignment
Victor Simossis Jaap Heringa Centre
for Integrative Bioinformatics VU (IBIVU) Vrije
Universiteit Amsterdam, The Netherlands
9Using secondary structure in multiple alignment
Structure more conserved than sequence
- 10 years SS prediction method development Q3
5 - 10 years MA method development difference in Q3
can be 40
10Using secondary structure in multiple alignment
Secondary structure prediction Q3 76 SS
prediction now good enough(?)
11Secondary structure-induced alignment iteration
12Flavodoxin-cheY multiple alignment Praline with
pre-processing
- 1fx1 -PKALIVYGSTTGNT-EYTAETIARQLANAG-YE
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLF-
DSLEETGAQGRKVACF - FLAV_DESDE MSKVLIVFGSSTGNT-ESIaQKLEELIAAGG-HE
VTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLF-
EEFNRFGLAGRKVAAf - FLAV_DESVH MPKALIVYGSTTGNT-EYTaETIARELADAG-YE
VDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLF-
DSLEETGAQGRKVACf - FLAV_DESSA MSKSLIVYGSTTGNT-ETAaEYVAEAFENKE-ID
VELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLY-
DSLENADLKGKKVSVf - FLAV_DESGI MPKALIVYGSTTGNT-EGVaEAIAKTLNSEG-ME
TTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLY-
EDLDRAGLKDKKVGVf - 2fcr --KIGIFFSTSTGNT-TEVADFIGKTLGA---KA
DAPIDVDDVTDPQALKDYDLLFLGAPTWNTG----ADTERSGTSWDEFLY
DKLPEVDMKDLPVAIF - FLAV_AZOVI -AKIGLFFGSNTGKT-RKVaKSIKKRFDDET-MS
DA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFL-
PKIEGLDFSGKTVALf - FLAV_ENTAG MATIGIFFGSDTGQT-RKVaKLIHQKLDG---IA
DAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFT-
NTLSEADLTGKTVALf - FLAV_ANASP SKKIGLFYGTQTGKT-ESVaEIIRDEFGN---DV
VTLHDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLY-
SELDDVDFNGKLVAYf - FLAV_ECOLI -AITGIFFGSDTGNT-ENIaKMIQKQLGK---DV
ADVHDIAKSS-KEDLEAYDILLLgIPTWYYGE--------AQCDWDDFF-
PTLEEIDFNGKLVALf - 4fxn -MK--IVYWSGTGNT-EKMAELIAKGIIESG-KD
VNTINVSDVNIDELL-NEDILILGCSAMGDEVL-------EESEFEPFI-
EEIS-TKISGKKVALF - FLAV_MEGEL MVE--IVYWSGTGNT-EAMaNEIEAAVKAAG-AD
VESVRFEDTNVDDVA-SKDVILLgCPAMGSEEL-------EDSVVEPFF-
TDLA-PKLKGKKVGLf - FLAV_CLOAB -MKISILYSSKTGKT-ERVaKLIEEGVKRSGNIE
VKTMNLDAVD-KKFLQESEGIIFgTPTYYAN---------ISWEMKKWI-
DESSEFNLEGKLGAAf - 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFN--NV
EEAEDGVDALNKLQAGGYGFVI---SDWNMPNM----------DGLELL-
KTIRADGAMSALPVLM - T
- 1fx1 GCGDS-SY-EYFCGA-VDAIEEKLKNLGAEIVQD
---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI-
------- - FLAV_DESDE ASGDQ-EY-EHFCGA-VPAIEERAKELgATIIAE
---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL-
------- - FLAV_DESVH GCGDS-SY-EYFCGA-VDAIEEKLKNLgAEIVQD
---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI-
-------
13PRALINEUsing secondary structure for alignment
Dynamic programming search matrix
Amino acid exchange weights matrices
MDAGSTVILCFV
HHHCCCEEEEEE
M D A A S T I L C G S
H H H H C C E E E C C
H
H
C
C
E
E
Default
14Flavodoxin-cheYusing predicted secondary structure
1fx1 -PK-ALIVYGSTTGNTEYTAETIARQLANAG-YE
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFD
S-LEETGAQGRKVACF e eeee b
ssshhhhhhhhhhhhhhttt eeeee stt tttttt seeee b
ee sss ee ttthhhhtt ttss tt
eeeee FLAV_DESVH MPK-ALIVYGSTTGNTEYTaETIARELA
DAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDD
FIPLFDS-LEETGAQGRKVACf e eeeeee
hhhhhhhhhhhhhhh eeeeee eeeeee
hhhhhh
eeeee FLAV_DESGI MPK-ALIVYGSTTGNTEGVaEAIAKTLN
SEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQED
FVPLYED-LDRAGLKDKKVGVf e eeeeee
hhhhhhhhhhhhhh eeeeee hhhhhh eeeeeee
hhhhhh
eeeeee FLAV_DESSA MSK-SLIVYGSTTGNTETAaEYVAEAF
ENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQD
DFIPLYDS-LENADLKGKKVSVf
eeeeee hhhhhhhhhhhhhh eeeee
eeeee hhhhhhh h
eeeee FLAV_DESDE MSK-VLIVFGSSTGNTESIaQKLEELIA
AGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDD
FLSLFEE-FNRFGLAGRKVAAf eeee
hhhhhhhhhhhhhh eeeee hhhhhhhhhhheeeee
hhhhhhh hh eeeee 2fcr
--K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVT
DPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKD
LPVAIF eeeee
ssshhhhhhhhhhhhhggg b eeggg s gggggg seeeeeee
stt s s s sthhhhhhhtggg tt
eeeee FLAV_ANASP SKK-IGLFYGTQTGKTESVaEIIRDEFG
ND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSD
WEGLYSE-LDDVDFNGKLVAYf eeeee
hhhhhhhhhhhh eee hhh hhhhhhheeeeee
hhhhhhhhh
eeeeee FLAV_ECOLI -AI-TGIFFGSDTGNTENIaKMIQKQL
GKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA--------QC
DWDDFFPT-LEEIDFNGKLVALf eee
hhhhhhhhhhhh eee hhh hhhhhhheeeee
hhhhh
eeeeee FLAV_AZOVI -AK-IGLFFGSNTGKTRKVaKSIKKRF
DDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENE
SWEEFLPK-IEGLDFSGKTVALf eee
hhhhhhhhhhhhh hhh hhhhhhheeeee
hhhhhhhhh
eeeeee FLAV_ENTAG MAT-IGIFFGSDTGQTRKVaKLIHQKL
DG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYD
SWQEFTNT-LSEADLTGKTVALf eeee
hhhhhhhhhhhh hhh hhhhhhheeeee
hhhhh eeeee 4fxn
----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDV
NIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KIS
GKKVALF eeeee
ssshhhhhhhhhhhhhhhtt eeeettt sttttt seeeeee
btttb ttthhhhhhh hst t tt
eeeee FLAV_MEGEL M---VEIVYWSGTGNTEAMaNEIEAAVK
AAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSV
VEPFFTD-LAP-KLKGKKVGLf
hhhhhhhhhhhhhh eeeee hhhhhhhh eeeee
eeeee FLAV_CLOAB M-K-ISILYSSKTGKTERVaKLIEEGVK
RSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWE
MKKWIDE-SSEFNLEGKLGAAf eee
hhhhhhhhhhhhhh eeeeee hhhhhhhhhh eeee
hhhhhhhhh eeeee 3chy
ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DAL
NKLQAGGYGFVISD---WNMPNM----------DGLELLKTIRADGAMSA
LPVLMV tt eeee s
hhhhhhhhhhhhhht eeeesshh hhhhhhhh eeeee s
sss hhhhhhhhhh ttttt eeee 1fx1
GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD-----------
----------GLRIDGD--PRAARDDIVGWAHDVRGAI--------
eee s ss sstthhhhhhhhhhhttt ee s
eeees gggghhhhhhhhhhhhhh FLAV_
DESVH GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD------
---------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------
- eee hhhhhhhhhhhh
eeeee eeeee
hhhhhhhhhhhhhh FLAV_DESGI GCGDS-SY-TYFCGAVDVI
EKKAEELgATLVAS---------------------SLKIDGE--P--DSA
EVLDwAREVLARV-------- eee
hhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_DESSA
GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD-----------------
----SLKIDGD--P--ERDEIVSwGSGIADKI--------
hhhhhhhhhhhh eeeee
e eee FLAV_DESDE
ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE-----------------
----GLKMEGD--ASNDPEAVASfAEDVLKQL--------
e hhhhhhhhhhhhhh eeeee
ee hhhhhhhhhhh 2fcr
GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSV
RD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------
eee ttt ttsttthhhhhhhhhhhtt eee b gggs
s tteet teesseeeettt ss hhhhhhhhhhhhhhhht FLAV_A
NASP GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYD
FNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------
hhhhhhhhhhhhhh
eeee
hhhhhhhhhhhhhhhh FLAV_ECOLI
GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADD
DHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA
hhhhhhhhhhhhhh eeee
hhhhhhhhhhhhhhhhhh FLAV_AZOVI
GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESS
EAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L--
e hhhhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_ENTA
G GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSF
SAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------
hhhhhhhhhhhhhhh eeee
hhhhhhh hhhhhhhhhhhh 4fxn
G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------
------------PLIVQNE--PDEAEQDCIEFGKKIANI---------
e eesss shhhhhhhhhhhhtt ee s
eeees ggghhhhhhhhhhhht FLAV
_MEGEL G-----SYGWGSGEWMDAWKQRTEDTgATVIGT-----
-----------------AIVNEM--PDNAPE-CKElGEAAAKA-------
-- hhhhhhhhhhh
eeeee eeee h
hhhhhhhh FLAV_CLOAB STANSIA-GGSDIALLTILNHLMVK
-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfG
ERiANkV--KQIF--
hhhhhhhhhhhhhh eeeee
hhhh hhh hhhhhhhhhhhh h 3chy
-----------TAEAKKENIIAAAQAGASGY-------------------
------VVK----P-FTAATLEEKLNKIFEKLGM------
ess hhhhhhhhhtt see
ees s hhhhhhhhhhhhhhht
G
15Iteration
Convergence
Limit cycle
Divergence
16Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
17Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
18Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
19Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
20Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
21Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
22Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
23Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
24Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
25Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
26Secondary structure prediction-based alignment
- Evaluation using the HOMSTRAD database of
structural alignments - Compared to PHD, secondary structure
prediction/MSA iteration improves both alignment
and secondary structure prediction by 3-4 - Iteration can be controlled by a MSA sum-of-pairs
score and secondary structure prediction
consistency score
27Symmetry-derived secondary structure prediction
using multiple sequence alignments (SymSSP)
Victor Simossis Jaap Heringa Centre
for Integrative Bioinformatics VU (IBIVU) Vrije
Universiteit Amsterdam, The Netherlands
28PralineProfile pre-processing
Once a gap, always a gap Use information from
all sequences right from the start Heringa
(1999, 2002), Kleinjung et al. 2002
29Progressive multiple alignment
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Similarity matrix
Scores
55
Guide tree
Multiple alignment
30Progressive multiple alignment
d
1
3
1
3
2
5
1
3
2
5
1
3
2
5
4
31Profile pre-processing
1
Score 1-2
2
1
Score 1-3
3
4
5
Score 4-5
Key Sequence
1
2
1
Pre-alignment
3
4
5
A C D . . Y
1
Pre-profile
Pi Px
32Profile pre-processing
1
Score 1-2
2
1
Score 1-3
3
4
5
Score 4-5
Pre-profiles
Pre-alignments
1
A C D . . Y
1
2
3
4
5
2
A C D . . Y
2
1
3
4
5
5
A C D . . Y
1
5
2
3
4
33Pre-profile alignment
Pre-profiles
1
A C D . . Y
2
A C D . . Y
Final alignment
3
A C D . . Y
1
2
3
4
5
4
A C D . . Y
A C D . . Y
5
34Pre-profile alignment
1
2
1
3
4
5
2
2
1
3
4
Final alignment
5
3
1
1
3
2
2
4
3
5
4
5
4
4
1
2
3
5
5
1
5
2
3
4
35Pre-profile alignmentAlignment consistency
Ala131
1
1
2
1
A131 A131 L133 C126 A131
3
4
5
2
2
1
2
3
4
5
3
1
3
2
4
5
4
4
1
2
5
3
5
5
1
5
2
3
4
36Flavodoxin-cheY consistency scores(prepro0)
1fx1 --7899999999999TEYTAETIARQL8776-66
57777777777777553799VL999ST97775599989-43556667779
8998878AQGRKVACF FLAV_DESVH
-46788999999999TEYTAETIAREL7777-775777777777777755
3799VL999ST97775599989-435566677798998878AQGRKVACF
FLAV_DESDE -47899999999999999999999988776695
658888777777778763YDAVL999SAW987778987775355666666
9777776789GRKVAAF FLAV_DESGI
-46788999999999TEGVAEAIAKTL9997-766788887777778875
39DVVL999ST987776--9889546667776697776557777888888
FLAV_DESSA 936777999999999999999999999887597
65777888888888876399999999STW77765--99995366666777
97998779999999999 4fxn
-8787799999999999999999997766669675677888888888887
77999999988777776--9889577788888897773237888888888
FLAV_MEGEL 9776779999999999999999997777766-6
65666677788899976799999999987777669--8873623344666
95555455778888888 2fcr
--87899999999999TEVADFIGK9965419003000001122333556
79DLLF99999855312888111224555555407777777888888888
FLAV_ANASP -47899LFYGTQTGKTESVAEIIR977765392
2356677777777897779999999999988843--99985557787778
99998879999999999 FLAV_ECOLI
997789999GSDTGNTENIAKMIQ87742229224566788899999955
69999999999755553----99262225555495777767778999999
FLAV_AZOVI --79IGLFFGSNTGKTRKVAKSIK998877596
57577888888999777899999999999877761112222222244555
-5555555778999999 FLAV_ENTAG
94789999999999999999999998755229223234555555555555
688899999998875521111111133477777-7777777999999999
FLAV_CLOAB -86999ILYSSKTGKTERVAK999755555505
7678887888887777765778899998522223--98883422344555
97777777777777777 3chy
01222222233333356666655555552229222222222222211121
63335555755553222888877674533344493332222222222222
Avrg Consist 86677788888888899999999987765548
44455566666666665557888888888766544887666334445566
586666556778888888 Conservation
01255386758489697469639464633430452443554465434735
16658868567554455000000314365446505575435547747759
1fx1 G888799955555559888888888899777-
---7777797787787978---5555555667765556777777788887
99------ FLAV_DESVH G888799955555559888888888
899777----7777797787787978---555555566776555677777
778888799------ FLAV_DESDE
A88878685555555999988888889998879--8777788-9877777
7--8555555554433245667777777777599------ FLAV_DESG
I 87775977755555677777777777777778---88888887
667778777775555555555542424667888887777-------- FL
AV_DESSA 977768777555556777777777777777767887
777777778888-978985555555556536556888888888877----
---- 4fxn 86777755555555266666666655555
55778877679998777779777776655555555554444666666665
55798------ FLAV_MEGEL 8577775666666525556777
77888888868997788898877655867788554433322222221223
3223355557-------- 2fcr
87777357333333377776666777776553333333333333332283
3333333332244444567777777888777633------ FLAV_ANAS
P 9777737753333447778888887777777333344444444
44433833333344444444444455577777788777734------ FL
AV_ECOLI 977743786444444777788888888888833334
44444444444424444455555455577566778888888887773411
0000 FLAV_AZOVI 97776355333333466666667777777
77333344444444444448233335555555555554555888888887
7772311---- FLAV_ENTAG 9777738865555558666666
66677666633333333333333322123333344444444455555665
566666555582------ FLAV_CLOAB
76662722222221244444444445555558788222222222222211
1111122222222222344443333333233399------ 3chy
222227222222224111355431113324578-877789976
66556877776322222222222322222323344444422------ A
vrg Consist 86665656444444466666666666666665666
55555655555556555654444434444433444556666666666668
89999 Conservation 736630574333341634645344447
46710000011010011000000010434744645443225474454448
434301000000 Iteration 0 SP 135136.00 AvSP
10.473 SId 3838 AvSId 0.297
37Flavodoxin-cheY consistency scores (prepro1500)
1fx1 -42444IVYGSTTGNTEYTAETIARQL8866
66666577777775667888DLVLLGCSTW77766----99547666676
9-77888788AQGRKVACFFLAV_DESVH
-34444IVYGSTTGNTEYTAETIAREL77666666657777777566788
8DLVLLGCSTW77766----995476666769-77888788AQGRKVACF
FLAV_DESSA -33444IVYGSTTGNTET999998887776557
77668888899666686YDIVLFGCSTW77777----996466666779-
88SL98ADLKGKKVSVFFLAV_DESGI
-34444IVYGSTTGNTEGVA999999999976555567777788666667
8DVVLLGCSTW77777----995466666779-88887688888KKVGVF
FLAV_DESDE -44777IVFGSSTGNTE9887776666555667
77778899999777777YDAVLFGCSAW88877----997587777779-
8887766777GRKVAAF4fxn
-32222IVYWSGTGNTE8888888876666778888888888NI888858
6DILILGCSA888888------8-8888886--66665378ISGKKVALF
FLAV_MEGEL -12222IVYWSGTGNTEAMA8888888888888
888555555555555485DVILLGCPAMGSE77------572222288--
8888755588GKKVGLF2fcr
-41456IFFSTSTGNTTEVA999998865432222765554443244779
YDLLFLGAPT944411999-111112454441-8DKLPEVDMKDLPVAIF
FLAV_ANASP -00456LFYGTQTGKTESVAEII9877553233
22427776666623589YQYLIIGCPTW55532--999843678W98889
9998888888GKLVAYFFLAV_AZOVI
-42445LFFGSNTGKTRKVAKSIK87777434333536666665467777
YQFLILGTPTLGEG862222222222355558-45666666888KTVALF
FLAV_ENTAG -266IGIFFGSDTGQTRKVAKLIHQKL666466
4424DVRRATR88888SYPVLLLGTPT88888644444444446WQEF8-
8NTLSEADLTGKTVALFFLAV_ECOLI
-51114IFFGSDTGNTENIAKMI987743311111555555588355599
YDILLLGIPT954431----88355225544--44666666779KLVALF
FLAV_CLOAB -63666ILYSSKTGKTERVAKLIE633333333
33333333333366LQESEGIIFGTPTY63--6--------66SWE3333
3333333333GKLGAAF3chy
ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGG
YGFVI---SDWNMPNM----------DGLEL--LKTIRADGAMSALPVLM
Avrg Consist
93344599999999999999999887766555555556666677566678
89999999999767658888775555566668967777677889999999
Conservation 023642867584896974696394646334435
43125645654143443665886856755445500000031446544600
55575345547747759 1fx1
G98879-89-999877977--7788899999999955--88888-99
88887798999777778766553344588776666222266899899FL
AV_DESVH G98879-89-999877977--778889999999995
5--88888-99888877989997777787665533445887766662222
66899899FLAV_DESSA G98878-688688888-88--8899
9999999999979988888887788889-89-978777766675664557
7776666654466899899FLAV_DESGI
G98879-898688888987--788888999GATLV7698899-9998789
888-8899787878776663122477788888333276899899FLAV_
DESDE AS8888-68-888888899--9999999999988888-9
99888889887788978887766688542222122555555553332779
999994fxn GS2228-228222222222--2388888
88888888888888888888888888888888777886676553557755
5533221288888888FLAV_MEGEL
G4888--28-8888882MD--AWKQRTEDTGATVI77-------------
--------77222--224444222222244222112--------2fcr
GLGDA5-8Y5DNFC88-88--887777777777776544
45555555555443855557777744653333577999999875553338
99899FLAV_ANASP GTGDQ5-GY5899999-99--99EEKIS
QRGG9997555554444444443328444446666555555555666667
6666433333899899FLAV_AZOVI
GLGDQ5-885777555-55--55555788888888555555555555555
554855555555555666555555888855555544442--288FLAV_
ENTAG GLGDQL-NYSKNFVSA-MR--ILYDLVIARGACVVG888
8EGYKFSFSAA6664NEFVGLPLDQEN88888EERIDSWLE888422426
88688FLAV_ECOLI GC99549784688888987997777777
77888885544444444444444411444477777445577556778888
8887433322100100FLAV_CLOAB
STANS636666333333333333666666666666666666333336336
6336663333336EDENARIFGERIANKVKQI3333336666663chy
VTAEA---KKENIIAA-----------AQAGAS------
-------------------GYVVK-----PFTAATLEEKLNKIFEKLGM-
----- Avrg Consist
99887797877777777779977888888888888667777777777677
66677777676667766655455577776666433355788788Conse
rvation 74664003715454570630035453444474575300
00010100100000000106837601444423355744544484343010
00000 Iteration 0 SP 136702.00
AvSP 10.654 SId 3955 AvSId 0.308
38Consistency iteration
Pre-profiles
Multiple alignment positional consistency scores
39Pre-profile update iteration
Pre-profiles
Multiple alignment
40Strategies for multiple sequence alignment
- Profile pre-processing
- Secondary structure-induced alignment
- Globalised local alignment
- Matrix extension
- Objective try to avoid (early) errors
41Globalised local alignment
1. Local (SW) alignment (M Po,e)
2. Global (NW) alignment (no M or Po,e)
Double dynamic programming
42M BLOSUM62, Po 0, Pe 0
43M BLOSUM62, Po 12, Pe 1
44M BLOSUM62, Po 60, Pe 5
45Strategies for multiple sequence alignment
- Profile pre-processing
- Secondary structure-induced alignment
- Globalised local alignment
- Matrix extension
- Objective try to avoid (early) errors
46Integrating alignment methods and alignment
information with T-Coffee
- Integrating different pair-wise alignment
techniques (NW, SW, ..) - Combining different multiple alignment methods
(consensus multiple alignment) - Combining sequence alignment methods with
structural alignment techniques - Plug in user knowledge
47Matrix extension
- T-Coffee
- Tree-based Consistency Objective Function For
alignmEnt Evaluation - Cedric Notredame
- Des Higgins
- Jaap Heringa J. Mol. Biol., 302, 205-2172000
48Using different sources of alignment information
Structure alignments
Clustal
Clustal
Dialign
Lalign
Manual
T-Coffee
49Globalised local alignment
1. Local (SW) alignment (M Po,e)
2. Global (NW) alignment (no M or Po,e)
Double dynamic programming
50M BLOSUM62, Po 0, Pe 0
51M BLOSUM62, Po 12, Pe 1
52M BLOSUM62, Po 60, Pe 5
53Strategies for multiple sequence alignment
- Profile pre-processing
- Secondary structure-induced alignment
- Globalised local alignment
- Matrix extension
- Objective try to avoid (early) errors
54Integrating alignment methods and alignment
information with T-Coffee
- Integrating different pair-wise alignment
techniques (NW, SW, ..) - Combining different multiple alignment methods
(consensus multiple alignment) - Combining sequence alignment methods with
structural alignment techniques - Plug in user knowledge
55Matrix extension
- T-Coffee
- Tree-based Consistency Objective Function For
alignmEnt Evaluation - Cedric Notredame
- Des Higgins
- Jaap Heringa J. Mol. Biol., 302, 205-2172000