Title: Introduction%20to%20bioinformatics
1Introduction to bioinformatics 2008Lecture 8
Multiple Sequence Alignment (II)
2Progressive multiple alignment
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Scores
Similarity matrix
55
Scores to distances
Iteration possibilities
Guide tree
Multiple alignment
3Progressive alignment strategy
- Perform pair-wise alignments of all of the
sequences (all against all e.g. make N(N-1)/2
alignments) - Use the alignment scores to make a similarity (or
distance) matrix - Use that matrix to produce a guide tree
- Align the sequences successively, guided by the
order and relationships indicated by the tree
(N-1 alignment steps).
4Progressive alignment strategy
- Methods
- Biopat (Hogeweg and Hesper 1984 -- first
integrated method ever) - MULTAL (Taylor 1987)
- DIALIGN (12, Morgenstern 1996)
- PRRP (Gotoh 1996)
- ClustalW (Thompson et al 1994)
- PRALINE (Heringa 1999)
- T-Coffee (Notredame 2000)
- POA (Lee 2002)
- MUSCLE (Edgar 2004)
- PROBSCONS (Do, 2005)
5Flavodoxin fold aligning 13 Flavodoxins cheY
5(??) fold
6Flavodoxin-cheY NJ tree
7Flavodoxin fold helix-beta-helix
8Flavodoxin family - TOPS diagrams
The basic topology of the flavodoxin fold is
given below, the other four TOPS diagrams show
flavodoxin folds with local insertions of
secondary structure elements.
2
3
4
1
2
3
4
5
?-helix ?-strand
1
5
9Flavodoxin-cheY NJ tree
10Flavodoxin-cheY Pre-processing (prepro?1500)
11Clustal, ClustalW, ClustalX
- CLUSTAL W/X (Thompson et al., 1994) uses
Neighbour Joining (NJ) algorithm (Saitou and Nei,
1984), widely used in phylogenetic analysis, to
construct a guide tree (see lecture on
phylogenetic methods). - Sequence blocks are represented by profile, in
which the individual sequences are additionally
weighted according to the branch lengths in the
NJ tree. - Further carefully crafted heuristics include
- (i) local gap penalties
- (ii) automatic selection of the amino acid
substitution matrix, (iii) automatic gap penalty
adjustment - (iv) mechanism to delay alignment of sequences
that appear to be distant at the time they are
considered. - CLUSTAL (W/X) does not allow iteration (Hogeweg
and Hesper, 1984 Corpet, 1988, Gotoh, 1996
Heringa, 1999, 2002)
12ClustalW web-interface
13- CLUSTAL X (1.64b) multiple sequence alignment
Flavodoxin-cheY - 1fx1 -PKALIVYGSTTGNTEYTAETIARQLANAG-Y-E
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
-SLEETGAQGRK - FLAV_DESVH MPKALIVYGSTTGNTEYTAETIARELADAG-Y-E
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
-SLEETGAQGRK - FLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-E
TTVVNVADVTAPGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPLYE
-DLDRAGLKDKK - FLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-D
VELKNVTDVSVADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPLYD
-SLENADLKGKK - FLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-E
VTLLNAADASAENLADGYDAVLFGCSAWGMEDLE------MQDDFLSLFE
-EFNRFGLAGRK - FLAV_CLOAB -MKISILYSSKTGKTERVAKLIEEGVKRSGNI-E
VKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN---------ISWEMKKWID
-ESSEFNLEGKL - FLAV_MEGEL --MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-D
VESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF
-TDLAPKLKGKK - 4fxn ---MKIVYWSGTGNTEKMAELIAKGIIESG-K-D
VNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI
-EEISTKISGKK - FLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT--
--LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS
-ELDDVDFNGKL - FLAV_AZOVI -AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD-
--ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP
-KIEGLDFSGKT - 2fcr --KIGIFFSTSTGNTTEVADFIGKTLGAKADAP-
--IDVDDVTDPQALKD-YDLLFLGAPTWNTGADTERSGT----SWDEFLY
DKLPEVDMKDLP - FLAV_ENTAG MATIGIFFGSDTGQTRKVAKLIHQKLDGIADAP-
--LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN
-TLSEADLTGKT - FLAV_ECOLI -AITGIFFGSDTGNTENIAKMIQKQLGKDVAD--
--VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD-------WDDFFP
-TLEEIDFNGKL - 3chy --ADKELKFLVVDDFSTMRRIVRNLLKELG----
FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG
-LELLKTIR--- - . ... .
.
The secondary structures of 4 sequences are known
and can be used to asses the alignment (red is
?-strand, blue is ?-helix)
14There are problems
- Accuracy is very important !!!!
- Progressive multiple alignment is a greedy
strategy Alignment errors during the
construction of the MSA cannot be repaired
anymore and these errors are propagated into
later progressive steps. - Comparisons of sequences at early steps during
progressive alignment cannot make use of
information from other sequences. - It is only later during the alignment progression
that more information from other sequences (e.g.
through profile representation) becomes employed
in the alignment steps.
15Progressive multiple alignment
Once a gap, always a gap Feng Doolittle, 1987
16Additional strategies for multiple sequence
alignment
- Profile pre-processing (Praline)
- Secondary structure-induced alignment
- Matrix extension
- Objective try to avoid (early) errors
17PRALINE web-interface
18Profile pre-processing
1
Score 1-2
2
1
Score 1-3
3
4
5
Score 4-5
1
Key Sequence
2
1
Pre-alignment
3
4
5
Master-slave (N-to-1) alignment
A C D . . Y
1
Pre-profile
Pi Px
19Pre-profile generation
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Cut-off
Pre-profiles
Pre-alignments
1
A C D . . Y
1
2
3
4
5
2
2
A C D . . Y
1
3
4
5
5
A C D . . Y
1
5
2
3
4
20Pre-profile alignment
Pre-profiles
1
A C D . . Y
2
A C D . . Y
Final alignment
3
A C D . . Y
1
2
3
4
5
4
A C D . . Y
A C D . . Y
5
21Pre-profile alignment
1
2
1
3
4
5
2
2
1
3
4
Final alignment
5
3
1
1
3
2
2
4
3
5
4
5
4
4
1
2
3
5
5
1
5
2
3
4
22Pre-profile alignmentAlignment consistency
Ala131
1
1
2
1
A131 A131 L133 C126 A131
3
4
5
2
2
1
2
3
4
5
3
1
3
2
4
5
4
4
1
2
5
3
5
5
1
5
2
3
4
23PRALINE pre-profile generation
- Idea use the information from all query
sequences to make a pre-profile for each query
sequence that contains information from other
sequences - You can use all sequences in each pre-profile, or
use only those sequences that will probably align
correctly. Incorrectly aligned sequences in the
pre-profiles will increase the noise level. - Select using alignment score only allow
sequences in pre-profiles if their alignment with
the score higher than a given threshold value.
In PRALINE, this threshold is given as
prepro1500 (alignment score threshold value is
1500 see next two slides)
24Reliable sequences for pre-profiles
The curve each time gives the number of pairwise
alignments (y) scoring less than x. The range
1500ltxlt1800 shows a flat section of the curve
that can serve as a natural cut-off point for
admitting sequences into the pre-alignment blocks
25Global pre-processing (prepro?0)
- Preprocessed profile for sequence 2
- 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKADAPIDV
DDVTDPQALKDYDLLFLGAPTWNTGADTERSGTSWDEFLYDKLPEVDMKD
LPVAIFGLGDAEGYPD - 1fx1 KALIVYGSTTGNTEYTAETIARQL-ANAGYEVDS
RDAASVEAFEGFDLVLLGCSTW--GDD---SIELQDDFLFDSLEETGAQG
RKVACFGCGDS-SY-E - 4fxn -MKIVYWSGTGNTEKMAELIAKGISGKDVNTINV
SDVNIDELLNE-DILILGC---SAMGDEVLEESEFEPFIEEISTKISGKK
VALGSYGWGDGKWMRD - FLAV_ANASP KIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDV
SEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNG
KLVAYfGTGDQIGYAD - FLAV_AZOVI KIGLFFGSNTGKTRKVaKSIKKRFDTMSDA-LNV
NRVS-AEDFAQYQFLILgTPTLGPGLSSDCENESWEEFL-PKIEGLDFSG
KTVALfGLGDQVGYPE - FLAV_CLOAB KISILYSSKTGKTERVaKLIEE--GVKRSGNIEV
KDAVDKKFLQESEGIIFgTPTYYANISWEMK--KW----IDESSEFNLEG
KLGAAfSTANAGGSDI - FLAV_DESDE KVLIVFGSSTGNTESIaQKLEELIAA-GGHEVTL
LNAADASALADYDAVLFgCSAWGM-EDLEMQ----DDFLFEEFNRFGLAG
RKVAAfASGDQE-Y-E - FLAV_DESGI KALIVYGSTTGNTEGVaEAIAKTLNSEGTTVVNV
ADVTAPGLAEGYDVVLLgCSTW--GDDEIELQEDFVP-LYEDLDRAGLKD
KKVGVfGCGDS-SY-T - FLAV_DESSA KSLIVYGSTTGNTETAaEYVAEAFENK-EIDVEL
KNVTDVSVANGYDIVLFgCSTW--G---EEEIELQDDFLYDSLENADLKG
KKVSVfGCGDSD-Y-T - FLAV_DESVH KALIVYGSTTGNTEYTaETIAREL-ADAGYEVDS
RDAASVEAFEGFDLVLLgCSTW--GDD---SIELQDDFLFDSLEETGAQG
RKVACfGCGDS-SY-E - FLAV_ECOLI AIGIFFGSDTGNTENIaKMIQKQLG--KDV-ADV
HDISSKEDLEAYDILLLgIPTWYYG----EAQCDWDDF-FPTLEEIDFNG
KLVALfGCGDQEDYAE - FLAV_ENTAG TIGIFFGSDTGQTRKVaKLIHQKLDGIADAPLDV
RRATREQFL-SYPVLLLgTPTLGDGLPGVEAGSSWQEFT-NTLSEADLTG
KTVALfGLGDQLNYSK - FLAV_MEGEL MVEIVYWSGTGNTEAMaNEIEAAVAAGADVSVRF
ED-TNVDDVASKDVILLgCPA--MGSE-ELEDSVVEPFFTDLAPK--LKG
KKVGLfGYGWGSG--- - 3chy KELKFLVVDDFSTRRIVRNLLKELGFNEEAEDGV
DALNKLQA-GGYGFVI---SDWNM---PNMDGL---ELLKTIRADGAMSA
LPVLMV---TAEAKKE - 2fcr NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEES
KSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV - 1fx1 YFCGAVDAIEEKLKNLGA----------------
EIVQD----GLRID--GDPRAARDDIVGWAHDVRGAI--
26Global pre-processing (prepro?0)
- Preprocessed profile for sequence 3
- 4fxn MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTIN
VSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVAL
FGSYGWGDGKWMRDFE - 1fx1 ALIVYGSTTGNTEYTAETIARQLANAGYEVDSRD
AASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
FGSYEYFCGA-VDAIE - 2fcr IGIFFSTSTGNTTEVADFIGKTL--GAKADAPID
VDDVTDPQALKDDLLFLGANTGADTERSGTSWDEFLYDKLPEVDMKDLPV
-AIFGLGDAEGYPDFC - FLAV_ANASP IGLFYGTQTGKTESVaEIIRD---EFGNDVVTLD
VSQAEVTDLNDYQYLIIgCPTWNIGEL-QSDWEGLYSELDVDFNGKLVAY
fGTIGYADNDAIGILE - FLAV_AZOVI IGLFFGSNTGKTRKVaKSIKKRFDDETMS-DALN
VNRVSAEDFAQYQFLILgTPTLGEGELENESWEEFLPKIGLDFSGKTVAL
fGQVGYPEGELYSFFK - FLAV_CLOAB MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMN
LDAVDKKFLQESEGIIFgTPTYYANI--SWEMKKWIDESSENLEGKLGAA
fSTAGGSDIALLTILN - FLAV_DESDE VLIVFGSSTGNTESIaQKLEELIAAGGHEVTLLN
AADASAENLADYDAVLFgCSAWGMEDLEQDDFLSLFEEFNRGLAGRKVAA
fAS---GDQEYVPAIE - FLAV_DESGI ALIVYGSTTGNTEGVaEAIAKTLNSEGMETTVVN
VADVTAPGLAGYDVVLLgCSTWGDDEIEQEDFVPLYEDLDAGLKDKKVGV
fGSYTYFCGA-VDVIE - FLAV_DESSA MSIVYGSTTGNTETAaEYVAEAFENKEIDVELKN
VTDVSVADLGNYDIVLFgCSTWGEEEIEQDDFIPLYDSLNADLKGKKVSV
fGDYTYFCGA-VDAIE - FLAV_DESVH ALIVYGSTTGNTEYTaETIARELADAGYEVDSRD
AASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
fGSYEYFCGA-VDAIE - FLAV_ECOLI TGIFFGSDTGNTENIaKMIQK---QLGKDVADVD
IAKSSKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVAL
fGDYAFCDAGTIRDIE - FLAV_ENTAG IGIFFGSDTGQTRKVaKLIHQK-LDGIADA-PLD
VRRATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVAL
fGNYSKNFVSAMRILY - FLAV_MEGEL VEIVYWSGTGNTEAMaNEIEAAVKAAGADVESVR
FEDTNVDDVASKDVILLgCPAMGSEELEDSVVEPFFTDLAPKLKGKKVGL
fGSYGWGSGEWMDAWK - 3chy DKELKFLVVDDFSTMRRIVRNLLKELG--FNNVE
EAEDGVD-ALNK-LQAGGYGVISDWNMPNMDGLELLKTI--RADGAMSAL
PVLMVTAEAKKENIIA - 4fxn ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKK
IANI - 1fx1 EKLKNLGAEIVQDGLRIDGDPRAARDDIVGWAHD
VRGA
27Pre-profiles (prepro?1500)
1
2
28Pre-profiles (prepro?1500)
13
14
29 Local pre-processing
Local alignments are calculated from high to low
scoring each time the sequence parts
corresponding to a selected local alignment are
blocked such that a next local alignment has to
emerge before or after the earlier selected one
this preserves co-linearity of the local
alignments and assocaited sequence fragments in
the pre-alignments
30Local pre-processing (locprepro?0)
- Preprocessed profile for sequence 2 2fcr
- 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKADAPIDV
DDVTDPQALKDYDLLFLGAPTWNTGADTERSGTSWDEFLYDKLPEVDMKD
LPVAIFGLGDAEGYPD - 1fx1 ...IVYGSTTGNTEYTAETIARQL---ANAGYEV
DDAASVEAFEGFDLVLLGCSTW--GDDSELQ----DDFLFDSLEETGAQG
RKVACFGCGDS-SY-E - 4fxn KI-VYWS-GTGNTEKMAELIAKGIGKDVNT-INV
SDVNIDELLNE-DILILGCSA--MGDEVEES--EFEPF----IEEISTKG
KKVALFGWGDGKGYG- - FLAV_ANASP KIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDV
SEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNG
KLVAYfGTGDQIGYAD - FLAV_AZOVI KIGLFFGSNTGKTRKVaKSIKKTM---SDA-LNV
NRVS-AEDFAQYQFLILgTPTLGEGSDCENE--SWEEFL-PKIEGLDFSG
KTVALfGLGDQVGYPE - FLAV_CLOAB KISILYSSKTGKTERVaKLIEE--GVKRSGNIEV
KDAVDKKFLQESEGIIFgTPTY-------YANISWEKWI-DESSEFNLEG
KLGAAfSTANSAGGSD - FLAV_DESDE KVLIVFGSSTGNTESIaQKLEELIAAAADA--SA
ENLAD-----GYDAVLFgCSAWGM-EDLEMQ----DDFLFEEFNRFGLAG
RKVAAfASGDQE-Y-E - FLAV_DESGI ...IVYGSTTGNTEGVaEAIAKTLNSEGTTVVNV
ADVTAPGLAEGYDVVLLgCSTW--GDDIELQ----EDFLYEDLDRAGLKD
KKVGVfGCGDS-SY-T - FLAV_DESSA ...IVYGSTTGNTETAaEYVAEAFENK---EIDV
ENVTD-VSVADYDIVLFgCSTW--G---EEEIELQDDFLYDSLENADLKG
KKVSVfGCGDSD-Y-T - FLAV_DESVH ...IVYGSTTGNTEYTaETIAREL---ADAGYEV
DDAASVEAFEGFDLVLLgCSTW--GDDSELQ----DDFLFDSLEETGAQG
RKVACfGCGDS-SY-E - FLAV_ECOLI ..GIFFGSDTGNTENIaKMIQKQLG-K-----DV
ADVHDKEDLEAYDILLLgIPTWYYG----EAQCDWDDF-FPTLEEIDFNG
KLVALfGCGDQEDYAE - FLAV_ENTAG .IGIFFGSDTGQTRKVaKLIHQKLDGIADAPLDV
RRATREQFL-SYPVLLLgTPT--LG-DGELPGVSWQEFT-NTLSEADLTG
KTVALfGLGDQLNYSK - FLAV_MEGEL .VEIVYWSGTGNTEAMaNEIEKAAGADVESDTNV
DDV----ASK--DVILLgCPA--MGSE-ELEDSVVEPFFTDLAPK--LKG
KKVGLfGYGWGSG--- - 3chy ..................................
.........................ADKELKFLVVDDFIVRNL----LKE
L-----GFNNVEEAED - 2fcr NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEES
KSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV
31Local pre-processing (locprepro?0)
- Preprocessed profile for sequence 3 4fxn
- 4fxn MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTIN
VSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVAL
FGSYGWGDGKWMRDFE - 1fx1 ..IVYGSTTGNTEYTAETIARQLANAGYEVDSRD
AASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
FGC---GDSSYVDAIE - 2fcr .KIIFFSSTGNTTEVADFIGKTL---GAKADAID
VDDVTDPQALKDDLLFLGAPTTGADT-ERSSWDEFLPEVDMK--DLPVAI
F---GLGDAE------ - FLAV_ANASP ..LFYGTQTGKTESVaEIIRD---EFGNDVVTLD
VSQAEVTDLNDYQYLIIgCPTIGE--L-QSDWEGLYSELDVDFNGKLVAY
fGTIGYADGKWSTDFN - FLAV_AZOVI ..LFFGSNTGKTRKVaKSIKKRFDETMSD--ALN
VNRVSAEDFAQYQFLILgTPTLGEGELNESEFLPKIEGLD--FSGKTVAL
fGQVGYGEGSWSTD-- - FLAV_CLOAB MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMN
LDAVD-KKFLQEEGIIFgTPTMKKWIDESSEFN--LEAfSTANSGSDIAL
LGGVAFGKPK------ - FLAV_DESDE ..IVFGSSTGNTEKLEELIAAG----GHEVTLLN
AADASAENLADYDAVLFgCSAWGMEDLEQDDFLSLFEEFNRGLAGRKVAA
fAS---GDQEY-EHFE - FLAV_DESGI ..IVYGSTTGNTEGVaEAIAKTLNSEGMETTVVN
VADVTAPGLAGYDVVLLgCSTWGDDEIEQEDFVPLYEDLDAGLKDKKVGV
fGC---GDSSYTYDIE - FLAV_DESSA ..IVYGSTTGNTETAaEYVAEAFENKEIDVELKN
VTDVSVADLGNYDIVLFgCSTWGEEEIEQDDFIPLYDSLNADLKGKKVSV
fGC---GDS----DYE - FLAV_DESVH ..IVYGSTTGNTEYTaETIARELADAGYEVDSRD
AASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
fGC---GDSSYVDAIE - FLAV_ECOLI ..IFFGSDTGNTENIaKMIQK---QLGKDV--AD
VHDISKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVAL
fGC---GD---QEDYA - FLAV_ENTAG ..IFFGSDTGQTRKVaKLIHQGIADAPLDVRR--
---ATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVAL
f---GLGDQNYSKNFV - FLAV_MEGEL VEIVYWSGTGNTEAMaNEIEAAVKAAGADVESVR
FEDTNVDDVASKDVILLgCPAMGSEELEDSVVEPFFTDLAPKLKGKKVGL
fGSYGWGSGEWMDAWK - 3chy .RIV......N...LKEL---GFVEEAEDVDALN
ISDPNMDELLRADVLMVTAEAKKENIIAAAQVKPFLEEKLNKIFEK....
................ -
- 4fxn ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKK
IANI
32- CLUSTAL X (1.64b) multiple sequence alignment
Flavodoxin-cheY - 1fx1 -PKALIVYGSTTGNTEYTAETIARQLANAG-Y-E
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
-SLEETGAQGRK - FLAV_DESVH MPKALIVYGSTTGNTEYTAETIARELADAG-Y-E
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
-SLEETGAQGRK - FLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-E
TTVVNVADVTAPGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPLYE
-DLDRAGLKDKK - FLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-D
VELKNVTDVSVADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPLYD
-SLENADLKGKK - FLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-E
VTLLNAADASAENLADGYDAVLFGCSAWGMEDLE------MQDDFLSLFE
-EFNRFGLAGRK - FLAV_CLOAB -MKISILYSSKTGKTERVAKLIEEGVKRSGNI-E
VKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN---------ISWEMKKWID
-ESSEFNLEGKL - FLAV_MEGEL --MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-D
VESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF
-TDLAPKLKGKK - 4fxn ---MKIVYWSGTGNTEKMAELIAKGIIESG-K-D
VNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI
-EEISTKISGKK - FLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT--
--LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS
-ELDDVDFNGKL - FLAV_AZOVI -AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD-
--ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP
-KIEGLDFSGKT - 2fcr --KIGIFFSTSTGNTTEVADFIGKTLGAKADAP-
--IDVDDVTDPQALKD-YDLLFLGAPTWNTGADTERSGT----SWDEFLY
DKLPEVDMKDLP - FLAV_ENTAG MATIGIFFGSDTGQTRKVAKLIHQKLDGIADAP-
--LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN
-TLSEADLTGKT - FLAV_ECOLI -AITGIFFGSDTGNTENIAKMIQKQLGKDVAD--
--VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD-------WDDFFP
-TLEEIDFNGKL - 3chy --ADKELKFLVVDDFSTMRRIVRNLLKELG----
FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG
-LELLKTIR--- - . ... .
.
33Flavodoxin-cheY Pre-processing (prepro?1500)
- 1fx1 -PKALIVYGSTTGNT-EYTAETIARQLANAG-YE
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLF-
DSLEETGAQGRKVACF - FLAV_DESDE MSKVLIVFGSSTGNT-ESIaQKLEELIAAGG-HE
VTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLF-
EEFNRFGLAGRKVAAf - FLAV_DESVH MPKALIVYGSTTGNT-EYTaETIARELADAG-YE
VDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLF-
DSLEETGAQGRKVACf - FLAV_DESSA MSKSLIVYGSTTGNT-ETAaEYVAEAFENKE-ID
VELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLY-
DSLENADLKGKKVSVf - FLAV_DESGI MPKALIVYGSTTGNT-EGVaEAIAKTLNSEG-ME
TTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLY-
EDLDRAGLKDKKVGVf - 2fcr --KIGIFFSTSTGNT-TEVADFIGKTLGA---KA
DAPIDVDDVTDPQALKDYDLLFLGAPTWNTG----ADTERSGTSWDEFLY
DKLPEVDMKDLPVAIF - FLAV_AZOVI -AKIGLFFGSNTGKT-RKVaKSIKKRFDDET-MS
DA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFL-
PKIEGLDFSGKTVALf - FLAV_ENTAG MATIGIFFGSDTGQT-RKVaKLIHQKLDG---IA
DAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFT-
NTLSEADLTGKTVALf - FLAV_ANASP SKKIGLFYGTQTGKT-ESVaEIIRDEFGN---DV
VTLHDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLY-
SELDDVDFNGKLVAYf - FLAV_ECOLI -AITGIFFGSDTGNT-ENIaKMIQKQLGK---DV
ADVHDIAKSS-KEDLEAYDILLLgIPTWYYGE--------AQCDWDDFF-
PTLEEIDFNGKLVALf - 4fxn -MK--IVYWSGTGNT-EKMAELIAKGIIESG-KD
VNTINVSDVNIDELL-NEDILILGCSAMGDEVL-------EESEFEPFI-
EEIS-TKISGKKVALF - FLAV_MEGEL MVE--IVYWSGTGNT-EAMaNEIEAAVKAAG-AD
VESVRFEDTNVDDVA-SKDVILLgCPAMGSEEL-------EDSVVEPFF-
TDLA-PKLKGKKVGLf - FLAV_CLOAB -MKISILYSSKTGKT-ERVaKLIEEGVKRSGNIE
VKTMNLDAVD-KKFLQESEGIIFgTPTYYAN---------ISWEMKKWI-
DESSEFNLEGKLGAAf - 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFN--NV
EEAEDGVDALNKLQAGGYGFVI---SDWNMPNM----------DGLELL-
KTIRADGAMSALPVLM - T
- 1fx1 GCGDS-SY-EYFCGA-VDAIEEKLKNLGAEIVQD
---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI-
------- - FLAV_DESDE ASGDQ-EY-EHFCGA-VPAIEERAKELgATIIAE
---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL-
------- - FLAV_DESVH GCGDS-SY-EYFCGA-VDAIEEKLKNLgAEIVQD
---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI-
-------
34Flavodoxin-cheY Local Pre-processing(locprepro?3
00)
- 1fx1 --PKALIVYGSTTGNTEYTAETIARQLANAGYEV
DSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPL--F
DSLEETGAQGRKVACF - FLAV_DESVH -MPKALIVYGSTTGNTEYTaETIARELADAGYEV
DSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPL--F
DSLEETGAQGRKVACf - FLAV_DESSA -MSKSLIVYGSTTGNTETAaEYVAEAFENKEIDV
ELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPL--Y
DSLENADLKGKKVSVf - FLAV_DESGI -MPKALIVYGSTTGNTEGVaEAIAKTLNSEGMET
TVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPL--Y
EDLDRAGLKDKKVGVf - FLAV_DESDE -MSKVLIVFGSSTGNTESIaQKLEELIAAGGHEV
TLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSL--F
EEFNRFGLAGRKVAAf - 4fxn --MK--IVYWSGTGNTEKMAELIAKGIIESGKDV
NTINVSDVNIDELLN-EDILILGCSAMGDEVL------E-ESEFEPF--I
EEIS-TKISGKKVALF - FLAV_MEGEL -MVE--IVYWSGTGNTEAMaNEIEAAVKAAGADV
ESVRFEDTNVDDVAS-KDVILLgCPAMGSEEL------E-DSVVEPF--F
TDLA-PKLKGKKVGLf - 2fcr ---KIGIFFSTSTGNTTEVADFIGKTLGAKADAP
I--DVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFL-Y
DKLPEVDMKDLPVAIF - FLAV_ANASP -SKKIGLFYGTQTGKTESVaEIIRDEFGNDVVTL
H--DVSQAEV-TDLNDYQYLIIgCPTWNIGEL--------QSDWEGL--Y
SELDDVDFNGKLVAYf - FLAV_AZOVI --AKIGLFFGSNTGKTRKVaKSIKKRFDDETMSD
A-LNVNRVSA-EDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEF--L
PKIEGLDFSGKTVALf - FLAV_ENTAG -MATIGIFFGSDTGQTRKVaKLIHQKLDG--IAD
APLDVRRATR-EQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEF--T
NTLSEADLTGKTVALf - FLAV_ECOLI --AITGIFFGSDTGNTENIaKMIQKQLGKDVADV
H--DIAKSSK-EDLEAYDILLLgIPTWYYGEA--------QCDWDDF--F
PTLEEIDFNGKLVALf - FLAV_CLOAB --MKISILYSSKTGKTERVaKLIEEGVKRSGNIE
VKTMNLDAVDKKFLQESEGIIFgTPTYYA-----------NISWEMKKWI
DESSEFNLEGKLGAAf - 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEE
AEDGVDALNKLQ-AGGYGFVI---SDWNMPNM----------DGLEL--L
KTIRADGAMSALPVLM -
- 1fx1 GCGDS--SY-EYFCGA-VD--AIEEKLKNLGAEI
VQD---------------------GLRID--GDPRAARDDIVGWAHDVRG
AI-------- - FLAV_DESVH GCGDS--SY-EYFCGA-VD--AIEEKLKNLgAEI
VQD---------------------GLRID--GDPRAARDDIVGwAHDVRG
AI-------- - FLAV_DESSA GCGDS--DY-TYFCGA-VD--AIEEKLEKMgAVV
IGD---------------------SLKID--GDPE--RDEIVSwGSGIAD
KI-------- - FLAV_DESGI GCGDS--SY-TYFCGA-VD--VIEKKAEELgATL
VAS---------------------SLKID--GEPD--SAEVLDwAREVLA
RV--------
35Strategies for multiple sequence alignment
- Profile pre-processing
- Secondary structure-induced alignment
(Praline-SS) - Matrix extension
- Objective integrate secondary structure
information to anchor alignments and avoid errors
36Protein structure hierarchical levels
TERTIARY STRUCTURE (fold)
37Why use (predicted) structural information
- Structure more conserved than sequence
- Many structural protein families (e.g. globins)
have family members with very low sequence
similarities. For example, globin sequences
identities can be as low as 10 while still
having an identical fold. - This means that you can still observe equivalent
secondary structures in homologous proteins even
if sequence similarities are extremely low. - But you are dependent on the quality of
prediction methods. For example, secondary
structure prediction is currently at 76
correctness. So, 1 out of 4 predicted amino acids
is still incorrect.
38How to combine secondary structure and amino acid
information
Amino acid substitution matrices
Dynamic programming search matrix
MDAGSTVILCFV
HHHCCCEEEEEE
M D A A S T I L C G S
H H H H C C E E E C C
H
H
C
C
E
E
Default
39In terms of scoring
- So how would you score a profile using this extra
information? - Same way of scoring as before, but you can use
sec. struct. specific substitution scores in
various combinations. - Where does it fit in?
- Very important structure is always more
conserved than sequence so secondary structure
elements can help anchoring the alignments
40Sequences to be aligned
Predict secondary structure
HHHHCCEEECCCEEECCHH HHHCCCCEECCCEEHHH HHHHHHHHHHHH
HCCCEEEE
CCCCCCEECCCEEEECCHH HHHHHCCEEEECCCEECCC
Secondary structure
Align sequences using secondary structure
Multiple alignment
41Using predicted secondary structure
1fx1 -PK-ALIVYGSTTGNTEYTAETIARQLANAG-YE
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFD
S-LEETGAQGRKVACF e eeee b
ssshhhhhhhhhhhhhhttt eeeee stt tttttt seeee b
ee sss ee ttthhhhtt ttss tt
eeeee FLAV_DESVH MPK-ALIVYGSTTGNTEYTaETIARELA
DAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDD
FIPLFDS-LEETGAQGRKVACf e eeeeee
hhhhhhhhhhhhhhh eeeeee eeeeee
hhhhhh
eeeee FLAV_DESGI MPK-ALIVYGSTTGNTEGVaEAIAKTLN
SEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQED
FVPLYED-LDRAGLKDKKVGVf e eeeeee
hhhhhhhhhhhhhh eeeeee hhhhhh eeeeeee
hhhhhh
eeeeee FLAV_DESSA MSK-SLIVYGSTTGNTETAaEYVAEAF
ENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQD
DFIPLYDS-LENADLKGKKVSVf
eeeeee hhhhhhhhhhhhhh eeeee
eeeee hhhhhhh h
eeeee FLAV_DESDE MSK-VLIVFGSSTGNTESIaQKLEELIA
AGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDD
FLSLFEE-FNRFGLAGRKVAAf eeee
hhhhhhhhhhhhhh eeeee hhhhhhhhhhheeeee
hhhhhhh hh eeeee 2fcr
--K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVT
DPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKD
LPVAIF eeeee
ssshhhhhhhhhhhhhggg b eeggg s gggggg seeeeeee
stt s s s sthhhhhhhtggg tt
eeeee FLAV_ANASP SKK-IGLFYGTQTGKTESVaEIIRDEFG
ND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSD
WEGLYSE-LDDVDFNGKLVAYf eeeee
hhhhhhhhhhhh eee hhh hhhhhhheeeeee
hhhhhhhhh
eeeeee FLAV_ECOLI -AI-TGIFFGSDTGNTENIaKMIQKQL
GKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA--------QC
DWDDFFPT-LEEIDFNGKLVALf eee
hhhhhhhhhhhh eee hhh hhhhhhheeeee
hhhhh
eeeeee FLAV_AZOVI -AK-IGLFFGSNTGKTRKVaKSIKKRF
DDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENE
SWEEFLPK-IEGLDFSGKTVALf eee
hhhhhhhhhhhhh hhh hhhhhhheeeee
hhhhhhhhh
eeeeee FLAV_ENTAG MAT-IGIFFGSDTGQTRKVaKLIHQKL
DG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYD
SWQEFTNT-LSEADLTGKTVALf eeee
hhhhhhhhhhhh hhh hhhhhhheeeee
hhhhh eeeee 4fxn
----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDV
NIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KIS
GKKVALF eeeee
ssshhhhhhhhhhhhhhhtt eeeettt sttttt seeeeee
btttb ttthhhhhhh hst t tt
eeeee FLAV_MEGEL M---VEIVYWSGTGNTEAMaNEIEAAVK
AAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSV
VEPFFTD-LAP-KLKGKKVGLf
hhhhhhhhhhhhhh eeeee hhhhhhhh eeeee
eeeee FLAV_CLOAB M-K-ISILYSSKTGKTERVaKLIEEGVK
RSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWE
MKKWIDE-SSEFNLEGKLGAAf eee
hhhhhhhhhhhhhh eeeeee hhhhhhhhhh eeee
hhhhhhhhh eeeee 3chy
ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DAL
NKLQAGGYGFVISD---WNMPNM----------DGLELLKTIRADGAMSA
LPVLMV tt eeee s
hhhhhhhhhhhhhht eeeesshh hhhhhhhh eeeee s
sss hhhhhhhhhh ttttt eeee 1fx1
GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD-----------
----------GLRIDGD--PRAARDDIVGWAHDVRGAI--------
eee s ss sstthhhhhhhhhhhttt ee s
eeees gggghhhhhhhhhhhhhh FLAV_
DESVH GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD------
---------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------
- eee hhhhhhhhhhhh
eeeee eeeee
hhhhhhhhhhhhhh FLAV_DESGI GCGDS-SY-TYFCGAVDVI
EKKAEELgATLVAS---------------------SLKIDGE--P--DSA
EVLDwAREVLARV-------- eee
hhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_DESSA
GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD-----------------
----SLKIDGD--P--ERDEIVSwGSGIADKI--------
hhhhhhhhhhhh eeeee
e eee FLAV_DESDE
ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE-----------------
----GLKMEGD--ASNDPEAVASfAEDVLKQL--------
e hhhhhhhhhhhhhh eeeee
ee hhhhhhhhhhh 2fcr
GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSV
RD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------
eee ttt ttsttthhhhhhhhhhhtt eee b gggs
s tteet teesseeeettt ss hhhhhhhhhhhhhhhht FLAV_A
NASP GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYD
FNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------
hhhhhhhhhhhhhh
eeee
hhhhhhhhhhhhhhhh FLAV_ECOLI
GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADD
DHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA
hhhhhhhhhhhhhh eeee
hhhhhhhhhhhhhhhhhh FLAV_AZOVI
GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESS
EAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L--
e hhhhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_ENTA
G GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSF
SAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------
hhhhhhhhhhhhhhh eeee
hhhhhhh hhhhhhhhhhhh 4fxn
G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------
------------PLIVQNE--PDEAEQDCIEFGKKIANI---------
e eesss shhhhhhhhhhhhtt ee s
eeees ggghhhhhhhhhhhht FLAV
_MEGEL G-----SYGWGSGEWMDAWKQRTEDTgATVIGT-----
-----------------AIVNEM--PDNAPE-CKElGEAAAKA-------
-- hhhhhhhhhhh
eeeee eeee h
hhhhhhhh FLAV_CLOAB STANSIA-GGSDIALLTILNHLMVK
-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfG
ERiANkV--KQIF--
hhhhhhhhhhhhhh eeeee
hhhh hhh hhhhhhhhhhhh h 3chy
-----------TAEAKKENIIAAAQAGASGY-------------------
------VVK----P-FTAATLEEKLNKIFEKLGM------
ess hhhhhhhhhtt see
ees s hhhhhhhhhhhhhhht
G
42Strategies for multiple sequence alignment
- Profile pre-processing
- Secondary structure-induced alignment
- Matrix extension
- Objective try to avoid (early) errors
43Integrating alignment methods and alignment
information with T-Coffee
- Integrating different pair-wise alignment
techniques (NW, SW, ..) - Combining different multiple alignment methods
(consensus multiple alignment) - Combining sequence alignment methods with
structural alignment techniques - Plug in user knowledge
44Matrix extension
- T-Coffee
- Tree-based Consistency Objective Function For
alignmEnt Evaluation - Cedric Notredame (Bioinformatics for dummies)
- Des Higgins
- Jaap Heringa J. Mol. Biol., 302, 205-2172000
45Using different sources of alignment information
Structure alignments
Clustal
Clustal
Dialign
Lalign
Manual
T-Coffee
46T-Coffee library system
Seq1 AA1 Seq2 AA2 Weight 3 V31 5 L33 10 3 V31 6
L34 14 5 L33 6 R35 21 5 l33 6 I36 35
47Matrix extension
2
1
3
1
4
1
3
2
4
2
4
3
48Search matrix extension alignment transitivity
49T-Coffee
Other sequences
Direct alignment
50Search matrix extension
51T-COFFEE web-interface
523D-COFFEE
- Computes structural based alignments
- Structures associated with the sequences are
retrieved and the information is used to optimise
the MSA - More accurate but for many (many) proteins we
do not have the structure!
53but.....
- T-COFFEE (V1.23) multiple sequence alignment
- Flavodoxin-cheY
- 1fx1 ----PKALIVYGSTTGNTEYTAETIARQLANAG-
YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIP
L-FDSLEETGAQGRK----- - FLAV_DESVH ---MPKALIVYGSTTGNTEYTAETIARELADAG-
YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIP
L-FDSLEETGAQGRK----- - FLAV_DESGI ---MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-
METTVVNVADVT-APGLAEGYDVVLLGCSTWGDDEIE------LQEDFVP
L-YEDLDRAGLKDKK----- - FLAV_DESSA ---MSKSLIVYGSTTGNTETAAEYVAEAFENKE-
IDVELKNVTDVS-VADLGNGYDIVLFGCSTWGEEEIE------LQDDFIP
L-YDSLENADLKGKK----- - FLAV_DESDE ---MSKVLIVFGSSTGNTESIAQKLEELIAAGG-
HEVTLLNAADAS-AENLADGYDAVLFGCSAWGMEDLE------MQDDFLS
L-FEEFNRFGLAGRK----- - 4fxn ------MKIVYWSGTGNTEKMAELIAKGIIESG-
KDVNTINVSDVN-IDELL-NEDILILGCSAMGDEVLE-------ESEFEP
F-IEEIS-TKISGKK----- - FLAV_MEGEL -----MVEIVYWSGTGNTEAMANEIEAAVKAAG-
ADVESVRFEDTN-VDDVA-SKDVILLGCPAMGSEELE-------DSVVEP
F-FTDLA-PKLKGKK----- - FLAV_CLOAB ----MKISILYSSKTGKTERVAKLIEEGVKRSGN
IEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN---------ISWEMKK
W-IDESSEFNLEGKL----- - 2fcr -----KIGIFFSTSTGNTTEVADFIGKTLGAKA-
--DAPIDVDDVTDPQAL-KDYDLLFLGAPTWNTGA----DTERSGTSWDE
FLYDKLPEVDMKDLP----- - FLAV_ENTAG ---MATIGIFFGSDTGQTRKVAKLIHQKLDGIA-
--DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQE
F-TNTLSEADLTGKT----- - FLAV_ANASP ---SKKIGLFYGTQTGKTESVAEIIRDEFGNDV-
--VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL--------QSDWEG
L-YSELDDVDFNGKL----- - FLAV_AZOVI ----AKIGLFFGSNTGKTRKVAKSIKKRFDDET-
M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEE
F-LPKIEGLDFSGKT----- - FLAV_ECOLI ----AITGIFFGSDTGNTENIAKMIQKQLGKDV-
--ADVHDIAKSS-KEDL-EAYDILLLGIPTWYYGEA--------QCDWDD
F-FPTLEEIDFNGKL----- - 3chy ADKELKFLVVD--DFSTMRRIVRNLLKELGFN-N
VE-EAEDGVDALNKLQ-AGGYGFVISDWNMPNMDGLE-------------
-LLKTIRADGAMSALPVLMV - . . . .
-
54Multiple alignment methods
- Multi-dimensional dynamic programminggt extension
of pairwise sequence alignment. - Progressive alignmentgt incorporates phylogenetic
information to guide the alignment process - Iterative alignmentgt correct for problems with
progressive alignment by repeatedly realigning
subgroups of sequence
55Iteration
Iteration can help in cases where one can learn
from the data produced in a preceding step, so
that the next step can be taken in a more
informed way.
Convergence
Limit cycle
Divergence
56Pre-profile alignmentAlignment consistency
Ala131
1
1
2
1
A131 A131 L133 C126 A131
3
4
5
2
2
1
2
3
4
5
3
1
3
2
4
5
4
4
1
2
5
3
5
5
1
5
2
3
4
57Flavodoxin-cheY consistency scores(PRALINE
prepro0)
Completely consistently aligned amino acids
1fx1 --7899999999999TEYTAETIARQL8776-66
57777777777777553799VL999ST97775599989-43556667779
8998878AQGRKVACF FLAV_DESVH
-46788999999999TEYTAETIAREL7777-775777777777777755
3799VL999ST97775599989-435566677798998878AQGRKVACF
FLAV_DESDE -47899999999999999999999988776695
658888777777778763YDAVL999SAW987778987775355666666
9777776789GRKVAAF FLAV_DESGI
-46788999999999TEGVAEAIAKTL9997-766788887777778875
39DVVL999ST987776--9889546667776697776557777888888
FLAV_DESSA 936777999999999999999999999887597
65777888888888876399999999STW77765--99995366666777
97998779999999999 4fxn
-8787799999999999999999997766669675677888888888887
77999999988777776--9889577788888897773237888888888
FLAV_MEGEL 9776779999999999999999997777766-6
65666677788899976799999999987777669--8873623344666
95555455778888888 2fcr
--87899999999999TEVADFIGK9965419003000001122333556
79DLLF99999855312888111224555555407777777888888888
FLAV_ANASP -47899LFYGTQTGKTESVAEIIR977765392
2356677777777897779999999999988843--99985557787778
99998879999999999 FLAV_ECOLI
997789999GSDTGNTENIAKMIQ87742229224566788899999955
69999999999755553----99262225555495777767778999999
FLAV_AZOVI --79IGLFFGSNTGKTRKVAKSIK998877596
57577888888999777899999999999877761112222222244555
-5555555778999999 FLAV_ENTAG
94789999999999999999999998755229223234555555555555
688899999998875521111111133477777-7777777999999999
FLAV_CLOAB -86999ILYSSKTGKTERVAK999755555505
7678887888887777765778899998522223--98883422344555
97777777777777777 3chy
01222222233333356666655555552229222222222222211121
63335555755553222888877674533344493332222222222222
Avrg Consist 86677788888888899999999987765548
44455566666666665557888888888766544887666334445566
586666556778888888 Conservation
01255386758489697469639464633430452443554465434735
16658868567554455000000314365446505575435547747759
1fx1 G888799955555559888888888899777-
---7777797787787978---5555555667765556777777788887
99------ FLAV_DESVH G888799955555559888888888
899777----7777797787787978---555555566776555677777
778888799------ FLAV_DESDE
A88878685555555999988888889998879--8777788-9877777
7--8555555554433245667777777777599------ FLAV_DESG
I 87775977755555677777777777777778---88888887
667778777775555555555542424667888887777-------- FL
AV_DESSA 977768777555556777777777777777767887
777777778888-978985555555556536556888888888877----
---- 4fxn 86777755555555266666666655555
55778877679998777779777776655555555554444666666665
55798------ FLAV_MEGEL 8577775666666525556777
77888888868997788898877655867788554433322222221223
3223355557-------- 2fcr
87777357333333377776666777776553333333333333332283
3333333332244444567777777888777633------ FLAV_ANAS
P 9777737753333447778888887777777333344444444
44433833333344444444444455577777788777734------ FL
AV_ECOLI 977743786444444777788888888888833334
44444444444424444455555455577566778888888887773411
0000 FLAV_AZOVI 97776355333333466666667777777
77333344444444444448233335555555555554555888888887
7772311---- FLAV_ENTAG 9777738865555558666666
66677666633333333333333322123333344444444455555665
566666555582------ FLAV_CLOAB
76662722222221244444444445555558788222222222222211
1111122222222222344443333333233399------ 3chy
222227222222224111355431113324578-877789976
66556877776322222222222322222323344444422------ A
vrg Consist 86665656444444466666666666666665666
55555655555556555654444434444433444556666666666668
89999 Conservation 736630574333341634645344447
46710000011010011000000010434744645443225474454448
434301000000 Iteration 0 SP 135136.00 AvSP
10.473 SId 3838 AvSId 0.297
Consistency values are scored from 0 to 10 the
value 10 is represented by the corresponding
amino acid (red)
58Flavodoxin-cheY consistency scores (PRALINE
prepro1500)
1fx1 -42444IVYGSTTGNTEYTAETIARQL8866
66666577777775667888DLVLLGCSTW77766----99547666676
9-77888788AQGRKVACFFLAV_DESVH
-34444IVYGSTTGNTEYTAETIAREL77666666657777777566788
8DLVLLGCSTW77766----995476666769-77888788AQGRKVACF
FLAV_DESSA -33444IVYGSTTGNTET999998887776557
77668888899666686YDIVLFGCSTW77777----996466666779-
88SL98ADLKGKKVSVFFLAV_DESGI
-34444IVYGSTTGNTEGVA999999999976555567777788666667
8DVVLLGCSTW77777----995466666779-88887688888KKVGVF
FLAV_DESDE -44777IVFGSSTGNTE9887776666555667
77778899999777777YDAVLFGCSAW88877----997587777779-
8887766777GRKVAAF4fxn
-32222IVYWSGTGNTE8888888876666778888888888NI888858
6DILILGCSA888888------8-8888886--66665378ISGKKVALF
FLAV_MEGEL -12222IVYWSGTGNTEAMA8888888888888
888555555555555485DVILLGCPAMGSE77------572222288--
8888755588GKKVGLF2fcr
-41456IFFSTSTGNTTEVA999998865432222765554443244779
YDLLFLGAPT944411999-111112454441-8DKLPEVDMKDLPVAIF
FLAV_ANASP -00456LFYGTQTGKTESVAEII9877553233
22427776666623589YQYLIIGCPTW55532--999843678W98889
9998888888GKLVAYFFLAV_AZOVI
-42445LFFGSNTGKTRKVAKSIK87777434333536666665467777
YQFLILGTPTLGEG862222222222355558-45666666888KTVALF
FLAV_ENTAG -266IGIFFGSDTGQTRKVAKLIHQKL666466
4424DVRRATR88888SYPVLLLGTPT88888644444444446WQEF8-
8NTLSEADLTGKTVALFFLAV_ECOLI
-51114IFFGSDTGNTENIAKMI987743311111555555588355599
YDILLLGIPT954431----88355225544--44666666779KLVALF
FLAV_CLOAB -63666ILYSSKTGKTERVAKLIE633333333
33333333333366LQESEGIIFGTPTY63--6--------66SWE3333
3333333333GKLGAAF3chy
ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGG
YGFVI---SDWNMPNM----------DGLEL--LKTIRADGAMSALPVLM
Avrg Consist
93344599999999999999999887766555555556666677566678
89999999999767658888775555566668967777677889999999
Conservation 023642867584896974696394646334435
43125645654143443665886856755445500000031446544600
55575345547747759 1fx1
G98879-89-999877977--7788899999999955--88888-99
88887798999777778766553344588776666222266899899FL
AV_DESVH G98879-89-999877977--778889999999995
5--88888-99888877989997777787665533445887766662222
66899899FLAV_DESSA G98878-688688888-88--8899
9999999999979988888887788889-89-978777766675664557
7776666654466899899FLAV_DESGI
G98879-898688888987--788888999GATLV7698899-9998789
888-8899787878776663122477788888333276899899FLAV_
DESDE AS8888-68-888888899--9999999999988888-9
99888889887788978887766688542222122555555553332779
999994fxn GS2228-228222222222--2388888
88888888888888888888888888888888777886676553557755
5533221288888888FLAV_MEGEL
G4888--28-8888882MD--AWKQRTEDTGATVI77-------------
--------77222--224444222222244222112--------2fcr
GLGDA5-8Y5DNFC88-88--887777777777776544
45555555555443855557777744653333577999999875553338
99899FLAV_ANASP GTGDQ5-GY5899999-99--99EEKIS
QRGG9997555554444444443328444446666555555555666667
6666433333899899FLAV_AZOVI
GLGDQ5-885777555-55--55555788888888555555555555555
554855555555555666555555888855555544442--288FLAV_
ENTAG GLGDQL-NYSKNFVSA-MR--ILYDLVIARGACVVG888
8EGYKFSFSAA6664NEFVGLPLDQEN88888EERIDSWLE888422426
88688FLAV_ECOLI GC99549784688888987997777777
77888885544444444444444411444477777445577556778888
8887433322100100FLAV_CLOAB
STANS636666333333333333666666666666666666333336336
6336663333336EDENARIFGERIANKVKQI3333336666663chy
VTAEA---KKENIIAA-----------AQAGAS------
-------------------GYVVK-----PFTAATLEEKLNKIFEKLGM-
----- Avrg Consist
99887797877777777779977888888888888667777777777677
66677777676667766655455577776666433355788788Conse
rvation 74664003715454570630035453444474575300
00010100100000000106837601444423355744544484343010
00000 Iteration 0 SP 136702.00
AvSP 10.654 SId 3955 AvSId 0.308
Consistency values are scored from 0 to 10 the
value 10 is represented by the corresponding
amino acid (red)
59Consistency iteration
Pre-profiles
Multiple alignment positional consistency scores
60Pre-profile update iteration
Pre-profiles
Multiple alignment
61Iterate similarity matrix, guide tree and MSA
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Similarity matrix
Scores
This way of iterating was already implemented in
1984 by Hogeweg and Hesper
55
Guide tree
Multiple alignment
62Secondary structure-induced alignment
63PRALINEUsing secondary structure for alignment
Dynamic programming search matrix
Amino acid exchange weights matrices
MDAGSTVILCFV
HHHCCCEEEEEE
M D A A S T I L C G S
H H H H C C E E E C C
H
H
C
C
E
E
Default
64Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
65Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
66Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
67Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
68Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
69Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
70Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH