Multiple alignment - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Multiple alignment

Description:

A conserved pattern of hydrophobicity with spacing 2 (that is every second ... A conserved pattern of hydrophobicity with spacing ~4 suggests a (surface) a-helix. ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 22

Provided by: peter217

Category:

more less

Transcript and Presenter's Notes

Title: Multiple alignment

1
Multiple alignment
Peter Højrup Department of Biochemistry
Molecular Biology, SDU, Odense University,
Denmark.
2
Multiple sequence alignment
One amino acid sequence plays coy A pair of
homologous sequences whisper Many aligned
sequences shout out loud.
3
Main applications for MA

Extrapolation
Determine family relationship
Phylogenetic analysis
Reconstruct the history of the protein
Pattern identification
Identify structural/functional important residues
Domain identification
Construct a pattern to find other family members
Structure prediction
A multiple alignment greatly enhances prediction
accuracy
PCR analysis
Find the less degenerate parts of a gene.

4
Definition
Consensus sequence
5
Multiple alignment parameters

AA substitution matrix
Usually one based on global alignment PAM250 or
Gonnet
Gap parameters
Both opening and extension parameters are very
important for the optimal alignment
Alignment order
ClustalX initially performs a pairwise comparison
closest fit is the initial alignment.
Sequence length
Try to align sequences of the same length you
may do an initial dot-plot for alignment regions.

6
Multiple sequence alignment
Rat_CALRETICULIN ----MLLSVPLLLGLLGLAAAD-------
---------------------------PAIYFKEQFLDGDAWTNR-----
----WVESKHKSD--FGKFVL Human_CALRETICULIN
----MLLSVPLLLGLLGLAVAE----------------------------
------PAVYFKEQFLDGDGWTSR---------WIESKHKSD--FGKFVL
RAT_CALNEXIN MEGKWLLCLLLVLGTAAIQAHDGHDDD
MIDIEDDLDDVIEEVEDSKSKSDTSTPPSPKVTYKAPVPTGEVYFADSFD
RGSLSGWILSKAKKDDTDDEIAK Human_CALNEXIN
MEGKWLLCMLLVLGTAIVEAHDGHDDDVIDIEDDLDDVIEEVEDSKPDT-
TAPPSSPKVTYKAPVPTGEVYFADSFDRGTLSGWILSKAKKDDTDDEIAK
. .
.
. .. Prim.cons.
MEGK2LL2V2L2LG22GLAA2DGHDDD2IDIEDDLDDVIEEVEDSK222D
T22P2SP2V22K22222G2V22A2SFDRG2LSGWI2SK2K2DDT222222
Rat_CALRETICULIN SSGKFYGDQEK------DKGLQTSQD
ARFYALSARF-EPFSNKGQTLVVQFTVKHEQNIDCGGGYVKLFPGG--LD
QKDMHGDSEYNIMFGPDICGPGTK Human_CALRETICULIN
SSGKFYGDEEK------DKGLQTSQDARFYALSASF-EPFSNKGQTLVVQ
FTVKHEQNIDCGGGYVKLFPNS--LDQTDMHGDSEYNIMFGPDICGPGTK
RAT_CALNEXIN YDGKWEVDEMKETKLPGDKGLVLMSRA
KHHAISAKLNKPFLFDTKPLIVQYEVNFQNGIECGGAYVKLLSKTSELNL
DQFHDKTPYTIMFGPDKCG-EDY Human_CALNEXIN
YDGKWEVEEMKESKLPGDKGLVLMSRAKHHAISAKLNKPFLFDTKPLIVQ
YEVNFQNGIECGGAYVKLLSKTPELNLDQFHDKTPYTIMFGPDKCG-EDY
. .
. . . ....
.. . Prim.cons.
22GK222DE2KE2KLPGDKGL22222A222A2SAK2N2PF222222L2VQ
22V22222I2CGG2YVKL22KT2EL22D22H2222Y2IMFGPD2CGP222
Rat_CALRETICULIN KVHVIFNYKGKNVLINKDIRCK----
------DDEFTHLYTLIVRPDNTYEVKIDNSQVESGSLEDDWD--FLPPK
KIKDPDAAKPEDWDERAKIDDPTD Human_CALRETICULIN
KVHVIFNYKGKNVLINKDIRCK----------DDEFTHLYTLIVRPDNTY
EVKIDNSQVESGSLEDDWD--FLPPKKIKDPDASKPEDWDERAKIDDPTD
RAT_CALNEXIN KLHFIFRHKNPKTGVYEEKHAKRPDAD
LKTYFTDKKTHLYTLILNPDNSFEILVDQSVVNSGNLLNDMTPPVNPSRE
IEDPEDRKPEDWDERPKIADPDA Human_CALNEXIN
KLHFIFRHKNPKTGIYEEKHAKRPDADLKTYFTDKKTHLYTLILNPDNSF
EILVDQSVVNSGNLLNDMTPPVNPSREIEDPEDRKPEDWDERPKIPDPEA
... . .
. . .
. . Prim.cons.
K2H2IF22K22222I222222KRPDADLKTYF2D22THLYTLI22PDN22
E222D2S2V2SG2L22D22PP22P222I2DP22RKPEDWDER2KIDDPT2
Rat_CALRETICULIN SKPEDWDK------------------
---PEHIPDPDAKKPEDWDEEMDGEWEP-------------------PVI
QNPEYKGEWKPRQIDNPDYKGTWI Human_CALRETICULIN
SKPEDWDK---------------------PEHIPDPDAKKPEDWDEEMDG
EWEP-------------------PVIQNPEYKGEWKPRQIDNPDYKGTWI
RAT_CALNEXIN VKPDDWDEDAPSKIPDEEATKPEGWLDD
EPEYIPDPDAEKPEDWDEDMDGEWEAPQIANPKCESAPGCGVWQRPMIDN
PNYKGKWKPPMIDNPNYQGIWK Human_CALNEXIN
VKPDDWDEDAPAKIPDEEATKPEGWLDDEPEYVPDPDAEKPEDWDEDMDG
EWEAPQIANPRCESAPGCGVWQRPVIDNPNYKGKWKPPMIDNPSYQGIWK

.
. Prim.cons.
2KP2DWD2DAP2KIPDEEATKPEGWLDDEPE2IPDPDA2KPEDWDE2MDG
EWE2PQIANP2CESAPGCGVWQRPVI2NP2YKG2WKP22IDNPDY2G2W2

7
Nomenclature
Rat_CALRETICULIN SSGKFYGDQEK------DKGLQTSQDARF
YALSARF-EPFSNKGQTL Human_CALRETICULIN
SSGKFYGDEEK------DKGLQTSQDARFYALSASF-EPFSNKGQTL RA
T_CALNEXIN YDGKWEVDEMKETKLPGDKGLVLMSRAKHHA
ISAKLNKPFLFDTKPL Human_CALNEXIN
YDGKWEVEEMKESKLPGDKGLVLMSRAKHHAISAKLNKPFLFDTKPL
. .
. . . Prim.cons.
22GK222DE2KE2KLPGDKGL22222A222A2SAK2N2PF222222L
Consensus sequence
8
Mind the gap!
250 260 270
280 290 300

Papain DGVRQVQPYNEGALLYSIANQPVSVVLEAAGKDFQ
LYRGGIFVGPCGNKVDHAVAAVGYG Staphopain
--------I---AILGSRV-E-----S----------RNGMHAGHAMAVV
GN--AKLNNG .
... . . . Prim.cons.
DGVRQVQP2NEGA2L2S22N2PVSVV2EAAGKDFQLYR2G222G22222V
22AVA2222G
Never have islands (widows)
Gaps should be in-frame
Papain YTTTELSYEEVLNDGDVNIPEYVDWRQKGAVTPVKNQ
GSCGSCWAFSAVVTIEGIIKIRT CathLx2
PRKGKVFQEPLFYEA----PRSVDWREKGYVTPVKNQGQCGSCWAFSATG
ALEGQMFRKT CathBx3 PPQRVMFTEDLKLPAS--FDAREQWP
QCPTIKEIRDQGSCGSCWAFGAVEAISDRICIHT Staphopain
---------------------ETQGNN-------------GWCAGYTMSA
LLN-------
. . Prim.cons.
P33333F3E3L333A2VN2P34V2WRQKG3VTPVKNQGSCGSCWAFSAV4
A2EG3I3I3T
9
Structural inferences

The most highly conserved regions are likely to
correspond to the active site.
Regions rich in insertions and deletions probably
correspond to surface loops.
A position containing a conserved Gly or Pro
probably corresponds to a turn.
A conserved pattern of hydrophobicity with
spacing 2 (that is every second residue) with
the intervening residues more variable and
including hydrophilic residues suggests a
b-strand on the surface.
A conserved pattern of hydrophobicity with
spacing 4 suggests a (surface) a-helix.

10
ClustalW/ClustalX

Multiple alignment takes place in three steps
Pairwise alignment of all sequences
Calculating a guide tree
Progressive alignment.

11
Guide tree of hexokinases
12
Hexokinase alignment
13
3D aspects - thioredoxins
Loop
a
Loop
Loop
Active site
Loop
b
14
Thioredoxin
Loop
Loop
a
b
Loop
Loop
Active site
15
Sequence logo from alignment
16
Naming conventions in multiple alignments

Clustal W only use the first word (i.e. never use
white space in name)
Do not use special symbols use underscore _
to connect words
The protein should be understandable in 15
characters (truncation)
All proteins to be aligned needs an individual
name.

17
PSI - BLAST

Position Iterated Blast
For each search round, the aligned results are
used as the basis for calculating a new
substitution matrix.
New iterations can be carried out as long as new
hits are found.
If no results are found in a normal BLAST,
PSI-BLAST will not help.
Check results carefully!!