Title: Protein Structure Alignment
1Protein Structure Alignment
- William R. Taylor and Christine A. Orengo
- J Mol. Biol. (1989) 208. 1-22
Presented byJie BaoDept of Computer
Science Iowa State Universitybaojie_at_cs.iastate.ed
u http//www.cs.iastate.edu/baojie
2Abstract
- Focus of this research find a good similarity
score for two residues in the structures being
compared , and then fo the matching - The score is based on distance plot analysis
- Matching is done by dynamic programming
- Advantage
- Insensitive to insertions and deletions
- No initial seeding is needed
- Test it on samples
- Globins, Calcium-binding proteins ,Rhodanese,
Immunoglobulin domains, Plastocyanin/azurin,
Lysozyme
3Outline
- Problem
- Methods
- Result
- Discussion
4Whats the problem
- Align two sequence based on their structure
position
5Existing Methods
- Least-squares Matthews Rossmann 1985
- Limitation 1 equivalence of positions must be
established before the superposition is preformed - Limitation 2 the displacement of subdomain
within one structure can result in a poor overall
fit between topologically equivalent structures. - Search in rotational space Rossmann etc.
1973-1977 - Comparing all pairs of structural fragments
- Both of them are computationally demanding and
the latter is sensitive to insertion/deletions
6Outline
- Problem
- Methods
- Result
- Discussion
7Dynamic Programming- Basic sequence alignment
- Sequence AlignmentAADADEFGHAADCDEAGH
- Identical pair 2insertion gap penalty g
1Window 4 - Sij Dij max Si1,j1 max Sk,j1 g
, kgti2 max Si1,l g, l gt j2 - Start from lower rightEnd with upper left
- The highest score is found and its inheritance
path is traced back
-1
2
-1
-1
2
2
8Dynamic Programming -Basic interatomic distance
matching(1)
- Basic interatomic distance matching method
- Consider only alpha-carbon atoms of two
structures and compare distance between them - Similarity score s a /(Adij-Bdklb) where
Adij,Bdkl are interatomic distances a
limits the maximum possible score b
preventing division by zero - Score between i in A and k in B Sik Sum-nmn
a/ (Adi,Im-Bdk,kmb) - Based on the score, standard dynamic programming
algorithm can used for alignment of positions
9Dynamic Programming -Basic interatomic distance
matching(2)
- Explanation of the position similarity score
B
A
j
dij
in
kn
i-n
k-n
k
i
The overall score is given by the sum of
individual distance comparisons
10Dynamic Programming -Structural Environment
matching
- The basic method is adequate for matching local
structures - but it was disrupted if the range of comparison
(-n to n) spanned an insertion/deletion
discontinuity - Do DP on Lower level of distance comparison to
produce a best equivalence of position between
two environments.Sik max a/ (Adij-Bdklb) - a 50, b 5
11Dynamic Programming -Alignment dependency
between levels
- Every comparison of the last step will produce an
alignment of matched distances which is alignment
of the two sequence too. - The values along he trace-back path in the lower
level matrix were accumulated in corresponding
elements in the higher level matrix - Apply cutoff on S to prevent the excessive
accumulation of background noise sqrt200N
, N is the length of shorter sequence numbers
lt cutoff are ignored - Advantage large contribution are made only to he
upper matrix for regions that match well in the
lower comparison. - Weakness solely based on interatomic distances
and has the limitation that similar distances
achieve a high score even when these are between
pairs of atoms that might be in completely
different relative directions
12Dynamic Programming -Alignment dependency
between levels (2)
- The method is used at 2 levels
- First to find the best equivalence of distances
for the 2 residues being compares - Then at a higher level to find the best
equivalence of residues within 2 sequence - Gap penalty of 5 is applied
13Dynamic Programming -Vector comparison method
- Solution compare interatomic vector rather than
simple distances - The vector s is defined in the local frame of
reference for every residue. - The Similarity score is changed to s a
/((AVij-BVkl)2b)
- Prepare the X-Y-Z
- X-axis was defined by the n-c vector
- A tentative Y-axis by the Cbeta-H vector
- Z-axis was their mutual perpendicular vector
- Y-axis was redefined as perpendicular to X and Z.
j
dij
in
i
i-n
Vij
14Dynamic Programming -the final method
- The vector-based method uses 3-d distances
- Higher dimensions also could incorporate any data
that can be defined at the residue level. - Nature of the amino acid can be usedSik max
(wDRiRka)/ (Adij-Bdklb) DXY is the value
in the Dayhoff matrix for the exchange of Amino
Acids of type X and Y w is he weight, default
it 1,0 a 40, b 2
15Implementation
- Implemented in SSAP(Structure and squence
alignment program)Written in C , run on VAX-II
under VMSSeparate FROTRAN program was used to
prepare dataData from PDB
CPU Time of run(mintues) SeqLen, Win Time 50
20 5.2 30 10.3 100
30 36.0 60 121.0150
20 36.2 40 130.4
60 269.6
16Outline
- Problem
- Methods
- Result
- Discussion
17Data Set
PDB ID used in the paper/ Current ID
18Results GlobinsMBN, HHB
This Work
4HHB haemoglobinAlpha-chain Beta-chain
5MBN myoglobin
LC
- Compared with Lesk Chothia 1980, conventional
superposition - Result
- alpha and beta chains of 4HHB are the most
similar - s(MBN, beta) gt s(MBN, alpha)
Structural comparison of the globins
19Results Calcium-binding proteinsParvalbumin,
CIB
- Calcium-binding proteins contains two motifs
(helices) C D and (eponymous) E F - Parvalbumin (helices) also contains A B
- The algorithm aligned the correct ion binding
motifs in both structures, ignoring the redundant
motif in parvalbumin (AB)compared with
Cariepy Hodges 1983
20Results - Rhodanese
- Align the two halves of rhodanese
- Compared with Ploegman 1978 least sequence, the
result is identical in all but minor aspects - Graphic observation reveals this work is more
plausible
1RHD, with two similar alternatingbeta/alpha-type
domains
21Results Immunoglobulin domains
3FABAntigen-binding fragment
1FC1 Constant fragment
22Results Immunoglobulin domains
Structure of Immunoglobulin
- Compares each domain against every other
- Produce correct alignment in 9 of 15 comparsions
- Compared with Lesk Chothia 1982
Immunoglobulin heavy H Immunoglobulin Light L
include 6 all-beta domains of two types
constant (C) and variable (V)
23Results - Lysozyme
- Lysozyme (hen egg-white) 6LYZ
- Lysozyme (T4) 1LZM/2LZM
- Compared with Rossmann Argos 1976, 1977 Matthews
1981 - Found the first common helix to be displaced by
one residue form both puvlished comparsions - Aligement of the initial beta-strand agrees with
that of R A 1976 - The reminder of the alignment is the same for
both methods, except for trivial displacements at
the fringes of equivalent blocks
6LYZ
2LZM
24Results Plastocyanin/azurin
- Plastocyanin 3PCY
- Azurin 1AZU
- Compared with Chothia Lesk 1982 and Adman 1985
- Except for some minor insertions and deletions on
the fringes, the alignments agrees with others.
3PCY
1AZU
25Outline
- Problem
- Methods
- Result
- Discussion
26Discussion
- The results demonstrated that this method produce
is equal in quality , and in some cased superior
, to those reported. - This method is insensitive to the displacement of
equivalent substructures - Future work develop statistical criteria for
evaluating the significance of structural
comparisons. - And it can be extended beyond residue level , eg.
Secondary structure