Title: An efficient Docking Method to Study Protein Interactions
1An efficient Docking Method to Study Protein
Interactions  Yuhua Duan1,2,3, Boojala Reddy2,
David Breslauer4 and Yiannis Kaznessis1,2Â 1Depart
ment of Chemical Engineering and Materials
Science, 2Digital Technology Center, 3Army
High-Performance Computing and Research Center,
University of Minnesota, Minneapolis, MN 55455
4Department of Bioengineering, University of
California San Diego
Results and Discussion
Residue Conservation Filter
Introduction
Homologous sequences Using the FASTA3 sequence
similarity search tool we obtained homologous
sequences from an annotated non redundant protein
sequence data base (SWALL). Homologous sequences
with less than 30 gaps in the sequence and
greater than 35 sequence identity to the parent
sequence were used for analysis. Evolutionary
Distance Evolutionary distance among the
sequences is calculated using the structure based
amino acid substitution matrix7. A similarity
score Sii for sequence i is calculated by summing
the identical substitution values of the residues
a and b from the substitution matrix M(a,b). An
evolutionary distance (EDij) between the two
sequences is calculated
- As an example, Fig.1 gives scatter plots of the
best 1000 ranked model structures versus the RMSD
of the model to the experimental structure for
complex 1TAB. Ranking based on the
shape-complementarity (Fig.1(a)), pair-potential
score functions (Fig.1(b)) and the minimized
energy of all the model structures using CHARMM
(Fig.1(c)). - It has been noted from our previous studies that
a significantly high number of conserved
positions are present in the naturally occurring
and functionally important interacting regions of
protein complexes5. However, in the case of
antigen-antibody complexes we have observed that
the region with non-conserved positions is
involved in interaction. Antibodies are made with
appropriate variability to interact with the
antigen and this is not a naturally occurring
protein-protein interaction, which explains our
finding. - We have identified the top 8 (group 1) and top
17 (group 2) of highly conserved and
well-exposed surface residues as two groups, in
each polypeptide chain of the interacting
complex. These residues are given in Table 2 for
1tabEI complex as an example. We have then
counted the total number of group 1 and group 2
positions in each modeled complex interface
region. - In Table 3 we summarize results from the docking
analysis for all the 6 systems.
We have employed docking calculations and
atomistic simulations to determine the structure
and the binding affinity of protein-protein
complexes. By exploring the interaction
interface, we find that the conservation
information can improve the docking rank. Here we
present our docking studies for five complex
structures. With this procedure, we are
participating in CPARI competition rounds 4 and
5.
Docking Procedure and Energy Minimization
- We have chosen five protein complex structures
(1TAB, 1EFU, 1FIN, 1JHL, 1KXQ) from the benchmark
structures suggested by Chen et al1. - For each protein complex, we employ docking
calculations using FTDOCK package2,3 to get
10,000 possible complexes and we obtained the
shape complementarity rank and pair potential
rank. - For each possible complex, using CHARMM
molecular mechanics simulations4 we minimized
the side-chain structure, and obtained an
estimate of the free energy for the generated
complexes. - With the weights, we computed an overall rank for
each docked complex. - Applied the residue conservation filter to
improve the rank5,6.
Conclusion
Free Energy Filter
- We described the considerable improvement in
ranking of the FTDOCK generated model complexes
using the residue conservation filter. Using
conservation information we significantly reduce
the number of docking solutions. - We also achieve ranking improvement for low RMSD
structures, simply incorporating linear
combinations of ranks of shape complementarity,
pair potential, CHARMM energy, and conserved
positions. - As we determine residue conservation in the
functionally interacting natural proteins, such
as enzyme-inhibitor complexes, we need to give
higher ranks for the models with higher number of
conserved positions in the interface region. In
the case of unnatural interactions such as
antigen-antibody complexes the interacting
regions are highly variable, and we need to give
higher ranks for the models with low numbers of
conserved positions.
With some approximation, the free energy change
can be divided into several terms ?G ?Ges
?Gcav ?Gbonding ... ?Gcoulomb
?Gpol SskAk ?Gbonding The individual terms
can be calculated separately. ?Gcoulomb and
?Gpol are calculated by the Generalized Born
model with the Debye-Huckel approximation
Conservation Index of Residue Position As
described above evolutionary distances between
the reference sequence and its homologues were
used to calculate residue conservation index
(CIl) for each position l using amino acid
substitution matrix, similar to the amino acid
conservation used by Valdar and
Thornton8.Conservation Index (CIl) is a
weighted sum of all pair wise similarities
between all residues present at the position. The
CIl value is calculated in a given alignment and
takes a value in the range 0.0 to 1.0.
Where N is the number of homologous sequences in
the alignment si(l) and sj(l) are the amino
acids at the alignment position l of sequences si
and sj respectively ED(si) and ED(sj) are the
average evolutionary distance of s(i) and s(j)
from the remaining homologues. Mut(a,b) measures
the similarity among the amino acids a and b as
derived from amino acid substitution matrix
M(a,b) and defined as
Current and Future Work
- We have used the group 1 and group 2 conservation
positions as a filter to reduce the total number
of docked models. We selected only the models,
which have at least 4 of group 1 positions and 6
of group 2 positions in the interface region of
the enzyme-inhibitor model complexes. In the case
of antigen-antibody complexes (1JHL, 1KXQ) we
have reversed the selection, limiting to 2 or
less group1 positions and 4 or less group2
positions. With the conservation positions
filter we reduced the number of complexes by
about 55 to 88 (see Reduced column in Table 3).
- In Fig. 2 we have plotted the RMSD versus model
rank for the remaining models after using the
conservation positions filter for 1TAB. In Table
3 we summarized these results for all the six
systems. - When we compare Fig 1 with Fig 2 and the
corresponding rows in the Table 3, when filter is
on we still select all of the low RMSD models
plus we also obtain many additional low RMSD
models. This can be clearly seen by comparing
Fig. 1(a)-(d) with Fig. 2(a)-(d) of 1TAB. This
shows that conservation filter not only decreases
the number of possible docked structures but also
improves the ranking of the low RMSD models.
- Optimizing the weights for each rank property to
come up a global rank by working on a larger data
sample. - Dissecting the structures of known
repressor-operator complexes we use
computationally efficient simulations to
calculate the binding affinity of
repressor-operator complexes and identify the
protein residues that play a central role in
binding and are amenable to mutations.
where fGB(rij2aij2e-D)1/2, aij(aiaj)1/2,
Drij2/(2aij)2, ai is the effective Born radius
of atom i
- The desolvation term SskAk can be obtained by
calculating the solvent-accessible-surface-area
Ak for each residue k and the optimizing weight
sk - The bonding term ?Gbonding can be expressed with
by using self-consistent Lennard-Jones 12-6
parameters (e, s ) which have been used in AMBER
and CHARMM software with the form
a,b are the pairs of amino acids at a given
alignment position l. M(a,b)low is the lowest
value in the substitution matrix and M(a,b)max is
the maximum value among all the possible
substitution pairs in that position. Thus the
Mut(a,b) takes a value in the range 0 to 1.
Solvent accessible contact area (SACA) values
were used to identify surface residues and buried
residues. We have identified the top 8 and 17
of highly conserved residues, which have solvent
accessibility greater than 25 of their total
surface area. As an example, in Table 2, we
listed the highly conserved surface residues of
complex 1TABs E and I chains.
.
References
- Chen, R, et al., Proteins. 52, 88-91(2003).
- Gabb, H.A. et al., J. Mol. Biol. 272,
109-120(1997). - Moont, G. Et al., Proteins. 35, 364-373(1999).
- Brooks, B.R., et al., J. Comp. Chem. 4,
187-217(1983). - Reddy,B.V.B., et al., Submitted to ISMB 2004.
- Duan, Y., et al., To be published.
- Gonnet, G.H., et al, Science 256,
1443-1445(1992). - Valdar, W.S., et al., Proteins. 42, 108-124(2001).
- By studying the residue conservation in each
sequence of heterocomplex structures of
interacting proteins as a filter we improved our
docking results. - From the binding affinity calculation, ?G -RT
lnkb, we can get the binding constant kb for the
protein-protein system. By substituting residues
in certain proteins, combining with molecular
simulation (CHARMM), we plan to obtain the free
energy change (??G) which could be strongly
related to mutation experiments (the work is in
progress).
1FIN
Acknowledgements
1VIN
1HCL
This work is partially supported by the Army High
Performance Computing Center (AHPCRC) under the
auspices of the Department of the Army, Army
Research Laboratory (ARL) under contract number
DAAD19-01-2-0014. We also thanks the University
of Minnesota Digital Technology Center for
support.
docking
Docked Cyclin-Dependent Kinase 2 Complex(1FIN)
from 1HCL 1VIN, the smallest RMSD we get is
0.41A with rank 2.