Title: A New Approach to Identifying Protein Binding Sites
1A New Approach to Identifying Protein Binding
Sites
- R. Zauhar M. Bruist
- Department of Chemistry Biochemistry
- University of the Sciences in Philadelphia
- 600 S. 43rd Street
- Philadelphia, PA 19104
- r.zauhar_at_usip.edu, m.bruist_at_usip.edu
2Identifying Binding Sites a Challenging Problem
- How to gauge surface geometry?
- Curvature?
- More flexible heuristic measures?
- Focus on sequence or on surface representation?
- Surface geometry more fundamental different
sequences may produce similar geometries - However, sequence provides easier route to
comparing/searching for binding sites - How to delineate interesting regions?
- Distance cutoff?
- Clustering approach?
3Outline of our approach
- Use triangulated surface representation
- Compute interactions between surface elements,
using line-of-sight intersection test (occluded
elements have no interaction) flexible form for
interaction term - Derive atom-atom interactions from associated
(neighboring) elements atoms become nodes in an
edge-weighted graph - Cluster atoms to identify surface features
4Triangulation using SMART SMART A
Solvent-Accessible Triangulated Surface Generator
for Molecular Graphics and Boundary Element
Applications, R.J. Zauhar, J. Comp-Aided Mol.
Design., 9, 149-159 (1995). Grid-accelerated
intersection tests borrowed from Shape Signatures
ray-tracing algorithm Shape Signatures A New
Approach to Ligand- and Receptor-Based Molecular
Design, R.J. Zauhar , L.-F.Tian, Z.-J. Li W.J.
Welsh, J. Med. Chem., 46, 5674-5690 (2003).
5We compared two forms for computing surface
interactions
Where Cij is a normal-weighted interaction
(symmetric) Dij is unweighted interaction form
Ai area of surface element i ni unit
normal of element i r length of vector r
connecting elements i and j u unit vector
along r (mu) adjustable exponent
6Features of surface interaction terms
- Normal-weighted term (Cij) maximizes
contributions of element pairs with high mutual
visibility. - Adjustable exponent in denominator can be used to
inversely weight interactions by distance
setting to zero removes distance criterion. - Elements with no line-of-site visibility are
eliminated from consideration, no matter the form
of the interaction term. - 8 Ã… cutoff for all interactions applied to reduce
computational burden.
7Interactions between elements are combined to
define interactions between atoms
Here s and t are indices over atoms, and E(s) is
the set of all surface elements associated with
atom s. The matrix Mst is symmetric, and
expresses the mutual visibility of the atoms s
and t, and perhaps also their distance (depending
on the specific form of the interaction
expression used to compute the Cijs). We note
that Mss 0 (by definition), even though it is
possible for elements associated with the same
atom to interact.
8Clustering Once solvent-accessible atoms are
linked via the matrix Mst, they form the nodes of
a graph with weighted edges (the weights being
given by the coefficients of the matrix). We
adopt a clustering method developed by Pavan and
Pelillo (A New Graph-Theoretic Approach to
Clustering and Segmentation, Proceedings of the
2003 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition), originally
applied to video analysis problems, for the task
of grouping atoms on the basis of mutual
interaction. Pavan Pelillo derive a recursive
algorithm for finding dominant subsets of an
edge-weighted graph, these being collections of
nodes with significantly greater in-group
similarity compared to nodes outside the set.
Dominant sets provide a natural definition of
clusters.
9Clustering (cont) Pavan Pelillo furthermore
demonstrate the equivalence of determining the
dominant sets (clusters) in a graph with an
apparently unrelated problem of finding a fixed
point for a simple dynamical system. We adapt
their approach as follows Let m be the number
of active (solvent-accessible) atoms in the
molecule. Let x be a vector of m real numbers,
with initial value chosen in the interior of the
standard simplex (we use x(0)(1/m,1/m,,/1/m)).
Then the following system is iterated until a
fixed-point solution is located
10Clustering (cont) The preceding dynamical system
is called a replicator equation, and its behavior
is well-understood. Fixed points will be found on
the the standard simplex, and for a given
solution x only a subset of the components of the
solution vector will be non-zero these form the
support of the solution. The nodes (atoms) in the
graph that correspond to the support are in fact
a dominant set, and comprise the set with
greatest weight (informally, the strongest
cluster). The first cluster thus found can be
removed from the system, thus reducing the
dimension of the both the vector x and the
coefficient matrix M. The reduced system is
iterated again to find the next cluster. The
entire process can be repeated until all the
atoms are assigned to clusters.
11Superclustering Our initial calculations
indicated that clusters defined by the process
just described tend to be small, typically
consisting of fewer than ten atoms. In order to
identifying larger clusters, we have extended the
original algorithm to define superclusters. The
method is straightforward - we compute the
interaction between two clusters by using the
original matrix of interactions between atoms
Here A(a) is the set of atoms in cluster a, and K
is a new coefficient matrix that expresses
interactions between clusters.
12Superclustering (cont) A replicator equation is
constructed using the K matrix, and superclusters
are found which relate the atom-based clusters
found by the first clustering procedure.
Superclusters can be re-expressed as atom sets
simply by enumerating the atoms contained in
their component clusters. The process can be
easily repeated to generate clusters of even
higher order however, in this work we have
attempted only one round of the superclustering
procedure.
13Implementation
- Computation of surface interactions using
xGrid/OS X (combination of C command-line tool
and Perl script) - Clustering carried out using C command-line tool
- Visualization using MolMon (OS X modelling tool)
and SYBYL (Tripos, Inc.) - Performance surface interactions for H-Ras
(2,524 atoms) requires 4,922 sec CPU time (lt 10
min when distributed over 10 G5 processors) 180
sec to merge results using Perl script.
Generation of clusters/superclusters requires 713
sec CPU (single G5).
14Initial Application
- Calix4arene, a small crown ether with ion
binding site. Our method identifies the apolar
cavity via both the main cluster and
supercluster. - Human Ras (small GTPase), PDB entry 1CLU. Our
method identifies the nucleotide binding site via
the main supercluster.
Figures follow
15apolar binding site
(b)Triangulated surface, color-coded by surface
interaction
(a) Calix4arene
Fig. 1
16(a) Calix4arene Main cluster (parameters C
interaction form, mu1)
(b) Main supercluster
Fig. 2
17Binding Site
(b) Triangulated color-coded surface
(a) h-Ras (GTP analogue highlighted)
Fig. 3
18Fig. 3(c Detail of Binding Site (with Color
Coding)
19second cluster
main supercluster
ligand
main cluster
ligand
(b) h-Ras main supercluster
(a) h-Ras nucleotide binding site second cluster
Fig. 4
20Discussion
- Although we have only begun to apply this
technique, initial results are clearly
encouraging. - The approach is easy to apply, and involves only
the surface geometry of the molecule considered. - This is currently an ab initio method the
results shown involve no training or pre-existing
knowledge of binding site location. - While computationally intensive, the method
parallelizes well, and the demonstration
calculations were easily carried out on a modest
cluster.
21Future work
- Apply to many more proteins and classes of
binding site. - Develop compact descriptors of clusters/binding
sites that can be easily compared across
proteins. - Optimize selection of surface interaction term
and parameters to produce reliable and
well-delineated site identification. - Explore ways of including electrostatic potential
in site interaction term.