Title: Introduction
1Introduction Shape Signatures1 is a method for
compactly encoding the shapes of molecules (or re
ceptor sites), and also their electrostatic
properties. The technique uses ray-tracing to
explore the volume enclosed by a molecular
surface. We begin with the triangulated
solvent-accessible molecular surface2, pick a
point on the surface at random to initiate the
ray, and then allow it to propagate by the laws
of optical reflection. Fig. 1 illustrates ray
propagation, and Fig. 2 shows a ray-trace for a
small molecule, in this case a protease
inhibitor. The raw data of a ray-trace is a c
ollection of line segments connecting reflection
points. From this information, we derive
probability distributions, expressed as
histograms, which encode information about the
shape of the molecule. The simplest of these
shape signatures is just the distribution of
observed segment lengths. Fig. 3 shows this 1D
signature for the protease inhibitor. ( 1D
indicates a one-dimensional domain for the
histogram, here the segment length.)
Signatures converge rapidly as the number of seg
ments increases, and are independent of the
starting point for the ray-trace.
2Fig. 1. Reflection from a single surface element.
The normal vector to the surface bisects the
reflection angle
Fig. 2. Ray-traces for the HIV protease inhibitor
indinavir, with 100 and 10,000 reflections. As
the number of reflections increases, the volume
of the molecule is densely filled with rays.
l1
l2
Indinavir
10000 reflections
100 reflections
3Fig. 3. 1D shape signature for Indinavir
(Sulfurous acid)
Fig. 4. Comparing signatures for two compounds.
Here two small compounds with similar shape but
very different connectivity are compared.
D 0.082
(Aziridine)
4 By computing the molecular electrostatic potenti
al (MEP) at each reflection point, we can extend
the signatures to include information about both
shape and polarity. Fig. 5 shows a 2D-MEP
signature for the protease inhibitor. Here the
vertical axis represents the MEP measured at a
reflection point, the horizontal axis the sum of
the segment lengths on either side of a
reflection, and the color coded distribution is
the probability for simultaneously observing
given values of these two parameters at a
reflection point.
(a)
(b)
Fig. 5. (a) MEP is computed at the reflection
point between segments 1 and 2. (b) 2D-MEP
signature for the HIV protease inhibitor
indinavir.
5Database Searching Shape Signatures are compare
d using simple metrics. One approach, shown in
Fig. 4, is to merely take the sum of the
differences between histogram heights, measured
at corresponding bins in the domain. The
smaller this distance between two signatures, the
greater the similarity we expect to observe in
the molecules the signatures represent.
Our strategy is to augment a chemical database
with shape signatures, and to then screen the
database for compounds of interest by comparing
the signature of a query compound against all the
molecules in the library. While generating the
signatures involves a significant computational
expense, comparing them is computationally
trivial, and the generation step need be done
only once for a given database!
As an initial test, we compared each compound in
the Tripos3 small-molecule database against all
the other compounds in the database using the
simple 1-D signatures. This database is
chemically diverse, containing amino acids,
carbohydrates, heterocycles, fatty acids, etc.
Fig. 6 illustrates the power of the method in
discriminating among molecules on the basis of
shape.
6Fig. 6. Comparison of Tripos small molecule
database against itself (selected results)
7Application to Estrogenic Compounds
It is now widely recognized that chemical compou
nds that can mimic the biological effects of sex
hormones can pose significant hazards to the
health of both humans and wildlife4. These
endocrine disruptors include as a subset
estrogenic compounds, which can interact with
estrogen receptors. There is increasing interest
in the problem of quickly screening large
chemical libraries for potential estrogen
mimics5. Ideally, it would be possible to rapidly
and accurately scan a given database for
molecules with a high proabability of acting as
estrogen mimics once identified, candidate
compounds could be subject to further scrutiny,
including assays of biological activity.
A special problem is posed by the character of t
he estrogen receptor - it is promiscuous,
interacting with compounds that feature no
obvious structural similarity. It is our
hypothesis that shape (and electrostatics) are
better descriptors of estrogenicity than chemical
structure. The shape signatures method is
well-suited to searching chemical databases
directly on the basis of shape and polarity.
8 Here we use shape signatures in ligand-based m
ode, scanning a large database for compounds
similar in shape to known endocrine disruptors.
Our target is a large subset (115,000 compounds)
of the NCI Database6. Coordinates for the
compounds in the NCI library are supplied by
Tripos, Inc. as part of the UNITY chemical
database package. Shape signatures information
has been generated for all compounds in NCI with
molecular weight less than 800. Computations were
carried out using the Beowulf cluster at the West
Center for Computer-Aided Drug Discovery at the
University of the Sciences in Philadelphia.
Queries Our queries are four compounds known
to be endocrine disruptors 17-?-estradiol,
coumestrol, DES, and tamoxifen.A selection of the
top hits for 1D signatures searches are presented
in Figs. 7-10. (Hits were ranked on the basis of
distance between target and query signatures.)
9Fig. 7. Selected hits (from top 50),
17-?-estradiol as query,
using 1D signatures
Hit 7
Hit 1
Hit 4 (QUERY!)
50-28-2
16205-32-6
59452-14-1
Hit 15
QUERY
Hit 11
21513-89-3
547-81-9
17-?-estradiol
Hit 35
Hit 36
Hit 24
1740-19-8
82571-86-6
2686-05-7
10Fig. 8. Selected hits (from top 50), coumestrol
as query,
using 1D signatures
Hit 2 (QUERY!)
Hit 5
Hit 1
479-13-0
520-28-5
6316-25-2
Hit 12
QUERY
Hit 7
29980-70-9
73460-18-3
coumestrol
Hit 24
Hit 31
Hit 15
23774-13-2
67199-66-0
14191-22-1
11Fig. 9. Selected hits (from top 50), DES as
query,
using 1D signatures
Hit 3
Hit 6
Hit 1
3092-20-4
83456-29-5
5465-75-8
Hit 12
QUERY
Hit 10
21323-24-0
6321-89-7
DES
Hit 15
Hit 17
Hit 24
5455-89-0
6960-48-1
2878-63-9
12Fig. 10. Selected hits (from top 50), tamoxifen
as query,
using 1D signatures
Hit 4
Hit 1
Hit 2
19142-68-8
65321-78-0
341-69-5
Hit 7
QUERY
Hit 5
85727-12-4
tamoxifen
66421-87-2
Hit 42
3733-63-9
13- Comments on 1D search
- If the query compound is present in the database,
it is ranked close to the top, although it may
not be the 1 hit. (Incomplete convergence of
histograms? Sensitivity to conformation?) - Compounds larger than the query, but which share
a common motif, can be selected (Fig. 7/Hit 24)
- also rearrangements of the query (Fig. 8/Hit
1) - DES is present in the NCI database, but is NOT
selected by the query. The structure in the
Tripos-supplied version of NCI is incorrect, the
phenol groups in cis arrangement about the
central double bond, where they should be trans! - Compounds are selected which are shape-similar to
the query, but have distinctly different
connectivity (Fig. 7/Hit 36).
- Shape signatures can effectively identify
compounds on the basis of shape.
142D-MEP Searching Searches were carried out agains
t the NCI database using 2D-MEP signatures, with
the same set of query compounds.
From our initial 1D search, it was clear that a
hydroxyl on 17-?-estradiol was positioned with
opposite orientation to the database hits,
likewise for a hydroxyl on coumestrol. This makes
little difference in the shape-only search, but
is critical when electrostatics is included.
These hydroxyl positions were modified prior to
running the 2D search. Results for the 2D-MEP s
earch clearly show the influence of
electrostatics. This is most dramatically seen
for 17-?-estradiol, and and to a lesser extent
for coumestrol the top seven hits for each of
these queries are shown on in Fig. 11. Compounds
identified by the 2D-MEP search are similar to
the queries both in polarity and size. In each
case, the query finds itself as best hit.
15Fig. 11. Best hits for two 2D-MEP searches
17-?-estradiol
Hit 1
2
3
QUERY!
1090-04-6
1630-83-7
50-28-2
5
6
4
7
6301-88-8
19882-03-2
3597-38-4
6301-87-7
Coumestrol
QUERY!
Hit 1
2
3
479-13-0
55977-10-1
80784-88-9
5
7
4
6
6780-38-7
6468-49-1
1690-63-7
54108-08-6
16Conclusions Shape Signatures promises to be a pow
erful tool for identifying molecules on the basis
of shape and polarity. In our initial tests using
estrogenic compounds as queries, searching with
1D signatures casts a wider net, selecting from
the database compounds that are are shape similar
to the query in all or in part. Searching with
2D-MEP signatures would appear to yield tighter
selectivity, the result of screening on the basis
of both shape and electrostatic potential.
References 1. R. Zauhar, J. Fretz W. Welsh, Sha
pe signatures, a novel technique for ligand- and
receptor-based molecular design, in preparation
2. SMART A Solvent-Accessible Triangulated
Surface Generator for Molecular Graphics and
Boundary Element Applications, R.J. Zauhar, J.
Comp-Aided Mol. Design., 9, 149-159 (1995).
3. Tripos, Inc., 1699 South Hanley Road, Saint
Louis, MO 4. Kavlock, R.J. Daston, G.P. DeRosa,
C. Fenner-Crisp, P. Gray, L.E. Kaattari, S.
Lucier, G. Luster, M. Mac, M.J. Maczka, C.
Miller, R. Moore, J. Rolland, R. Scott, G.
Sheehan, D.M. Sinks, T. Tilson, H.A. Research
needs for the risk assessment of health and
environmental effects of endocrine disruptors A
report of the U.S. EPA-sponsored workshop.
Environ. Health. Perspect. 1996, 104, 715-740.
5. Patlak, M. A testing deadline for endocrine
disrupters. Environm. Sci. Technol. 1996, 30,
540A-544A. 6. NCI Database, Developmental Therape
utics Program, National Cancer Institute,
National Institutes of Health, Bethesda, MD.