Title: Modelling molecules using local surface properties
1Modelling molecules using local surface
properties motion
- Martyn Ford
- University of Portsmouth, UK
- martyn.ford_at_port.ac.uk
2Atom based modellingQSAR QSPR
- Almost all modelling techniques are based on
atomistic descriptions of molecules - Although these techniques have been successful
over several decades, they have disadvantages - poor scaling characteristics
- lack of a solid physical justification, e.g.
scoring functions - interpretation difficult due to abstract nature
of many descriptors - tendency to produce high dimensional models
3Parasurf - a non-atom based approach
- The approach is based on calculation of a set of
local properties at or near the molecular surface - the local molecular electrostatic potential (MEP)
- the local ionisation energy (LIE, IEL)
- the local electron affinity (LEA, EAL)
- the local polarisability (LP, ?L)
4Calculation of thesurface properties
- Molecules defined as isodensity surfaces
- using semi-empirical AM1 electron density
- can also be defined using a shrink-wrap or a
marching cube algorithm - Fitted to a spherical harmonic expansion
- the shape of the shrink-wrapped surface, or
- the four local properties
- MEP, LIE, LEA LP
5Describing surface shapespherical harmonic
expansion
- The accuracy of the surface description is a
function of the order L of the expansion - The greater L, the larger the computational
penalty
6Adjusting thesurface resolution
- Spherical harmonics can be truncated at low
orders for fast QSAR scans (HTS), fast
superposition of molecules rapid calculation of
similarity indices - for ligands (MW lt 750), L 6-8
- for peptides proteins (MW gt 5,000), L 25-30
7Putative resolutions for in silico screening
- For ligands L6
- For receptors L25
8Advantage of this approach
- The procedure gives a completely analytical
description of the molecules shape the 4 local
properties - These 4 properties can predict the chemical and
biological properties of molecules of importance
in the medical, materials and environmental
sciences, e.g. - intermolecular binding properties
- chemical reactivity
9SHCs as QSAR descriptors
- The spherical harmonics coefficients (SHCs) are
the parameters that define the orthogonal
functions that comprise the SH expansion to any
order L - For each order, there are 2L1 coefficients
- These sum to order L to give a description of the
shape of a property to a required resolution
10SHCs as QSAR descriptors
11SHCs as QSAR descriptors
- There are five fields (shape, MEP, LIE, LEA LP)
to be represented by five spherical harmonic
expansions to order L - For high resolution (say L15), 5 x 256 1280
SHCs are calculated as descriptors - This may lead to redundancy, multicollinearity
selection bias when specifying QSAR models for
prediction
12Selection bias
- Occurs when p variables are specified from a pool
of k (gt p) descriptors in order to maximise the
coefficient of determination (R2) or the power of
prediction (Q2) - When the objective function is fit, selection
bias results in upwardly biased F ratios and
associated statistics (?is) - as a result, the F tables used to determine
significance are inappropriate (Livingstone DJ
Salt DW (2005) J.Med.Chem, 48, 661-663 Kubinyi H
(Proc. EuroQSAR 2004, in press) - www.cmd.port.ac.uk/cmd/fmaxmain.shtml
13How can we deal with this problem?
- One approach is to use stepwise regression
- We can protect against selection bias by
adjusting the tail probability ? until random
variables are prevented from entering the
specified equation by chance - This can be achieved by generating 1280 uniform
random variables regressing this sample against
the response variable, y
14How can we deal with this problem?
- The ? values for entering and leaving are then
reduced until no random variable enters the
equation - This ? is chosen for the model specification
15Case study
- Consider the following example
- an aligned set of 25 D4 antagonists previously
investigated using COMFA and PLS - Lanig H et al (2001) J. Med. Chem., 44, 1151 -
1157 - the study reported a 7-term QSAR equation with Q2
pred 0.74 - the range of pKi values is 4.61 to 9.21 (104.6
40,738 fold)
16The QSAR model
- pKi 3.13ar4,4 - 8.98ar9,-1 - 14.75ar13,-9 -
0.79av11,-11 6.14 - ( 0.31) ( 1.76) ( 3.27) (
0.21) ( 0.31) - where n25, R2 0.90, R2adj 0.88, Q2 0.82, s
0.44, F4,20 43.13, - ? 1.3 x10-10
- This 4-term equation 1 appears to have greater
power of prediction than the - 7-term COMFA model reported earlier by Lanig et
al (2001), - for which Q2 0.78
17Experimental vs calculated pKi values
18N-fold cross validation
19Visualisation of several local properties on a
single surface
- Can be achieved by using RGB coding to colour
code the different local properties, eg - LIE encoded on Red channel
- LEA encoded on Green Channel
- LP or MEP encoded on Blue Channel
- This will aid interpretation enable image
analysis to be used to match compounds with
similar surfaces
20Allopurinol RGB Surfaces
- LIE encoded on Red channel
- LEA encoded on Green Channel
- LP or MEP encoded on Blue Channel
21Critical points of allopurinol
8 maxima 7 minima 13 saddles
No. of maxima no. of saddles no. of minima
Euler characteristic ? (S) 2
22Gradient flows molecular surface property graphs
- Characterize the behaviour of a property f S
? ? on a molecular surface S, in terms of a
directed graph G on S derived from the gradient
vector field x? grad f(x) - The molecular surface property graph G is defined
by - Vertices (G) fixed points of grad f
- critical points of f
- Edges (G) stable and unstable manifolds of the
saddle points
23Representation of thecritical points -
allopurinol
- The critical points of the spherical harmonic
surface descriptions can be calculated
numerically - These can be visualised using RGB coding (top
left) - A molecular surface graph of the van der Waals
surface (top right) or some other property can be
used to search databases for identity or
complementarity
24Amino acid surface properties
- The surface properties have been calculated for
the 20 naturally ocurring amino acids - using a single conformation from the Richardson
Rotamer Library of experimentally observed
structures
25Hydrophobics
26Aromatics
27Hydrogen Bonding
28Charged
29Others
30Phylogenetic analysis using PHYLIP
31Analysing the dynamic behaviour of molecules
- The approach has so far been restricted to
descriptions of static molecules - How might we deal with molecular motion?
32Clustering Conformations
- The traditional method involves clustering
conformations sampled from MD or MC simulations - However,
- Linear arithmetic not appropriate for angular
data - The number of clusters needs to be specified a
priori - Scales as O(N2) and is therefore time consuming
and restricted to small data sets!
33The DASH algorithm
- A time series analysis procedure
- Based on circular statistics appropriate for
angular data - Uses a damping function to eliminate transient,
unstable states - Identifies conformers (states) using a coding
system comprising strings of integers - Performs data compression for efficient storage
34Circular Statistics
Linear mean
-180
180
Circular mean
35Elimination of unstable states
- DASH has a smoothing algorithm that can remove
singletons or states with a very low frequency of
occurrence
36Classifying the states
37Combining torsion angle state codes
Coding Algorithm
Individual Torsion Angle State Codes
Combined Code for Conformer
38Combining torsion angle state codes
39Further Data Compression
- The output from DASH can be further compressed to
a sequence of state codes and time spent
continuously in a state
40Data Compression
Torsion Angles from molecular
dynamic simulations
DASH
25000 x 8 reals
200 x 2 integers
41Advantages of DASH
- It can analyse torsions, distances and any
calculated property - It scales linearly
- wrt to the length of the simulation and number of
torsion angles or distances analysed - Identifies the number of conformers and gives a
unique identifier to each - Data compression
- converts many reals into a few integers
- It is therefore suitable for very long simulations
42Analysis of state sequences
- Unlike Wards clustering, the sequence of states
is preserved and can be used to investigate the
complexity of molecular motion - illustrated for a 17 state deltamethrin MD
simulation of 5 nsecs using 5 torsion angles
43Deltamethrin
44Modelling MD simulations
- Can we model the the molecular dynamics?
- Yes, using the states now identified by DASH
- Why model?
- To gain better understanding of the processes
involved in the observed dynamic behaviour
45Markov Chains
- A Markov chain is a model of a stochastic process
in which some variable (here the conformation
code) is followed through time -
46Markov Chains
- The probabilities of various changes between
states depend only on the preceding state - 1st
order Markov process - Xt conformation code at time t
Markov property
47Markov Modelling
- Transition probability matrix
next state
present state
48Transition Probability Matrix
Closed set
49Equilibrium Distribution
- At long times, MD simulations are expected to
attain a unique equilibrium distribution
where pi proportion of time spent in state i.
50Conformations of asparagine
- Application of DASH to MD trajectories of
asparagine identified six conformations with
similar side chain torsion angles to the
experimental structures contained in the
Richardson Rotamer Library - These structures have been investigated to
determine the influence of shape on surface
properties
51Conformations of asparagine
- Each conformation produces a molecular surface
property graph
52Surface property graphs for asparagine
conformations
53Conformations of asparagine
- The graphs are locally stable, unaltered by small
conformational changes. - Thus, different conformations can produce
structures with similar surface properties - Large conformational changes do alter the graphs,
corresponding to bifurcations in the gradient
field that change the number of critical points
and rearrange edges.
54Conclusions
- Properties can be calculated at the surface of
molecules using the Parashift technology - The properties are local can be RGB encoded
used for visualisation, pattern matching
alignment - Descriptor sets derived from these properties can
be used to specify robust QSPR QSAR models - environment, materials, medicinal chemistry, crop
protection
55Conclusions
- DASH provides a rapid means of analysing
molecular dynamic trajectories - scales linearly wrt time no of torsions
- minutes instead of hours
- DASH provides a fast method of identifying
metastable conformations - DASH provides a numerical characterisation of
molecular dynamics a means of data compression
56Conclusions
- The states identified by DASH can be used to
undertake Markov modelling
57Conclusions
- Molecular surface property graphs (MSPGs) provide
a concise characterisation of surface properties - providing a finite classification of
property-based conformational shapes. - Different shaped molecules (conformers) may have
similar surface topologies
58ParaSurf in silico Screening Technology
- The software became available from Cepos In
Silico Ltd on July 1st, 2005 - Academic partners
- University of Portsmouth
- University of Erlangen
- University of Southampton
- University of Aberdeen
- University of Oxford
- University of Edinburgh
59Acknowledgements
- David Salt
- Brian Hudson
- Matt Ellis
- David Whitley
- Lee Banting
- Tim Clark
- Research Councils UK, EPSRC HEFCE