Modelling molecules using local surface properties - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Modelling molecules using local surface properties

Description:

Allopurinol RGB Surfaces. LIE encoded on Red channel. LEA encoded on Green Channel ... points of allopurinol. 8 maxima. 7 ... critical points - allopurinol ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 60
Provided by: martyngl
Category:

less

Transcript and Presenter's Notes

Title: Modelling molecules using local surface properties


1
Modelling molecules using local surface
properties motion
  • Martyn Ford
  • University of Portsmouth, UK
  • martyn.ford_at_port.ac.uk

2
Atom based modellingQSAR QSPR
  • Almost all modelling techniques are based on
    atomistic descriptions of molecules
  • Although these techniques have been successful
    over several decades, they have disadvantages
  • poor scaling characteristics
  • lack of a solid physical justification, e.g.
    scoring functions
  • interpretation difficult due to abstract nature
    of many descriptors
  • tendency to produce high dimensional models

3
Parasurf - a non-atom based approach
  • The approach is based on calculation of a set of
    local properties at or near the molecular surface
  • the local molecular electrostatic potential (MEP)
  • the local ionisation energy (LIE, IEL)
  • the local electron affinity (LEA, EAL)
  • the local polarisability (LP, ?L)

4
Calculation of thesurface properties
  • Molecules defined as isodensity surfaces
  • using semi-empirical AM1 electron density
  • can also be defined using a shrink-wrap or a
    marching cube algorithm
  • Fitted to a spherical harmonic expansion
  • the shape of the shrink-wrapped surface, or
  • the four local properties
  • MEP, LIE, LEA LP

5
Describing surface shapespherical harmonic
expansion
  • The accuracy of the surface description is a
    function of the order L of the expansion
  • The greater L, the larger the computational
    penalty

6
Adjusting thesurface resolution
  • Spherical harmonics can be truncated at low
    orders for fast QSAR scans (HTS), fast
    superposition of molecules rapid calculation of
    similarity indices
  • for ligands (MW lt 750), L 6-8
  • for peptides proteins (MW gt 5,000), L 25-30

7
Putative resolutions for in silico screening
  • For ligands L6
  • For receptors L25

8
Advantage of this approach
  • The procedure gives a completely analytical
    description of the molecules shape the 4 local
    properties
  • These 4 properties can predict the chemical and
    biological properties of molecules of importance
    in the medical, materials and environmental
    sciences, e.g.
  • intermolecular binding properties
  • chemical reactivity

9
SHCs as QSAR descriptors
  • The spherical harmonics coefficients (SHCs) are
    the parameters that define the orthogonal
    functions that comprise the SH expansion to any
    order L
  • For each order, there are 2L1 coefficients
  • These sum to order L to give a description of the
    shape of a property to a required resolution

10
SHCs as QSAR descriptors
11
SHCs as QSAR descriptors
  • There are five fields (shape, MEP, LIE, LEA LP)
    to be represented by five spherical harmonic
    expansions to order L
  • For high resolution (say L15), 5 x 256 1280
    SHCs are calculated as descriptors
  • This may lead to redundancy, multicollinearity
    selection bias when specifying QSAR models for
    prediction

12
Selection bias
  • Occurs when p variables are specified from a pool
    of k (gt p) descriptors in order to maximise the
    coefficient of determination (R2) or the power of
    prediction (Q2)
  • When the objective function is fit, selection
    bias results in upwardly biased F ratios and
    associated statistics (?is)
  • as a result, the F tables used to determine
    significance are inappropriate (Livingstone DJ
    Salt DW (2005) J.Med.Chem, 48, 661-663 Kubinyi H
    (Proc. EuroQSAR 2004, in press)
  • www.cmd.port.ac.uk/cmd/fmaxmain.shtml

13
How can we deal with this problem?
  • One approach is to use stepwise regression
  • We can protect against selection bias by
    adjusting the tail probability ? until random
    variables are prevented from entering the
    specified equation by chance
  • This can be achieved by generating 1280 uniform
    random variables regressing this sample against
    the response variable, y

14
How can we deal with this problem?
  • The ? values for entering and leaving are then
    reduced until no random variable enters the
    equation
  • This ? is chosen for the model specification

15
Case study
  • Consider the following example
  • an aligned set of 25 D4 antagonists previously
    investigated using COMFA and PLS
  • Lanig H et al (2001) J. Med. Chem., 44, 1151 -
    1157
  • the study reported a 7-term QSAR equation with Q2
    pred 0.74
  • the range of pKi values is 4.61 to 9.21 (104.6
    40,738 fold)

16
The QSAR model
  • pKi 3.13ar4,4 - 8.98ar9,-1 - 14.75ar13,-9 -
    0.79av11,-11 6.14
  • ( 0.31) ( 1.76) ( 3.27) (
    0.21) ( 0.31)
  • where n25, R2 0.90, R2adj 0.88, Q2 0.82, s
    0.44, F4,20 43.13,
  • ? 1.3 x10-10
  • This 4-term equation 1 appears to have greater
    power of prediction than the
  • 7-term COMFA model reported earlier by Lanig et
    al (2001),
  • for which Q2 0.78

17
Experimental vs calculated pKi values
18
N-fold cross validation
19
Visualisation of several local properties on a
single surface
  • Can be achieved by using RGB coding to colour
    code the different local properties, eg
  • LIE encoded on Red channel
  • LEA encoded on Green Channel
  • LP or MEP encoded on Blue Channel
  • This will aid interpretation enable image
    analysis to be used to match compounds with
    similar surfaces

20
Allopurinol RGB Surfaces
  • LIE encoded on Red channel
  • LEA encoded on Green Channel
  • LP or MEP encoded on Blue Channel

21
Critical points of allopurinol
8 maxima 7 minima 13 saddles
No. of maxima no. of saddles no. of minima
Euler characteristic ? (S) 2
22
Gradient flows molecular surface property graphs
  • Characterize the behaviour of a property f S
    ? ? on a molecular surface S, in terms of a
    directed graph G on S derived from the gradient
    vector field x? grad f(x)
  • The molecular surface property graph G is defined
    by
  • Vertices (G) fixed points of grad f
  • critical points of f
  • Edges (G) stable and unstable manifolds of the
    saddle points

23
Representation of thecritical points -
allopurinol
  • The critical points of the spherical harmonic
    surface descriptions can be calculated
    numerically
  • These can be visualised using RGB coding (top
    left)
  • A molecular surface graph of the van der Waals
    surface (top right) or some other property can be
    used to search databases for identity or
    complementarity

24
Amino acid surface properties
  • The surface properties have been calculated for
    the 20 naturally ocurring amino acids
  • using a single conformation from the Richardson
    Rotamer Library of experimentally observed
    structures

25
Hydrophobics
26
Aromatics
27
Hydrogen Bonding
28
Charged
29
Others
30
Phylogenetic analysis using PHYLIP
31
Analysing the dynamic behaviour of molecules
  • The approach has so far been restricted to
    descriptions of static molecules
  • How might we deal with molecular motion?

32
Clustering Conformations
  • The traditional method involves clustering
    conformations sampled from MD or MC simulations
  • However,
  • Linear arithmetic not appropriate for angular
    data
  • The number of clusters needs to be specified a
    priori
  • Scales as O(N2) and is therefore time consuming
    and restricted to small data sets!

33
The DASH algorithm
  • A time series analysis procedure
  • Based on circular statistics appropriate for
    angular data
  • Uses a damping function to eliminate transient,
    unstable states
  • Identifies conformers (states) using a coding
    system comprising strings of integers
  • Performs data compression for efficient storage

34
Circular Statistics
Linear mean
-180
180
Circular mean
35
Elimination of unstable states
  • DASH has a smoothing algorithm that can remove
    singletons or states with a very low frequency of
    occurrence

36
Classifying the states
37
Combining torsion angle state codes
Coding Algorithm
Individual Torsion Angle State Codes
Combined Code for Conformer
38
Combining torsion angle state codes
39
Further Data Compression
  • The output from DASH can be further compressed to
    a sequence of state codes and time spent
    continuously in a state

40
Data Compression
Torsion Angles from molecular
dynamic simulations
DASH
25000 x 8 reals
200 x 2 integers
41
Advantages of DASH
  • It can analyse torsions, distances and any
    calculated property
  • It scales linearly
  • wrt to the length of the simulation and number of
    torsion angles or distances analysed
  • Identifies the number of conformers and gives a
    unique identifier to each
  • Data compression
  • converts many reals into a few integers
  • It is therefore suitable for very long simulations

42
Analysis of state sequences
  • Unlike Wards clustering, the sequence of states
    is preserved and can be used to investigate the
    complexity of molecular motion
  • illustrated for a 17 state deltamethrin MD
    simulation of 5 nsecs using 5 torsion angles

43
Deltamethrin
44
Modelling MD simulations
  • Can we model the the molecular dynamics?
  • Yes, using the states now identified by DASH
  • Why model?
  • To gain better understanding of the processes
    involved in the observed dynamic behaviour

45
Markov Chains
  • A Markov chain is a model of a stochastic process
    in which some variable (here the conformation
    code) is followed through time

46
Markov Chains
  • The probabilities of various changes between
    states depend only on the preceding state - 1st
    order Markov process
  • Xt conformation code at time t

Markov property
47
Markov Modelling
  • Transition probability matrix

next state
present state
48
Transition Probability Matrix
Closed set
49
Equilibrium Distribution
  • At long times, MD simulations are expected to
    attain a unique equilibrium distribution

where pi proportion of time spent in state i.
50
Conformations of asparagine
  • Application of DASH to MD trajectories of
    asparagine identified six conformations with
    similar side chain torsion angles to the
    experimental structures contained in the
    Richardson Rotamer Library
  • These structures have been investigated to
    determine the influence of shape on surface
    properties

51
Conformations of asparagine
  • Each conformation produces a molecular surface
    property graph

52
Surface property graphs for asparagine
conformations
53
Conformations of asparagine
  • The graphs are locally stable, unaltered by small
    conformational changes.
  • Thus, different conformations can produce
    structures with similar surface properties
  • Large conformational changes do alter the graphs,
    corresponding to bifurcations in the gradient
    field that change the number of critical points
    and rearrange edges.

54
Conclusions
  • Properties can be calculated at the surface of
    molecules using the Parashift technology
  • The properties are local can be RGB encoded
    used for visualisation, pattern matching
    alignment
  • Descriptor sets derived from these properties can
    be used to specify robust QSPR QSAR models
  • environment, materials, medicinal chemistry, crop
    protection

55
Conclusions
  • DASH provides a rapid means of analysing
    molecular dynamic trajectories
  • scales linearly wrt time no of torsions
  • minutes instead of hours
  • DASH provides a fast method of identifying
    metastable conformations
  • DASH provides a numerical characterisation of
    molecular dynamics a means of data compression

56
Conclusions
  • The states identified by DASH can be used to
    undertake Markov modelling

57
Conclusions
  • Molecular surface property graphs (MSPGs) provide
    a concise characterisation of surface properties
  • providing a finite classification of
    property-based conformational shapes.
  • Different shaped molecules (conformers) may have
    similar surface topologies

58
ParaSurf in silico Screening Technology
  • The software became available from Cepos In
    Silico Ltd on July 1st, 2005
  • Academic partners
  • University of Portsmouth
  • University of Erlangen
  • University of Southampton
  • University of Aberdeen
  • University of Oxford
  • University of Edinburgh

59
Acknowledgements
  • David Salt
  • Brian Hudson
  • Matt Ellis
  • David Whitley
  • Lee Banting
  • Tim Clark
  • Research Councils UK, EPSRC HEFCE
Write a Comment
User Comments (0)
About PowerShow.com