Title: Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine
1Surflex Fully Automatic Flexible Molecular
Docking Using a Molecular Similarity-Based Search
Engine
- Ajay N. Jain
- UCSF Cancer Research Institute and
Comprehensive Cancer Center, University of
California -
-
- Presentation by Susan Tang
- CS 379a
- January 23, 2006
2Protein-Ligand Docking Overview
- Goal
- - To predict how well a given set of ligands will
bind to a protein structure - - To predict the structure of bound
protein-ligand complexes - Components
- - Search method explore different ways that
ligand can interact/fit with protein - - Scoring function assign a quantitative value
to each ligand/protein fit
3Protein-Ligand Docking Overview
- Criteria
- 1) Docking accuracy
- Measures ability to find a conformation
alignment (pose) of a protein-ligand that is
close to reality - 2) Scoring accuracy
- Ability to rank a correct pose of a molecule
higher than an incorrect one - 3) Screening utility
- Ability to identify only true ligands in a set
that contains false positives - 4) Speed
- How fast the algorithm can screen a library of
ligands
4Surflex A new docking methodology
- Combines Hammerheads empirical scoring function
with a molecular similarity method to generate
putative poses of ligand fragments - Like Hammerhead, Surflex has 1 mode that uses an
incremental construction search approach. But
Surflex also has another mode a whole molecule
approach that is faster/more accurate - Surflex is designed primarily as a screening tool
for small molecule libraries
5Surflex Computational Design
- Protomol Generation
- First create an ideal active site ligand from
the protein structure of interest - Input
- (a) protein structure (b) list of residues to
identify protein active site - Output
- A protomol, or target to which potential ligands
or ligand fragments are aligned based on
molecular similarity -
- Procedure
- Molecular fragments are put into the protein
binding site in multiple positions ? optimized
for interaction with protein ? select
high-scoring nonredundant fragments ? protomol
formation
6Surflex Computational Design
- Protomol for streptavidin compared with the
native pose of biotin (green) - The bond being pointed to is broken by Surflex to
make fragments of biotin for docking.
7Surflex Computational Design
- Docking
- Ligands are docked into the protein to optimize
scoring function -
- Input
- (a) protein structure, (b) protomol, (c)
ligand(s) - Output
- The optimized poses of docked ligands along with
corresponding scores -
- Procedure
- Divide input ligand into 1-10 molecular
fragments ? search each fragment in terms of
conformation ? each conformation of each
fragment is aligned to protomol to get poses with
maximum molecular similarity to protomol ? score
aligned fragments and keep those with highest
score and minimal protein interpenetration ?
construct full ligand molecule from the aligned
fragments using either an incremental
construction approach or whole molecule approach
? highest scoring poses undergo further
refinement of conformation and alignment
8Surflex Computational Design
- Incremental Construction vs. Whole Molecule
Algorithm - Incremental Construction
- - Makes strong assumption that maximizing the
similarity of tiny fragments to the protomol will
generate good poses - Whole Molecule Algorithm
- - bypasses the strong independence assumption
made in incremental construction - - dead pieces are carried with the live
piece during conformation search - - when creating putative poses to protomol, the
dead pieces in their arbitrary initial
conformation are carried into the molecular
similarity computation ? eliminate those with
worst protein interpenetration - - for remaining poses, score on basis of
individual fragments - - recursive search yields whole molecules that
consist of fragments selected from different
docked poses - - these whole molecules score well in total,
over all fragments
9Surflex Computational Design
- Illustrates the process of docking biotin to
streptavidin (blue) - Gray indicates the live fragment
- Magenta indicates the dead fragment
- Green lines show the result of merging the two
well-docked fragments at the atoms indicated by
yellow circles - The merged pose closely follows the parent
fragments original configurations
10Surflex Evaluation
- Evaluation of reliability and accuracy of
dockings - - Comparison with experimental results on 81
protein/ligand pairs - - The pairs were selected to represent
structural diversity - Evaluation of Surflexs utility as a screening
tool - Performed on 2 protein targets (thymidine kinase
and estrogen receptor) - Competing docking methods were tested side by
side using the same data set for comparison
purposes (GOLD, Dock, FlexX) - Evaluation of the Surflexs docking speed
- - Investigate relationship between docking
time and of rotatable bonds
11Surflex Evaluation Data Set Construction
134 protein-ligand Complexes
81 protein-ligand complexes
filter
- Filtering Criteria
- 15 or fewer rotatable bonds
- ? Most small molecules have lt 15 rotable bonds
- no covalent attachments between ligand and
protein - ? Since Surflexs scoring function was developed
strictly on noncovalent complexes - ligands with no obvious errors in structure
- ? Undesirable to modify an existing
protein-ligand complex prior to testing - data set used for GOLD docking program
12Surflex EvaluationResults
- 1) Evaluation of reliability and accuracy of
dockings - Describes how thorough the search procedure is
and to what extent scoring function can recognize
good dockings - Surflex returned a pose within 2.5 angstroms rmsd
(94 of cases) - Surflex returned a BEST scoring pose that was
within 2.5 angstroms (86 of cases) - With a single docking from a random initial pose,
chances of finding a correct or nearly correct
pose is averaged to be 70
13Surflex EvaluationResults
14Surflex EvaluationResults
- 2) Evaluation of Surflexs utility as a screening
tool - Tests ability of program to detect true
positives against a background of random
molecules (sensitivity vs. specificity) - Surflex had a True Positive rate of gt 80 at a
False Positive rate of lt 1 - Surflex had the best performance (lowest FP rate
for a given TP rate) out of the different
individual and combined methods assayed
15Surflex EvaluationResults
- 3) Evaluation of the Surflexs docking speed
- Docking speed becomes very important in
screening large compound libraries. - Surflex demonstrated a docking time that was
approx. linear in number of rotatable bonds - Rigid molecules took a few seconds and each
additional rotatable bond took an additional 10
seconds - Surflex yielded a mean running time of 44 seconds
for the 81 protein-ligands in the test set used
earlier - Docking speed ranges from 50-100 seconds per
molecule for FlexX, DOCK, and GOLD (Surflex speed
is comparable to these times) - Quantitative comparison across methods is
difficult due to differences in hardware and
methodology
16Surflex EvaluationResults
17Conclusions
- Surflex marks a step forward in flexible
molecular docking programs - Compared to the best docking methods available,
Surflex is - as fast
- as accurate in terms of docked ligand RMSD
- much more accurate in terms of scoring
- Assaying the top scoring 1 of compounds in the
screening library should yield a large proportion
of true positives - Potential areas of improvement
- - scoring and penetration terms should be
combined into a single score - - scoring function should include training on
non-binding ligands (negative examples) - - effect of nonbonded self-interactions within
ligands should be accounted for explicitly - - allow a degree of protein flexibility (side
chain movement)