Title: AUTODOCK An Automated Docking Software for Predicting Optimal ProteinLigand Interaction
1AUTODOCKAn Automated Docking Software for
Predicting Optimal Protein-Ligand Interaction
- By
- Susan McClatchy, Milind Misra,
- Chandreyee Mukherjee, Indu Shrivastava
2Introduction
3Automated Docking Importance
- Interaction between biomolecules lie at the core
of all metabolic processes and life activities - The number of solved protein structures available
in the databases is expanding exponentially - To understand their functions it is essential to
elucidate the interaction mechanisms between the
different molecules - Primary importance lies in rational drug design
- Depending upon the success of the docked
molecules the docking ligand may be redesigned or
its structure further refined. - Also important in the area of immunology to study
antigen-antibody interaction.
4Inhibitor bound to active site of HIVPR
Surface structure of HIVPR with bound inhibitor
5What is docking?
- Prediction of the optimal physical configuration
and energy between two molecules - The docking problem optimizes
- Binding between two molecules such that their
orientation maximizes the interaction - Evaluates the total energy of interaction such
that for the best binding configuration the
binding energy is the minimum - The resultant structural changes brought about by
the interaction
6Categories of docking
- Protein-Protein Docking
- Both molecules are rigid
- Interaction produces no change in conformation
- Similar to lock-and key model
- Protein-Ligand Docking
- Ligand is flexible but the receptor protein is
rigid - Interaction produces conformational changes in
ligand
7 8Docking uses a search and score method
- It involves
- Finding useful ways of representing the molecules
and molecular properties. - Exploration of the configuration spaces available
for interaction between ligand and receptor. - Evaluate and rank configurations using a scoring
system, in this case the binding energy - However, since it is difficult to evaluate the
binding energy because the binding sites may not
be easily accessible, the binding energy is
modeled as follows - ?G bind ?Gvdw ?Ghbond ?Gelect ?G conform
?G tor ?G sol
9The AutoDock Software
- Developed by AJ Olsons group in 1990.
- AutoDock uses free energy of the docking
molecules using 3D potential-grids - Uses heuristic search to minimize the energy.
- Search Algorithms used
- Simulated Annealing
- Genetic Algorithm
- Lamarckian GA (GALS hybrid)
10Algorithms Overview
- Simulated Annealing
- Based on temperature effects
- Start with high temperature and global search
- Lower temperature local search
- Genetic Algorithm
- Charles Darwins Theory of Evolution
- Genotype Phenotype
- Lamarckian Algorithm ( Jean Baptiste de Lamarck)
- Phenotype Genotype
11Project Goal
- Study algorithms used to perform the searches and
to calculate minimum energy - Discuss why GALS hybrid better than SA
- Look at an example, i.e., dock a ligand to a
protein molecule using latest AutoDock version
12The Algorithms
13Simulated Annealing
- Algorithm modeled after the cooling of a solution
to form glass, though its better explained by
crystal formation - Given a long enough cooling time, molecules will
relax into their lowest energy state to form the
largest crystals - Quick cooling - highly disordered system
- Slow cooling - highly ordered crystal, with each
molecule in its lowest energy state - Algorithm simulates either linear or proportional
slow cooling
14The SA Algorithm
- Uses neighborhood operator N(s) to generate a set
of solutions according to a fixed distribution - New solution compared to preceding solution, and
is accepted if its energy is lower than that of
previous solution - If new solution has higher energy, it is accepted
probabilistically according to Boltzmann
distribution (see figure above) - At high temperatures, many higher energy
solutions will be accepted at low temps.,
majority of probabilistic moves rejected - Boltzmann probability distribution e exp(delta
E/T) where delta E energy difference between
two solutions, T
temperature - Boltzmann finds p(of finding a system with energy
E at temp T)
15Pseudocode for SA
- Compute a random initial state s
- n0, xn s // initialize best solution to s
and first state to 0 - Repeat i 1, 2, // specify number of
temperatures to try - Repeat j 1, 2, , mi // no. of steps to
perform for each temp. Ti - Compute a neighbor s N(s) // s new
solution from N(s) - if (f(s) lt f(s)) then // if energy of s
lt energy of s - s s // accept new solution s
- if (f(s) lt f(xn)) then // if energy of
new solution lt - xn s // energy of best solution of
- n n 1 // state n, replace best with
new - endif
- else // otherwise replace s with s
using - s s with probability e (f(s) - f(s))/Ti
// Boltzmann dist. - endif
- EndRepeat
- EndRepeat
16How Genetic Algorithms Work - A Simple Example
- Initial population of binary creatures having 6
genes - Each gene has two different alleles, either a 0
or a 1 - Three operators crossover, mutation and selection
1 1 1 1 0 0
0 0 0 0 0 1
1 0 0 0 0 1
0 0 0 0 0 0
17Selection
Score
- Selection based on a fitness function f(x)
- This operator chooses those individuals with the
lowest values - Those with higher values chosen with a very low
probability
1 1 1 1 0 0
20
0 0 0 0 0 1
13
1 0 0 0 0 1
48
52
0 0 0 0 0 0
18Crossover
0 0 0 1 0 0
1 1 1 1 0 0
1 1 1 0 0 1
0 0 0 0 0 1
1 1 1 1 0 1
1 1 1 1 0 0
0 0 0 0 0 0
0 0 0 0 0 1
19Mutation
0 0 1 1 0 0
0 0 0 1 0 0
1 1 1 0 1 1
1 1 1 0 0 1
1 1 1 1 0 1
1 1 1 1 0 1
0 0 1 0 1 0
0 0 0 0 0 0
20Replacement
offsp
Score
- Lower scoring individuals create more offspring,
higher scoring ones create fewer or none at all - Offspring replace parental generation
- Elitism function allows best individual from
parent generation to persist, if it is a better
solution than new individuals created - Cycle of selection, mutation, crossover and
replacement repeated
0 0 1 1 0 0
15 1
1 1 1 0 1 1
9 1
1 1 1 1 0 1
22 0
0 0 1 0 1 0
1 2
21Pseudocode for GA
- Select an initial population set xi0 x10 ,
x20,, xM0 - Determine fitness values f(xi0) for each
individual - Repeat for g 1, 2, of generations
- Perform selection
- Perform crossover with probability ?
- Perform mutation with probability ?
- Determine fitness f(xig) for new individuals
- xg argmini1,M f(xig) and yg f(xg)
- Perform replacement
- Until stopping criterion ( of generations) is
reached
22How GA works in AutoDock
- Ligands genes are its x, y and z coordinates
- These form a unit vector, which is given a random
rotation angle between 0o and 360o to form a
quaternion - Additional genes may represent torsion angles
between bonds of the ligand
23Mapping
- In standard GA, the genotype (x,y,z coordinates
plus rotation and any torsion angles) are mapped
to the fitness function f(x) - The fitness function value corresponds to each
individuals phenotype - According to the right hand side of the figure,
genotypes of parents with high f(x) values are
mutated to form genotypes of children with lower
f(x) values
24Selection, Crossover Mutation
- Selection chooses ligands with the lowest fitness
(energy) values - Crossover exchanges x, y, z coordinates, or
rotations or torsions between these ligands - Example Two ligands with xyz coordinates Abc and
aBc Crossover results in new individuals with
coordinates abc and ABc
- Mutation operator mutates coordinate or other
angle values by adding a random real number
according to a Cauchy distribution, which is
similar to a Gaussian but has thicker tails
25Replacement
- Individuals with better-than-average fitness
receive proportionally more offspring
- no (fw fi)/(fw - ltfgt),
- fw ! ltfgt
- where
- no number of offspring
- fi fitness of individual (energy of ligand)
- fw fitness of worst individual in last g
generations (typically 10) - ltfgt mean fitness of population
26Lamarckian Genetic Algorithm
- According to left hand side of figure, LGA finds
lowest fitness function (energy) values first,
then maps these values to their respective
genotypes - Genetic algorithm plus Solis and Wets local
search - Better performance than either simulated
annealing or genetic algorithm alone
27The Application
28HIV-1 Protease and AHA006
- HIV-1 Protease in complex with the cyclic
sulfamide inhibitor, AHA006 - Source Protein Data Bank
- Authors K. Backbro, T. Unge
- Exp. Method X-ray Diffraction (2 Å res.)
- Primary Citation Backbro et al, J Med Chem 40
pp. 898 (1997) - Polymer Chains A, B Residues 198 Atoms
1632
29Protein (HIV-1 Protease)
Ligand (AHA006)
(Source PDB)
30HIV-1 Protease dimer
(Rasmol)
31Initial X-Ray crystallographic positions of
protein and ligand
(SYBYL)
32Docking Preparation Ligand
- Assign charges
- Define rotatable bonds
- Rename aromatic carbons
- Merge non-polar hydrogens
- Write .pdbq ligand file
33Docking Preparation Protein
- Add essential hydrogens
- Load charges
- Merge lone-pairs
- Add solvation parameters
- Write .pdbqs protein file
34Docking Preparation Grid
- AutoDock uses grid-based docking
- Ligand-protein interaction energies are
pre-calculated and then used as a look-up table
during simulation
- Grid maps are constructed based on atoms of
interest in ligand (here CANOSH)
35(AutoDockTools)
36Docking Simulated Annealing
- Runs 100
- Cycles 50
- Initial Temp (RT) 1,000
- Temp reduction factor .95
- Linear temperature reduction
- Translation reduction factor 1
- Quaternion reduction factor 1
- Torsional reduction factor 1
- rotatable bonds 12
- Initial coordinates Random
- Initial quaternion Random
- Initial dihedrals Random
- Translation step 2.0 Å
- Quaternion step 50 deg
- Torsion step 50 deg
- Results
- 100 different clusters
- Energy range -0.63 to 64,000
- Conformation 81 -0.63
- Conformation 67 20.02
- Conformation 68 10.74
- Lowest energy conf not close to position but
similar to original - Conf 67 closest to position and conformation of
original ligand higher energy - Conf 68 close to position but not conformation
of original ligand not as high energy
37Original ligand conf SA conformation 67
(SYBYL)
38Original ligand conf SA conformation 67
Close-up of previous
(SYBYL)
39Original ligand conf SA conformation 67
(SYBYL)
40100 Clustered SA Conformations
(gOpenMol)
41Docking Genetic Algorithm
- Runs 50
- Evaluations 250,000
- Population size 50
- Elitism count 1
- Mutation rate 0.02
- Crossover rate 0.8
- Window size 10
- Cauchy alpha 0
- Cauchy beta 1
- rotatable bonds 12
- Initial coordinates Random
- Initial quaternion Random
- Initial dihedrals Random
- Translation step 2.0 Å
- Quaternion step 50 deg
- Torsion step 50 deg
- Results
- 50 different clusters
- Energy range -18.66 to 86.28
- Conformation 39 -18.66
- Conformation 9 -10.60
- Lowest energy conformation overall closest to
original ligand conformation - If only 10 runs had been used instead of 50, then
conf 9 would have been the lowest energy
conformation.
42Docking Local Search
- Results
- 18 different clusters
- Energy range 35.92 to 215,200
- Confs 20, 21, 22, 23 35.92
- Lowest energy conformation was most dissimilar to
original ligand conformation - Better results could have been obtained by
reducing the step sizes
- Runs 50
- Solis-Wets iterations 300
- Consecutive successes 4
- Consecutive failures 4
- Rho 1
- Lower bound on rho 0.01
- LS frequency 0.06
- rotatable bonds 12
- Initial coordinates Random
- Initial quaternion Random
- Initial dihedrals Random
- Translation step 2.0 Å
- Quaternion step 50 deg
- Torsion step 50 deg
43Docking Lamarckian GA
- Results
- 10 different clusters
- Energy range -18.10 to 8.38
- Conformation 7 -18.10
- Lowest energy conformation fairly similar to
original ligand conformation - If the number of runs was restricted to 10 for
both GA and LGA, LGA would have generated the
best structure
- Runs 10
- Max Evaluations 250,000
- Max Generations 27,000
- Population size 50
- Elitism count 1
- Mutation rate 0.02
- Crossover rate 0.8
- Window size 10
- Cauchy alpha 0
- Cauchy beta 1
- Solis-Wets iterations 300
- Consecutive successes 4
- Consecutive failures 4
- Rho 1
- Lower bound on rho 0.01
- LS frequency 0.06
- Gray options
44Original ligand conf Best GA conf Best LGA
conf Best SA conf Best LS conf
(SYBYL)
45Original ligand conf Best GA conf Best LGA
conf Best SA conf
(SYBYL)
46References
http//cmgm.stanford.edu/biochem218/Projects20199
8/Apaydin.pdf http//www.biz.uiowa.edu/class/6K299
_menczer/PPT/Hart/sld018.html http//cs.felk.cvut.
cz/xobitko/ga/ http//www.bch.msu.edu/labs/kuhn/w
eb/projects/screening/solvation.html http//wwwcmc
.pharm.uu.nl/gillies/thesis/ http//www.chem.uidah
o.edu/honors/boltz.html S.Kumar et.al. Protein
Flexibility and Electrostatic Interactions. IBM
Journal of Research and Development Vol45. No ¾
2001. G. Morris et.al. Automated Docking Using a
Lamarckian Genetic Algorithm and an Empirical
Binding Free Energy Function. Journal of
Computational Chemistry, Vol. 19, No. 14,
1639-1662 (1998) C. Rosin et.al. A Comparison of
Global and Local Search Methods in Drug Docking.
UCSD CSE Technical Report CS97-522 (1997) C. A.
Sotriffer et.al. Automated Docking of Ligands to
Antibodies Methods and Applications. Methods
20, 280-291 (2000) M. Vieth et.al. Assessing
Search Strategies for Flexible Docking.
Practical Handbook of Genetic Algorithms.
Edited by Lance Chambers An Introduction to
Genetic Algorithms. Melanie Mitchell. Goodsell
and Olson Prot. Struct. Func. Genet, 8,
195(1990). Principals of Biochemistry
Lehninger R. Durbin, S Eddy, A. Krogh, G.
Mitchison Biological sequence analysis Wm. E.
Hart. A Theoretical Comparison of Genetic
Algorithms and Simulated Annealing Sandia
National Laboratories, www.cs.sandia.gov/wehart.