Title: Progress Report: Gridenabled Protein Docking Simulation using DOCK
1Progress Report Grid-enabled Protein Docking
Simulation using DOCK
- Cathy Chang, Daniel Goodman, Marshall Levesque,
Noah Ollikainen - General Goals
- To use DOCK to accurately simulate
protein-protein and protein-ligand docking - To find novel binding sites and inhibitors in a
high-throughput manner - To develop scripts and software to automate this
as much as possible on the grid environment
2Summary of Progress So Far
- DOCK 5 has been successfully set up and installed
on numerous local clusters at Osaka University,
and has been successfully set up to run in
parallel with GLOBUS and MPI using Perl scripts.
This task is largely complete, and we should have
few problems running DOCK on the available
machines. - Perl scripts were written to convert databases
from various formats and prep them for docking by
adding AMBER charges, correcting format errors,
and removing incompatible ligands. The NCI
diversity set was converted and tested in this
fashion. This task still remains troublesome,
despite the automation. - 1WBN has been successfully docked as a test
kinase to a small slice (2000 ligands) of the
Drug-Like subset ZINC database at
http//zinc.docking.org. However, there were some
problems with our results, which will be
discussed. - Using this test kinase as well as 1ND4, various
docking parameters were tested to find the
parameter settings - like number of minimization
steps and energy grid granularity that minimize
the time per ligand and maximize the supposed
accuracy of the simulation. The ideal parameters
largely depend on the protein and binding site in
question. This is a difficult problem, but we
have automated it to some degree using Perl.
3Computing Time/Resources
- Currently Accessible Machines
- 3 Local Clusters at Osaka (Cafe, Tea, and a
third), which we must share with other
researchers here via the Globus Queue - Another cluster (TDWT) which we have to
ourselves, but is not apart of the Grid - The ROCKS cluster at SDSC, accessible via the
Grid - We can gain access to more machines on the Grid,
including clusters in Taiwan, China, and
Australia, but have not done so yet. Applying for
an account could take as much as a week.
4Computing Time
- Currently, docking a flexible ligand to a rigid
protein with our current parameters (for 1ND4)
takes about 20-30 minutes on a single processor,
depending on the size of the ligand and the
number of rotable bonds. - Depending on various parameters, this can
fluctuate from as little as 2 minutes to as many
as 90. - When we performed a test run with our Drug-Like
ZINC subset on TDWT, it took about 6 hours to do
2000 ligands, with an average of 5 ligands per
minute for the whole cluster.
5Database Considerations
- At this speed, flexible docking of ligands takes
a prohibitively long time. - The Drug-like subset of ZINC has well over 2
million ligands, and flexibly docking them all
would take months. - From the literature, weve found two possible
ways of increasing throughput speed - Filter this database
- Use a smaller and more targeted database
6Database Filtering
- Filtering would involve checking each ligand for
specific side chains, solvent properties, number
of rotable bonds, molecular weight, and other
factors. We could filter for toxicity, similarity
to other molecules that bind to kinases, etc. - Some filtering criteria are easier to check for
than others some we could do with a quick
script, others might require complex software. - We are currently using a subset of the ZINC
database that is already filtered for drug-like
criteria, using the method described in Lipinski
et al. - Problems
- Currently not clear from a chemistry standpoint
which ligands to remove/keep - All software we were able to find which does
complex filtering is commercial only - Limiting to other ligands similar to molecules
that bind to other kinases (kinase-like) has
the potential to miss many molecules that bind
well
7Scoring
- GRID energy and contact scoring
- DOCK 5s two main scoring methods both use a
pre-computed energy grid. This is faster, but
does not give an absolute measure of binding
affinity. - Automated docking with grid-based energy
evaluationEC Meng, BK Shoichet, ID Kuntz -
Journal of Computational Chemistry, 1992 - There are also several other scoring methods
available, including two flavors of GBSA pairwise
free-energy scoring and an all-atom AMBER
force-field. - http//dock.compbio.ucsf.edu/DOCK_6/dock6_manual.h
tmScoring - However, these other scoring methods, while
sometimes more accurate, take much longer, and
usually are performed after the best orientation
and conformation has been found by a grid-based
score.
8Consensus Scoring
- Bissantz et. al. and Charifson et al. both
suggest that combining scoring methods increases
accuracy and removes many false positives. - However, many of the scoring methods they used
(Chemscore, Pmf, PLP, etc) are all commercial,
and thus not easily available to us - Also, this will increase our computation time
9Putative Human Kinase
- Cathy Changs Work
- Target protein novel human protein kinase
ACAD10 - discovered by Kristine Breidis, et al.
- Need known 3D structure for binding site
prediction and docking simulations therefore - Model novel target protein structure with
- Modeller, protein homology modeling
- Identify new possible binding sites on target
protein with - SVM, Support Vector Machine
- written by Jo-Lan Chung, graduate student under
Dr. Bourne - Visualize results and transfer data to Daniel for
docking
10Modeller 8v2
- Based on sequence alignment, Modeller uses
protein homology (comparative) modeling to
predict a possible 3D structure for a protein
with unknown structure
11Sample 1WBN vs. Prediction
Original 1WBN Modellers predicted 1WBN
- Modeller is able to predict the major
characteristics of 1WBN with room for improvement - Will attempt to model target protein with same
procedure - After modeling, we can predict possible binding
sites with SVM for docking
12Target Protein ACAD10 vs. 1ND4
- ACAD10 contains a total of 4 domains, including a
protein kinase region - Kinase region alignment identity closest to 1ND4
- Modeller fails to model a majority of target due
to lack of information - Output structure has a protein core with a long
tail region - Kinase region is composed of a quarter of target
sequence
13Problems
- To eliminate the tail region, tried to confirm
sequence alignment with 123D - Outputs 1JQI, which aligns best with tail region
- However, Modeller result outputs 2 cores
connected with a central chain - We tried BLAST for alternative sequence
alignments - Instead of 4 domains, only 3 are recognized, and
all top alignment results do not have PDB IDs - PDB file is one of the required input for Modeller
14Solutions
- 1 predict structure of kinase region only
- Since ACAD10 is identified as a protein kinase,
we modelled this region specifically against 1ND4 - The resulting 3D structure has a major core and a
smaller tail - 2 alternative modeling program Swiss-Model
- Swiss-Model automatically constructs 3D models
after automatic sequence alignments and homolog
search - The resulting structure is more complete
Left kinase domain
Right SWISS-MODEL
15Currently using 1ND4 as a DOCKing template
- How this affects the Virtual Screening
- Kristine has told us that the closest kinase
homolog is 1ND4, despite the problems that weve
had modeling it so far. - We have given Kristine our models, and asked her
to give the go ahead for final docking and/or
give us tips on how to improve the model. - We are currently attempting to fine tune the
docking parameters using 1ND4 as the receptor, so
that the ligand that is crystallized with the PDB
file docks in a similar fashion. - To the left are the superimposed backbones of
1ND4 and the modeled structure for our putative
human kinase. - While their backbones are the same, the side
chains are of course very different.
16Protein Tyrosine Phosphatases and DOCKspecific
to Marshall Levesques work
- Goals
- Examine known and potential binding sites of
SHP/Gab proteins - Use DOCK to screen ligand database against
tyrosine phosphatases SHP-1, SHP-2, and adapter
protein Gab2 in hopes of finding potential
inhibitors of activity and Gab binding. - Attempt to simulate protein-protein binding
between SHP-2 and Gab2
17What we have
- SHP-1 is the only protein with crystal structures
for both apo and bound forms - SHP-2s bound catalytic domain could be modeled
or substituted by that of SHP-1 - Gab has no structures, so produced models would
have to be used entirely.
18Problems with what we have
- The SHP-1 bound structures substrate is a long
peptide with many rotable bonds, allowing for a
large number of orientations and conformations to
be scored by DOCK. - DOCKs determined binding site is also large due
to the substrates size, increasing surface area
to test. - Differing input parameters have all given
unsatisfactory RMSD values and DOCK runtimes,
averaging gt4Å and 3-4hrs respectively. - The substrates bound orientation is dependent on
multiple binding pockets
Crystallized orientation of substrate, SIRP?, is
colored according to elements. Energy scoring
(yellow) and Contact scoring (blue) both gave
incorrectly bound forms, with their Tyr(P)
residues not in the base of the binding pocket
(cyan) which consists of the SHP-1 signature
motif.
19Dealing with what we have
- Options
- Alter the SIRP? peptide in order to reduce
rotable bonds, decrease the potential binding
site box, and concentrate on main binding residue
Tyr(P). - What parts could be removed changed needs to be
investigated - Find other known binding substrates for SHP-1 and
use dock to find/compare its orientation. - Other ideas?
SHP-1 catalytic domain and SIRP? Tyr(P)469
complex with spheres generated used to determine
binding pockets. PTP signature motifis labeled
with cyan and WPD Loop with red. Notice the box
contains a large portion of the protein.
20Some remaining problems
- Many scoring methods, filtering programs, and
general tools to do this type of study are only
available commercially - Its not clear how accurate our final model will
be - So far, on test molecules, Autodock and Dock
results have been somewhat different - It will be difficult to gauge the precision of
our scoring methods until we test these molecules
in vitro - although using consensus scoring and comparing
Autodock and Dock results will help narrow down
our leads - It is not clear what type of database is best
suited to this task - It is not clear that we have sufficient time and
resources to test a massive database( gt1 million
ligands)