Title: In silico docking on grid infrastructures
1In silico docking on grid infrastructures
- Jean Salzemann
- LPC of Clermont-Ferrand, France (CNRS/IN2P3)
- Embrace Workshop, Helsinki, 2006/06/17
- Credit Nicolas Jacq, Vincent Breton
2Content
- WISDOM initiative
- Challenges of the high throughput virtual docking
- Development of a grid environments for a
large-scale deployment - Achieved deployment on EGEE infrastructure
- Wide In Silico Docking On Malaria
- Accelerate drug design against H5N1 neuraminidase
- Perspectives
3WISDOM initiative
- WISDOM initiative aims to demonstrate the
relevance and the impact of the grid approach to
address drug discovery for neglected and emerging
diseases. - First achieved experiences
- Summer 2005 Wide In Silico Docking On Malaria
(WISDOM) - Spring 2006 Accelerate drug design against H5N1
neuraminidase - Partners
- Grid infrastructures EGEE, Auvergrid, TWGrid
- European projects Embrace, BioinfoGrid, Share,
Simdat - Institutes and association Fraunhofer SCAI,
Academia Sinica of Taiwan, ITB, Unimo University,
LPC, CMBA, CERN-ARDA, HealthGrid
4There is a need
- to develop new drugs for the diseases of the
developing world - HIV/AIDS, malaria and Tuberculosis account for
5,6 million deaths - Permanent necessity to develop new drugs to fight
emerging resistance to drugs (malaria) - Unchanged pharmacopeia for decades against
trypanosomiasis, leishmaniasis, Chagas disease,
... - to be able to develop quickly new drugs against
emerging diseases - H5N1, SRAS, dengue are recent examples of
emerging diseases - Many factors like world-wide exchanges can help
propagation of such diseases at a large scale - Necessity to quickly adapt to emerging resistances
5Phases of a pharmaceutical development
Molecular Docking Predict how small molecules,
such as substrates or drug candidates, bind to a
receptor of known 3D structure
Target discovery
Lead discovery
Target Identification
Target Validation
Lead Identification
Lead Optimization
Clinical Phases (I-III)
Duration 12 15 years, Costs 500 - 800 million
US
6Grid-enabled virtual screening workflow
Grid service customers
Biology teams
Chemist/biologist teams
Data access for expert teams in the world
Check point
Check point
Check point
Grid infrastructure
Selected hits
Target
Hits
Annotation services
Docking services
MD services
Grid service providers
Chimioinformatics teams
Bioinformatics teams
7Challenges for high throughput virtual
dockingExample data challenge against H5N1 NA
Millions of chemical compounds available in
laboratories
In vitro high Throughput Screening 1/compound,
nearly impossible
- 300,000 Chemical compounds
- ZINC
- Chemical combinatorial library
Molecular docking (Autodock) 100 CPU years, 600
GB data
Data challenge on EGEE, Auvergrid, TWGrid 6
weeks on 2000 computers
In vitro screening of 100 hits
Hits sorting and refining
Target (PDB) Neuraminidase (8 structures)
8Issues for the grid-enabled high throughput
virtual docking
- Computer-based in-silico screening can help to
identify the most promising leads for biological
tests - systematic and productive
- reduces the cost of trail-and-error approach
- In silico docking is well-fitted for grid
deployment - CPU intensive application
- Huge amount of output
- No communication between tasks
- Issues of a large scale grid deployment
- The rate of submitted jobs must be carefully
monitored - The amount of transferred data impacts on grid
performance - Grid process introduces significant delays
- Licensed software requires licenses distribution
strategy on grid
9Grid tools of the data challenges
- WISDOM
- a workflow of grid job handling automated job
submission, status check and report, error
recovery - push model job scheduling
- batch mode job handling
- http//wisdom.eu-egee.fr
- DIANE
- a framework for applications with master-worker
model - pull mode job scheduling
- interactive mode job handling with flexible
failure recovery feature - http//cern.ch/diane
10WISDOM components
Installer
Tester
User
wisdom_install
wisdom_test
Set of jobs
wisdom_execution Workload definition Job
submission Job monitoring Job bookkeeping Fault
tracking Fault fixing Job resubmission
GRID Grid services (RB, RLS) Grid resources (CE,
SE) Application components (Software, database)
Superviser
License server
Accounting data
wisdom_collect
wisdom_site
wisdom_db
11Simplified grid workflow for WISDOM
Results
Subsets
WISDOM production system
Site1
Statistics
Jobs
Parameter settings Target structures
Resource Broker
User interface
Site2
Subsets
Compounds database
Storage Element
Software
Results
- FlexX license server
- 3000 floating licenses given by BioSolveIT to
SCAI - Maximum number of used licenses was 1008
12Grid resources of the data challenges
- EGEE-II
- AuverGrid
- TWGrid
- a world-wide infrastructure providing freely over
than 5,000 CPUs and 21 TB for biomedical
applications
13First biomedical data challenge World-wide In
Silico Docking On Malaria (WISDOM)
- Significant biological parameters
- 2 different docking applications (Autodock and
FlexX) - About 1 million virtual compounds selected
- Target proteins from the parasite responsible for
malaria - Significant numbers
- Total of about 46 million ligands docked in 6
weeks - 1TB of data produced
- Up 1700 computers in 15 countries used
simultaneously - About 80 CPU years
- Average crunching factor 600
Number of docked compounds vs time
Number of running and waiting jobs vs time
14Second biomedical data challenge Accelerate drug
design against H5N1 neuraminidase
- Significant biological parameters
- 1 docking application (Autodock)
- About 300,000 virtual compounds selected
- Target proteins with predicted mutations involved
in the virus multiplication
- Significant numbers
- Total of about 2,5 million ligands docked in 6
weeks - 600 GB of data produced
- Up 2000 computers in 17 countries used
simultaneously corresponding to about 105 CPU
years - Average crunching factor 900
Rate of jobs by EGEE federation
15Selecting the promising compounds
- The in-silico screening provides not only the
docking poses of a compound against the target
but also the docking energy - By ranking the information, chemist can select
the promising compounds to go on the
structure-based drug design for potential drugs
16Perspectives
- Second large scale docking on EGEE in fall 2006
- Several new foreseen targets on malaria, dengue
and other neglected diseases. - Resources needed 80 CPU years per target
- Supported by EGEE-II and EELA european projects,
Swiss BioGrid initiative - Collaboration is open for new targets,software
infrastructures -
- Reranking of WISDOM hits by Molecular Dynamics
simulations - Supported by BioinfoGrid EGEE-II european
projects - Interest for ressources on supercomputers
(contact with DEISA) - Best hits further processed through in vitro
testing and structure activity relationships