Title: Molecules by the Million
1Molecules by the Million Pattern recognition
and grid computing in drug discovery Graham
Richards University of Oxford
2The Genome The Human Genome project is virtually
complete. There are tens of thousands of
genes. What next?
3The Proteome The genome codes for hundreds
of thousands of proteins the proteome. Structures
are likely to emerge on a factory scale over the
next decade. What next?
4(No Transcript)
5Next steps Small molecules, drugs, to interact
with the proteins to modify action but, we
have Tens of thousands of genes Hundreds of
thousands of proteins Billions of small drug-like
potential molecules.
6Data base We need a data base of millions or
preferably billions of drug-like molecules of
known synthetic origin
7Data base (E.K. Davies, Dan Butler)
- Catalogue molecules 1.5 million
- 2. Combinatorial libraries 1 billion
- 3. Filter to satisfy Lipinski 35 million
- All these molecules have either been made
- or we have a synthetic route.
- 4. If necessary de novo derivatives can be
- created from the 35 million.
- 100 derivatives of each gives 3.5 billion
8Handling this information
Pattern recognition - Finding new leads -
Aligning molecules - Finding binding sites
9Finding new leads Take a known drug and find
those in the data base which are similar. (35
million compounds 15 minutes on a PC)
10Molecules which should mimic methotrexate
11Target unknown The big problem is molecular
alignment. Solution using methods of computer
vision.
12- Illustration of the translations generated when
- The two structures are at the wrong relative
rotation and - b) The two structures are at the correct relative
rotation.
(a)
(b)
13The HIV-1 Reverse Transcriptase inbitors used in
the second test set. a) Nevirapine and b)
Alpha-APA.
14Results of the alignment of the inhibitors of
HIV-1 Reverse Transcriptase. a) Experimental
Alignment and b) Result of Test.
b)
a)
15Protein known, binding site unknown We need
software to find binding sites. Multiscale
approach using k-means algorithm.
16Examples of models generated for the HIV-Reverse
Transcriptase inhibitor nevirapine. Only heavy
atoms are shown.
17Search results on HIV-Reverse Transcriptase.
Lowest energy grid points (shown in green) (up to
20,000 points). Protein shown as a red
ribbon. a) First iteration for nevirapine
18b) Third iteration for nevirapine
19c) Last iteration for nevirapine
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24Scoring binding
- Pharmacophore matching
- b) Binding energy calculation
25Computer power
Grid using screensaver - after SETI but with
massive data transmission Grid provided by United
Devices Inc.
26Screensaver Lifesaver Project
Started April 2001 Number of PCs gt 3.5
million CPU time gt 450,000 years
27Supercomputer power
Of 3.5 million devices, assume 400,000
active Average CPU 1.2 GHz P4 30 utilization
500 Tflops even derating this by one third for
bandwidth/ latency limitations 160 Tflops 3 x
the worlds fastest supercomputer
28Protein-Tyrosine-Phosphatase 1B (1BZH)
Number of hits 127,878 in total with free
energy binding predictions below zero.
29Filtering the results
3.5 billion molecules
Pharmacophore approach
Hundreds of thousands
Binding energy
100s
Molecular dynamics
10s of molecules as leads to synthesize and test
30Smallpox results (S. Kahn)
Target - topoisomerase of variolaHits - 900
good 50 very good
31Joining in www.grid.org more details www.chem.o
x.ac.uk Currently about 3.5 million PCs
have provided nearly 450,000 years of CPU time.
32(No Transcript)
33(No Transcript)