Title: Screen Ligand based virtual screening presented by
1Screen Ligand based virtual screeningpresented
by maintained by Miklós Vargyas
Last update 13 April 2010
2Screen
Virtual screening by topological descriptors
3Screen
Description of the product
Screen performs high throughput virtual screening
of compound libraries using similarity
comparisons by various molecular descriptors.
Availabilty
- JChemBase
- JChem Oracle cartridge
- Instant Jchem
- Server version
- standalone command line application programs
- KNIME
- PipelinePilot
4Key features
- Various 2D descriptors
- ChemAxon chemical fingerprint (CCFP)
- PipelinePilot ECFP/FCFP
- ChemAxon pharmacophore fingerprint (CPFP)
- BCUT
- Scalars (logP, logD, Szeged index )
- custom descriptors, in-house fingerprints
- Optimized similarity measures
- Improves similarity prediction
- depends on set of known actives
- high enrichment ratios in virtual screening
- Multiple queries
- 3 types of hypotheses
- combined hit lists
5Benefits
- Versatile
- Use various descriptors in your well established
model - Access your trusted in-house fingerprint in IJC,
JCB, JCART - Easy integration in corporate discovery pipelines
- Search chemical files directly no need to import
structures in database - New descriptors are pluggable in deployed systems
- Optimal
- Consistent similarity scores
- Smaller hit set
- More focused library
6Benefits
More consistent similarity scores
optimized Tanimoto
0.20
regular Tanimoto
0.06
0.28
7Benefits
- High enrichment ratio
- Fewer false hits
- Known actives are true positive hits (ACE
inhibitors)
8Results
NPY-5 (pharmacophore similarity)
9Results
ß2-adrenoceptor (pharmacophore similarity)
10Case study at Axovan
- GPCR activity prediction
- distinguishing between GPCR subclasses
GPCR-Tailored Pharmacophore Pattern Recognition
of Small Molecular Ligands Modest von Korff and
Matthias Steger, JCICS 2004, 44
11Screen roadmap
- New molecular descriptors
- ECFP/FCFP (in 5.4)
- Shape descriptors (in 5.4)
- Hidden use of the optimiser
- No-pain black-box approach
- Simultaneous multi-descriptor search
- Enhanced IJC integration
- Easy descriptor configuration and generation
- Similarity search type instead of descriptors,
metrics and other unfriendly concepts
12Screen roadmap
- GUI
- New web interface (HTML/AJAX)
- Desktop application for descriptor generation
- 3D shape similarity
- fast pre-filtering by 3D fingerprint
- Alignment based volumetric Tanimoto calculation
- scaffold hopping by maximizing topological
dissimilarity and spatial similarity
13Supplementary slides
14A typical approach
01010101000101000101001000000000000100100000100101
00100100010000
query fingerprint
query
metric
00000001000011010000001010100000000001100000100001
00001000001000 01000101100100100101100110100111001
11101000000110000000110001000 01000101000111010100
00110000101000010011000010100000000100100000 00011
01110011101111110100000100010000110110110000000100
110100000 0100010100110100010000000010000000010010
000000100100001000101000 0100011100011101000100001
011101100110110010010001101001100001000 0101110100
11010101011111100001000001111110001000010000100010
1000 010001010011110101000010001000000001001000001
0100100001000101000 000100010001010001010010000000
0000001010000010000100000100000000 010001010001001
1000000000000000000010100000010000000000000000000
01000101000101000000000000001010000100100000000001
00000000000000 01010101011111001111101000000000000
11010100011100100001100101000 01000101000110000100
00011000000000010001000000110000000001100000 00000
00100000000010000100000000000001010100000000100000
100100000 0100010100010100000000100000000000010000
000000000100001000011000 0001000100001100010010100
000010100101011100010000100001000101000 0100011100
01010001000010000100111001001000001000110000000010
1000 010101010001010001010010000000000001001000001
0010100100100010000
hits
targets
target fingerprints
15ChemAxons approach
01000101000111010100001100001010000100110000101000
00000100100000 00011011100111011111101000001000100
00110110110000000100110100000 01000101001101000100
00000010000000010010000000100100001000101000 01011
10100110101010111111000010000011111100010000100001
000101000 0001000100010100010100100000000000001010
000010000100000100000000 0100010100010100000000000
000101000010010000000000100000000000000 0101010101
11110011111010000000000001101010001110010000110010
1000 010001010001100001000001100000000001000100000
0110000000001100000 000000010000000001000010000000
0000001010100000000100000100100000
01011101001101010101111110000100000111111000100
00100001000101000
hypothesis fingerprint
queries
optimized metric
optimization
00000001000011010000001010100000000001100000100001
00001000001000 01000101100100100101100110100111001
11101000000110000000110001000 01000101000111010100
00110000101000010011000010100000000100100000 00011
01110011101111110100000100010000110110110000000100
110100000 0100010100110100010000000010000000010010
000000100100001000101000 0100011100011101000100001
011101100110110010010001101001100001000 0101110100
11010101011111100001000001111110001000010000100010
1000 010001010011110101000010001000000001001000001
0100100001000101000 000100010001010001010010000000
0000001010000010000100000100000000 010001010001001
1000000000000000000010100000010000000000000000000
01000101000101000000000000001010000100100000000001
00000000000000 01010101011111001111101000000000000
11010100011100100001100101000 01000101000110000100
00011000000000010001000000110000000001100000 00000
00100000000010000100000000000001010100000000100000
100100000 0100010100010100000000100000000000010000
000000000100001000011000 0001000100001100010010100
000010100101011100010000100001000101000 0100011100
01010001000010000100111001001000001000110000000010
1000 010101010001010001010010000000000001001000001
0010100100100010000
hits
targets
target fingerprints
16Performance
- Chemical fingerprint generation 500/s
- Pharmacophore fingerprint generation
- calculated 80/s
- rule-based 200/s
- Screening 12000/s
- Optimization 10s/metric
- Hardware/software environment
- P4 3GHz, 1GB RAM
- Red Hat Linux 9
- Java 1.4.2
17Implementations
Use of various fingerprints and metrics in
JSP http//www.chemaxon.com/jchem/examples/jsp1_x
/index.jsp
UGM presentation by Aureus Pharma Improved
Virtual Screening Strategies and Enrichment of
Focused Libraries in Active Compounds Using
Target-Oriented Databases http//www.chemaxon.co
m/forum/viewpost2307.html
18Molecular similarity
Chemical, pharmacological or biological
properties of two compounds match. The more the
common features, the higher the similarity
between two molecules.
Chemical
Pharmacophore
19Similarity measures
- Quantitative assessment of similarity of
structures - need a numerically tractable form
- molecular descriptors, fingerprints, structural
keys
Sequences/vectors of bits, or numeric values that
can be compared by distance functions, similarity
metrics.
20Standard metrics
21Topological chemical fingerprint
- hashed binary fingerprint
- encodes topological properties of the chemical
graph connectivity, edge label (bond type), node
label (atom type) - allows the comparison of two molecules with
respect to their chemical structure
- Construction
- find all 0, 1, , n step walks in the chemical
graph - generate a bit array for each walks with given
number of bits set - merge the bit arrays with logical OR operation
22Construction of chemical fingerprint
23Chemical similarity
01000101000101000100000000011010100110101000000101
00000000100000
01000101000101000100000000011010100110101000000001
00000000100000
24Topological pharmacophore fingreprint
- encodes pharmacophore properties of molecules as
frequency counts of pharmacophore point pairs at
given topological distance - allows the comparison of two molecules with
respect to their pharmacophore
- Construction
- perceive pharmacophoric features
- map pharmacophore point type to atoms
- calculate length of shortest path between each
pair of atoms - assign a histogram to every pharmacophore point
pairs and count the frequency of the pair with
respect to its distance
25Pharmacophore perception
Rule based approach
- Rule 1 The pharmacophore type of an atom is an
acceptor, if - it is a nitrogen, oxygen or sulfur, and
- it is not an amide nitrogen or sulfur, and
- it is not an aniline nitrogen, and
- it is not a sulfonyl sulfur, and
- it is not a nitro group nitrogen.
26Exceptions to simple rules
n-cyano-methil piperidine
sp2 atom
exception ? extra rules ? large number of rules
? maintenance, performance
27Effect of pH
pH 7
pH 1
pH ? pH specific rules ? large number of rules
? maintenance, performance
28Pharmacophore perception
Calculation based approach
Step 1 estimation of pKa
allows the determination of the protonation state
for ionizable groups at the given pH
Step 2 partial charge calculation
29Pharmacophore perception
Calculation based approach
Step 3 hydrogen bond donor/acceptor
recognition Step 4 aromatic perception Step 5
pharmacophore property assignment
acceptor negatively charged acceptor acceptor and
donor hydrophobic none
30Pharmacophore fingerprint
Pharmacophore type coloring acceptor, donor,
hydrophobic, none.
31Fuzzy smoothing
32Virtual screening using fingerprints
01010101000101000101001000000000000100100000100101
00100100010000
query fingerprint
query
metric
00000001000011010000001010100000000001100000100001
00001000001000 01000101100100100101100110100111001
11101000000110000000110001000 01000101000111010100
00110000101000010011000010100000000100100000 00011
01110011101111110100000100010000110110110000000100
110100000 0100010100110100010000000010000000010010
000000100100001000101000 0100011100011101000100001
011101100110110010010001101001100001000 0101110100
11010101011111100001000001111110001000010000100010
1000 010001010011110101000010001000000001001000001
0100100001000101000 000100010001010001010010000000
0000001010000010000100000100000000 010001010001001
1000000000000000000010100000010000000000000000000
01000101000101000000000000001010000100100000000001
00000000000000 01010101011111001111101000000000000
11010100011100100001100101000 01000101000110000100
00011000000000010001000000110000000001100000 00000
00100000000010000100000000000001010100000000100000
100100000 0100010100010100000000100000000000010000
000000000100001000011000 0001000100001100010010100
000010100101011100010000100001000101000 0100011100
01010001000010000100111001001000001000110000000010
1000 010101010001010001010010000000000001001000001
0010100100100010000
hits
targets
target fingerprints
33Multiple query structures
01000101000111010100001100001010000100110000101000
00000100100000 00011011100111011111101000001000100
00110110110000000100110100000 01000101001101000100
00000010000000010010000000100100001000101000 01011
10100110101010111111000010000011111100010000100001
000101000 0001000100010100010100100000000000001010
000010000100000100000000 0100010100010100000000000
000101000010010000000000100000000000000 0101010101
11110011111010000000000001101010001110010000110010
1000 010001010001100001000001100000000001000100000
0110000000001100000 000000010000000001000010000000
0000001010100000000100000100100000
01011101001101010101111110000100000111111000100
00100001000101000
queries
hypothesis fingerprint
metric
00000001000011010000001010100000000001100000100001
00001000001000 01000101100100100101100110100111001
11101000000110000000110001000 01000101000111010100
00110000101000010011000010100000000100100000 00011
01110011101111110100000100010000110110110000000100
110100000 0100010100110100010000000010000000010010
000000100100001000101000 0100011100011101000100001
011101100110110010010001101001100001000 0101110100
11010101011111100001000001111110001000010000100010
1000 010001010011110101000010001000000001001000001
0100100001000101000 000100010001010001010010000000
0000001010000010000100000100000000 010001010001001
1000000000000000000010100000010000000000000000000
01000101000101000000000000001010000100100000000001
00000000000000 01010101011111001111101000000000000
11010100011100100001100101000 01000101000110000100
00011000000000010001000000110000000001100000 00000
00100000000010000100000000000001010100000000100000
100100000 0100010100010100000000100000000000010000
000000000100001000011000 0001000100001100010010100
000010100101011100010000100001000101000 0100011100
01010001000010000100111001001000001000110000000010
1000 010101010001010001010010000000000001001000001
0010100100100010000
hits
targets
target fingerprints
34Hypothesis fingerprints
Advantages
- allows faster operation
- compiles features common to each individual
actives - reduces noise
Hypothesis types
35Hypothesis fingerprints
36The need for optimization
Too many hits
37The need for optimization
Inconsistent dissimilarity values
38Parametrized metrics
asymmetry factor
scaling factor
39Optimization of metrics
Step 1 optimize parameters for maximum
enrichment Step 2 validate metrics over an
independent test set
40Optimization of metrics
Step 1 optimize parameters for maximum enrichment
query set
1111100010000100001000101000
query fingerprint
parametrized metric
41Optimization of metrics
v1
v2
v3
vi
vn
42Optimization of metrics
Step 2 validate metrics over an independent test
set
query set
1111100010000100001000101000
optimized metric
query fingerprint
43Results of Optimization
1. Similar structures get closer
0.20
0.06
0.28
44Results of Optimization
2. Hit set size reduced
Active set 18 mGlu-R1 antagonists Target set
10000 randomly selected drug-like structures
45Results of Optimization
3. Higher enrichment
46Results of Optimization
4. Top ranked structures are spikes
- offers a more intuitive way to evaluate the
efficiency of screening - based on sorting random set hits and known
actives on dissimilarity values and counting the
number of random set hits preceding each active
in the sorted list
0.014 0.015 0.017 0.020 0.022 0.023 0.027 0.041 0.
043
number of virtual hits
number of spikes retrieved
47Results
ACE (pharmacophore similarity)
48Results
NPY-5 (pharmacophore similarity)
49Results
ß2-adrenoceptor (pharmacophore similarity)
503D flexible search
- Expected top performance 200 structures/s