Chemoinformatics tools for lead discovery - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Chemoinformatics tools for lead discovery

Description:

The huge numbers of molecules available in public and in-house databases means ... Barnard Chemical Information, Daylight Chemical Information Systems, MDL ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 28

Provided by: JonP63

Category:

more less

Transcript and Presenter's Notes

Title: Chemoinformatics tools for lead discovery

1
Chemoinformatics tools for lead discovery

Peter Willett, University of Sheffield, UK

2
Overview of talk

Approaches to virtual screening
Fingerprint-based similarity searching
Turbo similarity searching
Conclusions

3
Virtual screening

The huge numbers of molecules available in public
and in-house databases means that there is a
requirement for tools to rank compounds in order
of decreasing probability of activity
Range of methods available, varying in the
sophistication and the amount of information that
is available
Use of structure-based methods when an X-ray
structure for the biological target is available
If this is not the case then must make use of
information about (potential) ligands

4
Ligand-Based Methods

Similarity searching
Use when just a single bioactive reference
structure is available
3D pharmacophore searching
Use when it has been possible to carry out a
pharmacophore mapping exercise
Machine learning
Use when a fair number of both actives and
inactives have been identified

5
Similarity Searching I

Use of a similarity measure to quantify the
resemblance between an active target, or
reference, structure and each database structure
The similar property principle means that
high-ranked structures are likely to have similar
activities to that of the target structure
Similarity searching hence provides an obvious
way of following-up on an initial active

6
Similarity searching II

Many ways in which the similarity between two
molecules can be computed
A similarity measure has two components
A structure representation
A similarity coefficient to compare two
representations
Most operational systems use similarity measures
based on 2D fingerprints and the Tanimoto
coefficient

7
Fragment bit-strings (fingerprints)

Originally developed for 2D substructure search
Similarity is based on the fragments common to
two molecules
Widely used in both in-house and commercial
chemoinformatics systems

8
Similarity coefficients

Tanimoto coefficient for binary bit strings
C bits set in common between Target and Database
Structure
T bits set in Target
D bits set in Database structure
Values between zero (no bits in common) and unity
(identical fingerprints)
Many other, related similarity coefficients
exist
Tversky, cosine, Euclidean distance ..

9
Combination of search techniques using data
fusion I

Tanimoto/fingerprint measures most common but
many other types, e.g.,
Computed physicochemical properties
3D grid describing the molecular electrostatic
potential
These reflect different molecular
characteristics, so may enhance search
performance by using more than one similarity
measure
Data fusion or consensus scoring

10
Combination of search techniques using data
fusion II

Combination of different rankings of the same
sets of molecules
Two basic approaches
Generate rankings from the same molecule using
different similarity measures (similarity fusion)
Generate rankings from different molecules using
the same similarity measure but different
molecules (group fusion)

11
Group fusion
Reference 1
12
After truncation to required rank
Reference 2
Reference 1
Reference 3
13
Fused
Group Fusion
Final truncated
r 1000

r 2000
New Active
Active found in earlier list
14
Group fusion rules

Useful performance increases, even with just 10
actives, as better coverage of structural space
with multiple starting points
Improvement most obvious when searching for
heterogeneous sets of active molecules
Best results obtained by
Fusing similarity coefficient values, rather than
ranks
Re-ranking using the maximum of the similarity
values associated with each molecule
Using the Tanimoto coefficient

15
Turbo similarity searching I

Similar property principle nearest neighbours
are likely to exhibit the same activity as the
reference structure
Group fusion improves the identification of
active compounds
Potential for further enhancements by group
fusion of rankings from the reference structure
and from its assumed active nearest neighbours

16
Turbo similarity searching II
REFERENCE STRUCTURE
RANKED LIST
NEAREST NEIGHBOURS
17
Experimental details

MDL Drug Data report (MDDR) dataset of 11
activity classes and 102K structures
In all, 8294 actives in the 11 classes, with
(turbo) similarity searches being carried out
using each of these as the reference structure
ECFP_4 fingerprints/Tanimoto coefficient
MAX group fusion on similarity scores
Increasing numbers of nearest neighbours

18
Numbers of nearest neighbours
19
Upper and lower bound experiments
20
Rationale for upper bound results

The true actives in the set of assumed actives
yield significant enhancements in performance
The true inactives in the set of assumed actives
have little effect on performance
Taken together, the two groups of compounds yield
the observed net enhancement

21
Use of machine-learning methods for similarity
searching I

Turbo similarity searching uses group fusion to
enhance conventional similarity searching
Machine learning is a more powerful virtual
screening tool than similarity searching
But requires a training-set containing known
actives and inactives
Given an active reference structure, a
training-set can be generated from
Using the k nearest neighbours of the reference
structure as the actives
Using k randomly chosen, low-similarity compounds
as the inactives

22
Use of machine-learning methods for similarity
searching II
23
Results I

Experiments with the MDDR dataset show that group
fusion better than machine-learning methods when
averaged over all of the classes
However, group fusion inferior for the most
diverse datasets (as measured by the mean
pair-wise similarities)
Additional searches using 10 MDDR activity
classes that are as structurally diverse as
possible

24
Results II
25
Conclusions I

Fingerprint-based similarity searching using a
known reference structure is long-established in
chemoinformatics
When small numbers of actives are available,
group fusion will enhance performance when the
sought actives are structurally heterogeneous

26
Conclusions II

Can also enhance conventional similarity search,
even if there is just a single active, by
assuming that the nearest neighbours are also
active
Can be effected in two ways
Use of group fusion to combine similarity
rankings (overall best approach)
Use of substructural analysis to compute fragment
weights (best with highly heterogeneous sets of
actives)

27
Acknowledgements

Collaborators
Jerome Hert, Martin Whittle and David Wilton
Pierre Acklin, Kamal Azzaoui, Edgar Jacoby and
Ansgar Schuffenhauer
Alexander Alex, Jens Loesel and Jonathan Mason
Funding, software and data support
Barnard Chemical Information, Daylight Chemical
Information Systems, MDL Information Systems,
Novartis Institutes for BioMedical Research,
Pfizer Global Research and Development, Royal
Society, Scitegic, Tripos, and the Wolfson
Foundation