Chemoinformatics tools for lead discovery - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Chemoinformatics tools for lead discovery

Description:

The huge numbers of molecules available in public and in-house databases means ... Barnard Chemical Information, Daylight Chemical Information Systems, MDL ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 28
Provided by: JonP63
Category:

less

Transcript and Presenter's Notes

Title: Chemoinformatics tools for lead discovery


1
Chemoinformatics tools for lead discovery
  • Peter Willett, University of Sheffield, UK

2
Overview of talk
  • Approaches to virtual screening
  • Fingerprint-based similarity searching
  • Turbo similarity searching
  • Conclusions

3
Virtual screening
  • The huge numbers of molecules available in public
    and in-house databases means that there is a
    requirement for tools to rank compounds in order
    of decreasing probability of activity
  • Range of methods available, varying in the
    sophistication and the amount of information that
    is available
  • Use of structure-based methods when an X-ray
    structure for the biological target is available
  • If this is not the case then must make use of
    information about (potential) ligands

4
Ligand-Based Methods
  • Similarity searching
  • Use when just a single bioactive reference
    structure is available
  • 3D pharmacophore searching
  • Use when it has been possible to carry out a
    pharmacophore mapping exercise
  • Machine learning
  • Use when a fair number of both actives and
    inactives have been identified

5
Similarity Searching I
  • Use of a similarity measure to quantify the
    resemblance between an active target, or
    reference, structure and each database structure
  • The similar property principle means that
    high-ranked structures are likely to have similar
    activities to that of the target structure
  • Similarity searching hence provides an obvious
    way of following-up on an initial active

6
Similarity searching II
  • Many ways in which the similarity between two
    molecules can be computed
  • A similarity measure has two components
  • A structure representation
  • A similarity coefficient to compare two
    representations
  • Most operational systems use similarity measures
    based on 2D fingerprints and the Tanimoto
    coefficient

7
Fragment bit-strings (fingerprints)
  • Originally developed for 2D substructure search
  • Similarity is based on the fragments common to
    two molecules
  • Widely used in both in-house and commercial
    chemoinformatics systems

8
Similarity coefficients
  • Tanimoto coefficient for binary bit strings
  • C bits set in common between Target and Database
    Structure
  • T bits set in Target
  • D bits set in Database structure
  • Values between zero (no bits in common) and unity
    (identical fingerprints)
  • Many other, related similarity coefficients
    exist
  • Tversky, cosine, Euclidean distance ..

9
Combination of search techniques using data
fusion I
  • Tanimoto/fingerprint measures most common but
    many other types, e.g.,
  • Computed physicochemical properties
  • 3D grid describing the molecular electrostatic
    potential
  • These reflect different molecular
    characteristics, so may enhance search
    performance by using more than one similarity
    measure
  • Data fusion or consensus scoring

10
Combination of search techniques using data
fusion II
  • Combination of different rankings of the same
    sets of molecules
  • Two basic approaches
  • Generate rankings from the same molecule using
    different similarity measures (similarity fusion)
  • Generate rankings from different molecules using
    the same similarity measure but different
    molecules (group fusion)

11
Group fusion
Reference 1
12
After truncation to required rank
Reference 2
Reference 1
Reference 3
13
Fused
Group Fusion
Final truncated
r 1000

r 2000
New Active
Active found in earlier list
14
Group fusion rules
  • Useful performance increases, even with just 10
    actives, as better coverage of structural space
    with multiple starting points
  • Improvement most obvious when searching for
    heterogeneous sets of active molecules
  • Best results obtained by
  • Fusing similarity coefficient values, rather than
    ranks
  • Re-ranking using the maximum of the similarity
    values associated with each molecule
  • Using the Tanimoto coefficient

15
Turbo similarity searching I
  • Similar property principle nearest neighbours
    are likely to exhibit the same activity as the
    reference structure
  • Group fusion improves the identification of
    active compounds
  • Potential for further enhancements by group
    fusion of rankings from the reference structure
    and from its assumed active nearest neighbours

16
Turbo similarity searching II
REFERENCE STRUCTURE
RANKED LIST
NEAREST NEIGHBOURS
17
Experimental details
  • MDL Drug Data report (MDDR) dataset of 11
    activity classes and 102K structures
  • In all, 8294 actives in the 11 classes, with
    (turbo) similarity searches being carried out
    using each of these as the reference structure
  • ECFP_4 fingerprints/Tanimoto coefficient
  • MAX group fusion on similarity scores
  • Increasing numbers of nearest neighbours

18
Numbers of nearest neighbours
19
Upper and lower bound experiments
20
Rationale for upper bound results
  • The true actives in the set of assumed actives
    yield significant enhancements in performance
  • The true inactives in the set of assumed actives
    have little effect on performance
  • Taken together, the two groups of compounds yield
    the observed net enhancement

21
Use of machine-learning methods for similarity
searching I
  • Turbo similarity searching uses group fusion to
    enhance conventional similarity searching
  • Machine learning is a more powerful virtual
    screening tool than similarity searching
  • But requires a training-set containing known
    actives and inactives
  • Given an active reference structure, a
    training-set can be generated from
  • Using the k nearest neighbours of the reference
    structure as the actives
  • Using k randomly chosen, low-similarity compounds
    as the inactives

22
Use of machine-learning methods for similarity
searching II
23
Results I
  • Experiments with the MDDR dataset show that group
    fusion better than machine-learning methods when
    averaged over all of the classes
  • However, group fusion inferior for the most
    diverse datasets (as measured by the mean
    pair-wise similarities)
  • Additional searches using 10 MDDR activity
    classes that are as structurally diverse as
    possible

24
Results II
25
Conclusions I
  • Fingerprint-based similarity searching using a
    known reference structure is long-established in
    chemoinformatics
  • When small numbers of actives are available,
    group fusion will enhance performance when the
    sought actives are structurally heterogeneous

26
Conclusions II
  • Can also enhance conventional similarity search,
    even if there is just a single active, by
    assuming that the nearest neighbours are also
    active
  • Can be effected in two ways
  • Use of group fusion to combine similarity
    rankings (overall best approach)
  • Use of substructural analysis to compute fragment
    weights (best with highly heterogeneous sets of
    actives)

27
Acknowledgements
  • Collaborators
  • Jerome Hert, Martin Whittle and David Wilton
  • Pierre Acklin, Kamal Azzaoui, Edgar Jacoby and
    Ansgar Schuffenhauer
  • Alexander Alex, Jens Loesel and Jonathan Mason
  • Funding, software and data support
  • Barnard Chemical Information, Daylight Chemical
    Information Systems, MDL Information Systems,
    Novartis Institutes for BioMedical Research,
    Pfizer Global Research and Development, Royal
    Society, Scitegic, Tripos, and the Wolfson
    Foundation
Write a Comment
User Comments (0)
About PowerShow.com