A Multiobjective Approach to Combinatorial Library Design - PowerPoint PPT Presentation

About This Presentation
Title:

A Multiobjective Approach to Combinatorial Library Design

Description:

Multiobjective optimisation via weighted-sum fitness function ... Darren Green, Andrew Leach; GlaxoSmithKline, UK. Funding by GlaxoSmithKline, UK ... – PowerPoint PPT presentation

Number of Views:186
Avg rating:3.0/5.0
Slides: 26
Provided by: Val1170
Category:

less

Transcript and Presenter's Notes

Title: A Multiobjective Approach to Combinatorial Library Design


1
A Multiobjective Approach to Combinatorial
Library Design
  • Val Gillet
  • University of Sheffield, UK

2
Outline
  • SELECT
  • GA based program for combinatorial library design
  • Combinatorial subset selection in product-space
  • Multiobjective optimisation via weighted-sum
    fitness function
  • Limitations of a weighted-sum approach
  • MoSELECT
  • Multiobjective optimisation via MOGA

3
Library Design is a Multiobjective Optimisation
Problem
  • Early HTS results disappointing
  • Low hit rates
  • Hits too lipophilic too flexible high molecular
    weights
  • Diverse libraries
  • Distance-based/cell-based diversity
  • Bioavailability cost ease of synthesis
  • Focused/targeted libraries
  • Similarity to known active predicted active by
    QSAR model fit to receptor site
  • Bioavailability cost,.

4
Product-Based Library Design
  • A two-component combinatorial library can be
    represented by a 2D array
  • A combinatorial subset can be defined by
    intersecting rows and columns of the array
  • Exploring all combinatorial subsets is equivalent
    to testing all permutations of the rows and
    columns of the array

5
Selecting Combinatorial Subsets Using a GA
  • Chromosome encoding
  • each chromosome represents a combinatorial subset
    as an integer string
  • one partition for each reactant pool
  • the size of a partition equals the no. of
    reactants required from the corresponding pool
  • Crossover, mutation and roulette wheel parent
    selection are used to evolve new potential
    solutions

6
Multiobjective Optimisation in SELECT
  • Weighted-sum fitness function
  • enumerate the combinatorial library represented
    by a chromosome
  • calculate descriptors for molecules in the
    library
  • Objectives are scaled and user defined weights
    are applied

7
Multiobjective Optimisation in SELECT cont.
  • Diversity indices
  • distance-based (e.g. sum of pairwise
    dissimilarities and Daylight fingerprints)
  • cell-based
  • Physical property terms
  • minimise the difference between the distribution
    in the library and some reference distribution,
    e.g.
  • drug-like profile derived from WDI
  • Cost
  • minimise the cost of the library

8
Library Enumeration in SELECT
  • Virtual library is enumerated upfront
  • ADEPT (A Daylight Enumeration and Profiling Tool)
  • Identify potential reactants
  • Filter out unwanted ones
  • Enumerate virtual library
  • Reaction Tookit (Reaction transforms MTZ
    language)
  • Descriptors are calculated upfront
  • Combinatorial subset accessed via fast lookup

9
Example Amide Library
  • 10K virtual library
  • 100 amines 100 carboxylic acids
  • 30 x 30 amide subsets
  • WDI World Drugs Index
  • Reactant-based selection diversity (Diversity
    0.564 )

25
WDI
20
15
Percentage of Compounds
10
5
0
0
200
400
600
800
Molecular weight
10
Limitations of a Weighted-Sum Fitness Function
  • Definition of fitness function difficult
    especially for different types of objectives
  • e.g. molecular weight profile and cost
  • Setting of weights is non-intuitive
  • Can result in regions of search space being
    obscured especially when objectives are in
    competition
  • Difficult to monitor progress since gt1 objective
    to follow simultaneously
  • A single solution is found

11
Varying Weights in SELECT
  • Objectives are in competition resulting in
    trade-offs
  • A family of alternative solutions exist that are
    all equivalent

12
Multiobjective Optimisation
  • Evolutionary algorithms, e.g., GAs
  • operate with a population of individuals
  • well suited to search for multiple solutions in
    parallel
  • readily adapted to deal with multiobjective
    optimisation
  • MOGA MultiObjective Genetic Algorithm
  • Fonseca Fleming. IEEE Transactions on Systems,
    Man, and Cybernetics-Part A Systems and Humans,
    28(1), 1998, 26-37.

13
MOGA
  • Multiple objectives are handled independently
    without summation and without weights
  • A hyper-surface is mapped out in the search space
  • represents a continuum of solutions where all
    solutions are seen as equivalent
  • represents compromises or trade-offs between the
    various objectives
  • solutions are called non-dominated, or Pareto
    solutions.
  • A family of non-dominated solutions is sought
    rather than a single solution

14
Dominance Pareto Ranking
f2
  • A non-dominated individual is one where an
    improvement in one objective results in a
    deterioration in one or more of the other
    objectives when compared with the other
    individuals in the population

A
B
f1
15
SELECT
16
MoSELECT Search Progress
17
Family of Solutions
  • Each run of MoSELECT results in a family of
    solutions
  • Finding the same coverage of solutions using
    SELECT would require multiple runs using various
    combinations of weights
  • One run of MoSELECT takes the same cpu time as
    one run of SELECT

5000 iterations
18
Focused Library Aminothiazoles
  • a-bromoketones thioureas extracted from ACD
  • ADEPT used to
  • filter reactants (MW lt 300 RB lt 8)
  • enumerate virtual library gt 12850 products (74
    a-bromoketones 170 thioureas)
  • MoSELECT used to design 1530 subsets optimised
    on
  • Similarity to a target compound (Daylight
    fingerprints)
  • Cost (/g)

19
MoSELECT Solutions 1
0 iterations
20
MoSELECT Solutions 2
5000 iterations
21
Moving to gt 2 ObjectivesParallel Graph
Representation
5000 iterations
0.578
0.582
Diversity
0.586
0.59
0.594
0.58
0.6
0.62
0.64
D
MW
Each objective is scaled using the Max and Min
values achieved when the objective is optimised
independently
22
Focused Library Amides
  • 100 100 virtual library
  • MoSELECT used to design 10 10 subsets
  • Objectives
  • Similarity to a target
  • Sum of similarities using Daylight fps
  • Predicted bioavailability
  • Each compound rated from 1 to 4
  • Sum of ratings
  • Hydrogen bond profile
  • Rotatable bond profile

23
MoSELECT Solutions
  • Population size 50
  • Iteration 5000
  • Niching 30
  • Number of solutions 11
  • CPU 53s (R12K 360 MHz)

24
Conclusions
  • Advantages of MoSELECT
  • a family of equivalent solutions is obtained in a
    single run with each solution representing one
    combinatorial library
  • this is achieved at vastly reduced computational
    cost compared to performing multiple runs of
    SELECT
  • no need to determine weights for objectives
  • optimisation of different types of objectives is
    readily achieved
  • visualisation of the search progress allows
    trade-offs between objectives to be observed
  • the user can make an informed choice on which
    solution(s) to explore

25
Acknowledgements
  • Illy Khatib, Peter Willett Information Studies,
    University of Sheffield
  • Peter Fleming Automatic Control and Systems
    Engineering, University of Sheffield
  • Darren Green, Andrew Leach GlaxoSmithKline, UK
  • Funding by GlaxoSmithKline, UK
  • John Bradshaw Daylight
  • Daylight for software support
Write a Comment
User Comments (0)
About PowerShow.com