Title: A Multiobjective Approach to Combinatorial Library Design
1A Multiobjective Approach to Combinatorial
Library Design
- Val Gillet
- University of Sheffield, UK
2Outline
- SELECT
- GA based program for combinatorial library design
- Combinatorial subset selection in product-space
- Multiobjective optimisation via weighted-sum
fitness function - Limitations of a weighted-sum approach
- MoSELECT
- Multiobjective optimisation via MOGA
3Library Design is a Multiobjective Optimisation
Problem
- Early HTS results disappointing
- Low hit rates
- Hits too lipophilic too flexible high molecular
weights - Diverse libraries
- Distance-based/cell-based diversity
- Bioavailability cost ease of synthesis
- Focused/targeted libraries
- Similarity to known active predicted active by
QSAR model fit to receptor site - Bioavailability cost,.
4Product-Based Library Design
- A two-component combinatorial library can be
represented by a 2D array - A combinatorial subset can be defined by
intersecting rows and columns of the array - Exploring all combinatorial subsets is equivalent
to testing all permutations of the rows and
columns of the array
5Selecting Combinatorial Subsets Using a GA
- Chromosome encoding
- each chromosome represents a combinatorial subset
as an integer string - one partition for each reactant pool
- the size of a partition equals the no. of
reactants required from the corresponding pool - Crossover, mutation and roulette wheel parent
selection are used to evolve new potential
solutions
6Multiobjective Optimisation in SELECT
- Weighted-sum fitness function
- enumerate the combinatorial library represented
by a chromosome - calculate descriptors for molecules in the
library - Objectives are scaled and user defined weights
are applied
7Multiobjective Optimisation in SELECT cont.
- Diversity indices
- distance-based (e.g. sum of pairwise
dissimilarities and Daylight fingerprints) - cell-based
- Physical property terms
- minimise the difference between the distribution
in the library and some reference distribution,
e.g. - drug-like profile derived from WDI
- Cost
- minimise the cost of the library
8Library Enumeration in SELECT
- Virtual library is enumerated upfront
- ADEPT (A Daylight Enumeration and Profiling Tool)
- Identify potential reactants
- Filter out unwanted ones
- Enumerate virtual library
- Reaction Tookit (Reaction transforms MTZ
language) - Descriptors are calculated upfront
- Combinatorial subset accessed via fast lookup
9Example Amide Library
- 10K virtual library
- 100 amines 100 carboxylic acids
- 30 x 30 amide subsets
- WDI World Drugs Index
- Reactant-based selection diversity (Diversity
0.564 )
25
WDI
20
15
Percentage of Compounds
10
5
0
0
200
400
600
800
Molecular weight
10Limitations of a Weighted-Sum Fitness Function
- Definition of fitness function difficult
especially for different types of objectives - e.g. molecular weight profile and cost
- Setting of weights is non-intuitive
- Can result in regions of search space being
obscured especially when objectives are in
competition - Difficult to monitor progress since gt1 objective
to follow simultaneously - A single solution is found
11Varying Weights in SELECT
- Objectives are in competition resulting in
trade-offs - A family of alternative solutions exist that are
all equivalent
12Multiobjective Optimisation
- Evolutionary algorithms, e.g., GAs
- operate with a population of individuals
- well suited to search for multiple solutions in
parallel - readily adapted to deal with multiobjective
optimisation - MOGA MultiObjective Genetic Algorithm
- Fonseca Fleming. IEEE Transactions on Systems,
Man, and Cybernetics-Part A Systems and Humans,
28(1), 1998, 26-37.
13MOGA
- Multiple objectives are handled independently
without summation and without weights - A hyper-surface is mapped out in the search space
- represents a continuum of solutions where all
solutions are seen as equivalent - represents compromises or trade-offs between the
various objectives - solutions are called non-dominated, or Pareto
solutions. - A family of non-dominated solutions is sought
rather than a single solution
14Dominance Pareto Ranking
f2
- A non-dominated individual is one where an
improvement in one objective results in a
deterioration in one or more of the other
objectives when compared with the other
individuals in the population
A
B
f1
15SELECT
16MoSELECT Search Progress
17Family of Solutions
- Each run of MoSELECT results in a family of
solutions - Finding the same coverage of solutions using
SELECT would require multiple runs using various
combinations of weights - One run of MoSELECT takes the same cpu time as
one run of SELECT
5000 iterations
18Focused Library Aminothiazoles
- a-bromoketones thioureas extracted from ACD
- ADEPT used to
- filter reactants (MW lt 300 RB lt 8)
- enumerate virtual library gt 12850 products (74
a-bromoketones 170 thioureas) - MoSELECT used to design 1530 subsets optimised
on - Similarity to a target compound (Daylight
fingerprints) - Cost (/g)
19MoSELECT Solutions 1
0 iterations
20MoSELECT Solutions 2
5000 iterations
21Moving to gt 2 ObjectivesParallel Graph
Representation
5000 iterations
0.578
0.582
Diversity
0.586
0.59
0.594
0.58
0.6
0.62
0.64
D
MW
Each objective is scaled using the Max and Min
values achieved when the objective is optimised
independently
22Focused Library Amides
- 100 100 virtual library
- MoSELECT used to design 10 10 subsets
- Objectives
- Similarity to a target
- Sum of similarities using Daylight fps
- Predicted bioavailability
- Each compound rated from 1 to 4
- Sum of ratings
- Hydrogen bond profile
- Rotatable bond profile
23MoSELECT Solutions
- Population size 50
- Iteration 5000
- Niching 30
- Number of solutions 11
- CPU 53s (R12K 360 MHz)
24Conclusions
- Advantages of MoSELECT
- a family of equivalent solutions is obtained in a
single run with each solution representing one
combinatorial library - this is achieved at vastly reduced computational
cost compared to performing multiple runs of
SELECT - no need to determine weights for objectives
- optimisation of different types of objectives is
readily achieved - visualisation of the search progress allows
trade-offs between objectives to be observed - the user can make an informed choice on which
solution(s) to explore
25Acknowledgements
- Illy Khatib, Peter Willett Information Studies,
University of Sheffield - Peter Fleming Automatic Control and Systems
Engineering, University of Sheffield - Darren Green, Andrew Leach GlaxoSmithKline, UK
- Funding by GlaxoSmithKline, UK
- John Bradshaw Daylight
- Daylight for software support