Title: SAR vs QSAR or
1SAR vs QSAR or is QSAR different from SAR
Joanna Jaworska Procer Gamble, Brussels,
Belgium and Nina Jeliazkova IPP, Bulgarian
Academy of Sciences, Sofia, Bulgaria
2SAR vs. QSARhow could we say there is no
difference ?
- SAR is supposed to be not quantitative concept
- SAR is based on the notion of similarity
- Similar compounds have similar activity
- Dissimilar compounds have dissimilar activity
- QSAR aims to derive a quantitative model of the
activity
3SAR vs. QSARRoadmap
- What similarity means? A philosophers view and
implications to the toxicology - Are the basic tenets of SAR true ?
- What do similarity measures measure ?
- How does the similarity measure relate to QSAR
modeling ?
4Similarity philosophers view
- exploiting the similarity concept is a sign of
immature science (Quine) - it is ill defined to say A is similar to B and
it is only meaningful to say - A is similar to B with respect to C
implications for toxicology A chemical A
cannot be similar to a chemical B in absolute
terms but only with respect to some measurable
key feature
5Chemical Grouping by Similarity
Similarity between structures
Selected similar compounds
Similarity between points
?
6Structural similarity
- Does not imply always similarity in activity
- Martin et al. 2002 J.Med.Chem 45,4350-58
- Does not always imply similarity in descriptors
- Kubinyi, H., Chemical Similarity and Biological
activity (with permission of the author) -
7Structurally similar compounds can have very
different properties
8Example Y.Martin et al ( 2002) Do structurally
similar molecules have similar biological
activity ?
- Set of 1645 chemicals with IC50s for monoamine
oxidase inhibition - Daylight fingertips 1024 bits long ( 0-7 bonds)
- Using Tanimoto coeff with a cut off value 0f 0.85
only 30 of actives were detected
Cutoff values of actives detected False
positives
J. Med. Chem. 2002,45,4350-4358
9How else to measure chemical similarity ?
- Describe chemical compounds with a set of
numerical values ( fingerprints, diverse
descriptors, field values, etc.) - Set up some measure between values (Euclidean
distance, Tanimoto distance, Carbo similarity
index, etc.)
What do we actually measure ? And how it is
related to the activity ?
10What do we measure ?
The distance between numerical representations of
chemical compounds
- A few warnings
- The numerical representation is not unique
- The numerical representation includes only part
of all the information about the compound - A distance measure reflects closeness only if
the data holds specific assumptions - (next slide - example)
11Distances - example
by Euclidean distance we will decide that the red
point is closer to the data set 2, while a human
will note that it belongs to the data set 1.
- Distances give results which are not always
expected intuitively - Be aware of the assumptions behind distances
(e.g. Euclidean distance gives good results with
normally distributed data in orthogonal space)
12How do we represent a chemical compound ?
Fingerprints, Descriptors (more than 3000
available), electron density, various fields, etc.
All representations lose information. We should
ensure this information is not important. How?
13Finding important information
- A problem not unique to (Q)SAR
- Lot of methods available
- Most popular (e.g. PCA ) not the best
Possible solution look for the most
discriminative information (example descriptors
which provide best discrimination between active
and inactive compounds)
14SAR vs. QSARhow could we say there is no
difference ?
- Two common things to this point
- Both methods use numerical representation of
chemical compounds - Both methods need to decide which representation
to use
One more difference SAR is a qualitative not a
quantitative relationship Is this true indeed?
15Similarity and Activity
- Proximity with respect to descriptors does not
necessary mean proximity with respect to the
activity (example) - This is only true if a linear relationship holds
between descriptors and activity (examples) - The linear relationship is only a special case,
given the complexity of biochemical interactions.
Its use should be justified in every specific
case - Structural similarity should be used with care
(examples)
16Neighbourhood principle
- Molecules in the same local region
(neighbourhood) of a descriptor space tend to
have similar values of a desired property - Contradictory evidence exists both supporting
and rejecting
17Neibourhood principle Analysis
Depends on the relationship between the
descriptors and activity !!!
18Neighbourhood principle Lessons
- In order to apply the neighbourhood principle
the TYPE of the relationship between descriptor
and activity should be known - The neighbourhood principle is genuine only if
the relationship is LINEAR - The linear relationship is only a simple special
case, given the complexity of biochemical
interactions. Its use SHOULD BE JUSTIFIED in
every specific case.
19SAR vs QSAR
- SAR is based on the similarity principle
- The principle is assumed, but in the reality it
is not always true - Similarity of structures
- Similarity of descriptors
- The authenticity depends on the type of the
relationship between descriptors (numerical
representation of chemicals) and activity - The type of the relationship should be known (or
derived)
20SAR vs. QSARhow could we say there is a
difference ?
- Three common things to this point
- Both methods use numerical representation of
chemical compounds - Both methods need to decide which representation
to use - Both methods need to derive the relationship
between numerical representation (descriptors,
etc.) and activity.
21Thank you!
When you can measure what you are speaking about,
and express it in numbers, you know something
about it but when you cannot measure it, when
you cannot express it in numbers, your knowledge
is of a meager and unsatisfactory kind it may
be the beginning of knowledge, but you have
scarcely advanced to the stage of
science. William Thomson, Lord Kelvin