Title: Kein Folientitel
1Introduction to QSAR (Quantitative Structure
Activity Relationships)
2Introduction to QSAR
- Example of a Qualitative Structure Activity
Relationship (SAR)
Affinity to Serotonin Receptor 1. decreases by
N-alkylation (R1/R2) 2. decreases by
bis-methylation (R3/R4) 3. increases by
methoxylation in R5 and/or R7 4. increases
by adding lipophilic substituents in R6
3Introduction to QSAR
- In contrast to Qualitative SAR, Quantitative SAR
(QSAR) seeks to find a mathematical relationship
between biological activity and molecular
properties - General form of a QSAR equation
- biol. activity f(P) with P molecular
property/ies - or more specifically
- biol. activity const. (c1.P1) (c2.P2)
(c3.P3) ... - Molecular properties (descriptors) P are
calculated for each molecule in the data set - Coefficients c and constant term are calculated
by statistical methods (e.g. multiple linear
regression)
4Molecular QSAR-Descriptors
- 1D Whole-molecule properties (e.g. molecular
weight, melting point, logP etc.) - 2D Substituent constants (e.g. ?, ?, molar
refractivity), fragment fingerprints,
topological indices - 3D Surface or field properties (e.g.
electrostatic potential, , steric fields,
hydrophobicity, solvent accessible surface area
etc.),
5Introduction to QSAR
- Why QSAR?
- QSAR models are derived from a series of
(similar) molecules with known activity (training
set) - If a statistically relevant QSAR model has been
found, it can be applied to new molecules in this
series (test set) in order to predict their
activity before biological testing (or even
before synthesis!)
6Introduction to QSAR
Example Analgesic activity of Capsaicin analogs
(taken from Walpole et al., Sandoz)
7Introduction to QSAR
8Introduction to QSAR
- The Gibbs-Helmholtz equation (?GRTlnK) tells us
that there is a logarithmic relationship between
equilibrium constants (e.g. EC50) and free energy
of binding - Thus, we have to transform the EC50 values to a
logarithmic scale
9Introduction to QSAR
- Now, we require some molecular properties
(descriptors)... - The Sandoz group decided to use two substituent
constants the hydrophobic constant ? and the
molar refractivity (MR) (correlated with the size
and polarizability of the substituents)
10Introduction to QSAR
- We can plot the descriptor values vs. Log EC50 ...
- ...and we can calculate linear equations for both
parameters - Log EC50 0.76 - (0.82)?
- Log EC50 1.14 - (0.07)MR
Our first QSAR equations!!!
11Introduction to QSAR
- How larger are the errors that we make?
?-Equation Log EC50 0.76 - (0.82)?
12Introduction to QSAR
- How larger are the errors that we make?
MR-Equation Log EC50 1.14 - (0.07)MR
13Introduction to QSAR
- How larger are the errors that we make?
Actual (measured) Log EC50 vs Predicted Log EC50
Correlation coefficients (R2) 0.88 0.53
14Introduction to QSAR
- Can we do any better by using both parameters in
the equation (multiple linear regression (MLR)
instead of simple linear regression)?
Best MLR Equation Log EC50 0.76 - (0.82)?
(0.0003)MR (Corr. Coeff. R2 0.89)
15Introduction to QSAR
- How can we validate these QSAR models?
- Prediction within the training set (e.g. by
leave-one-out cross validation) - leave out each compound once
- calculate QSAR model with remaining compounds
only - predict activity of left-out compound
- compare prediction with "true" affinity
- calculate "cross validated" R2 (often reported as
Q2) - Prediction of the test set
16Introduction to QSAR
- How can we validate these QSAR models?
- Prediction of the training set (cross
validation) - Log EC50 0.76 - (0.82)? R20.89 Q20.72
- Log EC50 1.14 - (0.07)MR R20.53 Q20.28
- Log EC50 0.76 - (0.82)? (0.0003)MR R20.89
Q20.58
17Introduction to QSAR
- How can we validate these QSAR models?
- Prediction of the test set (compound 6i in this
example)
- Log EC50 0.76 - (0.82)? Predicted EC50
for 6i 1.56 - Log EC50 1.14 - (0.07)MR Predicted
EC50 for 6i 0.42 - Log EC50 0.76 - (0.82)? (0.0003)MR
Predicted EC50 for 6i 1.57
- Now we have a problem....
18Introduction to QSAR
- Some problems associated with "classical" QSAR
- Only applicable within a chemical series
- A good training set must be available
- Activity data should be evenly spread
- Activity data should span 3-4 orders of magnitude
(log units) - Choice of meaningful descriptors
- Problem of extrapolation (e.g. descriptors of
test compounds lie out of descriptor range of
training set) - Non-linear relationships are hard to detect