Title: Carcinogenicity%20prediction
1- Carcinogenicity prediction
- for Regulatory Use
- Natalja Fjodorova
- Marjana Novic,
- Marjan Vracko,
- Marjan Tušar
- National institute of Chemistry, Ljubljana,
Slovenia
2 Kemijske Dnevi 25-27 September 2008
UNIVERZA MARIBOR
3Overview
- 1. EU project CAESAR aimed for development of
QSAR models for prediction of toxicological
properties of substances, used for regulatory
purposes. - 2. The principles of validations of QSARs which
will be used for chemical regulation. - 3. Carcinogenicity models using Counter
Propagation Artificial Network
4- It is estimated that over 30000 industrial
chemicals used in Europe require additional
safety testing to meet requirements of new
chemical regulation REACH. - If conducted on animals this testing would
require the use of an extra 10-20 million animal
experiments. - Quantitative Structure Activity Relationships
(QSAR) is one major prospect between alternative
testing methods to be used in a regulatory
context.
5 aimed to develop (Q)SARs as non-animal
alternative tools for the assessment of chemical
toxicity under the REACH.
FR6- CAESAR European ProjectComputer Assisted
Evaluation of Industrial chemical Substances
According to Regulations
Coordinator- Emilio Benfenati- Istituto di
Ricerche Farmacologiche Mario Negri
6The general aim of CAESAR is
- 1. To produce QSAR models for toxicity
prediction of chemical substances, to be used for
regulatory purposes under REACH in a transparent
manner by applying new and unique modelling and
validation methods.
7- 2. Reduce animal testing and its associated
costs, in accordance with Council Directive
86/609/EEC and Cosmetics Directive (Council
Directive 2003/15/EC)
8CAESAR is solving several problems
- Ethical- save animal lifes
- Economical- cost reduction on testing
- Political- REACH implementation- new chemical
legislation
9- CAESAR aimed to develop new (Q)SAR models for 5
end-points - Bioaccumulation (BCF),
- Skin sensitisation
- Mutagenicity
- Carcinogenicity
- Teratogenicity
-
10The characterization of the QSAR models follows
the general scheme of 5 OECD principles
- A defined endpoint
- An unambiguous algorithm
- A defined domain of applicability
- Appropriate measures of goodness-of-fit,
robustness and predictivity - A mechanistic interpretation, if possible.
11Principle1- A defined endpoint
- Endpoint is the property or biological activity
determined in experimental protocol, (OECDTest
Guideline). - Carcinogenicity is a defined endpoint
- addressed by an officially recognized
- test method (Method B.32
- Carcinogenicity test Annex V to
- Directive 67/548/EEC).
12Principle2- An unambiguous algorithm
- Algorithm is the form of relationship between
chemical structure and property or biological
activity being modelled. - Examples
- 1. Statistically (regression) based QSARs
- 2. Neural network model, which includes both
learning process and prediction process.
13- Transparency in the (Q)SAR algorithm can be
provided by means of the following information - a) Definition of the mathematical form of a QSAR
model, or of the decision rule (e.g. in the case
of a SAR) - b) Definitions of all descriptors in the
algorithm, and a description of their derivation - c) Details of the training set used to develop
the algorithm.
14Principle3- A Defined Domain of Applicability
- The definition of the Applicability Domain (AD)
is based on the assumption that a model is
capable of making reliable predictions only
within the structural, physicochemical and
response space that is known from its training
set. - List of basic structures (for example, aniline,
fluorene..) - The range of chemical descriptors values.
-
15- Principle4- Appropriate measures
- goodness-of-fit,
- robustness (internal performance) and
- predictivity (external performance)
- The assessment of model performance is sometimes
called statistical validation.
16Principle5- A mechanistic interpretation, if
possible
- Mechanistic interpretation of (Q)SAR provides a
ground for interaction and dialogue between model
developer, and toxicologists and regulators, and
permits the integration of the (Q)SAR results
into wider regulatory framework, where different
types of evidence and data concur or compliment
each other as a basis for making decisions and
taking actions. - Example enhancing/inhibition the metabolic
activation of substances may be discussed.
17- National Institute of Chemistry in Ljubljana
(NIC-LJU) - is responsible for development of models for
predicton of carcinogenicity
18DATA ON CARCINOGENICITY
- 1.Studies of carcinogenicity in humans
- 2.Carcinogenicity studies in animals
- 3.Other relevant data
- additional evidence related to the possible
carcinogenicity - Genetic Toxicology
- Structure-Activity Comparisons
- Pharmacokinetics and Metabolism
- Pathology
19Cancer Risk Assessment IARC International
Agency for Research of Cancer
   IARC  For animals
 Group  Classification  Explanation Classification
Group A Human Carcinogen sufficient human evidence for causal association between exposure and cancer Â
Group B1 Probable Human limited evidence in human Â
Group B2 Probable Human inadequate evidence in humans and sufficient evidence in animals clear evidence
Group C Possible Human Carcinogen limited evidence in animals some evidence
Group D Not Classifiable as Human Carcinogenicity inadequate evidence in animals equivocal
Group E No Evidence of Carcinogenicity in Human at least two adequate animal tests or both negative epidemiology and animal studies no evidence
20Predictive Toxicology Approaches
- 1. Quantitative models (QSARs) Continuous data
prediction on the basis of experimental evidence
of rodent carcinogenic potential (TD50 tumorgenic
dose) - 2. Categorical models based on YES/NO data.
(P-positive NP-not positive)
21Dataset
805 chemicals were filtered from 1481compounds
taken from Distributed Structure-Searchable
Toxicity (DSSTox) Public Database Network
http//www.epa.gov/ncct/dsstox/sdf_cpdbas.html
which was derived from the Lois Gold
Carcinogenic Database (CPDBAS) The chemicals
involved in the study belong to different
chemical classes, (noncongeneric substances)
22Descriptors
- 252 MDL descriptors were calculated in program
MDL QSAR. - 2. Descriptors dataset was reduced to
- 27 MDL descriptors, using Kohonen map and
Principle Component Analisis.
23Counter Propagation Artificial Neural Network
Step1 mapping of molecule Xs (vector
representing structure) into the Kohonen layer
Step2 correction of weights in both, the Kohonen
and the Output layer
Step3 prediction of the four-dementional target
(toxicity) Ts
24Investigation of quantitative modelsshows us low
results RESPONCE- TD50mmol
Correlation coefficient in the external
validation is lower then 0.5
25Continuouse data models (Quantitative models)
Models Reduction of descriptors method, model TRAINING TRAINING TEST TEST
R_train RMSE R_test RMSE
CP ANN_model 250MDLdescriptors 0.74 1.51 0.47 1.78
CP ANN_model 86MDLdescriptors Kohonen map 0.72 1.54 0.42 1.90
CP ANN_model 27MDLdescriptors PCA 0.74 1.52 0.45 1.80
SVM_model (Thomas Ferrary) 86MDLdescriptors 0.82 1.23 0.47 1.81
26Investigation of categorical modelsshows us
satisfactory results
- YES/NO principe
- RESPONCE
- P-positive-active
- NP-not positive-inactive
27Characteristics used for validation of
categorical model
- true positive(TP),
- true negative (TN)
- Accuracy(AC), AC(TNTP)/(TNTPFNFP)
- TPrateSensitivity(SE)TP/(TPFN)
- TNrateSpecificity(SP)TN/(TNFP)
28Categorical model for dataset 805 chemicals
(Training644 and Test161), using 27 MDL
descriptors
 Training Training Training Test Test Test
 ACC, SE, SP, ACC, SE, SP,
Model_1 88 90 86 68 69 67
Model_2 92 99 85 68 73 63
29Confusion matrix TR(644)/TE(161)classes
(Positive- Negative)
Class Positive (predict.) Negative (predict.) Number TR(TE) 644(161)
Positive (experim.) 329(65) 3(24) 332(89)
Negative (experim.) 47(27) 265(45) 312(72)
FN
TP
TN
FP
30How we find optimal model, using threshold
Threshold0.45 Accuracy0.68 SE0.73 SP0.63
31Changing of threshold allows us to get models
with different statistical performances.
Tr SE SP ACC
0.05 0.91 0.15 0.57
0.1 0.83 0.36 0.62
0.15 0.8 0.47 0.65
0.2 0.79 0.47 0.65
0.25 0.79 0.47 0.65
0.3 0.79 0.53 0.67
0.35 0.78 0.57 0.68
0.4 0.73 0.6 0.67
0.45 0.73 0.63 0.68
0.5 0.65 0.63 0.64
0.55 0.62 0.72 0.66
0.6 0.62 0.74 0.67
0.65 0.6 0.76 0.67
0.7 0.58 0.76 0.66
0.75 0.54 0.78 0.65
0.8 0.52 0.79 0.64
0.85 0.45 0.83 0.62
0.9 0.31 0.89 0.57
0.95 0.24 0.93 0.55
1 0 1 0.45
32ROC(Receiver operating characteristic) curve
Training set
Test set
The area under the curve is 0.988 and 0.699 in
the training and test sets, respectively.
33How requrements of REACH reflect development of
models
- To focus model to high sensitivity in prediction
of carcinogenicity - From regulatory perspective, the higher
sensitivity in predicting carcinogens is more
desirable than high specificity - Sensitivity- percentage of correct predictions of
carcinogens - Specificity- percentage of correct predictions of
non-carcinogens
34Conclusion
- 1.We have bult the carcinogenicity models in
accordance with 5 OECD principles principle of
validation - 2. We have got satisfactory results for
categorical models with accuracy 68 which is
good for carcinogenicity as it meet the level of
uncertanty of test data. - 3. The goal of our future investigation will be
dedicated to research of relationship between
results of carcinogenicity tests and presence of
Genotoxic, non Genotoxic alerts using TOX TREE
program.
35Acknowledgements
- The financial support of the European Union
through CAESAR project (SSPI-022674) as well as
of the Slovenian Ministry of Higher Education,
Science and Technology (grant P1-017) is
gratefully acknowledged.
36