Title: P. Gramatica1, H. Walter2 and R. Altenburger2
1RANKING OF EEC PRIORITY LIST 1 CHEMICALS FOR
STRUCTURAL SIMILARITY AND MODELLING OF ALGAL
TOXICITY
D 12
P. Gramatica1, H. Walter2 and R.
Altenburger2 1QSAR Research Unit - DBSF -
University of Insubria - VARESE - ITALY 2UFZ
Centre for Environmental Research - LEIPZIG -
GERMANY e-mail paola.gramatica_at_uninsubria.it
Web http//fisio.dipbsf.uninsubria.it/qsar
INTRODUCTION Environmental exposure situations
are often characterized by a multitude of
heterogeneous chemicals with different mechanisms
of action and type of effect. The EEC priority
List 1 (Council Directive 76/464/EEC) consists of
heterogeneous environmental chemicals with mostly
unknown or unspecific modes of action, so it was
used to select components for mixture experiments
in the EEC PREDICT (Prediction and Assessment of
the Aquatic Toxicity of Mixtures of Chemicals)
project. A list of 202 compounds was studied for
structural similarity to identify the most
representative and dissimilar chemicals and to
find an objective method to group them on the
basis of their structural aspects.
STRUCTURAL DESCRIPTION OF COMPOUNDS Molecular
descriptors represent the way chemical
information contained in the molecular structure
is transformed and coded. Among the theoretical
descriptors, the best known, obtained simply from
the knowledge of the formula, are molecular
weight and count descriptors (1D-descriptors,
i. e. counting of bonds, atoms of different kind,
presence or counting of functional groups and
fragments, etc.). Graph-invariant descriptors
(2D-descriptors, including both topological and
information indices), are obtained from the
knowledge of the molecular topology. WHIM
molecular descriptors 1 contain information
about the whole 3D-molecular structure in terms
of size, symmetry and atom distribution. All
these indices are calculated from the
(x,y,z)-coordinates of a three-dimensional
structure of a molecule, usually from a spatial
conformation of minimum energy 37
non-directional (or global) and 66 directional
WHIM descriptors are obtained. A complete set of
about two hundred molecular descriptors has been
obtained 2. 1 Todeschini R. and Gramatica P.
Quant.Struct.-Act.Relat. 1997, 16, 113-119 2
Todeschini R. and Consonni V. - DRAGON - Software
for the calculation of the molecular
descriptors., Talete srl, Milan (Italy) 2000.
Download http//www.disat.unimib.it/chm.
CHEMOMETRIC METHODS Several chemometric analyses
have been applied to the compounds (represented
by molecular descriptors) to group the more
similar ones, in accordance with a multivariate
structural approach, and with the final aim to
highlight the structurally most dissimilar
compounds. The analyses performed
are Hierarchical Cluster Analysis hierarchical
clustering was performed with the aim of finding
clusters of the studied compounds in high
dimensional space, using molecular descriptors as
variables. Different distance metrics (Euclidean,
Manhattan, Pearson) and different linkages
(Complete, average, single, etc.) were used and
compared to find the best way to cluster these
compounds. Principal Component Analysis (PCA)
this analysis was used to calculate just a few
components from a large number of variables.
These components allow the highlighting of the
distribution of the compounds according to
structure, and find the similarity between
compounds assigned to the same cluster. Kohonen
Maps this is an additional way of mapping
similar compounds by using the so-called
self-organized topological feature maps, which
are maps that preserve the topology of a
multidimensional representation within a toroidal
two-dimensional representation. The position of
the compounds in this map shows the similarity
level of the structure of the EEC List 1
compounds.
The chemicals selected as the structurally most
dissimilar compounds are N. Substance
Chemical Class 1 atrazine
Triazine 2 biphenyl
Aromatic 3 chloralhydrat
Chlorinated aliphatics 4 2,4,5-trichlorophenol
Benzene derivative 5 fluoranthene
PAH 6 lindane
HCH 7 naphthalene
PAH 8 parathion Organophosphate 9 p
hoxime Organophosphate 10 tributylti
n chloride Organotin 11 triphenyltin
chloride Organotin
REGRESSION MODELS QSAR models were developed by
Ordinary Least Square regression (OLS) method.
The selection of the best subset variables for
modelling the algal toxicity of the studied
compounds was done by a Genetic Algorithm
(GA-VSS) approach and all the calculations have
been performed by using the leave-one-out (LOO)
and leave-more-out (LMO) procedures and the
scrambling of the responses for the validation of
the models.
HETEROGENEOUS CONGENERIC COMPOUNDS
HETEROGENEOUS COMPOUNDS
R2 93.9 Q2LOO 91.8 Q2LMO 87.5 SDEP
0.342 SDEC 0.296
R2 78 Q2LOO 62.1 Q2LMO 61.7 SDEP
0.751 SDEC 0.573
R2 77 Q2LOO 69.7 Q2LMO
69.7 SDEP 0.709 SDEC 0.619
nO is the number of O atoms, IDDM is the mean
information content on the distance degree
magnitude, while E1e is a directional 3D-WHIM
descriptor of atomic distribution weighted on the
electronegativity.
nO is the number of O atoms and IDE is the mean
information content on the distance equality.
CONCLUSIONS The chemometric analyses here applied
have been demonstrated to be very useful in
ranking the studied chemicals in according to
their structural similarity or dissimilarity. In
the modelling of structural heterogeneous
compounds with unknown mode of action, not very
satisfactory QSAR models have been obtained. The
role of specific parameters, such as directional
WHIMs, capable to describe particular molecular
features relevant for explaining the specific
mode of action, is always relevant in QSAR models
for congeneric chemicals. Increasing
heterogeneity increases the role of structural
and topological descriptors, accounting for
general molecular features, not related to
specific mode of action. This work was supported
by the Environment Climate programme for the
European Commission, Contract EV4-CT96-0319
(PREDICT) and Contract EVK1-CT99-00012 (BEAM)