Title: QSAR%20AND%20CHEMOMETRIC%20APPROACHES%20TO%20THE%20SCREENING%20OF%20POPs
1QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING
OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG
RANGE TRANSPORT
Paola Gramaticaa, Ester Papaa and Stefano
Pozzib a) Department of Structural and
Functional Biology, University of Insubria -
Varese (Italy) b) Laboratory of environmental
Studies (SPAA) - Lugano (Switzerland) e-mail
paola.gramatica_at_uninsubria.it http//fisio.dipbsf
.uninsubria.it/qsar/
QSAR Research Unit
D 13
Introduction
The need for a scientific foundation for the
criteria used to evaluate persistence and
long-range transport (LRT) potential of POPs
(Persistent Organic Pollutants) in the
environment has been recently highlighted1.
Persistence is a necessary condition for
long-range transport, however persistent
chemicals are not necessarily subject to
long-range transport the inherent tendency of
compounds towards global mobility must also be
taken into account. The half-life of organic
pollutants in various compartments is among the
most commonly used criterion for studying
persistence, but these studies are severely
hindered by the limited availability of
experimental degradation half-life data, thus
there is an incentive to develop reliable
procedures, like QSAR/QSPR, to estimate lacking
data. The same is true for physico-chemical
properties particularly relevant for determining
mobility potential2. As the Long Range Transport
potential of POPs is due to the contemporaneous
influence of their persistence in the environment
and their inherent tendency to mobility, the
finding of the best combination of chemical
properties minimizing LRT is a multicriteria
problem and can be approached positively through
MultiCriteria Decision-Making (MCDM) techniques3
procedures for combining the magnitude of several
properties into a single quantitative measure of
overall quality.
For modeling and predicting half life we used a
data set of 141 organic compounds, for which
half-life experimental values in different
compartments are available from Howard4, Mackay5
and Rodan6. The molecular structure has been
represented by a wide set of molecular
descriptors7 calculated by a software developed
by R.Todeschini7,8 Constitutional
descriptors(56), Topological descriptors(69),
Walk counts (20), Bcut descriptors (64), Galvez
indices (21), 2D Autocorrelations (96), Charge
descriptors (7), Aromaticity descriptors (4),
Molecular profiles (40), Geometrical
descriptors(18), 3D MoRSE descriptors (160), WHIM
descriptors9 (99), GETAWAY descriptors (196),
Empirical descriptors (3). The selection of the
best subset variables for modelling half-life was
done by a Genetic Algorithm (GA-VSS) approach,
where the response is obtained by ordinary least
square regression (OLS). All the calculations
have been performed by using the leave-one-out
(LOO) and leave-more-out (LMO) procedures and the
scrambling of the responses for the validation of
the models (MOBY-DIGS package)10.
General Persistence Index The Principal Component
Analysis (PCA) of the experimental and predicted
half-life of 141 pollutants in various media
allows the ranking of the chemicals according to
their overall half-life and relative persistence
in different media. A general Persistence Index
is obtained from the linear combination of
half-life data in four environmental media (PC1
in Fig. 1).The chemicals on the right are the
most globally persistent in the various
compartments.
Global Mobility Index The inherent tendency of
compounds towards global mobility is regulated
mainly by volatility, water solubility, Kow and
Koa. A Global Mobility Index is obtained from
the linear combination, by PCA, of the
physico-chemical properties the PC1 score
(EV74.6) in Fig. 2. The chemicals on the right
side of are those with the major tendency to
mobility.
PERSISTENCE
MOBILITY
Figure 1
Figure 2
Screening of Long Range Transport Potential
The finding of the best combination of chemical
properties minimizing LRT can be approached by
MultiCriteria Decision-Making (MCDM) techniques
procedures for combining the magnitude of several
properties into a single quantitative measure of
overall quality. The utility function is chosen
here as the best combined criteria function and
is applied to the most relevant properties
determining the LRT, according to the following
criteria, f(x), all expressed as the minimum the
general Persistence Index (Fig.1), deriving from
the PCA combination of half-life in four
environmental compartments, the Mobility Index
(Fig.2), deriving from the cited physico-chemical
properties and the Air Half-life, which is
considered particularly relevant in determining
LRT. The k3 properties, equally weighted (by the
weight l) and added in the utility function,
according to the reported formula, allow a
ranking of the studied chemicals according to
their LRT potential, giving a LRT index ( F(x)).
The
chemicals, highlighted in Fig. 3, with the lowest
utility (F(x) near 0) will exhibit highest LRT
potential, while those with F(x) near 1 will have
the lowest possibility for LRT.
Figure 3
The QSPR (Quantitative Structure-Property
Relationships) approach is applied here in two
steps first, to fill the gap in the experimental
data of the studied properties and finally to
model the scores of the MCDM function, the LRT
index (Fig. 3). Different kinds of theoretical
molecular descriptors have been used to obtain
OLS regression models (Fig.4) and CART
classification models (Fig. 5) with good
predictive power (Q2LOO86.8, Q2LMO 86.2 and
Misclassification Risk Cross val.6.2,
respectively.
References 1- Klecka,
G.M., Ed. (1999). SETAC Pellston Workshop
Environ. Toxicol. Chem. (Suppl.), 18, 8 2-
Gramatica, P., Pozzi, S., Consonni, V. and Di
Guardo, A. (2001) SAR and QSAR in Environ. Res.,
in press. 3- Hendriks M.M.W.B., De Boer
J.H., Smilde A.K. and Doornbos D.A. (1992)
Chemom. Intell. Lab. Syst 16, 175. 4-
Howard,P.H. et all. Handbook of environmental
degradation rates (1991) http//esc.syrres.co
m/interkow/PhysProp.htm 5- Mackay, Shiu, Ma
Illustrated handbook of physical-chemical
properties and environmental fate for
organic chemicals (2000) 6- Rodan, B.D et all.
Envir. Sci. technol.,33(2 3482-3488 (1999) 7-
R.Todeschini and V.Consonni,Handbook of molecular
descriptors (2000) Wiley 8- R.Todeschini,
DRAGON ver.1.0, Milano, 2000 free download from
http//www.disat.unimib.it/chm 9- R. Todeschini
and P.Gramatica (1997) Quant. Struct.Act. Rel.
16, 113-119. 10- R. Todeschini, R. (1999). MOBY
DIGS - Software for multilinear regression
analysis and variable subset selection by
Genetic Algorithm, rel. 2.1 Milan (Italy). 11-
Beyer, A., Mackay, D., Matthies, M., Wania, F.
and Webster E. (2000). Environ. Sci.Technol.
34, 699-703.
Classification Tree
nC 14.50
nC 7.00
E1u 0.40
2
3
2
1
Assigned class
Figure 5
Figure 4
Conclusions The ranking of the studied
chemicals according to their LRT potential,
obtained by the utility function of MCDM, can be
proposed as an alternative approach to others
based on characteristic travel distance (CTD)11.
An additional advantage of this approach is that
the application of the QSPR models (both
regression and classification) on the scores of
the MCDM utility function (defined as LRT index)
can allow a fast pre-screening of existing and
new chemicals for their inherent tendency to LRT,
based simply on the knowledge of their molecular
structure.