Title: QSAR Application Toolbox: Step 12: Building a QSAR model
1QSAR Application ToolboxStep 12 Building a
QSAR model
2Objectives
- This presentation demonstrates building a QSAR
model for predicting acute toxicity to
Tetrahymena pyriformis of aldehydes. The
presentation addresses specifically - predicting acute toxicity for a target chemical
- building QSAR model based on the prediction
- applying the model to other aldehydes
- exporting the predictions to a file.
3The Exercise
- This exercise includes the following steps
- select a target chemical Furfural, CAS 98011
- extract available experimental results
- search for analogues
- estimate the 48h-IGC50 for Tetrahymena pyriformis
by using trend analysis - improve the data set by either
- subcategorising by Protein binding mechanisms,
or - assessing the difference between outliers and the
target chemical - evaluate and save the model
- Use the model to display its training set,
visualize its applicability domain and perform
predictions.
4Chemical Input
- After launching the Toolbox, select the Flexible
Track. - This takes you to the first module, which is
Chemical input. - Enter the target chemical by its CAS number
(98-01-1)
5Select target chemical Furfural, CAS 98011
6Substance Information
7Profiling the Target Chemical
- Select the Profiling methods you wish to use by
clicking on the box before the name of the
profiler. - For this example check all mechanistic methods.
- Click on Apply.
8Profiling
9Target interaction with proteins
Double clicking shows profiling scheme
The chemical could interact with protein by
Schiff-base formation.
10Target interaction with proteins
11Endpoints
- Endpoints refer to the electronic process of
retrieving the environmental fate, ecotoxicity
and toxicity data that are stored in the Toolbox
database. - Data gathering can be executed in a global
fashion (i.e., collecting all data of all
endpoints) or on a more narrowly defined basis
(e.g., collecting data for a single or limited
number of endpoints).
12Extracting endpoint values
13Redundancy table Reports for same endpoint
values across databases
14Reproducing endpoint value
In this exercise we will build a QSAR model to
estimate the following endpoint
Ecotoxicological Information Aquatic
Toxicity Protozoa Tetrahymena
pyriformis IGC50 48h
15Defining a Category
The initial search for analogues is based on
structural similarity, in this example - US
EPA categorization
16Category Definition
17Set Category Name
18Analogues
- The data is automatically collated.
- Based on the defined category (Aldehydes US EPA
categorisation) 274 analogues have been
identified. - These 274 compounds along with the target
chemical form a category (Aldehydes), which can
be used for data gap filling (see next slide). -
19Analogues
20Extracting experimental results for analogues
- Highlight the 274 Aldehydes (US EPA
categorisation). - The inserted window entitled Read Data?
appears (see next slide). - Click OK.
21Extracting experimental results for analogues
22Extracting experimental results for analogues
23Applying Trend-analysis
- Move to the module Filling data gap
- Open the data tree to
- Ecotoxicological information
- Protozoa
- Tetrahymena pyriformis
- IGC50
- 48 h
- Highlight the data endpoint box under the target
chemical. - It contains already an experimental result, which
we are going to reproduce by trend analysis. - Next with the trend analysis box highlighted,
click Apply (see next slide).
24Apply Trend-analysis
25Results of Trend-analysis
26Interpreting the Trend-analysis
- The resulting plot outlines the available
experimental results of all analogues (Y axis)
according to a default descriptor Log Kow (X
axis). - The RED dot represents the target chemical.
- The BLUE dots represent the experimental results
available for the analogues. - The GREEN dots represent the analogues belonging
to a different subcategory (see following slides).
27An Accurate Trend Analysis of the Data set (1)
- In this example, the mechanistic properties of
the analogues are not consistent. - Subcategorization can be performed based on
protein binding mechanisms. This is the second
stage of analogue search - requiring the same
interaction mechanism. - Acute effects are indeed associated with
interaction of chemicals with lipid cell
membrane, i.e. with protein binding. - Chemicals with a different protein binding
mechanism compared to the target chemical will be
removed.
28Subcategorization
- To improve the data by subcategorizing, follow
these steps - Click on Subcategor.
- Select Protein binding from the Grouping methods
list. - All chemicals which have a potential protein
binding mechanism different from the target
chemical are highlighted (GREEN dots) - Click on Remove.
29Subcategorization
30Result after Subcategorization
31An Accurate Trend Analysis of the Data set (2)
- The chemicals which differ from the target are
- Michael type nucleophilic addition (23)
- No binding (48)
- Nucleophilic addition to azomethynes (1)
- Nucleophilic substitution of haloaromatics (1)
- Another way for refining the data set is to ask
what makes the obvious outliers different from
the target. -
32Subcategorization
- Right-Click on any of the outlying results from
the analogues (BLUE dots) - Select Differences to target from the menu
- Select Protein binding from the Grouping methods
list - Click on Remove (see next slide)
33Subcategorization
34Result after Subcategorization
35 QSAR Model evaluation
- To assess the model accuracy use
- - Adequacy (predictions after leave-one-out)
- - Statistics
- - Cumulative frequency
36 QSAR Model evaluation
37 QSAR Model evaluation
38 QSAR Model evaluation
The residuals abs (obs-predicted) for 95 of
analogues are comparable with the variation of
experimental data.
39 Saving the Derived QSAR Model
- To save the new regression model follow these
steps - - Click on Save model button
- - Enter the model name Acute tox
- - Click on OK and
- - Accept the value
40 QSAR Model evaluation
41Apply QSAR model
- The derived model can be used to
- List training set chemicals
- Right-click on the QSAR model Acute tox
- Select training set from the context menu
- Visualize whether a chemical is in the
applicability domain of the model - In the data matrix highlight the empty cell of
one of the analogues (e.g. chemical no 2 in the
matrix) for the endpoint 48-h IGC Tetrahymena
pyriformis - Right-click on the QSAR model Acute tox
- Select Display domain
- Perform predictions for the chemicals in the
matrix. - Right-click on the QSAR model Acute tox
- Select Predict endpoint and All Chemicals in
domain
42Apply QSAR model Training set
43Apply QSAR model Visualize whether a chemical is
in the applicability domain of the model
- The chemical is an aldehyde as required by the
model. It can react with protein by Schiff-base
formation and does not react to protein by any of
the eliminated mechanisms - Michael-type nucleophilic addition
- No binding
- Nucleophilic addition to azomethynes
- Nucleophilic substitution of haloaromatics
- Another requirement is Log Kow to be gt0.3210
and lt 4.75. The last requirement is slightly
violated (Log Kow 4.87) and therefore the
chemical is outside of the applicability domain
of the model.
44Apply QSAR model Visualize whether a chemical is
in the applicability domain of the model
45Apply QSAR model Perform predictions
46Apply QSAR model Perform predictions
47Export QSAR results
- The predictions for the chemicals in the matrix
can be exported into a text file. - In the data tree right-click on 48 h (for the
endpoint IGC50 for Tetrahymena pyriformis) and
select Export endpoint data from the menu.
48Export QSAR results
click right button
49Export QSAR results
50Export QSAR results
51Export QSAR results
- The resulting text file can be loaded into a
spreadsheet and further analysed.