Title: CORALSEA
1CORALSEA
Workflow
2The software CORALSEA is a tool to build up the
quantitative structure property / activity
relationships (QSPRs/QSARs)
- The representation of the molecular structure
that is used in the CORALSEA is SMILES - simplified molecular input-line entry system
- For details, please see
- http//www.daylight.com/dayhtml/doc/th
eory/theory.smiles.html
3Here we used for the demo of CORALSEA our
model from article THE DEFINITION OF THE
MOLECULAR STRUCTURE FOR POTENTIAL ANTI-MALARIA
AGENTS BY THE MONTE CARLO METHOD Struct. Chem.
2013 2413691381 You can develop a better
model , but now please follow our suggestions.
4The first action is the preparation of SMILES
file which is the input for CORALSEA
Each compound should be represented by (1) The
type,-, (2) The ID it can be CAS
(chemical abstract service) or a number (3)
SMILES and (4) Endpoint value. is
indicator of sub-training set - is indicator
of calibration set is indicator of test
set. The role of sub-training set is developer
of model The role of calibration set is critic
of model The role of test set is estimator of
model.
- 1 COc1ccc2c(c1)NC(C)C(CCCCCCC)C2O 7.332
- 2 COc1ccc2c(c1)NC(C)CC2O 4.903
- 3 OC1c2ccccc2NC(C)C1CCCCCCC 6.979
- 4 OC1c2ccccc2NC(C)C1CCCCCCCCC 7.400
- 5 OC1c3ccccc3NC(C)C1C2CCCCC2 5.652
- -6 OC1c3ccccc3NC(C)C1c2ccccc2 6.270
- 7 OC2c3ccccc3NC(C)C2Cc1ccccc1 5.207
- 8 OC1c2ccccc2NC(C)C1Br 7.110
- -9 OC1c2ccccc2NC(C)C1\CC\CCCCCCC 7.824
- 10 CC(CCCCCCC)C1C(O)c2ccccc2NC1C 7.472
- 12 OC2c3ccccc3NC(C)C2/CC/c1ccccc1 5.827
- 13 COc1ccc2NC(C)C(Br)C(O)c2c1 5.934
- -14 Cc1ccc2NC(C)C(Br)C(O)c2c1 6.583
- 15 Brc1ccc2NC(C)C(Br)C(O)c2c1 6.470
- 17 Fc1ccc2NC(C)C(Br)C(O)c2c1 6.903
- 18 Clc1ccc2NC(C)C(CCCCCC)C(O)c2c1 4.336
- 19 COc2cccc3NC(C)C(Cc1ccccc1)C(O)c23 5.675
- -21 COc1ccc3c(c1)NC(C)C(Cc2ccccc2)C3O 5.859
- -22 COc1cccc2NC(C)C(C(O)c12)c3ccccc3 5.295
MyFile.txt
5It is a good idea to reserve some substances as
"invisible" validation set for final estimation
of the model
- Format of file for this validation is the
following - The number of compounds
- (2) list of compounds in the above-mentioned
format type-ID-SMILES-Endpoint values.
- 10
- 11 OC1c2ccccc2NC(C)C1C\CC\CCCCCC 6.728
- 16 Clc1ccc2NC(C)C(Br)C(O)c2c1 6.900
- 20 COc2ccc3NC(C)C(Cc1ccccc1)C(O)c3c2 4.624
- 27 Clc1ccc3c(c1)NC(C)C(Cc2ccccc2)C3O 4.805
- 32 Clc1cc2c(cc1Cl)NC(C)C(C2O)c3ccccc3
6.456 - 40 Clc1cc2c(cc1OC)NC(C)C(CC)C2O 7.559
- 42 Clc1cc2c(cc1OC)NC(C)C(CCCCCCC)C2O 8.530
- 43 Clc1cc2c(cc1OC)NC(C)C(CCCCCCCCC)C2O
8.779 - 51 CC(CCCCC)C1C(O)c2cc(Cl)c(cc2NC1C)OC
7.830 - 52 Clc1cc2c(cc1OC)NC(C)C(\CC\CCCCC)C2O
7.975
MyInput.txt
6In order to start your work you must download
CORALSEA.zip from www.insilico.eu/coral When
it is done, you must insert folder "CORALSEA" in
your computer
7and insert your data (i.e. MyTRNCLBTST.txt)
in folder MyCORALSEA
8Containing of MyCORALSEA is the following
9In order to carry out QSPR/QSAR analysis of data
represented for CLASSIFICATION MODEL one should
do the following
- Insert TRNCLBTST-1.txt in the folder
- Insert Input-1.txt in the folder.
- Click CORALSEA.exe.
TRNCLBTST.txt-is file which contains training
(TRN), calibration(CLB) ,and test(TST)
sets Input.txt is data which are not visible
during building up model
10It appears in your screen
Click Button Load method
11It appears in your screen
1
3
2
Insert name TRNCLBTST-1.txt in text box
12It appears in your screen
Click SAVE SYSTEM
13It appears in your screen
Restart program and Click Load system
14It appears in your screen
Click OK
15It appears in your screen
This plot relates to the external invisible
validation set
16It appears in your screen
File Output-1.txt contains statistical
characteristics for the validation set
(Output-1.txt is placed in folder Model)
17In order to carry out QSPR/QSAR analysis of data
represented for REGRESSION MODEL one should do
the following
- Insert TRNCLBTST.txt in the folder
- Insert Input-1.txt in the folder.
- Click CORALSEA.exe.
TRNCLBTST.txt-is file which contains training
(TRN), calibration(CLB) ,and test(TST)
sets Input.txt is data which are not visible
during building up model
18It appears in your screen
INSERT
SELECT
Insert name TRNCLBTST-1.txt in text box. After
this, please select Classic Scheme or Balance
of Correlation for your QSPR/QSAR investigation
19It appears in your screen
1
2
Two actions (1) define Method and (2)Save method
20It appears in your screen
1
2
You can involve graph invariants in addition to
SMILES attributes
21It appears in your screen
You can use classic scheme, balance of
correlations, and Ideal slopes C1,C1
22It appears in your screen
3
1
1
2
You can choice your mode e.g. (1) Define
Dstart0.25 (2) Nepoch20 after this you must
do (3) Click Save method, otherwise method
remains the same
23It appears in your screen
Click Search for preferable model (T,N)
24It appears in your screen
Programm will carry out the Monte Carlo
optimization with various threshold and the
number of epochs. The preferable values of
threshold and the number of epochs one can find
in file Search/BestMDL.txt when the
calculation will be completed.
25The containing of file search/BestMDL.txt will
be approximately the following
One can see that preferable threshold (T) is 2,
and the preferable number of epochs (N) is
15. One can use this information to build up
robust model.
26An attempt to build up robust model
- Create Folder MyCORALSEA-T2-N15 (copy of
MyCORALSEA) - Run CORALSEA.exe in this folder
MyCORALSEA-T2-N15 - Click Load method
27It appears in your screen
2
4
3
1
T2
N15
- Insert Nepoch15,
- (2) Click Building up preferable model (T,N)
(3)Insert Threshold2, and (4) Click
Continue
28It appears in your screen
Click Yes
29Gradually the program will be calculating the
model
30When the model will be ready the screen will be
the following
Click Save system
31Folder Model contains parameters of the
QSPR/QSAR model
File Output-1.txt contains statistics for the
invisible validation set
32When the model will be ready the screen will be
the following
Click Load system
33It will appear at the screen
2
MyInput.txt
1
- Insert name MyInput.txt instead of
Input-1.txt - (2) Click Start of DCW and Endpoint calculation
for SMILES input file
34It will appear at the screen
After these actions, file model/Output.txt
will contain results of calculation for compounds
from MyInput.txt Click OK
35It will appear at the screen
You will see graphical representation for
sub-training, calibration, test, and validation
sets.
36The containing of the model/Output.txt will be
the following
Last, but not least
37One can calculate model for individual SMILES
1
2
- Insert SMILES in indicated box
- (2) Click Start of DCW and Endpoint Calculation
for Inserted SMILES
38It appears in your screen
See file Model/DemoDesc.txt
39The Containing of Model/DemoDesc.txt is the
following
DCW is DCW(2,15) for NC(CCCNC(N)N)C(O)O
Endpoint2.9412. This example is only demo, the
NC(CCCNC(N)N)C(O)O is apparently out of Domain
of applicability.
40These slides have shown the "technology", but to
understand "philosophy", please read file
"ReadMe.pdf"
41Some definitions
Thank you for your attention ! CORALSEA TEAM