Title: Software tools for chemical and scientific informatics
1Software tools forchemical and scientific
informatics
- David Wild
- Visiting Assistant Professor
- School of Informatics
- djwild_at_indiana.edu
2Software at IUB Informatics
- Spotfire DecisionSite
- Chemistry / molecular modeling applications
- ChemTK, ArgusLab, MOE
- BCI software cluster analysis
- OpenEye software docking
- Chemaxon
- Chemoinformatics programming toolkits
- Daylight, BCI, OpenEye
3Spotfire DecisionSite
- Scientific Visualization software for analyzing
large amounts of data - Excellent Swedish design!
- Very high amount of functionality, tailored to
life sciences industries - We currently have a 20-seat license at Indiana,
some still available
4Charles Minard, 1861
5ChemTK
- Windows program for organizing and doing simple
analysis on chemical datasets - Free lite version can handle 250 structures,
available for download from www.sageinformatics.co
m - We have license for full version, which can
handle 1m structures
6ArgusLab
- Free molecular modeling program, downloadable
from http//www.planaria-software.com/ - Has been extensively evaluated at Indiana
- Good 3D molecular visualization capability with a
surprising amount of built-in functionality (e.g.
docking) - Some robustness problems and rough edges
7MOE
- Molecular Operating Environment
- Molecular modeling tool, with flexible scripting
language - See www.chemcomp.com
8BCI Software
- Cluster analysis (Wards, Divisive K-means,
K-means) - Designed for chemistry fingerprints, but can be
used on any bitstring or numeric data - Fingerprint generation
- Custom chemical fragment dictionaries
- Markush structure handling
- Enumeration, non-enumerative profiling and
selection - See www.bci.gb.com
9Comparative execution timesNCI subsets, 2.2 GHz
Intel Celeron processor
7h 27m
3h 06m
2h 25m
44m
10Clustering a 1 million compound dataseton a 2.2
GHz Celeron Desktop Machine
Time for a single run may vary due to
different selection of seeds. Runtimes can be
shortened e.g. by using a max. number of
iterations or a relocation cutoff.
11OpenEye Software
- OMEGA
- Generation of 3D conformers from 2D structures
- FRED
- Very fast docking of structures to proteins
- Other programs available, see www.eyesopen.com
12ChemAxon / JChem
- Set of modules
- Marvin is a collection of Java tools for drawing,
displaying and characterizing chemical
structures, substructures and reactions. - JChem Baseadds a chemical interface to corporate
databases, which can be applied for combined SQL
and structural queriesimports/exports
molecules, substructures, or reactions in
standard formats (Molfile, SD file, RD file,
SMILES, SMARTS, etc.). - JChem Cartridgeadds chemical knowledge to the
Oracle platform giving automatic access to
Oracle's security, scalability, and replication
features. - Standardizerstructure canonization tool
converting molecules from different formats into
standard representation. - Screenscreening based on pharmacophore or
chemical fingerprints or other descriptors. - Reactorgenerating reaction products from reaction
equations and reactants. - Fragmentergenerating building blocks based on
Recap rules from molecule libraries. - Serial Molecule Generatortransforming molecules
by a sequence of user-defined transformations. - Chemical Term Evaluatorevaluating chemical
expressions. - JKlustorclustering and diversity calculations
based on molecular fingerprints or other
properties. - See www.chemaxon.com and www.jchem.com
13Chemistry Toolkits
- A toolkit is an interface to pre-written routines
that perform chemistry-related tasks - Provides chemistry objects, e.g.
- Molecule, atom, dataset, cluster, fingerprint
- And functions to work on these objects, e.g.
- Convert SMILES to molecule object
- Delete atom from a molecule
- Cluster a dataset
- Calculate tanimoto similarity between fingerprints
14Commercial Toolkits Available
- Daylight Toolkit (2D)
- http//www.daylight.com/products/
- http//www.daylight.com/meetings/summerschool00/co
urse/toolkit/ for examples - Fingerprints, SMILES, SMARTS, reaction, database
- Available on Unix platforms
- BCI Toolkit (2D)
- http//www.bci.gb.com
- Fingerprints, clustering, Markush
- Available on Unix and Windows platforms
15Commercial Toolkits Available
- OpenEye OEChem (2D/3D)
- http//www.eyesopen.com/oechem/
- Wide range of 2D and 3D functionality
- Available for Windows / Unix / Mac
16Example 1
- include ltstdio.hgt
- include dt_smiles.h
- main()
-
- char smiles200
- dt_Handle mol
- printf(Enter SMILES String )
- fscanf(stdin,s,smiles)
- mol dt_smilin(strlen(smiles), smiles)
- printf(\nMolecule has d atoms.\n,
- dt_count(mol, TYP_ATOM))
-
17Example 2
- include ltstdio.hgt
- include dt_smiles.h
- include dt_smarts.h
- main()
-
- char smiles200, smarts200
- dt_Handle mol,sub
- printf(Enter SMILES String )
- fscanf(stdin,s,smiles)
- printf(Enter SMARTS String )
- fscanf(stdin,s,smarts)
- mol dt_smilin(strlen(smiles), smiles)
- sub dt_smartin(strlen(smarts), smarts)
- if(dt_match(sub, mol, 1) ! NULL_OB)
- printf(SMARTS substructure is contained
within the SMILES structure\n)
18More information on using the Daylight Toolkit
- Daylight is installed on xavier.informatics.indana
.edu - Look in DY_ROOT/contrib/src/c for example source
code and makefiles - Easiest way is to copy an existing makefile (e.g.
in the DY_ROOT/contrib/src/c/smiles directory)
and modify for any libraries / source code you
need - See Daylight website at www.daylight.com,
especially support documentation