Title: Chemical Informatics and Cyberinfrastructure Building Blocks
1Chemical Informatics and Cyber-infrastructure
Building Blocks
- Chemical Informatics Resources
- Deluge of experimental data
- gt 100,000 compounds screened by 10 publicly
funded high throughput screening centers using
various assay techniques (molecular to cellular) - Molecular Libraries Screening Center Network
- Chemical databases maintained by various groups
- NIH PubChem, NIH DTP
- Chemical informatics and computational chemistry
- Data clustering, data mining, descriptor
calculations, toxicity prediction, docking,
molecular modeling, and quantum chemistry - Visualization tools
- Web resources journal articles, etc.
- A Chemical Informatics Grid will need to
integrate these into a common, loosely coupled,
open, distributed computing environment.
2Our Solution Stack
Portals and Other User Interfaces
- Domain specific Web Services
- VOTables, CDK services
- Grid services, Cyber-infrastructure for
computationally intensive applications. - Clustering, quantum chemistry
- Workflow and service management
- We work with Taverna
- Many solutions Kepler, BPEL engines, etc.
- Portlets and other user interfaces
- Rich desktop apps
- Ubiquitous clients
Workflow and Service Management
Web and Grid Services
Each level is subject for research and
development, as is their integration.
3Wrapping Science Applications as Services
- Science Grid services typically must wrap legacy
applications written in C or Fortran. - You must handle such problems as
- Specifying several input and output files
- These may need to be staged in
- Launching executables and monitoring their
progress. - Specifying environment variables
- Often these have also shell scripts to do some
miscellaneous tasks. - How do you convert this to WSDL?
- Or (equivalently) how do you automatically
generate the XML job description for WS-GRAM?
4Flow Chart of SMILES to Cluster Partitioned of
BCI Web Service
SMILES to DKM
SMILE String
Makebits
Fingerprint (.scn)
DivKmeans
Cluster Hierarchy (.dkm)
Generating the best levels
Clustering Fingerprints
Generating Fingerprints
Dictionary (Default)
New SMILE String
Extracting individual cluster partitions
Extracted Cluster Hierarchy (.clu)
Optclus
RNNclus
One Column Process
Merge Process
best
level
5BCI Clustering Service Methods
6Submitting Applications with Condor
- We are working to use Condor-G as a simple bridge
to the NSFs TeraGrid for job submission. - Condor has a Web Service interface (called
BirdBath) that we are using to construct Java
portlets. - We are investigating how to construct Condor
classads using GPIR. - Required for Condor matchmaking
- But no facility for this built in to the
TeraGrid.
7Condor-G and Globus
Condor Only
(Portal) Client
(Portal) Client
Condor
Condor Master
Condor -G
Condor
TeraGrid Globus
TeraGrid Globus
Condor
Condor
LSF
PBS
8VOTables Handling Tabular Data
- Developed by the Virtual Observatory community
for encoding astronomy data. - The VOTable format is an XML representation of
the tabular data (data coming from BCI, NIH DTP
databases, and so on). - VOTables-compatible tools have been built
- We just inherit them.
- SAVOT and JAVOT JAVA Parser APIs for VOTable
allow us to easily build VOTable-based
applications - Web Services
- Spread sheet
- Plotting applications.
- VOPlot and TopCat are two
9mrtd1.txt smiles representation of chemical
compounds along with its properties
10Votable.xml xml representation of mrtd1.txt file
11VOPlot Application from generated votable.xml
file Graph plotted on Mass (Xaxis) and PSA
(Y-axis)
12More Services WWMM Services
13CDK-Based Services
14ToxTree Service
- The Threshold of Toxicological Concern (TTC)
establishes a level of exposure for all chemicals
below which there would be no appreciable risk to
human health. - ToxTree implements the Cramer Decision Tree
approach to estimate TTC. - We have converted this into a service.
- Uses SMILES as input.
- Note the GUI must be separated from the library
to be a service
http//ecb.jrc.it/QSAR/home.php?CONTENU/QSAR/qsar
_tools/qsar_tools_toxtree.php
15OSCAR3 Service
- Oscar3 is a tool for shallow, chemistry-specific
natural language parsing of chemical documents
(i.e. journal articles). - It identifies (or attempts to identify)
- Chemical names singular nouns, plurals, verbs
etc., also formulae and acronyms. - Chemical data Spectra, melting/boiling point,
yield etc. in experimental sections. - Other entities Things like N(5)-C(3) and so on.
- Results are exported as an XML file.
- There is a larger effort, SciBorg, in this area
- http//www.cl.cam.ac.uk/aac10/escience/sciborg.ht
ml - It also has potentially very interesting Workflows
http//wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Osca
r3