Chemical Informatics and Cyberinfrastructure Building Blocks - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Chemical Informatics and Cyberinfrastructure Building Blocks

Description:

100,000 compounds screened by 10 publicly funded high throughput screening ... Launching executables and monitoring their progress. Specifying environment variables ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 16
Provided by: Marlon91
Category:

less

Transcript and Presenter's Notes

Title: Chemical Informatics and Cyberinfrastructure Building Blocks


1
Chemical Informatics and Cyber-infrastructure
Building Blocks
  • Chemical Informatics Resources
  • Deluge of experimental data
  • gt 100,000 compounds screened by 10 publicly
    funded high throughput screening centers using
    various assay techniques (molecular to cellular)
  • Molecular Libraries Screening Center Network
  • Chemical databases maintained by various groups
  • NIH PubChem, NIH DTP
  • Chemical informatics and computational chemistry
  • Data clustering, data mining, descriptor
    calculations, toxicity prediction, docking,
    molecular modeling, and quantum chemistry
  • Visualization tools
  • Web resources journal articles, etc.
  • A Chemical Informatics Grid will need to
    integrate these into a common, loosely coupled,
    open, distributed computing environment.

2
Our Solution Stack
Portals and Other User Interfaces
  • Domain specific Web Services
  • VOTables, CDK services
  • Grid services, Cyber-infrastructure for
    computationally intensive applications.
  • Clustering, quantum chemistry
  • Workflow and service management
  • We work with Taverna
  • Many solutions Kepler, BPEL engines, etc.
  • Portlets and other user interfaces
  • Rich desktop apps
  • Ubiquitous clients

Workflow and Service Management
Web and Grid Services
Each level is subject for research and
development, as is their integration.
3
Wrapping Science Applications as Services
  • Science Grid services typically must wrap legacy
    applications written in C or Fortran.
  • You must handle such problems as
  • Specifying several input and output files
  • These may need to be staged in
  • Launching executables and monitoring their
    progress.
  • Specifying environment variables
  • Often these have also shell scripts to do some
    miscellaneous tasks.
  • How do you convert this to WSDL?
  • Or (equivalently) how do you automatically
    generate the XML job description for WS-GRAM?

4
Flow Chart of SMILES to Cluster Partitioned of
BCI Web Service
SMILES to DKM
SMILE String
Makebits
Fingerprint (.scn)
DivKmeans
Cluster Hierarchy (.dkm)
Generating the best levels
Clustering Fingerprints
Generating Fingerprints
Dictionary (Default)
New SMILE String
Extracting individual cluster partitions
Extracted Cluster Hierarchy (.clu)
Optclus
RNNclus
One Column Process
Merge Process
best
level
5
BCI Clustering Service Methods
6
Submitting Applications with Condor
  • We are working to use Condor-G as a simple bridge
    to the NSFs TeraGrid for job submission.
  • Condor has a Web Service interface (called
    BirdBath) that we are using to construct Java
    portlets.
  • We are investigating how to construct Condor
    classads using GPIR.
  • Required for Condor matchmaking
  • But no facility for this built in to the
    TeraGrid.

7
Condor-G and Globus
Condor Only
(Portal) Client
(Portal) Client
Condor
Condor Master
Condor -G
Condor
TeraGrid Globus
TeraGrid Globus
Condor
Condor
LSF
PBS
8
VOTables Handling Tabular Data
  • Developed by the Virtual Observatory community
    for encoding astronomy data.
  • The VOTable format is an XML representation of
    the tabular data (data coming from BCI, NIH DTP
    databases, and so on).
  • VOTables-compatible tools have been built
  • We just inherit them.
  • SAVOT and JAVOT JAVA Parser APIs for VOTable
    allow us to easily build VOTable-based
    applications
  • Web Services
  • Spread sheet
  • Plotting applications.
  • VOPlot and TopCat are two

9
mrtd1.txt smiles representation of chemical
compounds along with its properties
10
Votable.xml xml representation of mrtd1.txt file
11
VOPlot Application from generated votable.xml
file Graph plotted on Mass (Xaxis) and PSA
(Y-axis)
12
More Services WWMM Services
13
CDK-Based Services
14
ToxTree Service
  • The Threshold of Toxicological Concern (TTC)
    establishes a level of exposure for all chemicals
    below which there would be no appreciable risk to
    human health.
  • ToxTree implements the Cramer Decision Tree
    approach to estimate TTC.
  • We have converted this into a service.
  • Uses SMILES as input.
  • Note the GUI must be separated from the library
    to be a service

http//ecb.jrc.it/QSAR/home.php?CONTENU/QSAR/qsar
_tools/qsar_tools_toxtree.php
15
OSCAR3 Service
  • Oscar3 is a tool for shallow, chemistry-specific
    natural language parsing of chemical documents
    (i.e. journal articles).
  • It identifies (or attempts to identify)
  • Chemical names singular nouns, plurals, verbs
    etc., also formulae and acronyms.
  • Chemical data Spectra, melting/boiling point,
    yield etc. in experimental sections.
  • Other entities Things like N(5)-C(3) and so on.
  • Results are exported as an XML file.
  • There is a larger effort, SciBorg, in this area
  • http//www.cl.cam.ac.uk/aac10/escience/sciborg.ht
    ml
  • It also has potentially very interesting Workflows

http//wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Osca
r3
Write a Comment
User Comments (0)
About PowerShow.com