Title: OEChem Enabled: Data Mining Tools and Chemoinformatics Virtual Classroom
1OEChem Enabled Data Mining Tools and
Chemoinformatics Virtual Classroom
- Norah MacCuish
- John MacCuish
- TJ ODonnell
- Tudor Oprea
- Mitch Chapman
Mesa Analytics Computing, LLC TJ ODonnell
Consulting Sunset Molecular, LLC Desert Moon
Consulting, LLC
2 Data Mining Tools
- Fingerprint Module programs reads SMILES strings
and generates the 320 MDLI MACCS keys (or 164 of
the earlier 166 public keys), Reoptimization of
MDL keys for use in Drug Discovery, JCICS,
42(6), 2002, 1273-1280. The programs are OEChem
1.2 enabled. - Grouping Module contains clustering algorithms,
symmetric, asymmetric, non-overlapping and
overlapping versions of Taylors algorithm. It
also contains RNN hierarchical algorithms such as
Wards, Group Average, Complete Link. The
clustering algorithms contain their respective
ambiguity indices. Measures program for
similarity searching or for the generation of
symmetric or asymmetric similarity matrices
(sparse or full). It comes with various measures
such as Tanimoto, Euclidean, Cosine, etc. - Ambiguity Viewer is a Python TK application for
viewing hierarchical results from the Grouping
Module. OEDepict enabled. - ChemTattoo program will produce a series of modal
statistics and visualization useful in finding
structural commonalities. The data sets can be
large or small (e.g., a specific group from a
clustering). The modal fingerprint of a data set
is, loosely, the intersection of bits turned on
for a given set of fingerprints. This
intersection can be parameterized to create a
modal such that if a bit is turned on in the
modal if for some percentage of the fingerprints
(e.g., 90), that bit is turned on in the modal
fingerprint. Via the MDL fingerprints above and
atom counting, this program is OEChem 1.2
enabled.
3 Chemoinformatics Virtual Classroom
-
- At present there are no specific software tools
for chemical information training in the U.S.,
though there are several nascent chemometrics and
chemical information university departments and
curricula. A number of commercial software
products used in the pharmaceutical and
biotechnology industry are either too expensive
or of limited utility for training in either
academic or business settings. This poster will
discuss converting Mesa Analytics Computing,
LLC with OEChem inside into training tools for
concept learning. These tools will serve the
dual purpose in providing cheminformatics
training tools for both academia and industry.
By employing distance learning through a web
delivery system, the training software will
provide an effective, low cost solution for
academic institutions, whether they are offering
a single course to rural students in a remote
setting, or an entire program in cheminformatics
and a major urban university. In addition, such
training tools will be very useful in industry
settings with local area networks, where in a
multidiscipline setting individuals need to
receive training on the concepts employed by
industrial chemoinformatics software an
integral part of the drug discovery process for
pharmaceutical and biotech industries. -
- Concerning the Chemoinformatics Virtual
Classroom, our research results are based upon
work supported by the National Science Foundation
Small Business Innovative Research (SBIR) Program
under Grant No. 0339360. Any opinions, findings,
and conclusions or recommendations expressed in
this material are those of the author(s) (Mesa
Analytics Computing, LLC) and do not
necessarily reflect the view of the National
Science Foundation (http//nsf.gov). - Abstract at https//www.fastlane.nsf.gov/servlet/s
howaward?award0339360 -
4Ambiguity Viewer
5Ambiguity Viewer
6ChemTattooTM
The modal fingerprint of a dataset is, loosely,
the intersection of bits turned on for a given
set of fingerprints. E.g., Fingerprints 010001
010100 010100 --------- 010000 Modal
fingerprint at 100 010100 Modal fingerprint at
66 A simple example might be as follows 1.
Select a group of compounds, say a cluster from a
clustering of a collection of active
compounds.2. Generate the fingerprints for the
group, if they are not already generated.3.
Input the fingerprints to ChemTattoo, specifying
the type of fingerprint (MACCS 166 or MACCS
320)4. The output will contain the various
modal statistics and fingerprints mentioned
above, or the SMARTS patterns.5. Input the
SMARTS patterns and original compound
fingerprints into a depictor with SMARTS matching
and atom/bond coloring. For an introduction to
modal fingerprints see, Stigmata An Algorithm
to determine Structural Commonalities in Diverse
Data Sets , N. E. Shemetulskis, et al., JCICS,
1996, 36(4), 862-871.
7 HIVRT Non-nucleoside Inhibitors
RED Common modal features Black Not in
modal Threshold set to 1.0
8Top Selling 200 Drugs
RED,GREEN, BLUE Common modal
features Black Not in modal Threshold set to
0.8
9Acknowledgements
- OpenEye Scientific Software
- OEChem
- Sunset Molecular Discovery, LLC
- Wombat Database