OEChem Enabled: Data Mining Tools and Chemoinformatics Virtual Classroom

1 / 9
About This Presentation
Title:

OEChem Enabled: Data Mining Tools and Chemoinformatics Virtual Classroom

Description:

... to ChemTattoo, specifying the type of fingerprint (MACCS 166 or MACCS 320) ... 5. Input the SMARTS patterns and original compound fingerprints into a depictor ... –

Number of Views:73
Avg rating:3.0/5.0
Slides: 10
Provided by: johndma1
Category:

less

Transcript and Presenter's Notes

Title: OEChem Enabled: Data Mining Tools and Chemoinformatics Virtual Classroom


1
OEChem Enabled Data Mining Tools and
Chemoinformatics Virtual Classroom
  • Norah MacCuish
  • John MacCuish
  • TJ ODonnell
  • Tudor Oprea
  • Mitch Chapman

Mesa Analytics Computing, LLC TJ ODonnell
Consulting Sunset Molecular, LLC Desert Moon
Consulting, LLC
2
Data Mining Tools
  • Fingerprint Module programs reads SMILES strings
    and generates the 320 MDLI MACCS keys (or 164 of
    the earlier 166 public keys), Reoptimization of
    MDL keys for use in Drug Discovery, JCICS,
    42(6), 2002, 1273-1280. The programs are OEChem
    1.2 enabled.
  • Grouping Module contains clustering algorithms,
    symmetric, asymmetric, non-overlapping and
    overlapping versions of Taylors algorithm. It
    also contains RNN hierarchical algorithms such as
    Wards, Group Average, Complete Link. The
    clustering algorithms contain their respective
    ambiguity indices. Measures program for
    similarity searching or for the generation of
    symmetric or asymmetric similarity matrices
    (sparse or full). It comes with various measures
    such as Tanimoto, Euclidean, Cosine, etc.
  • Ambiguity Viewer is a Python TK application for
    viewing hierarchical results from the Grouping
    Module. OEDepict enabled.
  • ChemTattoo program will produce a series of modal
    statistics and visualization useful in finding
    structural commonalities.  The data sets can be
    large or small (e.g., a specific group from a
    clustering).  The modal fingerprint of a data set
    is, loosely, the intersection of bits turned on
    for a given set of fingerprints.  This
    intersection can be parameterized to create a
    modal such that if a bit is turned on in the
    modal if for some percentage of the fingerprints
    (e.g., 90), that bit is turned on in the modal
    fingerprint. Via the MDL fingerprints above and
    atom counting, this program is OEChem 1.2
    enabled.


3
Chemoinformatics Virtual Classroom
  •  
  • At present there are no specific software tools
    for chemical information training in the U.S.,
    though there are several nascent chemometrics and
    chemical information university departments and
    curricula. A number of commercial software
    products used in the pharmaceutical and
    biotechnology industry are either too expensive
    or of limited utility for training in either
    academic or business settings. This poster will
    discuss converting Mesa Analytics Computing,
    LLC with OEChem inside into training tools for
    concept learning. These tools will serve the
    dual purpose in providing cheminformatics
    training tools for both academia and industry.
    By employing distance learning through a web
    delivery system, the training software will
    provide an effective, low cost solution for
    academic institutions, whether they are offering
    a single course to rural students in a remote
    setting, or an entire program in cheminformatics
    and a major urban university. In addition, such
    training tools will be very useful in industry
    settings with local area networks, where in a
    multidiscipline setting individuals need to
    receive training on the concepts employed by
    industrial chemoinformatics software an
    integral part of the drug discovery process for
    pharmaceutical and biotech industries.
  •  
  • Concerning the Chemoinformatics Virtual
    Classroom, our research results are based upon
    work supported by the National Science Foundation
    Small Business Innovative Research (SBIR) Program
    under Grant No. 0339360. Any opinions, findings,
    and conclusions or recommendations expressed in
    this material are those of the author(s) (Mesa
    Analytics Computing, LLC) and do not
    necessarily reflect the view of the National
    Science Foundation (http//nsf.gov).
  • Abstract at https//www.fastlane.nsf.gov/servlet/s
    howaward?award0339360
  •  

4
Ambiguity Viewer
5
Ambiguity Viewer
6
ChemTattooTM
The modal fingerprint of a dataset is, loosely,
the intersection of bits turned on for a given
set of fingerprints.  E.g., Fingerprints 010001
010100 010100 --------- 010000 Modal
fingerprint at 100 010100 Modal fingerprint at  
66 A simple example might be as follows 1.
Select a group of compounds, say a cluster from a
clustering of a collection of active
compounds.2. Generate the fingerprints for the
group, if they are not already generated.3.
 Input the fingerprints to ChemTattoo, specifying
the type of fingerprint (MACCS 166 or MACCS
320)4.  The output will contain the various
modal statistics and fingerprints mentioned
above, or the SMARTS patterns.5.  Input the
SMARTS patterns and original compound
fingerprints into a depictor with SMARTS matching
and atom/bond coloring. For an introduction to
modal fingerprints see, Stigmata  An Algorithm
to determine Structural Commonalities in Diverse
Data Sets , N. E. Shemetulskis, et al., JCICS,
1996, 36(4), 862-871.
7
HIVRT Non-nucleoside Inhibitors
RED Common modal features Black Not in
modal Threshold set to 1.0
8
Top Selling 200 Drugs
RED,GREEN, BLUE Common modal
features Black Not in modal Threshold set to
0.8
9
Acknowledgements
  • OpenEye Scientific Software
  • OEChem
  • Sunset Molecular Discovery, LLC
  • Wombat Database
  • ChemAxon
  • Marvin Tools
Write a Comment
User Comments (0)
About PowerShow.com