Title: Building Hypotheses and Searching Databases
1Building Hypotheses and Searching Databases
2Two ways of creating an hypothesis
- Automatically create a chemical feature-based
hypothesis from a set of compounds with respect
to a type of activity. - Build an hypothesis by assembling substructures
and chemical functions and specifying the
geometric constraints between them.
3Components of an Hypothesis
- Assemble the components from known data such as
the atomic coordinates available from X-ray
crystallographic data. - Express the characteristics of the hypothesis as
a collection of particular chemical substructures
or a collection of chemical functions such as
hydrogen bond donors and hydrophilic groups, or a
combination of substructures and chemical
functions.
4Chemical Substructures Available
- The feature dictionary contains a large library
of chemical functional groups such as primary,
secondary, tertiary amines, hydroxyl, carbonyl,
acridyl, acetoamido, 1-beta-glucopyranosyl, amino
acids etc, etc.
5Chemical Functions Available
- The chemical functions available include HB
ACCEPTOR, HB ACCEPTOR lipid, HB DONOR,
HYDROPHOBIC, HYDROPHOBIC aliphatic, HYDROPHOBIC
aromatic, NEG CHARGE, NEG IONIZABLE, POS CHARGE,
POS IONIZABLE, RING AROMATIC
6Chemical Functions Available
- The distances, angles, and/or torsions between
items in an hypothesis, the preferred location of
a chemical feature, and a range of elements per
atom position may be specified within the
hypothesis or a substructure of the hypothesis. - Excluded volumes may also be specified.
7Using the hypothesis?
- Having built the hypothesis databases may then be
searched with the hypothesis to find compounds
within the databases that match the hypothesis.
8Building a Substructure Hypotheses and Searching
Databases
9Hydrogen count set to anything
10Specifying atom range per atom position
11Specifying atom range per atom position
12Searching a Database with an Hypothesis
- Once the hypothesis has been designed and built
it may be used to search a database for compounds
that contain the defined features.
13Results of Database search. Default 300 hits.
14Hit example 1
15Hit example 1
16Compound data
17Compound sort. Handling large datasets.
18Compound property report.
19Managing Databases Coping with extremely large
numbers.
- Are scientists going to manage in the new world
without an in depth knowledge of mathematics and
statistics? - Are we training scientists to cope with tools
such as, cluster analysis, discriminate analysis,
cross validation techniques, neural networks,
Fourier transformations etc, etc. - Do we need courses in the design, building and
interrogation of databases.
20Building a Feature Based Hypotheses and Searching
Databases
21Setting distance constraints.
22Setting distance constraints.
23Hybrid hypothesis of chemical functions and
fragments.
24Using the generic b-adrenergic agonist to search
a database.
25Databases?
- Global structural databases e.g. CCSD
- Structure specific databases.
- Therapeutic area based databases.
- Multi-conformer databases.
- Composite databases that encapsulate the
information of multi-gigabyte files. - QSAR based databases.
- Commercially available versus problem specific
databases.