Title: Hugo O. Villar, Ph.D., MBA
1Hugo O. Villar, Ph.D., MBA Mark R. Hansen,
Ph.D. Altoris, Inc. San Diego, CA
www.altoris.com www.chemapps.com
www.patentinformatics.com
2Parameter Driven Data Management
(1,0,1,0,0,0,1,0,0,.) 3.25, 7.24,. 3D
structural data
Toxicology Affinity Activity Phamacokinetics
www.altoris.com www.chemapps.com
www.patentinformatics.com
3Organizing Chemical Data
- Different techniques
- Classification of traditional data types
- Molecular Properties (continuous, binary)
- Clustering, Discriminant Analysis, etc.
- Symbolic Data
- Categorical
- Ranges of properties
www.altoris.com www.chemapps.com
www.patentinformatics.com
4Grouping by Molecular Properties
- Well defined scaffolds lead to well defined
clusters - Neighboring structures are related
www.altoris.com www.chemapps.com
www.patentinformatics.com
5Same scaffold in multiple clusters
Cluster
Better defined scaffolds are found in fewer
clusters
www.altoris.com www.chemapps.com
www.patentinformatics.com
6Same scaffold in multiple clusters
Even less common scaffolds are in multiple
clusters
www.altoris.com www.chemapps.com
www.patentinformatics.com
7Not Systematic
Single cluster
Multiple clusters
www.altoris.com www.chemapps.com
www.patentinformatics.com
8Single cluster, multiple scaffolds
Fingerprint degeneracy, high density (ties), etc.
www.altoris.com www.chemapps.com
www.patentinformatics.com
9Substructure Enumeration
- Alternative to molecular property based
grouping - Possible for even large chemical databases
- Mining the information is challenging
- Large number of substructures
- Multivariate statistics not useful
- Tryptophan
675 Substructures
www.altoris.com www.chemapps.com
www.patentinformatics.com
10Knowledge Based Substructure Enumeration
Tryptophan
(Bone Villar, JCC, 1997)
- Not all substructures are informative
- Relationships add complexity but no information
- Atom by Atom
-
11Knowledge Based Chemical Browsing
- Large number of substructures
- Make computations challenging
- Complicate browsing through data
- Eliminate non informative scaffolds and relations
- Omit trivial extensions
- Optimize number of occurrences
- Organize in parent child relationships
www.altoris.com www.chemapps.com
www.patentinformatics.com
12Knowledge-Base Scaffold Tree
Organized as parent child Nodes are distinct
extensions Trivial depends on use, flexibility
is key Correlate to molecules Correlate to
Bioactivity and Properties
www.altoris.com www.chemapps.com
www.patentinformatics.com
13Trees facilitate browsing
14Scaffold expansion parameters
Library Size
Atom by Atom expansion, grows explosively Scaffold
differentiation affects growth Knowledge Based
growth is controlled
15Occurrence
Number of molecules with a given substructure (N)
Low occurrence substructures can Add up
significantly May contribute little information
NCI Library
( Scaffolds) N ( Scaffolds) Nn ( Scaffolds)
N Mols Diversity measure
16Scaffold Complexity
N3, Maybridge Chemicals
Limiting only to substructures with 2 and 3 rings
can reduce tree complexity Drug like molecules
are already low complexity
www.altoris.com www.chemapps.com
www.patentinformatics.com
17Tree nodes and leaves
N10 , NCI library
Unique scaffolds can be repeated through the tree
in different nodes
www.altoris.com www.chemapps.com
www.patentinformatics.com
18Scaffolds Symbolic Analysis
- Objects to numbers
- Counts of objects is simplest form
- Odds ratios, Proportions (CI), etc.
- Objects to Objects
- Unions, Intersections, OR
www.altoris.com www.chemapps.com
www.patentinformatics.com
19Counts, Objects to numbers
Compute the odds that a scaffold is found in
mutagenic compound
Kho et al., JMC, 2005
20Objects to Objects Venn Diagrams
Given two libraries identify scaffolds unique to
one
Bioblocks, 350 mols
Create a tree for each library Identify the
difference in scaffolds.
Maybridge gt50,000 mols
21Objects to Objects Pharmacophores to Scaffolds
www.altoris.com www.chemapps.com
www.patentinformatics.com
22Identify Bioisosteric Replacements
23Summary
Substructure enumeration Alternative for
chemoinformatics work Large datasets can be
handled efficiently Large datasets can be
organized and viewed Correlation with
properties Object to object Object to
numerical or categorical More research is
needed.
www.altoris.com www.chemapps.com
www.patentinformatics.com
24Additional Information www.altoris.com www.paten
tinformatics.com www.chemapps.com Contact
information hugo_at_altoris.com
PatentInformatics
PatBLAST