Title: Pharmacophore Identification by Data Mining
1Pharmacophore Identification by Data Mining
- Takashi Okada, Masumi Yamakawa, Satoshi Fujishima
and Norito Ohmori - Department of Informatics, Kwansei Gakuin
University, - 2-1 Gakuen, Sanda, Hyogo, 669-1337 Japan Email
okada-office_at_ksc.kwansei.ac.jp
2Contents
- Problem Definition Using an Example of Dopamine
D1 Agonist - Linear Fragments
- The Scheme of Analysis
- The Cascade Model
- Rules and Selection
- Refinement from a Rule Condition to the
Supporting Structure Chart - Basic Active Structures
- Concluding Remarks
3Dopamine agonists antagonists
- Dopamine is a kind of neurotransmitter compound.
- There are 6 kinds of dopamine receptors D1 - D5
and Dauto. - If we have too much dopamine, then schizophrenic
symptoms develop. - If we have too little dopamine, then Parkinson's
symptoms develop. - Agonists and antagonists to the receptors are
used as drugs. - The problem is to extract characteristic
substructures of agonists and antagonists for
each receptor.
Ligand
4Source Data and Basic Active Structures
Dopamine Agonists Data
List of Basic Active Structures
- MDDR database 2003.1 by MDL Prous
- 3 Dopamine agonists data D1, D2, Dauto
- Omission of stereo- optical isomers
DG-H
and 4 miscellaneous compounds
5Contents
- Problem Definition Using an Example of Dopamine
D1 Agonist - Linear Fragments
- The Scheme of Analysis
- The Cascade Model
- Rules and Selection
- Refinement from a Rule Condition to the
Supporting Structure Chart - Pull-up of Basic Active Structures
- Concluding Remarks
6Linear Fragments as Descriptors
Lowercase for aromatic atoms
CO treated as a united atom
2 neighbors Hydrogen gt 0
O2H-c3--CO2-N3H
2 terminal atoms
Atom symbols are omitted here
(colon) for aromatic bond
Friendly notation for chemists Avoids too
detailed expression
7Selection of Descriptors
- Initial number of fragments generated 4626
- Selection of fragments by the appearnace
probability 3 lt P(fragment) lt 97
. 660 fragments selected. - Omission of a fragment in a correlated
pair Rij gt 0.9 306 fragments are
selected. - Meaningful fragments are added manually 29
fragments are added - Finally 335 fragments are used as descriptors.
8Contents
- Problem Definition Using an Example of Dopamine
D1 Agonist - Linear Fragments
- The Scheme of Analysis
- The Cascade Model
- Rules and Selection
- Refinement from a Rule Condition to the
Supporting Structure Chart - Basic Active Structures
- Concluding Remarks
9Scheme of Analysis (Dopamine Agonists)
Dopamine D1, D2, Dauto Agonists 407 compounds
- Final Result
- List of Basic Active Structures
- Supporting Structures Chart
Chemist Driven Refinement
Supporting Strcutures Chart
10Contents
- Problem Definition Using an Example of Dopamine
D1 Agonist - Linear Fragments
- The Scheme of Analysis
- The Cascade Model
- Rules and Selection
- Refinement from a Rule Condition to the
Supporting Structure Chart - Basic Active Structures
- Concluding Remarks
11Sum of Squares Decomposition for Categorical
DataUsing SS Definition by Gini
800/200 0.16 TSS 160
BSS(2)
BSS(1)
BSS(1) 18
BSS(2) 72
WSS(2)
760/40 0.0475 WSS(1) 38
40/160 0.16 WSS(2) 32
18 38 72 32 160
positives / negatives Sample variance Sum of
squares
TSS WSS(1) BSS(1) WSS(2) BSS(2)
12Cascade Model Local Correlation and Rule
E the activity
A y
Main condition
IF B y added on A y Cases 100 ?
60 THEN E y 60 ? 90, BSS 5.40 D y 60 ?
93, BSS 6.67
Precondition
Collateral correlations
A y, B y
13Contents
- Problem Definition Using an Example of Dopamine
D1 Agonist - Linear Fragments
- The Scheme of Analysis
- The Cascade Model
- Rules and Selection
- Refinement from a Rule Condition to the
Supporting Structure Chart - Basic Active Structures
- Concluding Remarks
14Computation of rules for D1 agonists
- Lattice generation
- 5928 nodes using thres 0.125
- 1242 candidate links (BSS gt 4070.007)
- Optimization of links resulted in 323 rules with
BSS gt 3690.015 - Organization of rules resulted in 12 principal
rules and 41 relative rules. - Select rules with D1 ratio gt 0.8 and compounds
gt 10, - Finally, 2 principal and 19 relative rules
resulted.
15Self Organizing Map of 21 D1 Agonist Rules
- SOM procedure placed 21 rules in the right figure
using supporting compounds (y/n) as variables. - The coverage-based algorithm selected 3 rules
(shown by red color), which covers 69 compounds
out of 74 actives.
16The Strongest Rule for Dopamine D1 agonists
- Rule 1 Cases 407 -gt 60 BSS 35.41
- IF O2H-c3c3-O2H y
- added on
- THEN D1Ag 0.82 0.18 ? 0.05 0.95 (off on)
-
Main condition catechol
THEN DAuAg 0.52 0.48 ? 1.00 0.00 (off
on) THEN C4H-C4H-c3-O2H 0.79 0.21 ? 0.07
0.93 (n y) THEN C4H-C4H-c3-O2H 0.81 0.19 ?
0.02 0.98 (n y) THEN N3H-C4H--c3-O2H 0.86
0.14 ? 0.33 0.67 (n y) THEN N3H-C4H--c3-O2H 0
.90 0.10 ? 0.32 0.68 (n y) THEN
C4H-N3H--C4H-c3 0.85 0.15 ? 0.38 0.62 (n y)
- Catechol is the key substructure.
- Real pharmacophore needs an amino group,which
appears in collateral correlations.
- Interpretation is a hard process.
17Contents
- Problem Definition Using an Example of Dopamine
D1 Agonist - Linear Fragments
- The Scheme of Analysis
- The Cascade Model
- Rules and Selection
- Refinement from a Rule Condition to the
Supporting Structure Chart - Basic Active Structures
- Concluding Remarks
18Refinement Process Starting from OccO
0
1
2
3
4
- Insertion of bond-atom pair to every position.
- Greedy search for the highest BSS structure.
- If a seed captures the essential
substructure,the refinement reaches a reasonable
result.
8
19Structure Refinement System
All data Active 74 Inactive 334
a seed SMARTS string
a seed fragment
Select by seed Active 57 Inactive 3
Target D1 agonist
Reference D1, D2, Dauto agonists
At step 7 Active 57 Inactive 2
20Part of Supporting Structures chart at Step 7
21Contents
- Problem Definition Using an Example of Dopamine
D1 Agonist - Linear Fragments
- The Scheme of Analysis
- The Cascade Model
- Rules and Selection
- Refinement from a Rule Condition to the
Supporting Structure Chart - Basic Active Structures
- Concluding Remarks
22Necessity of Listing Basic Active Structures
- Given a common substructure
supporting structure chart - There are still dissimilar structure types.
- A computer scientist thinks OK, if supplemented
by a comment to frequently appearing skeletons. - Synthetic organic chemists thinklisting basic
active structures is essential - because the function as a drug necessitates the
presence of surrounding substructures. - They said they can extract these structures
easily viewing the supporting structures chart..
23Extraction of Basic Active StructuresDriven by a
Chemist
- A chemist recognizes a candidate basic structure,
and gives it to the refinement system. - The process repeats until it gives a satisfactory
yes/no hits. - The extraction of an active structure repeats
until the list of basic active structures cover
most compounds. - The final knowledge base consists of
- A list of basic active structures with SMARTS
- Reference to the supporting compounds chart for
each active structure - Miscellaneous compounds not covered by the active
structures
24Basic Active Structures from D1 Agonists
actives / inactives
25Miscellaneous Compound Structures Not Covered by
the 8 Basic Active Structures
26Basic Active Structure
- The basic active structure is a subjective
concept. - The structure possesses the discriminating power
for the target activity tested by the refinement
system. - In this sense, the structure is different from
the frequent appearing skeleton in active
compounds. - This listing stimulates chemists' recognition of
the pharmacophore, and it is very useful for the
design of new drugs.
Non DG
DG-A
DG-C
DG-unknown1
Non DG
DG-B
DG-unknown2
DG-D
Miscellaneous Active compounds
Nobody knows the exact boundary.
Non DG
27Contents
- Problem Definition Using an Example of Dopamine
D1 Agonist - Linear Fragments
- The Scheme of Analysis
- The Cascade Model
- Rules and Selection
- Refinement from a Rule Condition to the
Supporting Structure Chart - Basic Active Structures
- Concluding thought
28Discrimination vs. Understanding
- Discrimination can be done by just HO-cc-OH.
- An amino group is definitely necessary as a D1
agonist. Success of a rule as a discrimination
model depends on the biased character of the
dataset. - Incorporation of Surrounding Information
- Inclusion of collateral correlations into a rule.
- Aggregation of the surrounding components by the
refinement system. - Essential components in the surroundings are
added and catalogued as the basic active
structure by medicinal chemists using the
refinement system.
29Conclusion and Future Plan
- We could pull up basic active structures of
dopamine D1 agonists, which are effective in the
design of new drugs. - Project of Pharmacophore Knowledge Base
- Current stage (some entry is ongoing)
- Dopamine D1, D2, Dauto Agonist
- Dopamine D1-D4 Antagonist
- 5HT 1A, 1B, 1C, 1D, 1F, 3, 4 Agonist
- 5HT 1A, 1D, 2A, 2B, 2C, 3, 4 Antagonist
- Dopamine, 5HT Reuptake Inhibitor
- Adrenergic a1, a2, ß1 Blocker
- Adrenergic a2 Agonist
- Adenosine A1, A2 Agonist
- Adenosine A1 - A3 Anatagonist
- Website will open by the end of this year.
- Analysis results of 100 activities by March 2009
30Acknowledgements
- Thanks go to
- Chemists Dr. Mori and Dr. Horikawa and Ms.
Kamiguchi for their efforts in the analysis and
useful comments to the system. - Students Mr. Nakano and Mr. Kitajima for the
development of supporting software. - Granted by Ministry of Education, Culture,
Sports, Science and Technology, Japan. - Thank you for your attention.
31Structure Refinement Using Unconnected Structure
- Seed fragment C-NH0CCCcc-OH1.N!H0
active/inactive 16 / 1 - Refined fragment C(-C(-C(-C(-C(-C)))))-NH0(-C
(-C(-C)))C(-C(-c(c(c(c)))))CCcc-OH1.N!H0
(-C(-C)) active/inactive 16 / 0contains 2
structures
32Cascade model 1
- Itemset Lattice
- itemset as node
- items inclusion as link
- class distribution as node property
33Cascade model 2
- Nodes as Lakes with Potential
- Links as Waterfalls with Power
- High Power Waterfalls as Rules
Mixed
- Questions
- Potential definition
- Power definition
- Cascade construction
- Selection of waterfalls
Pure
Potential Class purity