Pharmacophore Identification by Data Mining - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Pharmacophore Identification by Data Mining

Description:

Self Organizing Map of 21 D1 Agonist Rules ... Granted by Ministry of Education, Culture, Sports, Science and Technology, Japan. ... – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0
Slides: 34
Provided by: dml4
Category:

less

Transcript and Presenter's Notes

Title: Pharmacophore Identification by Data Mining


1
Pharmacophore Identification by Data Mining
  • Takashi Okada, Masumi Yamakawa, Satoshi Fujishima
    and Norito Ohmori
  • Department of Informatics, Kwansei Gakuin
    University,
  • 2-1 Gakuen, Sanda, Hyogo, 669-1337 Japan Email
    okada-office_at_ksc.kwansei.ac.jp

2
Contents
  • Problem Definition Using an Example of Dopamine
    D1 Agonist
  • Linear Fragments
  • The Scheme of Analysis
  • The Cascade Model
  • Rules and Selection
  • Refinement from a Rule Condition to the
    Supporting Structure Chart
  • Basic Active Structures
  • Concluding Remarks

3
Dopamine agonists antagonists
  • Dopamine is a kind of neurotransmitter compound.
  • There are 6 kinds of dopamine receptors D1 - D5
    and Dauto.
  • If we have too much dopamine, then schizophrenic
    symptoms develop.
  • If we have too little dopamine, then Parkinson's
    symptoms develop.
  • Agonists and antagonists to the receptors are
    used as drugs.
  • The problem is to extract characteristic
    substructures of agonists and antagonists for
    each receptor.

Ligand
4
Source Data and Basic Active Structures
Dopamine Agonists Data
List of Basic Active Structures
  • MDDR database 2003.1 by MDL Prous
  • 3 Dopamine agonists data D1, D2, Dauto
  • Omission of stereo- optical isomers


DG-H
and 4 miscellaneous compounds
5
Contents
  • Problem Definition Using an Example of Dopamine
    D1 Agonist
  • Linear Fragments
  • The Scheme of Analysis
  • The Cascade Model
  • Rules and Selection
  • Refinement from a Rule Condition to the
    Supporting Structure Chart
  • Pull-up of Basic Active Structures
  • Concluding Remarks

6
Linear Fragments as Descriptors
Lowercase for aromatic atoms
CO treated as a united atom
2 neighbors Hydrogen gt 0
O2H-c3--CO2-N3H
2 terminal atoms
Atom symbols are omitted here
(colon) for aromatic bond
Friendly notation for chemists Avoids too
detailed expression
7
Selection of Descriptors
  • Initial number of fragments generated 4626
  • Selection of fragments by the appearnace
    probability 3 lt P(fragment) lt 97
    . 660 fragments selected.
  • Omission of a fragment in a correlated
    pair Rij gt 0.9 306 fragments are
    selected.
  • Meaningful fragments are added manually 29
    fragments are added
  • Finally 335 fragments are used as descriptors.

8
Contents
  • Problem Definition Using an Example of Dopamine
    D1 Agonist
  • Linear Fragments
  • The Scheme of Analysis
  • The Cascade Model
  • Rules and Selection
  • Refinement from a Rule Condition to the
    Supporting Structure Chart
  • Basic Active Structures
  • Concluding Remarks

9
Scheme of Analysis (Dopamine Agonists)
Dopamine D1, D2, Dauto Agonists 407 compounds
  • Final Result
  • List of Basic Active Structures
  • Supporting Structures Chart

Chemist Driven Refinement
Supporting Strcutures Chart
10
Contents
  • Problem Definition Using an Example of Dopamine
    D1 Agonist
  • Linear Fragments
  • The Scheme of Analysis
  • The Cascade Model
  • Rules and Selection
  • Refinement from a Rule Condition to the
    Supporting Structure Chart
  • Basic Active Structures
  • Concluding Remarks

11
Sum of Squares Decomposition for Categorical
DataUsing SS Definition by Gini
800/200 0.16 TSS 160
BSS(2)
BSS(1)
BSS(1) 18
BSS(2) 72
WSS(2)
760/40 0.0475 WSS(1) 38
40/160 0.16 WSS(2) 32
18 38 72 32 160
positives / negatives Sample variance Sum of
squares
TSS WSS(1) BSS(1) WSS(2) BSS(2)
12
Cascade Model Local Correlation and Rule
E the activity
A y
Main condition
IF B y added on A y Cases 100 ?
60 THEN E y 60 ? 90, BSS 5.40 D y 60 ?
93, BSS 6.67
Precondition
Collateral correlations
A y, B y
13
Contents
  • Problem Definition Using an Example of Dopamine
    D1 Agonist
  • Linear Fragments
  • The Scheme of Analysis
  • The Cascade Model
  • Rules and Selection
  • Refinement from a Rule Condition to the
    Supporting Structure Chart
  • Basic Active Structures
  • Concluding Remarks

14
Computation of rules for D1 agonists
  • Lattice generation
  • 5928 nodes using thres 0.125
  • 1242 candidate links (BSS gt 4070.007)
  • Optimization of links resulted in 323 rules with
    BSS gt 3690.015
  • Organization of rules resulted in 12 principal
    rules and 41 relative rules.
  • Select rules with D1 ratio gt 0.8 and compounds
    gt 10,
  • Finally, 2 principal and 19 relative rules
    resulted.

15
Self Organizing Map of 21 D1 Agonist Rules
  • SOM procedure placed 21 rules in the right figure
    using supporting compounds (y/n) as variables.
  • The coverage-based algorithm selected 3 rules
    (shown by red color), which covers 69 compounds
    out of 74 actives.

16
The Strongest Rule for Dopamine D1 agonists
  • Rule 1 Cases 407 -gt 60 BSS 35.41
  • IF O2H-c3c3-O2H y
  • added on
  • THEN D1Ag 0.82 0.18 ? 0.05 0.95 (off on)

Main condition catechol
THEN DAuAg 0.52 0.48 ? 1.00 0.00 (off
on) THEN C4H-C4H-c3-O2H 0.79 0.21 ? 0.07
0.93 (n y) THEN C4H-C4H-c3-O2H 0.81 0.19 ?
0.02 0.98 (n y) THEN N3H-C4H--c3-O2H 0.86
0.14 ? 0.33 0.67 (n y) THEN N3H-C4H--c3-O2H 0
.90 0.10 ? 0.32 0.68 (n y) THEN
C4H-N3H--C4H-c3 0.85 0.15 ? 0.38 0.62 (n y)
  • Catechol is the key substructure.
  • Real pharmacophore needs an amino group,which
    appears in collateral correlations.
  • Interpretation is a hard process.

17
Contents
  • Problem Definition Using an Example of Dopamine
    D1 Agonist
  • Linear Fragments
  • The Scheme of Analysis
  • The Cascade Model
  • Rules and Selection
  • Refinement from a Rule Condition to the
    Supporting Structure Chart
  • Basic Active Structures
  • Concluding Remarks

18
Refinement Process Starting from OccO
0
1
2
3
4
  • Insertion of bond-atom pair to every position.
  • Greedy search for the highest BSS structure.
  • If a seed captures the essential
    substructure,the refinement reaches a reasonable
    result.

8
19
Structure Refinement System
All data Active 74 Inactive 334
a seed SMARTS string
a seed fragment
Select by seed Active 57 Inactive 3
Target D1 agonist
Reference D1, D2, Dauto agonists
At step 7 Active 57 Inactive 2
20
Part of Supporting Structures chart at Step 7
21
Contents
  • Problem Definition Using an Example of Dopamine
    D1 Agonist
  • Linear Fragments
  • The Scheme of Analysis
  • The Cascade Model
  • Rules and Selection
  • Refinement from a Rule Condition to the
    Supporting Structure Chart
  • Basic Active Structures
  • Concluding Remarks

22
Necessity of Listing Basic Active Structures
  • Given a common substructure
    supporting structure chart
  • There are still dissimilar structure types.
  • A computer scientist thinks OK, if supplemented
    by a comment to frequently appearing skeletons.
  • Synthetic organic chemists thinklisting basic
    active structures is essential
  • because the function as a drug necessitates the
    presence of surrounding substructures.
  • They said they can extract these structures
    easily viewing the supporting structures chart..

23
Extraction of Basic Active StructuresDriven by a
Chemist
  • A chemist recognizes a candidate basic structure,
    and gives it to the refinement system.
  • The process repeats until it gives a satisfactory
    yes/no hits.
  • The extraction of an active structure repeats
    until the list of basic active structures cover
    most compounds.
  • The final knowledge base consists of
  • A list of basic active structures with SMARTS
  • Reference to the supporting compounds chart for
    each active structure
  • Miscellaneous compounds not covered by the active
    structures

24
Basic Active Structures from D1 Agonists
actives / inactives
25
Miscellaneous Compound Structures Not Covered by
the 8 Basic Active Structures
26
Basic Active Structure
  • The basic active structure is a subjective
    concept.
  • The structure possesses the discriminating power
    for the target activity tested by the refinement
    system.
  • In this sense, the structure is different from
    the frequent appearing skeleton in active
    compounds.
  • This listing stimulates chemists' recognition of
    the pharmacophore, and it is very useful for the
    design of new drugs.

Non DG
DG-A
DG-C
DG-unknown1
Non DG
DG-B
DG-unknown2
DG-D
Miscellaneous Active compounds
Nobody knows the exact boundary.
Non DG
27
Contents
  • Problem Definition Using an Example of Dopamine
    D1 Agonist
  • Linear Fragments
  • The Scheme of Analysis
  • The Cascade Model
  • Rules and Selection
  • Refinement from a Rule Condition to the
    Supporting Structure Chart
  • Basic Active Structures
  • Concluding thought

28
Discrimination vs. Understanding
  • Discrimination can be done by just HO-cc-OH.
  • An amino group is definitely necessary as a D1
    agonist. Success of a rule as a discrimination
    model depends on the biased character of the
    dataset.
  • Incorporation of Surrounding Information
  • Inclusion of collateral correlations into a rule.
  • Aggregation of the surrounding components by the
    refinement system.
  • Essential components in the surroundings are
    added and catalogued as the basic active
    structure by medicinal chemists using the
    refinement system.

29
Conclusion and Future Plan
  • We could pull up basic active structures of
    dopamine D1 agonists, which are effective in the
    design of new drugs.
  • Project of Pharmacophore Knowledge Base
  • Current stage (some entry is ongoing)
  • Dopamine D1, D2, Dauto Agonist
  • Dopamine D1-D4 Antagonist
  • 5HT 1A, 1B, 1C, 1D, 1F, 3, 4 Agonist
  • 5HT 1A, 1D, 2A, 2B, 2C, 3, 4 Antagonist
  • Dopamine, 5HT Reuptake Inhibitor
  • Adrenergic a1, a2, ß1 Blocker
  • Adrenergic a2 Agonist
  • Adenosine A1, A2 Agonist
  • Adenosine A1 - A3 Anatagonist
  • Website will open by the end of this year.
  • Analysis results of 100 activities by March 2009

30
Acknowledgements
  • Thanks go to
  • Chemists Dr. Mori and Dr. Horikawa and Ms.
    Kamiguchi for their efforts in the analysis and
    useful comments to the system.
  • Students Mr. Nakano and Mr. Kitajima for the
    development of supporting software.
  • Granted by Ministry of Education, Culture,
    Sports, Science and Technology, Japan.
  • Thank you for your attention.

31
Structure Refinement Using Unconnected Structure
  • Seed fragment C-NH0CCCcc-OH1.N!H0
    active/inactive 16 / 1
  • Refined fragment C(-C(-C(-C(-C(-C)))))-NH0(-C
    (-C(-C)))C(-C(-c(c(c(c)))))CCcc-OH1.N!H0
    (-C(-C)) active/inactive 16 / 0contains 2
    structures

32
Cascade model 1
  • Itemset Lattice
  • itemset as node
  • items inclusion as link
  • class distribution as node property

33
Cascade model 2
  • Nodes as Lakes with Potential
  • Links as Waterfalls with Power
  • High Power Waterfalls as Rules

Mixed
  • Questions
  • Potential definition
  • Power definition
  • Cascade construction
  • Selection of waterfalls

Pure
Potential Class purity
Write a Comment
User Comments (0)
About PowerShow.com