In silico screening in modern drug discovery research - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

In silico screening in modern drug discovery research

Description:

In silico screening in modern drug discovery research. Presented by Olga Komina ... MW, CmOnHk ,hydrophobicity. 2D-descriptors encode chemical topology ... – PowerPoint PPT presentation

Number of Views:1351
Avg rating:3.0/5.0
Slides: 49
Provided by: okom
Category:

less

Transcript and Presenter's Notes

Title: In silico screening in modern drug discovery research


1
In silico screening in modern drug discovery
research
  • Presented by Olga Komina
  • Department of Computer Science Engineering
  • University of Nebraska Lincoln
  • July 2004

2
Modern Drug Discovery
  • Multidisciplinary area of research
  • Combinarotial chemistry
  • Chemoinformatics
  • Molecular biology
  • Biochemistry
  • Medicine
  • Macromolecular modeling
  • Pharmacology
  • Drug Discovery is a goal of research. Methods and
    approaches from different science areas can be
    applied to achieve the goal.

3
Drug Discovery Pipeline
  • Target identification and validation
  • Assay development
  • Virtual screening (VS)
  • High throughput screening (HTS)
  • Quantitative structure activity relationship
    (QSAR) and refinement of compounds
  • Characterization of prospective drugs
  • Testing on animals for activity and side effects
  • Clinical trials
  • FDA approval

4
Computer-aided Drug Design Strategy
5
Mechanism of Drug Action
6
Virtual Screening (VS)
  • In silico screening of large compound databases
    in order to reduce the scale of high-throughput
    screening.
  • Conceptual diversity
  • Small molecule screening
  • Protein structure based screening
  • Algorithmic diversity
  • Similarity searching
  • Clustering and partitioning
  • Simple filters
  • Artificial intelligence
  • Integration of different computational approaches
  • Similarity paradox

7
Similarity Paradox
8
Descriptors of Molecular Structure Properties
  • 1D-descriptors encode chemical composition
    physicochemical properties
  • MW, CmOnHk ,hydrophobicity
  • 2D-descriptors encode chemical topology
  • Connectivity indices, degree of branching, degree
    of flexibility, of aromatic bonds
  • 3D-descriptors encode 3D shape, volume,
    functionality, surface area
  • Pharmacophore the spatial arrangement of
    chemical groups that determines its activity

9
Connectivity Indices
  • Connectivity of an atom
  • of atoms connected to it
  • Connectivity of a bond -
  • the reciprocal of the square root of the product
    of the connectivities of the atoms
  • Connectivity index of a molecule summation of
    all bond connectivities

Isobytul alcohol
10
Classification of Atoms to Atom Types
  • Developed for prediction of log P values
  • A molecule characterized by the count of 120 atom
    types
  • Atom type commonly occurring atomic states of
    C, H, O, N, S, P, Se and
  • halogens (F, Cl, Br, I)

11
Atom Types (Carbon)
12
Example Description of a Molecule by the Count
of Atom Types
13
Molecular Fingerprints
  • Molecule A 00011101010
  • Molecule B 00101111000
  • Tanimoto coefficient Tc

Nc
3
Tc

NA NB - Nc
5 5 - 3
Nc the number of common bits set on NA the
number of bits set in A NB the number of bits
set in B.
14
Drugs vs. Non-drugs
  • Enriching screening libraries with drug-like
    compounds
  • fail fast, fail cheap strategy
  • Manual classification is time-consuming and bias
  • Computational approaches speeds up the screening,
    reduce the size and improves the quality of
    combinatorial libraries
  • Assumption typical drugs have something in
    common that other compounds lack

15
Lipinski Rule of Five (1997)
  • Poor absorption and permeation are more likely to
    occur when there are more than 5 hydrogen-bond
    donors, more than 10 hydrogen-bond acceptors, the
    molecular mass is greater than 500, or the log P
    value is greater than 5.
  • Further research studied a broader range of
    physicochemical and structural properties
  • Related problems
  • Compound toxicity
  • Compound mutagenicity
  • Blood-brain barrier penetration
  • Central nervous system activity

16
Data Sets
  • Drug Databases
  • World Drug Index (WDI)
  • Comprehensive Medical Chemistry (CMC)
  • MACCS-II Drug Data Report (MDDR)
  • Non-drug Databases
  • Available Chemical Directory (ACD)
  • Quality of training sets

17
Artificial Neural Networks
  • ANNs are self learning systems which learn from
    experience
  • Biologically inspired
  • Neuron is a processing element
  • Artificial neuron simulates four basic functions
    of a natural neuron
  • Receives input from other sources
  • Combines those inputs in some way
  • Performs nonlinear operations on the result
  • Outputs the final result

18
Artificial Neuron
19
Network Topology
20
ANN Training
  • Supervised both inputs and outputs are provided
  • Initial weights chosen randomly
  • Errors propagated back through the system to
    adjust weights
  • Most common algorithm backward-error propagation
    (back-propagation)

21
ANNs for Drug Classification (1998)
  • Input Counts of atom types
  • Topology 92 x 5 x 1
  • Feedforward with backpropagation
  • Training 5000 ACD and 5000 WDI
  • Accuracy 83 - ACD, 77 - WDI

22
ANNs for Drug Classification (1998)
  • Input seven 1D descriptors (MW, log P, aromatic
    density) and ISIS fingerprints
  • Topology 173 x 0/5/10 x 1
  • Bayesian learning procedure
  • Training 3500 ACD and 3500 CMC
  • Accuracy 90 - CMC, 80 - MDDR, 90 - ACM

23
Misclassification Examples
Misclassified non-drug
Misclassified drug
24
ANNs to Predict Biological Activity
  • Applications
  • CNS-active compounds
  • Protein kinase inhibitors
  • G protein-coupled receptor ligands
  • Best prediction accuracy 80
  • Advantage capable of predicting structurally
    diverse compounds
  • Disadvantage no definite rules

25
Recursive Partitioning
  • Statistical method for analyzing and mining large
    data sets that consists of active and inactive
    molecules
  • HTS data analyzed to discover SAR
  • Easy to visualize and interpret
  • Applicable to a variety of classification problem
  • A problem of assigning chemical compounds to
    property classes based on their structural and
    physicochemical features

26
Partitioning Problem Definition
  • Given a training set of D descriptor values and P
    property values for each molecule in the set, the
    question is to create a set of yes/no questions
    which are organized into hierarchical tree from
    with one question per node and class predictions
    at leaf nodes with minimum classification error.

27
Single Property RP
  • Single property classification such as molecules
    classified active or inactive
  • Drugs vs. Non-drugs
  • C4.5, C5.0

28
Single Property RP (cont.)
  • All possible questions are asked based on single
    descriptor values, scores of corresponding
    partitions are computed
  • Descriptor resulting in the best score is used
    to grow the tree
  • Loop to question asking until terminating
    condition is met

29
Gini Impurity Metric
  • Impurity, I, of a node
  • I ? pipj
  • where pi and pj are the fractions of the members
    of a node that belong to class value i and j
    respectively
  • Gini metric maximizes the decrease in Impurity,
    ?I, from a potencial node question
  • ?I I pLIL pRIR
  • where pL and pR are the fractions of the node
    members that partition to nodes L and R
    respectively for a given question, and IL and IR
    are the impurities of new nodes

30
Tree Growth
Entire Training set
Root
Descriptor 3
yes
no
Node L
Node R
Pruning phase metric R? R? ?Nleaf R? the
number of misclassifications in the training set
31
Application of Single Property RP for
Drug/Non-drug Classification
  • Input 120 atom types
  • C5.0
  • Training 5000 WDI, 5000 ACD
  • Prediction error 21
  • The presence of alcohols, tertiary and secondary
    amines, phenols, enols, and carboxylic groups
    accounts for 75 of correct classifications for
    drugs.

32
Decision Tree for Drug/Non-drug Classification
33
Multiple Property RP
  • SP is not sufficient in many biological systems
  • ADMET properties
  • Absorbtion
  • Distribution
  • Metabolism
  • Excretion
  • Toxicity
  • Nonspecific binding to multiple targets causes
    side effects
  • Dependent properties

34
Partially Unified Multiple Property RP
  • Developed for prediction of multiple dependent
    properties
  • Discover features that distinguishes the classes
    of different properties and make them similar
  • Some node apply to all properties while others
    apply to only single properties
  • Classes are NOT mutually exclusive
  • Nodes are labeled with one class of a single
    property type

35
Mapping to SP Representation
  • D descriptor values x1, x2, x3, , xD
  • P property values y1, y2, y3, , yP
  • New descriptor K is a property descriptor
  • x1, x2, x3, 1, y1
  • x1, x2, x3, y1, y2, y3 x1, x2, x3, 2,
    y2
  • x1, x2, x3, 3, y3
  • Every path from the root to a leaf has a split on
    the descriptor K

36
1. Pure Specific Tree
2. Generic node growth Max ( Min ?I k ) gt 0
k
37
PUMP-RP (cont.)
  • A split with an improvement for each property is
    chosen
  • The metric maximizes the minimum decrease in
    impurity from each potential node question
  • A compound may appear in more than one leaf node
  • Each K node is regrown recursively
  • The resulting tree is overgeneralized

38
Finding the Best Tree
  • R?? Ro ?(Nleaf - ?Ngeneric)
  • Where ? is a generality parameter,
  • Ngeneric is the number of generic nodes

39
Application of PUMP-RP for Drug Specificity
  • Cyclooxygenase (COX) inhibitors
  • COX-2 inhibitors are antiinflammatory agents
  • COX-1 inhibitors damage gastrointestinal tract
  • Good drug should be highly specific to COX-2
  • Celebrex, Vioxx are widely prescribed
  • Goal to obtain a model of activity and
    selectivity of COX-2 inhibitors as a function of
    their physicochemical properties

40
Data and Results
  • 100 2D and 3D descriptors
  • Each property has two classes active and
    inactive
  • Gini Impurity score
  • Accuracy
  • on the training set
  • 60-80 COX-2, 78-91 COX-1
  • On the test set
  • 50-89 COX-2, 60-100 COX-1
  • Disadvantage not capable of predicting compounds
    with molecular scaffolds not yet discovered

41
Extension to PUMP-RP
  • To model systems with more than two properties
  • semi-generic node applies to more than one
    property but not all
  • To model multiple properties with opportunity to
    observe what properties are more closely related
    than others
  • Problems to apply
  • ADMET properties
  • Activity/ADMET properties
  • COX-2/COX-1/Drugs
  • Drug-drug interactions based on target
    specificity
  • Modified Gini Impurity score

42
Gini Impurity for the Extended PUMP-RP
  • Modified scoring function
  • Max (Max (Min ?Ik)), where k P

k
43
Tree Built by the Extended PUMP-RP
44
Targeting RNA
  • Emerging field in drug discovery
  • RNA plays an essential role in many biological
    processes
  • Natural antibiotics are RNA-targeting drugs
    (streptomycin, tetracycline, etc)
  • Potential drug targets viral RNAs
  • Antisense strategy

45
Targeting RNA
  • HTS against RNA targets less successful that for
    protein targets
  • Identification of new classes of RNA ligands are
    extremely rare
  • Limited knowledge of the chemistry and structure
    of RNA recognition
  • Consists of 4 nucleotides less diverse than
    proteins, RNA flexibility

46
What Can Be Done?
  • Assumption compounds binding RNA have something
    in common that other compounds lack
  • Dataset a comprehensive database containing
    examples of bindings between small molecules and
    RNAs
  • Computational approaches to extract common
    features of such compounds and to train models
    for prediction (AI methods)

47
Concluding Remarks
  • Drug Discovery is a goal of multidisciplinary
    research
  • No algorithm to discover a drug
  • Old problem given a compound structure, what are
    its properties?
  • Computational approaches can assist drug
    discovery process
  • Limitation lack of systematic biological data
  • Market pressure and prospective profit bring more
    and more resources into drug discovery

48
Multilevel Neighborhoods of Atoms
phenol
Write a Comment
User Comments (0)
About PowerShow.com