Deepak Bandyopadhyay - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Deepak Bandyopadhyay

Description:

Deepak Bandyopadhyay – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 51
Provided by: McMi4
Category:

less

Transcript and Presenter's Notes

Title: Deepak Bandyopadhyay


1
Deepak Bandyopadhyay
Bridging Multiple Disciplines Advances in
Target-Specific Drug Discovery
CS/algorithms
Cheminfo
Bioinfo
Data Mining
Graphics/ Info Viz
2
Pictorial research overview
Structural Bioinformatics
  • Geometry

Graphics / Visualization
Graph Mining
  • Protein Function Inference

Cheminformatics
3
Graphics / Visualization
Apple iPhone, 2007
Painting on Real objects with projectors, 2001
4
Radial clustergram
  • Hierarchical cluster visualization
  • Dendrogram
  • Clustergram
  • Radial clustergram

D. K. Agrafiotis, D. Bandyopadhyay and M. Farnum.
J. Chem. Inf. Model. 2007 4769-75
M. Schonlau. Computational Statistics 2004
19(1) 95-111
5
Geometry overview
  • Robust nearest neighbors
  • Delaunay tessellation

3.8Å
  • Effect of imprecision
  • Probabilistic neighbors

t
t
2
1
2
1
t
3
DelProb(Dt1t2t3) 1 S pi in ?(t1,t2,t3)
6
Structural Bioinformatics
  • Packing scores distinguish native protein from
    decoys/predictions

Left Fewer AD tetrahedra in native vs. CASP5
pred. Right distinction power of a statistical
packing score is invariant to perturbation
  • Method for secondary structure assignmentfinds
    irregular a/b not found by standard (DSSP) method

1bg5
AD
7
Structural Bioinformatics
  • Conformational change and protein flexibility
  • ? flexible region
  • ? isolated flexible residue

Ovotransferrin, e color 0, 0.01-0.1, 0.1-0.5,
0.5-1,1-2
closed form
open form
open form
closed form
LYS109
Tryptophan tRNA synthetase (TrpRS)
VAL118
VAL118
PHE108
ILE183
ILE176
ARG182
PRO177
PHE5
SER6
PRO10
ILE14
MD simulation (8 structures)courtesy M.Kapustina
Experimental (5 structures)solved in C.W.Carter
Jr. lab
8
In-Depth Cheminformatics
  • Pharmacophore Development
  • New self-organizing method
  • Conformational Analysis
  • Using Stochastic Proximity Embedding
  • Effect of permutations
  • Visualization
  • Radial clustergram

9
Stochastic Proximity Embedding Agrafiotis Xu
2002
after isometric SPE
  • Learn intrinsic dimension and structure of a high
    dimensional dataset
  • Express as low-dimensional embedding
  • Points self-organize in an iterative step that
    applies high-D constraints to low-D coords
  • Scales linearly with data set size

D. K. Agrafiotis and H. Xu "A self-organizing
principle for learning nonlinear manifolds",
Proc. Natl. Acad. Sci. USA, 2002, 99, 15869-15872
10
Conformational Sampling with SPE Xu, Izrailev
and Agrafiotis, 2003
  • High-D space chemical/steric constraints

  • Low-D (3D) space coords of low-energy
    conformation(s), initially randomized
  • Iterative step
  • Choose random pair of atoms
  • Compare current distance between them to minimum
    and maximum allowed distance
  • If they are too far apart, move closer
  • If they are too close, move apart
  • Samples conformation space well Xu et al, 2003

H. Xu, S. Izrailev, and D. K. Agrafiotis,
"Conformational sampling by self-organization",
J. Chem. Info. Comput. Sci., 2003, 43, 1186-1191
11
Pharmacophore
  • Components
  • Chemical feature points
  • Geometry (distance/angle)
  • Applications
  • Ligand-based VS
  • Receptor-based VS
  • compactly represents active site
  • accommodates flexibility
  • finds off-target hits
  • Docking
  • QSAR
  • Scaffold hopping
  • De novo design

Pharmacophore representation filter search
query
12
SPE for molecular alignment and pharmacophore
development
  • Aim Flexible molecular alignment and elucidation
    of pharmacophore hypothesis into 3D, starting
    from
  • 1D/2D structures of active molecules
  • correspondence between matching groups
  • Method highlights
  • Simultaneous iterative conformational analysis
    alignment
  • Stochastic self-organizing algorithm

C1C2(Cl)CC3(Br)CC1(F)CC(I)(C2)C3 C1C(Cl)CC(F)CC1(B
r) C(F)CC(I)(CCBr)CCCl
Bandyopadhyay D, Agrafiotis DK. A New
self-organizing algorithm for molecular
alignment and pharmacophore development. 2007,
submitted
13
Constraint Types
14
SPE for multiple molecule alignment
  • Input all molecules (SMILES/2D SDF)
  • Initial 3D coordinates, if provided else random
  • Input constraints for each molecule
  • Pphore constraints from literature or hypothesis
    generator
  • Aggregate all molecules/constraints into one
  • Renumber atom numbers in constraints
  • Run modified SPE on aggregate molecule
  • Pick a random constraint to enforce (dist, vol,
    ext, pphore)
  • Iterate and check for convergence
  • Postprocess output (optional)
  • Distance geometry refinement
  • Energy minimization

15
Aligning pharmacophore groups
  • Groups, rings, hydrophobic regions
  • Motion/error/gradient from centroids computed on
    the fly (no pseudoatoms)
  • Aromatic rings (centroids superimposed)
  • Compute normal vectors on the fly
  • Motion Rotate all points about centroid to align
    normal vectors
  • Asymmetric rotation works better
  • Smaller of parallel or antiel alignment
  • Evaluation arc length rq as distance
  • Gradient d/dx(1-cos2q), x on rotating ring

d gt dmax
d
?
d lt dmax q gt qmax
r?
16
Pharmacophore from alignment
  • From all trials, extract unique alignments
  • pairwise distance comparison
  • For each unique alignment found
  • Radius of pphore point sphere
  • RMSD of aligned coords across trials
  • Radius of centroid
  • RMSD of centroid across trials avg. radius
    centroid?point
  • Normal vector angle deviation
  • Cone width
  • Also, flattening of ellipsoid

17
Opioid Pharmacophorefor pain-killing (not
addictive) property
  • 14 opioids and their human/synthetic analogs

MET-enkephalin (endorphin)
Norepinephrine (hormone)
Brema-zocine
Cycla-zocine
Codeine
Levorphanol
Fentanyl
Etorphine
Buprenorphine
Demerol
Methadone
Sources http//www.pharmacy.umaryland.edu/courses
/PHAR531/lectures_old/drug_discovery_2.html
http//www.neurosci.pharm.utoledo.edu/MBC33
20/opioids.htm
18
Opioid alignment and pharmacophore
1 unique alignment (2 views)
Extracted pharmacophore after refinement and
minimization
Radii (from alignment RMSD) ? 0.74 Å ? 1.85 Å
? 0.53 Å Ring normal deviation 19.4 Pairwise
distances ? - ? 4.7 Å 0.58 Å ? - ?
7.0 Å 0.48 Å ? - ? 2.8 Å 0.02 Å
19
P-glycoprotein bindersPearce et al, PNAS
1989Varma and Hou, Accelrys case study
Dimethyl amino benzoyl methyl reserpate
Rhodamine 123
Reserpine
Rescinnamine
Trimethyl benzoyl yohimbine
Verapamil
Benzoyl yohimbine
H L Pearce, A R Safa, N J Bach, M A Winter, M C
Cirtain, and W T Beck. Essential features of the
P-glycoprotein pharmacophore as defined by a
series of reserpine analogs that modulate
multidrug resistance. Proc Natl Acad Sci U S A.
1989 July 86(13) 51285132.
20
P-Glycoprotein binder alignment
  • All 7 P-glycoprotein binders only
    reserpine/yohimbine scaffold

21
P-Glycoprotein binder pharmacophore
  • 5 unique alignments (2 filtered out because of
    significant sphere overlap)

Radii (from alignment RMSD) ? 3.7Å ? 1.1Å ?
3.9Å ? 1.9Å Ring normal deviation 23.0
59.3 Pairwise distances ?-? 5.4Å 0.18Å ?-?
5.7Å 0.16Å ?-? 5.5Å 0.45Å ?-? 5.8Å
0.15Å ?-? 5.3Å 0.07Å ?-? 4.0Å 0.25Å
Radii (from alignment RMSD) ? 3.3Å ? 0.4Å ?
3.2Å ? 0.5Å Ring normal deviation 27.1
37.4 Pairwise distances ?-? 5.3Å 0.18Å ?-?
5.7Å 0.15Å ?-? 5.8Å 0.21Å ?-? 7.2Å
0.08Å ?-? 6.3Å 0.36Å ?-? 4.1Å 0.45Å
Radii (from alignment RMSD) ? 3.4Å ? 0.5Å ?
3.4Å ? 0.5Å Ring normal deviation 18.2
24.9 Pairwise distances ?-? 5.4Å 0.22Å ?-?
4.6Å 0.30Å ?-? 6.2Å 0.47Å ?-? 5.9Å
0.12Å ?-? 5.8Å 0.39Å ?-? 4.0Å 0.45Å
22
HIV-1 protease inhibitorsWang et al., 1996
Wang S, Milne GW, Yan X, Posey IJ, Nicklaus MC,
Graham L, Rice WG. J Med Chem. 1996 May
1039(10)2047-54.
23
Ligand vs. structure-based pphore
24
Performance
  • Without refinement/minimization, similar to SPE
  • 0.01sec-1sec per alignment, of 3-15 mol, 10-100
    atoms each
  • With refinement/minimization, slower
  • E.g. 2x slower for refinement, 4x slower for
    minimization
  • Quality since SPE samples conformation space
    well, this method should sample alignment space
    well
  • Typically, only 1-10 iterations refinement needed
  • Alignments almost at distance geometry minima
  • New fragment-based conformational analysis (SOS,
    Zhu et al.2006)
  • Produces better raw geometries starting from
    minimized fragments
  • Removes need for energy minimization of
    conformations

25
Effect of Permuted Inputon Conformer Generation
Canonical SMILES
XRAY
Permuted SMILES
4DFR, methotrexate bound to dihydrofolate
reductase
Carta G, et al. J. Comput. Aided Mol. Des..,
2006, 20, 179
26
Permuting does not change conformational space
sampled by SPE
1CBX
1ETT
1HVR
Unique Conformations vs. Sampling
4DFR
1GLQ
27
Permuted Input vs Chance
10 random runs of one permuted input
10 permuted SDFs
10 permuted SMILES
28
Conclusions
- Iterative ensemble conformational analysis is a
fast method for molecular alignment and
pharmacophore development - SPE provides a fast
and robust kernel that is easy to adapt for this
task - SPE is invariant to permuted input, unlike
other distance geometry-based conformational
sampling algorithms.
Papers - D. Bandyopadhyay and D.K. Agrafiotis. A
New Self-Organizing Algorithm for Molecular
Alignment and Pharmacophore Development.
submitted. - D. K. Agrafiotis, D. Bandyopadhyay,
G. Carta, A. J. S. Knox and D. G. Lloyd. On the
Effects of Permuted Input on Conformational
Sampling An Evaluation of Stochastic Proximity
Embedding (SPE). Chemical Biology Drug Design,
2007, to appear. - D. K. Agrafiotis, D.
Bandyopadhyay, J. K. Wegner and H. van Vlijmen.
Recent Advances in Chemoinformatics. Journal of
Chemical Information and Modeling, 2007, to
appear. - D. K. Agrafiotis, D. Bandyopadhyay and
M. Farnum. Radial Clustergrams Visualizing the
Aggregate Properties of Hierarchical Clusters.
Journal of Chemical Information and Modeling,
2007, 4769-75.
29
In-Depth
  • Graph Data Mining
  • Protein Function Inference

30
Motivation
  • Pharmaceutical product pipeline is shrinking
  • Need for new approaches
  • Drugs often fail due to off-target effects
  • lack of specificity
  • Target space still small
  • restricted to well-characterized proteins
  • genomics offers new target possibilities

Data from Overington et al. Nature Reviews Drug
Discovery 5, 993996 (December 2006)
doi10.1038/nrd2199
31
Need for function inference
SEQUENCE
From Jan 1, 1999 through Apr 19, 2005 1605 SG
structures deposited 669 annotated unknown
function 382 have no significant global
structure similarity with proteins of known
function (DALI z-score lt 10)Holm and Sander,
1993
Function determination status for protein
sequences in fully-sequenced genomes of four
organisms. Data from Koonin and Galperin (2002).
Rightmost bar is cumulative
32
Approaches to protein function determination
  • Experimental characterization
  • Relatively slow, but sure
  • Computational inference
  • Annotation transfer by seq./str. homology
  • Requires / depends on alignment
  • Often unreliable Wilson et al., 2000 Aloy et
    al., 2001
  • Annotation by active site templates / motifs
  • Extracted out of proteins with known function
  • Annotation by other computational methods
  • ML / data mining / knowledge based
  • De novo prediction of aspects of function

global or local similarity
33
Potential benefits of function inference to drug
discovery
Ofran et al., 2005
  • Focus experimental function determination effort
  • Expand the target space
  • Find new targets for unmet medical needs
  • Find alternate binding sites on proteins
  • Predict adverse drug reactions or toxicity
  • Repurpose existing drugs to new targets

34
Graph Representation of Protein Structure
Nodes Amino acid identity, Ca chain
coordinates Edges sequence adjacency,
spatial proximity
Image courtesy Luke Huan, UNC-CS
Sequence edge
Proximity edge
Subgraph mining (FFSM) Huan et al., 2003, 2004
  • Input database of labeled undirected graphs
    support 0 lt ? ? 1

? 2/3
  • Output All (connected) frequent subgraphs from
    the graph database

35
Graph Edge RepresentationHuan, Bandyopadhyay,
Wang, Snoeyink, Prins and Tropsha, J Comp Biol
2005
  • Contact distance graph (CD)
  • All Ca lt8.5Å dense
  • Delaunay tessellation (DT)
  • Empty sphere geometric property
  • Sparse, but unstable for imprecise points
  • Almost-Delaunay edge graph (AD)
  • Denser than DT, sparser than CD.
  • Designed for imprecise points
  • Fast and robust to find frequent patterns

36
Families and background
  • Structural classification of proteins (SCOP)
    Murzin et al, 1995
  • hierarchical
  • uses global structure similarity
  • Protein family
  • Group of proteins with related
    structure/function
  • Background dataset
  • 6500 proteins with no twosequences gt90
    identicalWang and Dunbrack, 2003
  • represents all proteins

37
Family-specific Fingerprints
  • Goal To identify family-specific fingerprints
  • subgraphs that are frequent in a family (gt 80)
  • are rare in all proteins (infrequent in
    background, lt 5)
  • Typical families have 10 to 1000 fingerprints
    with 4-8 residues
  • Example largest fingerprint in serine protease
    family, below

1LO6
Blue catalytic triad, known functionally
important Grey others
38
Function inference using fingerprints
  • Select families
  • Model protein structures by graphs
  • Mine frequent subgraphs from family
  • Filter those rare in background as fingerprints
  • Search for fingerprints in new structure
  • Subgraph isomorphism with graph index
  • Assign significance
  • Distribution of fingerprints in background

TRAINING
INFERENCE
39
Significance from fingerprints
  • Plot specificity vs. sensitivity at different
    fingerprints (ROC curve)
  • Define cutoff points for sensitivity, 99
    specificity of family membership

sens cutoff
spec cutoff
40
Fingerprints discriminate structurally similar
families
  • Fingerprints distinguish 20 structurally similar
    families of TIM fold
  • Diagonal has high fingerprints, off-diagonal
    low
  • Exceptions super/sub-family pairs, families with
    weak fingerprints

? TIM barrel (super)families w/ fingerprints ?
? (super)families in which fingerprints occur ?
41
Function inference of structural orphans
  • Structural orphan proteins from Structural
    Genomics
  • structure known, function unknown
  • no structure similarity to other proteins
  • Function inferences for 80 orphans with 99
    specificity
  • Out of 382 orphans deposited between 1999 and
    April 2005

From Jan 1, 1999 through Apr 19, 2005 1605 SG
structures deposited 669 annotated unknown
function 382 have no global structure similarity
with proteins of known function (DALI z-score lt10)
42
Structural Genomics Function inference I
Kinemages auto- generated from PDB files and
fingerprints
Metallo-dependent hydrolase 8-stranded ba (TIM)
barrel fold 17 members, 49 fingerprints
unknown function 7-stranded barrel fold 30/49
fingerprints found
43
Residues hit by fingerprints
Figures made in VMD
1nfg
1m65
SCOP 51556
CASP5 T0147
Ycdx
Metallo-dependent hydrolase 8-stranded ba (TIM)
barrel fold 17 members, 49 fingerprints
unknown function 7-stranded barrel fold 30/49
fingerprints found
Acidic
Basic
Polar
Hphobic
44
Structural genomics function inference II
Figures made in VMD
1twu
Yyce
Antibiotic resistance protein Glyoxalase /
bleomycin resistance / dioxygenase superfamily 4
members, 62 fingerprints
unknown function, not in SCOP 1.67, DALI z lt 10
in Nov 2004 46/62 fingerprints found DALI z now
gt10 with newly discovered family members
45
False ve? Or potential new insight?
Therapeutically important family Nuclear
receptor ligand-binding domain, unique a-helical
fold 23 members, 67 fingerprints
46
False ve? Or potential new insight?
Unknown function cyanobacterial orange
carotenoid-binding protein C-term ab nuclear
transport factor (NTF2) N-term unique fold, all
a-helical 48/67 fingerprints found (mostly in
N-term)
Family Nuclear receptor ligand-binding
domain Unique a-helical fold 23 members, 67
fingerprints
47
Directions for further work
  • Structural genomics/unknown function inference
  • Sequence annotation models/predictions Arakaki
    et al., 2004
  • Structure annotation complement seq./struc.
    homology and functional site templates Aloy et
    al., 2001 Ofran et al., 2005
  • Classifying protein families based on similar
    fingerprints
  • Automatically inferring function-specific
    residues and geometric patterns Polacco
    Babbitt 2006
  • Detecting alternate binding sites Ofran et al.,
    2005
  • Drug discovery based on family-specific
    fingerprints

48
Acknowledgements
  • Subgraph mining / function inference
    collaborators
  • Cheminformatics collaborators
  • Dimitris K. Agrafiotis
  • Fangqiang Zhu
  • Mike Farnum
  • Al Gibbs
  • Renee Desjarlais
  • Max Cummings

49
Publications
  • Bandyopadhyay, D., Huan, J., Liu, J., Prins, J.,
    Snoeyink, J., Wang, W. and Tropsha, A.
    Structure-based function inference using protein
    family-specific fingerprints, Protein Science
    156, pp. 1537-1543, June 2006. Supplementary
    material http//www.cs.unc.edu/debug/papers/Func
    Inf
  • J. Huan, D. Bandyopadhyay, J. Prins, J. Snoeyink,
    A. Tropsha, and W. Wang. Distance-based
    identification of spatial motifs in proteins
    using constrained frequent subgraph mining. In
    proceedings of the LSS Computational Systems
    Bioinformatics conference (CSB), 2006
  • J. Huan, D. Bandyopadhyay, W. Wang, J. Snoeyink,
    J. Prins, A. Tropsha. Comparing three graph
    representations for Journal of Computational
    Biology, 126, pp. 657-671, 2005.
  • J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink,
    J. Prins, Alexander Tropsha (2004). Finding
    Protein Family-specific residue packing patterns
    in Protein Structure Graphs. RECOMB 2004.
  • Bandyopadhyay, D. and J. Snoeyink (2004).
    Almost-Delaunay simplices Nearest neighbor
    relations for imprecise points. ACM-SIAM
    Symposium On Discrete Algorithms (SODA04).
    http//www.cs.unc.edu/debug/papers/AlmDel
  • Bandyopadhyay, D. and J. Snoeyink (2004).
    Almost-Delaunay simplices Robust nearest
    neighbor relations for imprecise points in CGAL.
    Second CGAL User Workshop, 2004. Software
    http//www.cs.unc.edu/debug/software
  • Bandyopadhyay, D., A. Tropsha and J. Snoeyink.
    Analyzing Protein Structure using Almost-Delaunay
    Tetrahedra. UNC-CS Technical Report TR03-043,
    2003. Poster presented at RECOMB 2004, March
    2004, San Diego, CA.

D. Bandyopadhyay et al., Prot. Sci. June 2006
50
Additional References
  • Overington JP, Al-Lazikani B, Hopkins AL. How
    many drug targets are there? Nat Rev Drug Discov.
    2006 Dec5(12)993-6. Review.
  • Polacco BJ, Babbitt PC. Automated discovery of 3D
    motifs for protein function annotation.
    Bioinformatics. 2006 Mar 1522(6)723-30.
  • Hambly K, Danzer J, Muskal S, Debe DA.
    Interrogating the druggable genome with
    structural informatics. Mol Divers. 2006
    Aug10(3)273-81.
  • Ofran Y, Punta M, Schneider R, Rost B. Beyond
    annotation transfer by homology Novel
    protein-function prediction methods to assist
    drug discovery. Drug Discovery Today. 2005 10
    14751482.
  • Arakaki AK, Zhang Y, Skolnick J. Large-scale
    assessment of the utility of low-resolution
    protein structures for biochemical function
    assignment. Bioinformatics. 2004 May
    120(7)1087-96.
  • Huan J, Wang W, and Prins J. Efficient Mining of
    Frequent Subgraph in the Presence of Isomorphism,
    2003, Proc. 3rd IEEE International Conference on
    Data Mining (ICDM), pp. 549-552.
  • Wang G, Dunbrack RL Jr. PISCES a protein
    sequence culling server. Bioinformatics. 2003 Aug
    1219(12)1589-91.
  • Koonin EV, Galperin MY. Sequence-Evolution-Functio
    n Computational Approaches in Comparative
    Genomics. 2002, Kluwer Academic Publishers
    (published online on NCBI bookshelf, 2003).
  • Aloy P, Querol E, Aviles FX, Sternberg MJ.
    Automated structure-based prediction of
    functional sites in proteins applications to
    assessing the validity of inheriting protein
    function from homology in genome annotation and
    to protein docking. J Mol Biol. 2001 Aug
    10311(2)395-408.
  • Wilson CA, Kreychman J, Gerstein M. Assessing
    annotation transfer for genomics quantifying the
    relations between protein sequence, structure and
    function through traditional and probabilistic
    scores. J Mol Biol. 2000 Mar 17297(1)233-49.
  • Murzin AG, Brenner SE, Hubbard T, Chothia C.
    SCOP a structural classification of proteins
    database for the investigation of sequences and
    structures. J Mol Biol. 1995 Apr 7247(4)536-40.
  • Holm L, Sander C. Protein structure comparison by
    alignment of distance matrices. J Mol Biol. 1993
    Sep 5233(1)123-38.
Write a Comment
User Comments (0)
About PowerShow.com