Title: Deepak Bandyopadhyay
1 Deepak Bandyopadhyay
Bridging Multiple Disciplines Advances in
Target-Specific Drug Discovery
CS/algorithms
Cheminfo
Bioinfo
Data Mining
Graphics/ Info Viz
2Pictorial research overview
Structural Bioinformatics
Graphics / Visualization
Graph Mining
- Protein Function Inference
Cheminformatics
3Graphics / Visualization
Apple iPhone, 2007
Painting on Real objects with projectors, 2001
4Radial clustergram
- Hierarchical cluster visualization
- Dendrogram
- Clustergram
- Radial clustergram
D. K. Agrafiotis, D. Bandyopadhyay and M. Farnum.
J. Chem. Inf. Model. 2007 4769-75
M. Schonlau. Computational Statistics 2004
19(1) 95-111
5Geometry overview
3.8Å
t
t
2
1
2
1
t
3
DelProb(Dt1t2t3) 1 S pi in ?(t1,t2,t3)
6Structural Bioinformatics
- Packing scores distinguish native protein from
decoys/predictions
Left Fewer AD tetrahedra in native vs. CASP5
pred. Right distinction power of a statistical
packing score is invariant to perturbation
- Method for secondary structure assignmentfinds
irregular a/b not found by standard (DSSP) method
1bg5
AD
7Structural Bioinformatics
- Conformational change and protein flexibility
- ? flexible region
- ? isolated flexible residue
Ovotransferrin, e color 0, 0.01-0.1, 0.1-0.5,
0.5-1,1-2
closed form
open form
open form
closed form
LYS109
Tryptophan tRNA synthetase (TrpRS)
VAL118
VAL118
PHE108
ILE183
ILE176
ARG182
PRO177
PHE5
SER6
PRO10
ILE14
MD simulation (8 structures)courtesy M.Kapustina
Experimental (5 structures)solved in C.W.Carter
Jr. lab
8In-Depth Cheminformatics
- Pharmacophore Development
- New self-organizing method
- Conformational Analysis
- Using Stochastic Proximity Embedding
- Effect of permutations
- Visualization
- Radial clustergram
9Stochastic Proximity Embedding Agrafiotis Xu
2002
after isometric SPE
- Learn intrinsic dimension and structure of a high
dimensional dataset - Express as low-dimensional embedding
- Points self-organize in an iterative step that
applies high-D constraints to low-D coords - Scales linearly with data set size
D. K. Agrafiotis and H. Xu "A self-organizing
principle for learning nonlinear manifolds",
Proc. Natl. Acad. Sci. USA, 2002, 99, 15869-15872
10Conformational Sampling with SPE Xu, Izrailev
and Agrafiotis, 2003
- High-D space chemical/steric constraints
- Low-D (3D) space coords of low-energy
conformation(s), initially randomized - Iterative step
- Choose random pair of atoms
- Compare current distance between them to minimum
and maximum allowed distance - If they are too far apart, move closer
- If they are too close, move apart
- Samples conformation space well Xu et al, 2003
H. Xu, S. Izrailev, and D. K. Agrafiotis,
"Conformational sampling by self-organization",
J. Chem. Info. Comput. Sci., 2003, 43, 1186-1191
11Pharmacophore
- Components
- Chemical feature points
- Geometry (distance/angle)
- Applications
- Ligand-based VS
- Receptor-based VS
- compactly represents active site
- accommodates flexibility
- finds off-target hits
- Docking
- QSAR
- Scaffold hopping
- De novo design
Pharmacophore representation filter search
query
12SPE for molecular alignment and pharmacophore
development
- Aim Flexible molecular alignment and elucidation
of pharmacophore hypothesis into 3D, starting
from - 1D/2D structures of active molecules
- correspondence between matching groups
- Method highlights
- Simultaneous iterative conformational analysis
alignment - Stochastic self-organizing algorithm
C1C2(Cl)CC3(Br)CC1(F)CC(I)(C2)C3 C1C(Cl)CC(F)CC1(B
r) C(F)CC(I)(CCBr)CCCl
Bandyopadhyay D, Agrafiotis DK. A New
self-organizing algorithm for molecular
alignment and pharmacophore development. 2007,
submitted
13Constraint Types
14SPE for multiple molecule alignment
- Input all molecules (SMILES/2D SDF)
- Initial 3D coordinates, if provided else random
- Input constraints for each molecule
- Pphore constraints from literature or hypothesis
generator - Aggregate all molecules/constraints into one
- Renumber atom numbers in constraints
- Run modified SPE on aggregate molecule
- Pick a random constraint to enforce (dist, vol,
ext, pphore) - Iterate and check for convergence
- Postprocess output (optional)
- Distance geometry refinement
- Energy minimization
15Aligning pharmacophore groups
- Groups, rings, hydrophobic regions
- Motion/error/gradient from centroids computed on
the fly (no pseudoatoms) - Aromatic rings (centroids superimposed)
- Compute normal vectors on the fly
- Motion Rotate all points about centroid to align
normal vectors - Asymmetric rotation works better
- Smaller of parallel or antiel alignment
- Evaluation arc length rq as distance
- Gradient d/dx(1-cos2q), x on rotating ring
d gt dmax
d
?
d lt dmax q gt qmax
r?
16Pharmacophore from alignment
- From all trials, extract unique alignments
- pairwise distance comparison
- For each unique alignment found
- Radius of pphore point sphere
- RMSD of aligned coords across trials
- Radius of centroid
- RMSD of centroid across trials avg. radius
centroid?point - Normal vector angle deviation
- Cone width
- Also, flattening of ellipsoid
17Opioid Pharmacophorefor pain-killing (not
addictive) property
- 14 opioids and their human/synthetic analogs
MET-enkephalin (endorphin)
Norepinephrine (hormone)
Brema-zocine
Cycla-zocine
Codeine
Levorphanol
Fentanyl
Etorphine
Buprenorphine
Demerol
Methadone
Sources http//www.pharmacy.umaryland.edu/courses
/PHAR531/lectures_old/drug_discovery_2.html
http//www.neurosci.pharm.utoledo.edu/MBC33
20/opioids.htm
18Opioid alignment and pharmacophore
1 unique alignment (2 views)
Extracted pharmacophore after refinement and
minimization
Radii (from alignment RMSD) ? 0.74 Å ? 1.85 Å
? 0.53 Å Ring normal deviation 19.4 Pairwise
distances ? - ? 4.7 Å 0.58 Å ? - ?
7.0 Å 0.48 Å ? - ? 2.8 Å 0.02 Å
19P-glycoprotein bindersPearce et al, PNAS
1989Varma and Hou, Accelrys case study
Dimethyl amino benzoyl methyl reserpate
Rhodamine 123
Reserpine
Rescinnamine
Trimethyl benzoyl yohimbine
Verapamil
Benzoyl yohimbine
H L Pearce, A R Safa, N J Bach, M A Winter, M C
Cirtain, and W T Beck. Essential features of the
P-glycoprotein pharmacophore as defined by a
series of reserpine analogs that modulate
multidrug resistance. Proc Natl Acad Sci U S A.
1989 July 86(13) 51285132.
20P-Glycoprotein binder alignment
- All 7 P-glycoprotein binders only
reserpine/yohimbine scaffold
21P-Glycoprotein binder pharmacophore
- 5 unique alignments (2 filtered out because of
significant sphere overlap)
Radii (from alignment RMSD) ? 3.7Å ? 1.1Å ?
3.9Å ? 1.9Å Ring normal deviation 23.0
59.3 Pairwise distances ?-? 5.4Å 0.18Å ?-?
5.7Å 0.16Å ?-? 5.5Å 0.45Å ?-? 5.8Å
0.15Å ?-? 5.3Å 0.07Å ?-? 4.0Å 0.25Å
Radii (from alignment RMSD) ? 3.3Å ? 0.4Å ?
3.2Å ? 0.5Å Ring normal deviation 27.1
37.4 Pairwise distances ?-? 5.3Å 0.18Å ?-?
5.7Å 0.15Å ?-? 5.8Å 0.21Å ?-? 7.2Å
0.08Å ?-? 6.3Å 0.36Å ?-? 4.1Å 0.45Å
Radii (from alignment RMSD) ? 3.4Å ? 0.5Å ?
3.4Å ? 0.5Å Ring normal deviation 18.2
24.9 Pairwise distances ?-? 5.4Å 0.22Å ?-?
4.6Å 0.30Å ?-? 6.2Å 0.47Å ?-? 5.9Å
0.12Å ?-? 5.8Å 0.39Å ?-? 4.0Å 0.45Å
22HIV-1 protease inhibitorsWang et al., 1996
Wang S, Milne GW, Yan X, Posey IJ, Nicklaus MC,
Graham L, Rice WG. J Med Chem. 1996 May
1039(10)2047-54.
23Ligand vs. structure-based pphore
24Performance
- Without refinement/minimization, similar to SPE
- 0.01sec-1sec per alignment, of 3-15 mol, 10-100
atoms each - With refinement/minimization, slower
- E.g. 2x slower for refinement, 4x slower for
minimization - Quality since SPE samples conformation space
well, this method should sample alignment space
well - Typically, only 1-10 iterations refinement needed
- Alignments almost at distance geometry minima
- New fragment-based conformational analysis (SOS,
Zhu et al.2006) - Produces better raw geometries starting from
minimized fragments - Removes need for energy minimization of
conformations
25Effect of Permuted Inputon Conformer Generation
Canonical SMILES
XRAY
Permuted SMILES
4DFR, methotrexate bound to dihydrofolate
reductase
Carta G, et al. J. Comput. Aided Mol. Des..,
2006, 20, 179
26Permuting does not change conformational space
sampled by SPE
1CBX
1ETT
1HVR
Unique Conformations vs. Sampling
4DFR
1GLQ
27Permuted Input vs Chance
10 random runs of one permuted input
10 permuted SDFs
10 permuted SMILES
28Conclusions
- Iterative ensemble conformational analysis is a
fast method for molecular alignment and
pharmacophore development - SPE provides a fast
and robust kernel that is easy to adapt for this
task - SPE is invariant to permuted input, unlike
other distance geometry-based conformational
sampling algorithms.
Papers - D. Bandyopadhyay and D.K. Agrafiotis. A
New Self-Organizing Algorithm for Molecular
Alignment and Pharmacophore Development.
submitted. - D. K. Agrafiotis, D. Bandyopadhyay,
G. Carta, A. J. S. Knox and D. G. Lloyd. On the
Effects of Permuted Input on Conformational
Sampling An Evaluation of Stochastic Proximity
Embedding (SPE). Chemical Biology Drug Design,
2007, to appear. - D. K. Agrafiotis, D.
Bandyopadhyay, J. K. Wegner and H. van Vlijmen.
Recent Advances in Chemoinformatics. Journal of
Chemical Information and Modeling, 2007, to
appear. - D. K. Agrafiotis, D. Bandyopadhyay and
M. Farnum. Radial Clustergrams Visualizing the
Aggregate Properties of Hierarchical Clusters.
Journal of Chemical Information and Modeling,
2007, 4769-75.
29In-Depth
- Graph Data Mining
- Protein Function Inference
30Motivation
- Pharmaceutical product pipeline is shrinking
- Need for new approaches
- Drugs often fail due to off-target effects
- lack of specificity
- Target space still small
- restricted to well-characterized proteins
- genomics offers new target possibilities
Data from Overington et al. Nature Reviews Drug
Discovery 5, 993996 (December 2006)
doi10.1038/nrd2199
31Need for function inference
SEQUENCE
From Jan 1, 1999 through Apr 19, 2005 1605 SG
structures deposited 669 annotated unknown
function 382 have no significant global
structure similarity with proteins of known
function (DALI z-score lt 10)Holm and Sander,
1993
Function determination status for protein
sequences in fully-sequenced genomes of four
organisms. Data from Koonin and Galperin (2002).
Rightmost bar is cumulative
32Approaches to protein function determination
- Experimental characterization
- Relatively slow, but sure
- Computational inference
- Annotation transfer by seq./str. homology
- Requires / depends on alignment
- Often unreliable Wilson et al., 2000 Aloy et
al., 2001 - Annotation by active site templates / motifs
- Extracted out of proteins with known function
- Annotation by other computational methods
- ML / data mining / knowledge based
- De novo prediction of aspects of function
global or local similarity
33Potential benefits of function inference to drug
discovery
Ofran et al., 2005
- Focus experimental function determination effort
- Expand the target space
- Find new targets for unmet medical needs
- Find alternate binding sites on proteins
- Predict adverse drug reactions or toxicity
- Repurpose existing drugs to new targets
34Graph Representation of Protein Structure
Nodes Amino acid identity, Ca chain
coordinates Edges sequence adjacency,
spatial proximity
Image courtesy Luke Huan, UNC-CS
Sequence edge
Proximity edge
Subgraph mining (FFSM) Huan et al., 2003, 2004
- Input database of labeled undirected graphs
support 0 lt ? ? 1
? 2/3
- Output All (connected) frequent subgraphs from
the graph database
35Graph Edge RepresentationHuan, Bandyopadhyay,
Wang, Snoeyink, Prins and Tropsha, J Comp Biol
2005
- Contact distance graph (CD)
- All Ca lt8.5Å dense
- Delaunay tessellation (DT)
- Empty sphere geometric property
- Sparse, but unstable for imprecise points
- Almost-Delaunay edge graph (AD)
- Denser than DT, sparser than CD.
- Designed for imprecise points
- Fast and robust to find frequent patterns
36Families and background
- Structural classification of proteins (SCOP)
Murzin et al, 1995 - hierarchical
- uses global structure similarity
- Protein family
- Group of proteins with related
structure/function - Background dataset
- 6500 proteins with no twosequences gt90
identicalWang and Dunbrack, 2003 - represents all proteins
37Family-specific Fingerprints
- Goal To identify family-specific fingerprints
- subgraphs that are frequent in a family (gt 80)
- are rare in all proteins (infrequent in
background, lt 5) - Typical families have 10 to 1000 fingerprints
with 4-8 residues - Example largest fingerprint in serine protease
family, below
1LO6
Blue catalytic triad, known functionally
important Grey others
38Function inference using fingerprints
- Select families
- Model protein structures by graphs
- Mine frequent subgraphs from family
- Filter those rare in background as fingerprints
- Search for fingerprints in new structure
- Subgraph isomorphism with graph index
- Assign significance
- Distribution of fingerprints in background
TRAINING
INFERENCE
39Significance from fingerprints
- Plot specificity vs. sensitivity at different
fingerprints (ROC curve) - Define cutoff points for sensitivity, 99
specificity of family membership
sens cutoff
spec cutoff
40Fingerprints discriminate structurally similar
families
- Fingerprints distinguish 20 structurally similar
families of TIM fold - Diagonal has high fingerprints, off-diagonal
low - Exceptions super/sub-family pairs, families with
weak fingerprints
? TIM barrel (super)families w/ fingerprints ?
? (super)families in which fingerprints occur ?
41Function inference of structural orphans
- Structural orphan proteins from Structural
Genomics - structure known, function unknown
- no structure similarity to other proteins
- Function inferences for 80 orphans with 99
specificity - Out of 382 orphans deposited between 1999 and
April 2005
From Jan 1, 1999 through Apr 19, 2005 1605 SG
structures deposited 669 annotated unknown
function 382 have no global structure similarity
with proteins of known function (DALI z-score lt10)
42Structural Genomics Function inference I
Kinemages auto- generated from PDB files and
fingerprints
Metallo-dependent hydrolase 8-stranded ba (TIM)
barrel fold 17 members, 49 fingerprints
unknown function 7-stranded barrel fold 30/49
fingerprints found
43Residues hit by fingerprints
Figures made in VMD
1nfg
1m65
SCOP 51556
CASP5 T0147
Ycdx
Metallo-dependent hydrolase 8-stranded ba (TIM)
barrel fold 17 members, 49 fingerprints
unknown function 7-stranded barrel fold 30/49
fingerprints found
Acidic
Basic
Polar
Hphobic
44Structural genomics function inference II
Figures made in VMD
1twu
Yyce
Antibiotic resistance protein Glyoxalase /
bleomycin resistance / dioxygenase superfamily 4
members, 62 fingerprints
unknown function, not in SCOP 1.67, DALI z lt 10
in Nov 2004 46/62 fingerprints found DALI z now
gt10 with newly discovered family members
45False ve? Or potential new insight?
Therapeutically important family Nuclear
receptor ligand-binding domain, unique a-helical
fold 23 members, 67 fingerprints
46False ve? Or potential new insight?
Unknown function cyanobacterial orange
carotenoid-binding protein C-term ab nuclear
transport factor (NTF2) N-term unique fold, all
a-helical 48/67 fingerprints found (mostly in
N-term)
Family Nuclear receptor ligand-binding
domain Unique a-helical fold 23 members, 67
fingerprints
47Directions for further work
- Structural genomics/unknown function inference
- Sequence annotation models/predictions Arakaki
et al., 2004 - Structure annotation complement seq./struc.
homology and functional site templates Aloy et
al., 2001 Ofran et al., 2005 - Classifying protein families based on similar
fingerprints - Automatically inferring function-specific
residues and geometric patterns Polacco
Babbitt 2006 - Detecting alternate binding sites Ofran et al.,
2005 - Drug discovery based on family-specific
fingerprints
48Acknowledgements
- Subgraph mining / function inference
collaborators - Cheminformatics collaborators
- Dimitris K. Agrafiotis
- Fangqiang Zhu
- Mike Farnum
- Al Gibbs
- Renee Desjarlais
- Max Cummings
49Publications
- Bandyopadhyay, D., Huan, J., Liu, J., Prins, J.,
Snoeyink, J., Wang, W. and Tropsha, A.
Structure-based function inference using protein
family-specific fingerprints, Protein Science
156, pp. 1537-1543, June 2006. Supplementary
material http//www.cs.unc.edu/debug/papers/Func
Inf - J. Huan, D. Bandyopadhyay, J. Prins, J. Snoeyink,
A. Tropsha, and W. Wang. Distance-based
identification of spatial motifs in proteins
using constrained frequent subgraph mining. In
proceedings of the LSS Computational Systems
Bioinformatics conference (CSB), 2006 - J. Huan, D. Bandyopadhyay, W. Wang, J. Snoeyink,
J. Prins, A. Tropsha. Comparing three graph
representations for Journal of Computational
Biology, 126, pp. 657-671, 2005. - J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink,
J. Prins, Alexander Tropsha (2004). Finding
Protein Family-specific residue packing patterns
in Protein Structure Graphs. RECOMB 2004. - Bandyopadhyay, D. and J. Snoeyink (2004).
Almost-Delaunay simplices Nearest neighbor
relations for imprecise points. ACM-SIAM
Symposium On Discrete Algorithms (SODA04).
http//www.cs.unc.edu/debug/papers/AlmDel - Bandyopadhyay, D. and J. Snoeyink (2004).
Almost-Delaunay simplices Robust nearest
neighbor relations for imprecise points in CGAL.
Second CGAL User Workshop, 2004. Software
http//www.cs.unc.edu/debug/software - Bandyopadhyay, D., A. Tropsha and J. Snoeyink.
Analyzing Protein Structure using Almost-Delaunay
Tetrahedra. UNC-CS Technical Report TR03-043,
2003. Poster presented at RECOMB 2004, March
2004, San Diego, CA.
D. Bandyopadhyay et al., Prot. Sci. June 2006
50Additional References
- Overington JP, Al-Lazikani B, Hopkins AL. How
many drug targets are there? Nat Rev Drug Discov.
2006 Dec5(12)993-6. Review. - Polacco BJ, Babbitt PC. Automated discovery of 3D
motifs for protein function annotation.
Bioinformatics. 2006 Mar 1522(6)723-30. - Hambly K, Danzer J, Muskal S, Debe DA.
Interrogating the druggable genome with
structural informatics. Mol Divers. 2006
Aug10(3)273-81. - Ofran Y, Punta M, Schneider R, Rost B. Beyond
annotation transfer by homology Novel
protein-function prediction methods to assist
drug discovery. Drug Discovery Today. 2005 10
14751482. - Arakaki AK, Zhang Y, Skolnick J. Large-scale
assessment of the utility of low-resolution
protein structures for biochemical function
assignment. Bioinformatics. 2004 May
120(7)1087-96. - Huan J, Wang W, and Prins J. Efficient Mining of
Frequent Subgraph in the Presence of Isomorphism,
2003, Proc. 3rd IEEE International Conference on
Data Mining (ICDM), pp. 549-552. - Wang G, Dunbrack RL Jr. PISCES a protein
sequence culling server. Bioinformatics. 2003 Aug
1219(12)1589-91. - Koonin EV, Galperin MY. Sequence-Evolution-Functio
n Computational Approaches in Comparative
Genomics. 2002, Kluwer Academic Publishers
(published online on NCBI bookshelf, 2003). - Aloy P, Querol E, Aviles FX, Sternberg MJ.
Automated structure-based prediction of
functional sites in proteins applications to
assessing the validity of inheriting protein
function from homology in genome annotation and
to protein docking. J Mol Biol. 2001 Aug
10311(2)395-408. - Wilson CA, Kreychman J, Gerstein M. Assessing
annotation transfer for genomics quantifying the
relations between protein sequence, structure and
function through traditional and probabilistic
scores. J Mol Biol. 2000 Mar 17297(1)233-49. - Murzin AG, Brenner SE, Hubbard T, Chothia C.
SCOP a structural classification of proteins
database for the investigation of sequences and
structures. J Mol Biol. 1995 Apr 7247(4)536-40. - Holm L, Sander C. Protein structure comparison by
alignment of distance matrices. J Mol Biol. 1993
Sep 5233(1)123-38.