Deepak Bandyopadhyay

About This Presentation

Title:

Deepak Bandyopadhyay

Description:

Deepak Bandyopadhyay – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 51

Provided by: McMi4

Category:

more less

Transcript and Presenter's Notes

Title: Deepak Bandyopadhyay

1
Deepak Bandyopadhyay
Bridging Multiple Disciplines Advances in
Target-Specific Drug Discovery
CS/algorithms
Cheminfo
Bioinfo
Data Mining
Graphics/ Info Viz
2
Pictorial research overview
Structural Bioinformatics

Geometry

Graphics / Visualization
Graph Mining

Protein Function Inference

Cheminformatics
3
Graphics / Visualization
Apple iPhone, 2007
Painting on Real objects with projectors, 2001
4
Radial clustergram

Hierarchical cluster visualization
Dendrogram
Clustergram
Radial clustergram

D. K. Agrafiotis, D. Bandyopadhyay and M. Farnum.
J. Chem. Inf. Model. 2007 4769-75
M. Schonlau. Computational Statistics 2004
19(1) 95-111
5
Geometry overview

Robust nearest neighbors

Delaunay tessellation

3.8Å

Effect of imprecision

Probabilistic neighbors

t
t
2
1
2
1
t
3
DelProb(Dt1t2t3) 1 S pi in ?(t1,t2,t3)
6
Structural Bioinformatics

Packing scores distinguish native protein from
decoys/predictions

Left Fewer AD tetrahedra in native vs. CASP5
pred. Right distinction power of a statistical
packing score is invariant to perturbation

Method for secondary structure assignmentfinds
irregular a/b not found by standard (DSSP) method

1bg5
AD
7
Structural Bioinformatics

Conformational change and protein flexibility

? flexible region
? isolated flexible residue

Ovotransferrin, e color 0, 0.01-0.1, 0.1-0.5,
0.5-1,1-2
closed form
open form
open form
closed form
LYS109
Tryptophan tRNA synthetase (TrpRS)
VAL118
VAL118
PHE108
ILE183
ILE176
ARG182
PRO177
PHE5
SER6
PRO10
ILE14
MD simulation (8 structures)courtesy M.Kapustina
Experimental (5 structures)solved in C.W.Carter
Jr. lab
8
In-Depth Cheminformatics

Pharmacophore Development
New self-organizing method
Conformational Analysis
Using Stochastic Proximity Embedding
Effect of permutations
Visualization
Radial clustergram

9
Stochastic Proximity Embedding Agrafiotis Xu
2002
after isometric SPE

Learn intrinsic dimension and structure of a high
dimensional dataset
Express as low-dimensional embedding
Points self-organize in an iterative step that
applies high-D constraints to low-D coords
Scales linearly with data set size

D. K. Agrafiotis and H. Xu "A self-organizing
principle for learning nonlinear manifolds",
Proc. Natl. Acad. Sci. USA, 2002, 99, 15869-15872
10
Conformational Sampling with SPE Xu, Izrailev
and Agrafiotis, 2003

High-D space chemical/steric constraints
Low-D (3D) space coords of low-energy
conformation(s), initially randomized
Iterative step
Choose random pair of atoms
Compare current distance between them to minimum
and maximum allowed distance
If they are too far apart, move closer
If they are too close, move apart
Samples conformation space well Xu et al, 2003

H. Xu, S. Izrailev, and D. K. Agrafiotis,
"Conformational sampling by self-organization",
J. Chem. Info. Comput. Sci., 2003, 43, 1186-1191
11
Pharmacophore

Components
Chemical feature points
Geometry (distance/angle)
Applications
Ligand-based VS
Receptor-based VS
compactly represents active site
accommodates flexibility
finds off-target hits
Docking
QSAR
Scaffold hopping
De novo design

Pharmacophore representation filter search
query
12
SPE for molecular alignment and pharmacophore
development

Aim Flexible molecular alignment and elucidation
of pharmacophore hypothesis into 3D, starting
from
1D/2D structures of active molecules
correspondence between matching groups
Method highlights
Simultaneous iterative conformational analysis
alignment
Stochastic self-organizing algorithm

C1C2(Cl)CC3(Br)CC1(F)CC(I)(C2)C3 C1C(Cl)CC(F)CC1(B
r) C(F)CC(I)(CCBr)CCCl
Bandyopadhyay D, Agrafiotis DK. A New
self-organizing algorithm for molecular
alignment and pharmacophore development. 2007,
submitted
13
Constraint Types
14
SPE for multiple molecule alignment

Input all molecules (SMILES/2D SDF)
Initial 3D coordinates, if provided else random
Input constraints for each molecule
Pphore constraints from literature or hypothesis
generator
Aggregate all molecules/constraints into one
Renumber atom numbers in constraints
Run modified SPE on aggregate molecule
Pick a random constraint to enforce (dist, vol,
ext, pphore)
Iterate and check for convergence
Postprocess output (optional)
Distance geometry refinement
Energy minimization

15
Aligning pharmacophore groups

Groups, rings, hydrophobic regions
Motion/error/gradient from centroids computed on
the fly (no pseudoatoms)
Aromatic rings (centroids superimposed)
Compute normal vectors on the fly
Motion Rotate all points about centroid to align
normal vectors
Asymmetric rotation works better
Smaller of parallel or antiel alignment
Evaluation arc length rq as distance
Gradient d/dx(1-cos2q), x on rotating ring

d gt dmax
d
?
d lt dmax q gt qmax
r?
16
Pharmacophore from alignment

From all trials, extract unique alignments
pairwise distance comparison
For each unique alignment found
Radius of pphore point sphere
RMSD of aligned coords across trials
Radius of centroid
RMSD of centroid across trials avg. radius
centroid?point
Normal vector angle deviation
Cone width
Also, flattening of ellipsoid

17
Opioid Pharmacophorefor pain-killing (not
addictive) property

14 opioids and their human/synthetic analogs

MET-enkephalin (endorphin)
Norepinephrine (hormone)
Brema-zocine
Cycla-zocine
Codeine
Levorphanol
Fentanyl
Etorphine
Buprenorphine
Demerol
Methadone
Sources http//www.pharmacy.umaryland.edu/courses
/PHAR531/lectures_old/drug_discovery_2.html
http//www.neurosci.pharm.utoledo.edu/MBC33
20/opioids.htm
18
Opioid alignment and pharmacophore
1 unique alignment (2 views)
Extracted pharmacophore after refinement and
minimization
Radii (from alignment RMSD) ? 0.74 Å ? 1.85 Å
? 0.53 Å Ring normal deviation 19.4 Pairwise
distances ? - ? 4.7 Å 0.58 Å ? - ?
7.0 Å 0.48 Å ? - ? 2.8 Å 0.02 Å
19
P-glycoprotein bindersPearce et al, PNAS
1989Varma and Hou, Accelrys case study
Dimethyl amino benzoyl methyl reserpate
Rhodamine 123
Reserpine
Rescinnamine
Trimethyl benzoyl yohimbine
Verapamil
Benzoyl yohimbine
H L Pearce, A R Safa, N J Bach, M A Winter, M C
Cirtain, and W T Beck. Essential features of the
P-glycoprotein pharmacophore as defined by a
series of reserpine analogs that modulate
multidrug resistance. Proc Natl Acad Sci U S A.
1989 July 86(13) 51285132.
20
P-Glycoprotein binder alignment

All 7 P-glycoprotein binders only
reserpine/yohimbine scaffold

21
P-Glycoprotein binder pharmacophore

5 unique alignments (2 filtered out because of
significant sphere overlap)

Radii (from alignment RMSD) ? 3.7Å ? 1.1Å ?
3.9Å ? 1.9Å Ring normal deviation 23.0
59.3 Pairwise distances ?-? 5.4Å 0.18Å ?-?
5.7Å 0.16Å ?-? 5.5Å 0.45Å ?-? 5.8Å
0.15Å ?-? 5.3Å 0.07Å ?-? 4.0Å 0.25Å
Radii (from alignment RMSD) ? 3.3Å ? 0.4Å ?
3.2Å ? 0.5Å Ring normal deviation 27.1
37.4 Pairwise distances ?-? 5.3Å 0.18Å ?-?
5.7Å 0.15Å ?-? 5.8Å 0.21Å ?-? 7.2Å
0.08Å ?-? 6.3Å 0.36Å ?-? 4.1Å 0.45Å
Radii (from alignment RMSD) ? 3.4Å ? 0.5Å ?
3.4Å ? 0.5Å Ring normal deviation 18.2
24.9 Pairwise distances ?-? 5.4Å 0.22Å ?-?
4.6Å 0.30Å ?-? 6.2Å 0.47Å ?-? 5.9Å
0.12Å ?-? 5.8Å 0.39Å ?-? 4.0Å 0.45Å
22
HIV-1 protease inhibitorsWang et al., 1996
Wang S, Milne GW, Yan X, Posey IJ, Nicklaus MC,
Graham L, Rice WG. J Med Chem. 1996 May
1039(10)2047-54.
23
Ligand vs. structure-based pphore
24
Performance

Without refinement/minimization, similar to SPE
0.01sec-1sec per alignment, of 3-15 mol, 10-100
atoms each
With refinement/minimization, slower
E.g. 2x slower for refinement, 4x slower for
minimization
Quality since SPE samples conformation space
well, this method should sample alignment space
well
Typically, only 1-10 iterations refinement needed
Alignments almost at distance geometry minima
New fragment-based conformational analysis (SOS,
Zhu et al.2006)
Produces better raw geometries starting from
minimized fragments
Removes need for energy minimization of
conformations

25
Effect of Permuted Inputon Conformer Generation
Canonical SMILES
XRAY
Permuted SMILES
4DFR, methotrexate bound to dihydrofolate
reductase
Carta G, et al. J. Comput. Aided Mol. Des..,
2006, 20, 179
26
Permuting does not change conformational space
sampled by SPE
1CBX
1ETT
1HVR
Unique Conformations vs. Sampling
4DFR
1GLQ
27
Permuted Input vs Chance
10 random runs of one permuted input
10 permuted SDFs
10 permuted SMILES
28
Conclusions
- Iterative ensemble conformational analysis is a
fast method for molecular alignment and
pharmacophore development - SPE provides a fast
and robust kernel that is easy to adapt for this
task - SPE is invariant to permuted input, unlike
other distance geometry-based conformational
sampling algorithms.
Papers - D. Bandyopadhyay and D.K. Agrafiotis. A
New Self-Organizing Algorithm for Molecular
Alignment and Pharmacophore Development.
submitted. - D. K. Agrafiotis, D. Bandyopadhyay,
G. Carta, A. J. S. Knox and D. G. Lloyd. On the
Effects of Permuted Input on Conformational
Sampling An Evaluation of Stochastic Proximity
Embedding (SPE). Chemical Biology Drug Design,
2007, to appear. - D. K. Agrafiotis, D.
Bandyopadhyay, J. K. Wegner and H. van Vlijmen.
Recent Advances in Chemoinformatics. Journal of
Chemical Information and Modeling, 2007, to
appear. - D. K. Agrafiotis, D. Bandyopadhyay and
M. Farnum. Radial Clustergrams Visualizing the
Aggregate Properties of Hierarchical Clusters.
Journal of Chemical Information and Modeling,
2007, 4769-75.
29
In-Depth

Graph Data Mining
Protein Function Inference

30
Motivation

Pharmaceutical product pipeline is shrinking
Need for new approaches
Drugs often fail due to off-target effects
lack of specificity
Target space still small
restricted to well-characterized proteins
genomics offers new target possibilities

Data from Overington et al. Nature Reviews Drug
Discovery 5, 993996 (December 2006)
doi10.1038/nrd2199
31
Need for function inference
SEQUENCE
From Jan 1, 1999 through Apr 19, 2005 1605 SG
structures deposited 669 annotated unknown
function 382 have no significant global
structure similarity with proteins of known
function (DALI z-score lt 10)Holm and Sander,
1993
Function determination status for protein
sequences in fully-sequenced genomes of four
organisms. Data from Koonin and Galperin (2002).
Rightmost bar is cumulative
32
Approaches to protein function determination

Experimental characterization
Relatively slow, but sure
Computational inference
Annotation transfer by seq./str. homology
Requires / depends on alignment
Often unreliable Wilson et al., 2000 Aloy et
al., 2001
Annotation by active site templates / motifs
Extracted out of proteins with known function
Annotation by other computational methods
ML / data mining / knowledge based
De novo prediction of aspects of function

global or local similarity
33
Potential benefits of function inference to drug
discovery
Ofran et al., 2005

Focus experimental function determination effort
Expand the target space
Find new targets for unmet medical needs
Find alternate binding sites on proteins
Predict adverse drug reactions or toxicity
Repurpose existing drugs to new targets

34
Graph Representation of Protein Structure
Nodes Amino acid identity, Ca chain
coordinates Edges sequence adjacency,
spatial proximity
Image courtesy Luke Huan, UNC-CS
Sequence edge
Proximity edge
Subgraph mining (FFSM) Huan et al., 2003, 2004

Input database of labeled undirected graphs
support 0 lt ? ? 1

? 2/3

Output All (connected) frequent subgraphs from
the graph database

35
Graph Edge RepresentationHuan, Bandyopadhyay,
Wang, Snoeyink, Prins and Tropsha, J Comp Biol
2005

Contact distance graph (CD)
All Ca lt8.5Å dense
Delaunay tessellation (DT)
Empty sphere geometric property
Sparse, but unstable for imprecise points
Almost-Delaunay edge graph (AD)
Denser than DT, sparser than CD.
Designed for imprecise points
Fast and robust to find frequent patterns

36
Families and background

Structural classification of proteins (SCOP)
Murzin et al, 1995
hierarchical
uses global structure similarity
Protein family
Group of proteins with related
structure/function
Background dataset
6500 proteins with no twosequences gt90
identicalWang and Dunbrack, 2003
represents all proteins

37
Family-specific Fingerprints

Goal To identify family-specific fingerprints
subgraphs that are frequent in a family (gt 80)
are rare in all proteins (infrequent in
background, lt 5)
Typical families have 10 to 1000 fingerprints
with 4-8 residues
Example largest fingerprint in serine protease
family, below

1LO6
Blue catalytic triad, known functionally
important Grey others
38
Function inference using fingerprints

Select families
Model protein structures by graphs
Mine frequent subgraphs from family
Filter those rare in background as fingerprints
Search for fingerprints in new structure
Subgraph isomorphism with graph index
Assign significance
Distribution of fingerprints in background

TRAINING
INFERENCE
39
Significance from fingerprints

Plot specificity vs. sensitivity at different
fingerprints (ROC curve)
Define cutoff points for sensitivity, 99
specificity of family membership

sens cutoff
spec cutoff
40
Fingerprints discriminate structurally similar
families

Fingerprints distinguish 20 structurally similar
families of TIM fold
Diagonal has high fingerprints, off-diagonal
low
Exceptions super/sub-family pairs, families with
weak fingerprints

? TIM barrel (super)families w/ fingerprints ?
? (super)families in which fingerprints occur ?
41
Function inference of structural orphans

Structural orphan proteins from Structural
Genomics
structure known, function unknown
no structure similarity to other proteins
Function inferences for 80 orphans with 99
specificity
Out of 382 orphans deposited between 1999 and
April 2005

From Jan 1, 1999 through Apr 19, 2005 1605 SG
structures deposited 669 annotated unknown
function 382 have no global structure similarity
with proteins of known function (DALI z-score lt10)
42
Structural Genomics Function inference I
Kinemages auto- generated from PDB files and
fingerprints
Metallo-dependent hydrolase 8-stranded ba (TIM)
barrel fold 17 members, 49 fingerprints
unknown function 7-stranded barrel fold 30/49
fingerprints found
43
Residues hit by fingerprints
Figures made in VMD
1nfg
1m65
SCOP 51556
CASP5 T0147
Ycdx
Metallo-dependent hydrolase 8-stranded ba (TIM)
barrel fold 17 members, 49 fingerprints
unknown function 7-stranded barrel fold 30/49
fingerprints found
Acidic
Basic
Polar
Hphobic
44
Structural genomics function inference II
Figures made in VMD
1twu
Yyce
Antibiotic resistance protein Glyoxalase /
bleomycin resistance / dioxygenase superfamily 4
members, 62 fingerprints
unknown function, not in SCOP 1.67, DALI z lt 10
in Nov 2004 46/62 fingerprints found DALI z now
gt10 with newly discovered family members
45
False ve? Or potential new insight?
Therapeutically important family Nuclear
receptor ligand-binding domain, unique a-helical
fold 23 members, 67 fingerprints
46
False ve? Or potential new insight?
Unknown function cyanobacterial orange
carotenoid-binding protein C-term ab nuclear
transport factor (NTF2) N-term unique fold, all
a-helical 48/67 fingerprints found (mostly in
N-term)
Family Nuclear receptor ligand-binding
domain Unique a-helical fold 23 members, 67
fingerprints
47
Directions for further work

Structural genomics/unknown function inference
Sequence annotation models/predictions Arakaki
et al., 2004
Structure annotation complement seq./struc.
homology and functional site templates Aloy et
al., 2001 Ofran et al., 2005
Classifying protein families based on similar
fingerprints
Automatically inferring function-specific
residues and geometric patterns Polacco
Babbitt 2006
Detecting alternate binding sites Ofran et al.,
2005
Drug discovery based on family-specific
fingerprints

48
Acknowledgements

Subgraph mining / function inference
collaborators
Cheminformatics collaborators
Dimitris K. Agrafiotis
Fangqiang Zhu
Mike Farnum
Al Gibbs
Renee Desjarlais
Max Cummings

49
Publications

Bandyopadhyay, D., Huan, J., Liu, J., Prins, J.,
Snoeyink, J., Wang, W. and Tropsha, A.
Structure-based function inference using protein
family-specific fingerprints, Protein Science
156, pp. 1537-1543, June 2006. Supplementary
material http//www.cs.unc.edu/debug/papers/Func
Inf
J. Huan, D. Bandyopadhyay, J. Prins, J. Snoeyink,
A. Tropsha, and W. Wang. Distance-based
identification of spatial motifs in proteins
using constrained frequent subgraph mining. In
proceedings of the LSS Computational Systems
Bioinformatics conference (CSB), 2006
J. Huan, D. Bandyopadhyay, W. Wang, J. Snoeyink,
J. Prins, A. Tropsha. Comparing three graph
representations for Journal of Computational
Biology, 126, pp. 657-671, 2005.
J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink,
J. Prins, Alexander Tropsha (2004). Finding
Protein Family-specific residue packing patterns
in Protein Structure Graphs. RECOMB 2004.
Bandyopadhyay, D. and J. Snoeyink (2004).
Almost-Delaunay simplices Nearest neighbor
relations for imprecise points. ACM-SIAM
Symposium On Discrete Algorithms (SODA04).
http//www.cs.unc.edu/debug/papers/AlmDel
Bandyopadhyay, D. and J. Snoeyink (2004).
Almost-Delaunay simplices Robust nearest
neighbor relations for imprecise points in CGAL.
Second CGAL User Workshop, 2004. Software
http//www.cs.unc.edu/debug/software
Bandyopadhyay, D., A. Tropsha and J. Snoeyink.
Analyzing Protein Structure using Almost-Delaunay
Tetrahedra. UNC-CS Technical Report TR03-043,
2003. Poster presented at RECOMB 2004, March
2004, San Diego, CA.

D. Bandyopadhyay et al., Prot. Sci. June 2006
50
Additional References

Overington JP, Al-Lazikani B, Hopkins AL. How
many drug targets are there? Nat Rev Drug Discov.
2006 Dec5(12)993-6. Review.
Polacco BJ, Babbitt PC. Automated discovery of 3D
motifs for protein function annotation.
Bioinformatics. 2006 Mar 1522(6)723-30.
Hambly K, Danzer J, Muskal S, Debe DA.
Interrogating the druggable genome with
structural informatics. Mol Divers. 2006
Aug10(3)273-81.
Ofran Y, Punta M, Schneider R, Rost B. Beyond
annotation transfer by homology Novel
protein-function prediction methods to assist
drug discovery. Drug Discovery Today. 2005 10
14751482.
Arakaki AK, Zhang Y, Skolnick J. Large-scale
assessment of the utility of low-resolution
protein structures for biochemical function
assignment. Bioinformatics. 2004 May
120(7)1087-96.
Huan J, Wang W, and Prins J. Efficient Mining of
Frequent Subgraph in the Presence of Isomorphism,
2003, Proc. 3rd IEEE International Conference on
Data Mining (ICDM), pp. 549-552.
Wang G, Dunbrack RL Jr. PISCES a protein
sequence culling server. Bioinformatics. 2003 Aug
1219(12)1589-91.
Koonin EV, Galperin MY. Sequence-Evolution-Functio
n Computational Approaches in Comparative
Genomics. 2002, Kluwer Academic Publishers
(published online on NCBI bookshelf, 2003).
Aloy P, Querol E, Aviles FX, Sternberg MJ.
Automated structure-based prediction of
functional sites in proteins applications to
assessing the validity of inheriting protein
function from homology in genome annotation and
to protein docking. J Mol Biol. 2001 Aug
10311(2)395-408.
Wilson CA, Kreychman J, Gerstein M. Assessing
annotation transfer for genomics quantifying the
relations between protein sequence, structure and
function through traditional and probabilistic
scores. J Mol Biol. 2000 Mar 17297(1)233-49.
Murzin AG, Brenner SE, Hubbard T, Chothia C.
SCOP a structural classification of proteins
database for the investigation of sequences and
structures. J Mol Biol. 1995 Apr 7247(4)536-40.
Holm L, Sander C. Protein structure comparison by
alignment of distance matrices. J Mol Biol. 1993
Sep 5233(1)123-38.

Write a Comment

User Comments (0)

About PowerShow.com

Deepak Bandyopadhyay - PowerPoint PPT Presentation

Deepak Bandyopadhyay

Deepak Bandyopadhyay – PowerPoint PPT presentation