Title: Structure%20and%20Motion
1Structure and Motion
- Jean-Claude LatombeComputer Science Department
Stanford University - NSF-ITR Meeting on
November 14, 2002
2Stanfords Participants
- PIs L. Guibas, J.C. Latombe, M. Levitt
- Research Associate P. Koehl
- Postdocs F. Schwarzer, A. Zomorodian
- Graduate students S. Apaydin (EE), S. Ieong
(CS), R. Kolodny (CS), I. Lotan (CS), A. Nguyen
(Sc. Comp.), D. Russel (CS), R. Singh (CS), C.
Varma (CS) - Undergraduate students J. Greenberg (CS),E.
Berger (CS) - Collaborating faculty
- A. Brunger (Molecular Cellular Physiology)
- D. Brutlag (Biochemistry)
- D. Donoho (Statistics)
- J. Milgram (Math)
- V. Pande (Chemistry)
3Problems Addressed
- Biological functions derive from the structures
(shapes) achieved by molecules through motions - ? Determination, classification, and
prediction of 3D protein structures - ? Modeling of molecular energy and
simulation of folding and binding motion
4Whats New for Computer Science?
- Massive amount of experimental data
- Importance of similarities
- Multiple representations of structure
- Continuous energy functions
- Many objects forming deformable chains
- Many degrees of freedom
- Ensemble properties of pathways
5 Massive amount of experimental data
- ? Abstract/simplify data sets into compact data
structures
E.g. Electron density map ? Medial axis
6 Importance of similarities
- ?Segmentation/matching/scoring techniques
E.g. Libraries of protein fragmentsKolodny,
Koehl, Guibas, Levitt, JMB (2002)
71tim Approximations
real protein
8Alignment of Structural Motifs Singh and Saha
Kolodny and Linial
- Problem
- Determine if two structures share common motifs
- 2 (labelled) structures in R3 Aa1,a2,,an,
Bb1,b2,,bm - Find subsequences sa and sb s.t the
substructures asa(1),asa(2),,
asa(l) bsb(1),bsb(2),, bsb(l) are similar - Twofold problem alignment and correspondence
- Score ?? Approximation ?? Complexity
9R. Singh and M. Saha. Identifying Structural
Motifs in Proteins.Pacific Symp. on
Biocomputing, Jan. 2003.
Iterative Closest Point (Besl-McKay) for
alignment
? Score RMSD distance
10R. Singh and M. Saha. Identifying Structural
Motifs in Proteins.Pacific Symp. on
Biocomputing, Jan. 2003.
Trypsin
Trypsinactivesite
11R. Singh and M. Saha. Identifying Structural
Motifs in Proteins.Pacific Symp. on
Biocomputing, Jan. 2003.
Trypsin active site against 42Trypsin like
proteins
12 Multiple representations of structure
ProShape softwareKoehl, Levitt
(Stanford),Edelsbrunner (Duke)
13Statistical potentials for proteins based on
alpha complex Guibas, Koehl, Zomorodian
- Decoys generated using physical potentials
-
- Select best decoys using distance information
14- Continuous energy functions
- Many objects in deformable chains
- ?Many pairs of objects, but relatively few are
close enough to interact - ? Data structures that capture proximity, but
undergo small or rare changes
During motion simulation - detect steric clashes
(self-collisions) - find pairs of atoms closer
than cutoff
15- Other application domains
- Modular reconfigurable robots
- Reconstructive surgery
16- Fixed Bounding-Volume hierarchies dont work
sec17
17- Instead, exploit what doesnt change chain
topology? Adaptive BV hierarchiesGuibas,
Nguyen, Russel, Zhang Lotan, Schwarzer,
Halperin, Latombe (SOCG02)
sec17
18- Wrapped bounding sphere hierarchiesGuibas,
Nguyen, Russel, Zhang (SoCG 2002)
- WBSH undergoes small number of changes
- Self-collision
- O(n logn ) in R2 O(n2-2/d) in Rd, d ? 3
19- ChainTreesLotan, Schwarzer, Halperin, Latombe
(SoCG02)
Assumption Few degrees of freedom change at each
motion step (e.g., Monte Carlo simulation)
- Find all pairs of atoms closer than a given
cutoff - Find which energy terms can be reused
20- ChainTreesLotan, Schwarzer, Halperin, Latombe
(SoCG02)
Updating
Finding interacting pairs
(in practice, sublinear)
21- ChainTreesApplication to MC simulation
(comparison to grid method)
m1
m 5
22- Run new series of experiments with more complex
energy field EEF1 Lazaridis Karplus
(with Pande) - Use library of fragments (with Koehl)
Open problem How to find good moves to make when
the conformation is compact and random moves are
rejected with high probability?
23- Future Work Spanner for deformable
chainAgarwal, Gao, Duke Nguyen, Zhang,
Stanford
3HVT
Capture proximity information with a sparse
spanner
24 Many degrees of freedom
- ?Tools to explore large dimensional conformation
space - - Sampling strategies - Nearest neighbors
-
25Sampling structures by combining
fragmentsKolodny, Levitt
Library of protein fragments
? Discrete set of candidate structures
26Nearest neighbors in high-dimensional
spaceLotan and Schwarzer
Find k nearest neighbors of a given protein
conformation in a set of n conformations (cRMS,
dRMS)
Idea Cut backbone into m equal subsequences
27Nearest neighbors in high-dimensional
spaceLotan and Schwarzer
100,000 decoys of 1CTF (Park-Levitt
set) Computation of 100 NN of each conformation
Full rep., dRMS (brute force) 84h
Ave. rep., dRMS (brute force) 4.8h
SVD red. rep., dRMS (brute force) 41min
SVD red. rep., dRMS (kd-tree) 19min
80 of computed NNs are true NNskd-tree
software from ANN library (U. Maryland)
28 Ensemble properties of pathways
- ? Stochastic nature of molecular motion requires
characterizing average properties of many
pathways
29Example 1 Probability of Folding pfold
We stress that we do not suggest using pfold as
a transition coordinate for practical purposes as
it is very computationally intensive. Du,
Pande, Grosberg, Tanaka, and Shakhnovich On the
Transition Coordinate for Protein Folding
Journal of Chemical Physics (1998).
Folded set
Unfolded set
30Example 2 Ligand-Protein InteractionSept,
Elcock and McCammon 99
10K to 30K independent simulations
31Probabilistic Roadmap Apaydin, Brutlag, Hsu,
Guestrin, Latombe (RECOMB02, ECCB02) Idea
Capture the stochastic nature of molecular motion
by a network of randomly selected conformations
and by assigning probabilities to edges
32Probabilistic Roadmap Apaydin, Brutlag, Hsu,
Guestrin, Latombe (RECOMB02, ECCB02)
- One linear equation per node
- Solution gives pfold for all nodes
- No explicit simulation run
- All pathways are taken into account
- Sparse linear system
l
k
j
Pik
Pil
Pij
m
Pim
i
Pii
Let fi pfold(i) After one step fi Pii fi
Pij fj Pik fk Pil fl Pim fm
33Probabilistic Roadmap
Correlation with MC Approach
- 1ROP (repressor of primer)
- 2 a helices
- 6 DOF
34Probabilistic Roadmap
Computation Times (1ROP)
Monte Carlo
Over 106 energy computations
Over 11 days of computer time
49 conformations
Roadmap
15,000 energy computations
1 - 1.5 hours of computer time
5000 conformations
4 orders of magnitude speedup!
35Future work Probabilistic Roadmap
- Non-uniform sampling strategies
- Encoding molecular dynamics into probabilistic
roadmaps (with V. Pande) - Quantitative experiments with ligand-protein
binding (with V. Pande)
36Bio-X Clark Center
37The following slides relate to non-research
issues. I do not plan to present them. Jack and
Leo may want to use the contents of some of them
for their own presentations.
38Education
- Tutorial on Delaunay, Alpha-Shape and Pockets
(Koehl) - A biocomputing Notebook (Koehl)
- Biocomputation lectures in pre-existing classes
- CS326 motion planning molecular motion,
probabilistic roadmaps, self-collision detection
(Latombe) - CS468 intro to computational topology finding
pockets and tunnels in molecules, compute surface
areas and volumes and their derivative
(Zomorodian) - New class on Algorithmic Biology (Batzoglu,
Guibas, Latombe) - Graduate Curriculum Committee, Bio-Engineering
Dept., Stanford (Latombe)
39Trained Students (1/2)
- PhD students
- Serkan Apaydin, EE
- An Nguyen, Scientific Computing
- Carlos Guestrin, CS (Daphne Kollers group)
- Itay Lotan, CS
- Rachel Kolodny, CS
- Daniel Russel, CS
- Samuel Ieong, CS
Most graduate students have a principal advisor
in CS and a secondaryone in a bio-related
department (Levitt, Brutlag, Pande)
40Trained Students (2/2)
- Graduated Master students
- Rohit Singh, finding motifs in proteins, best
Stanford CS masters thesis, June 02 current
position bioinformatics company in San Diego - Chris Varma, study of ligand-protein interaction
with probabilistic roadmaps, June 02 current
position PhD student, Harvard/MIT Biomedical
program - Current Master student
- Ben Wong, modeling T cell activity
- Undergraduate
- Eric Berger, CS, Stanford, summer internship
- Julie Greeberg, CS, Harvard, summer internship
41Visitors
- Prof. Alberto MunozMath Dept., University of
Yucatan, Mexico3 months, Summer02Haptic
interaction and probabilistic roadmaps - Prof. Ileana StreinuSmith College6 months, from
Sept.02Protein folding
42Interactions Within Stanford
- - Guibas and Levitt, with J. Milgram (Math)
topology of configuration spaces of chains-
Guibas, with V. Pande (Chemistry) and D. Donoho
(Statistics) non-linear multi-resolution analysis
of molecular motions- Latombe and Apaydin, with
D. Brutlag (Biochemistry) and V. Pande
probabilistic roadmaps- Latombe and Lotan with
V. Pande efficient MC simulation
43Interactions Outside Stanford
- - Collision Detection for Deforming Necklaces,
P. Agarwal, L. Guibas, A. Nguyen, D. Russel, and
L. Zhang. Invited to special issue of Comp.
Geom., Theory and Applications, following
presentation at SoCG'02.- Kinetic Medians and
kd-Trees, P. Agarwal, J. Gao, and L. Guibas.
Proc. 10th European Symp. Algorithms, LNCS 2461,
Springer-Verlag, 5-16, 2002.- Stochastic Roadmap
Simulation An Efficient Representation and
Algorithm for Analyzing Molecular Motion, M.S.
Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, and
J.C. Latombe. Proc. RECOMB'02, Washington D.C.,
pp. 12-21, 2002. - Efficient Maintenance and
Self-Collision testing for Kinematic Chains, I.
Lotan, F. Schwarzer, D. Halperin, and J.C.
Latombe, SoCG02, pp. 43-42. June 2002.-
Stochastic Conformational Roadmaps for Computing
Ensemble Properties of Molecular Motion, M.S.
Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, and
J.C. Latombe. Workshop on Algorithmic Foundations
of Robotics (WAFR), Nice, Dec. 2002.
44Attendance to Conferences
- - BCATS 01 and 02 Bio-Computation At
Stanford- RECOMB 02 Int. Conf. on Research in
Computational Biology- ISMB 02 Int. Conf. on
Intelligent Syst. for Molecular Biology- ECCB
2002 European Conf. on Computational Biology-
Biophysical Society Symp. on Molecular
Simulations in Structural Biology, 2002- SoCG
2002 ACM Symp. on Computational Heometry
45Outreach
- - Latombe and Levitt serve as members of the
Scientific Leadership Council of Stanfords
Bio-X program- Presentations Stanfords Bio-X
Symposium (3/02), Stanfords Computer Forum
(3/02), Berkeleys Broad Area Seminar (4/02)-
Conference committees Guibas, program
committee, WAFR02 and SoCG03 Latombe,
program committee, 1st IEEE Bioinformatics Conf.
03 Apaydin, organization committee of BCATS02
46The following slides are extra slides that I
removed from my presentation for lack of time
47General Goals
- Larger proteins considered ? computational
efficiency - Diversity of molecules and interactions ?
computational abstractions - Extension of in-silico experiments ?
computational correctness - ?Enable biological studies that were not
possible before, more systematically
48Approach
- Select hard problems
- Close interaction between computer scientists
(Guibas, Koehl, Latombe) and biologists (Koehl,
Levitt, Brutlag, Pande, Brunger) - Most graduate students are CS students with
secondary advisor in biology - Perform extensive tests
49- Electron density map ? Medial axisGuibas,
Brunger, Russel - Medial axis of iso-surfaces to estimate backbone
- Cleaning and simplification of axis to filter
noise out - Persistence of features across multiple
iso-surfaces
sec17
50 Continuous energy function
- ?Essential for protein structure prediction and
molecular motion simulation - - Statistical potentials based on alpha
complex - - Maintenance of energy values during
simulation
51- Instead, exploit what doesnt change chain
topology? Adaptive BV hierarchies - Balanced binary trees of constant topology
- Efficient repair of position/size of BVsGuibas,
Nguyen, Russel, Zhang Lotan, Schwarzer,
Halperin, Latombe (SOCG02)
sec17
52- Future WorkSpanner for deformable
chainAgarwal, Gao, Duke Nguyen, Zhang,
Stanford
53Probabilistic Roadmap
- 1ROP (repressor of primer)
- 2 a helices
- 6 DOF
- 1HDD (Engrailed homeodomain)
- 3 a helices
- 12 DOF
H-P energy model with steric clash exclusion Sun
et al., 95