Title: Structure and Motion
1Structure and Motion
- Jean-Claude LatombeComputer Science Department
Stanford University - NSF-ITR Meeting on
November 14, 2002
2Stanfords Participants
- PIs L. Guibas, J.C. Latombe, M. Levitt
- Research Associate P. Koehl
- Postdocs F. Schwarzer, A. Zomorodian
- Graduate students S. Apaydin (EE), S. Ieong
(CS), R. Kolodny (CS), I. Lotan (CS), A. Nguyen
(Sc. Comp.), D. Russel (CS), R. Singh (CS), C.
Varma (CS) - Undergraduate students J. Greenberg (CS),E.
Berger (CS) - Collaborating faculty
- A. Brunger (Molecular Cellular Physiology)
- D. Brutlag (Biochemistry)
- D. Donoho (Statistics)
- J. Milgram (Math)
- V. Pande (Chemistry)
3Problem Domains
- Biological functions derive from the structures
(shapes) achieved by molecules through motions - ? Determination, classification, and
prediction of 3D protein structures - ? Modeling of molecular energy and
simulation of folding and binding motion
4Whats New/Interesting for Computer Science?
- Massive amount of experimental data
- Importance of similarities
- Multiple representations of structure
- Continuous energy functions
- Many objects forming deformable chains
- Many degrees of freedom
- Ensemble properties of pathways
5 Importance of similarities
- ?Segmentation/matching/scoring techniques
E.g. Libraries of protein fragmentsKolodny,
Koehl, Guibas, Levitt, JMB (2002)
61tim Approximations
real protein
7Alignment of Structural Motifs Singh and Saha
Kolodny and Linial
- Problem
- Determine if two structures share common motifs
- 2 (labelled) structures in R3 Aa1,a2,,an,
Bb1,b2,,bm - Find subsequences sa and sb s.t the
substructures asa(1),asa(2),,
asa(l) bsb(1),bsb(2),, bsb(l) are similar - Twofold problem alignment and correspondence
- Score ?? Approximation ?? Complexity
8R. Singh and M. Saha. Identifying Structural
Motifs in Proteins.Pacific Symp. on
Biocomputing, Jan. 2003.
Iterative Closest Point (Besl-McKay) for
alignment
? Score RMSD distance
9R. Singh and M. Saha. Identifying Structural
Motifs in Proteins.Pacific Symp. on
Biocomputing, Jan. 2003.
Trypsin
Trypsinactivesite
10R. Singh and M. Saha. Identifying Structural
Motifs in Proteins.Pacific Symp. on
Biocomputing, Jan. 2003.
Trypsin active site against 42Trypsin like
proteins
11 Multiple representations of structure
ProShape softwareKoehl, Levitt
(Stanford),Edelsbrunner (Duke)
12Statistical potentials for proteins based on
alpha complex Guibas, Koehl, Zomorodian
- Decoys generated using physical potentials
- Select best decoys using distance information
13- Continuous energy function
- Many objects in deformable chains
- ?Many pairs of objects, but relatively few are
close enough to interact - ? Data structures that capture proximity, but
undergo small or rare changes
- During motion simulation
- - detect steric clashes (self-collisions)
- find pairs of atoms closer than cutoff
- find which energy terms can be reused
14- Other application domains
- Modular reconfigurable robots
- Reconstructive surgery
15- Fixed Bounding-Volume hierarchies dont work
- Instead, exploit what doesnt change chain
topology? Adaptive BV hierarchiesGuibas,
Nguyen, Russel, Zhang Lotan, Schwarzer,
Halperin, Latombe (SOCG02)
sec17
16- Wrapped bounding sphere hierarchiesGuibas,
Nguyen, Russel, Zhang (SoCG 2002)
- WBSH undergoes small number of changes
- Self-collision
- O(n logn ) in R2 O(n2-2/d) in Rd, d ? 3
17- ChainTreesLotan, Schwarzer, Halperin, Latombe
(SoCG02)
18- ChainTreesLotan, Schwarzer, Halperin, Latombe
(SoCG02)
Assumption Few degrees of freedom change at each
motion step (e.g., Monte Carlo simulation)
Updating
Finding interacting pairs
(in practice, sublinear)
19- ChainTreesApplication to MC simulation
(comparison to grid method)
20 Many degrees of freedom
- ?Tools to explore large dimensional
conformational (structure) spaces - - Structure sampling Kolodny, Levitt- Finding
nearest neighbors Lotan, Schwarzer -
21Sampling structures by combining
fragmentsKolodny, Levitt
Library of protein fragments
? Discrete set of candidate structures
22Nearest neighbors in high-dimensional
spaceLotan, Schwarzer
Find k nearest neighbors of a given protein
conformation in a set of n conformations (cRMS,
dRMS)
Idea Cut backbone into m equal subsequences
23Nearest neighbors in high-dimensional
spaceLotan and Schwarzer
100,000 decoys of 1CTF (Park-Levitt
set) Computation of 100 NN of each conformation
80 of computed NNs are true NNskd-tree
software from ANN library (U. Maryland)
24 Ensemble properties of pathways
- ? Stochastic nature of molecular motion
requires characterizing average properties of
many pathways Probabilistic conformational
roadmaps Applications to protein folding and
ligand-protein binding Apaydin, Brutlag,
Guestrin, Hsu, Latombe
25Example Probability of Folding pfold
We stress that we do not suggest using pfold as
a transition coordinate for practical purposes as
it is very computationally intensive. Du,
Pande, Grosberg, Tanaka, and Shakhnovich On the
Transition Coordinate for Protein Folding
Journal of Chemical Physics (1998).
Folded set
Unfolded set
26Probabilistic Roadmap Apaydin, Brutlag, Hsu,
Guestrin, Latombe (RECOMB02, ECCB02) Idea
Capture the stochastic nature of molecular motion
by a network of randomly selected conformations
and by assigning probabilities to edges
27Probabilistic Roadmap
- One linear equation per node
- Solution gives pfold for all nodes
- No explicit simulation run
- All pathways are taken into account
- Sparse linear system
l
k
j
Pik
Pil
Pij
m
Pim
i
Pii
Let fi pfold(i) After one step fi Pii fi
Pij fj Pik fk Pil fl Pim fm
28Probabilistic Roadmap
Correlation with MC Approach
- 1ROP (repressor of primer)
- 2 a helices
- 6 DOF
29Probabilistic Roadmap
Computation Times (1ROP)
Monte Carlo
Over 106 energy computations
Over 11 days of computer time
49 conformations
Roadmap
15,000 energy computations
1 - 1.5 hours of computer time
5000 conformations
4 orders of magnitude speedup!
30Summary
- Interpretation of electron density maps
- Statistical potential
- Library of protein fragments
- Self-collision and energy maintenance
- Structure alignment
- ProShape software
- Tools for high-dimensional spaces
- Probabilistic roadmaps
- Biology
- Structure determination
- Modeling
- Shape representation
- Hierarchies
- Algorithms
- Deformation
- Motion planning
- Shape organization
- Software
- Alpha shapes
31Future Work
- Perform more substantial experiments E.g., more
realistic potentials in ChainTree and
probabilistic roadmaps - Extend tools to solve more relevant problems
E.g., encode Molecular Dynamics into
probabilistic roadmaps - Combine resultsE.g., use library of fragments to
sample probabilistic roadmaps - Develop new algorithms/data structuresE.g.,
sparse spanners to capture proximity information
32Our Future The BioX Clark Center
June 2003