Title: Modeling molecular dynamics from simulations
1Modeling molecular dynamics from simulations
- Nina Singhal Hinrichs
- Departments of Computer Science and Statistics
- University of Chicago
- January 28, 2009
2Motivation
- Proteins are essential parts of living organisms
- enzymes, cell signaling, membrane transport . . .
- Composed of chain of amino acids
- Fold to unique 3-dimensional structure
- Misfolding can cause diseases
- Alzheimers, Mad cow, Huntingtons . . .
- How do proteins fold?
3Molecular dynamics
- Represent atoms of molecule and solvent
- Model forces on atoms
- Integrate laws of motion
- Small integration time step compared to motion
timescales
4Folding_at_Home Distributed computing for
biomolecular simulation
- Perform multiple simulations in parallel
- Total simulation times hundreds of microseconds
(hundreds of CPU-years)
- Very powerful computational resource
- 200 Teraflops sustained performance
- gt1,000,000 total CPUs 200,000 active
5Challenge How to analyze?
- Enormous datasets
- Describe dynamics in microscopic detail
- Questions we want to answer
- Rate of folding, mechanism of folding . . .
- How can we extract these properties from our data?
6Outline
- Markovian state model for molecular motion
- Model description, uses, examples
- New algorithms for building these models
- Defining states and transition probabilities
- New methods for dealing with finite sampling
- Model complexity, uncertainty analysis, targeted
sampling
7Chemical intuition
Chemical reactions often exhibit stochastic
behavior
n-butane
Chandler, Journal of Chemical Physics (1977)
8Markovian state model
Define states in the conformation space
Define transition probabilities, or edges,
between states
9Uses of the model
Chodera et al., Multiscale Modeling and
Simulation (2006)
- Populations of states over time
- Eigenvalues and eigenvectors
conformational changes - Kinetic properties virtually any kinetic
property - Mechanistic properties most likely path,
probability of transitions as graph algorithms
p
t
10Example models
Kasson et al., PNAS (2006)
alanine peptide
lipid vesicle fusion
Chodera et al., Multiscale Modeling and
Simulation (2006)
alpha helix
villin headpiece
Sorin and Pande, Biophysical Journal (2005)
Jayachandran et al., Journal of Structural
Biology (2006)
11Computational and statistical challenges
- Building Markovian state model
- Defining states that are Markovian
- Calculating the transition probabilities
- Refining Markovian state model
- Finding the best model
- Determining model uncertainty
- Designing new simulations
12Automatic state decomposition
- Building Markovian State Model
- Defining states that are Markovian
- Calculating the transition probabilities
- Challenge Find appropriate states
- Individual conformations as states does not scale
- Group conformations into discrete states
- Structural clustering is insufficient
- Basic algorithm combine structural and kinetic
similarity
J. D. Chodera, N. Singhal, V. S. Pande, K. A.
Dill, and W. C. Swope. Automatic discovery of
metastable states for the construction of Markov
models of macromolecular conformational dynamics.
Journal of Chemical Physics, 126, 155101 (2007).
(These authors contributed equally to this work)
13Comparison of structural and kinetic clustering
trpzip2
Cochran et al. PNAS 985578, 2001.
structural clustering
kinetic clustering
14State decomposition splitting
Cluster conformations by root mean square
distance (RMSD)
15State decomposition lumping
group states which inter-convert quickly
16State decomposition resplitting
Cluster conformations, restricted to each state
17Blocked alanine peptide
60
y
-60
Chodera et al., Multiscale Modeling and
Simulation (2006)
f
-60
60
18Automatic state decomposition of alanine peptide
Black state sits on top of multiple other states!
y
Benefit of automatic algorithm
f
19Stability of decomposition
20TrpZip peptide
21Transition probabilities
- Building Markovian State Model
- Defining states that are Markovian
- Calculating the transition probabilities
Discretize trajectories into series of states
1?2?2?3?4?3?5
N. Singhal, C. D. Snow, and V. S. Pande. Using
path sampling to build better Markovian state
models Predicting the folding rate and mechanism
of a trp zipper beta hairpin. Journal of Chemical
Physics, 121(1), 415-425 (2004).
22Model selection
- Refining Markovian State Model
- Finding the best model
- Determining model uncertainty
- Designing new simulations
- Challenge How many states should we have?
- More states are more Markovian
- More states have more parameters
- How do we evaluate this tradeoff?
N. S. Hinrichs and V. S. Pande. Bayesian metrics
for validating and improving Markovian state
models for molecular dynamics simulations. (In
preparation)
23Hidden Markov Model formulation
- Formulate the problem as a Hidden Markov Model
structure scoring question - Different discretizations of continuous space
- Benefits of Bayesian scores
- Naturally handles tradeoff between complexity of
model and amount of data - Avoids over-fitting of parameters
States
Observations
24Alanine peptide results
Score of Hidden Markov models for different lag
times Last model is worse at shorter times but
preferred at longer times No previous evaluation
methods could distinguish these models
25Uncertainty analysis
- Refining Markovian State Model
- Finding the best model
- Determining model uncertainty
- Designing new simulations
Goal Once we have the states, what is the
uncertainty in the model?
Uncertainty caused by finite sampling
1
1
5
5
3
3
2
2
4
4
Both are reasonable but give different transition
probabilities ? Different MFPT, Pfold,
eigenvalues, eigenvectors ...
N. Singhal and V. S. Pande. Error analysis and
efficient sampling in Markovian state models for
protien folding. Journal of Chemical Physics,
123, 204909-204921 (2005). N. S. Hinrichs and V.
S. Pande. Calculation of the distribution of
eigenvalues and eigenvectors in Markovian state
models for molecular dynamics. Journal of
Chemical Physics, 126, 244101 (2007).
26Transition probabilities
Recall that we calculate transition probabilities
by counting
- Instead of getting a single value, we can talk
about the distribution of transition
probabilities - Bayes Rule
pij
27Sampling approach
Possible solution to get distribution of
eigenvalues
Problem sampling can be expensive solving per
sample can be expensive
28Closed-form solution
Idea trade exact distribution for efficient
approximation
Eigenvalue equation
efficient to calculate using adjoint systems
Taylor series expansion
Multivariate normal approximation of Dpi
? Closed-form normal distribution for l
29Uncertainty results
5000 trajectories from each state
2
1
6
4
3
5
Alanine System
Transition Counts
Running times (6 states) Sampling-based 40
seconds Closed-form lt 0.01 seconds
30Sampling strategies
- Refining Markovian State Model
- Finding the best model
- Determining model uncertainty
- Designing new simulations
- Problem Simulations are expensive. Even with
Folding_at_Home, we run simulations for months - How to intelligently allocate our resources?
- Common approaches
- equilibrium sampling sample each conformation
from its equilibrium distribution - even sampling sample equally from each state
- New sequential approaches
N. Singhal and V. S. Pande. Error analysis and
efficient sampling in Markovian state models for
protien folding. Journal of Chemical Physics,
123, 204909-204921 (2005). N. S. Hinrichs and V.
S. Pande. Calculation of the distribution of
eigenvalues and eigenvectors in Markovian state
models for molecular dynamics. Journal of
Chemical Physics, 126, 244101 (2007).
31Adaptive sampling
Goal Reduce uncertainty of eigenvalue Uncertaint
y analysis decomposes by transitions from each
state
Variance depends on both uncertainty of and
sensitivity to transition probabilities
32Adaptive sampling alanine
On 6-state alanine system, select trajectories
randomly for 3 sampling strategies
Transition Counts
33Adaptive sampling villin
2454 states
- Benefits
- Very quickly reduce the variance
- Reduce the total number of simulations
- Need less computational power
- Can study more complex systems
Villin Headpiece
Jayachandran, et al., Journal of Chemical
Physics (2006)
34Summary
- Markovian state models are convenient methods to
describe molecular motion - Automatic state decomposition
- Scalable to large size systems
- Model selection
- Evaluate tradeoff between model complexity and
amount of data - Uncertainty analysis
- Efficient and decomposable
- Adaptive sampling
- Reduce number of simulations
35Acknowledgements
- Vijay Pande Stanford University adviser
- Bill Swope, Jed Pitera IBM collaborators
- John Chodera state decomposition work