Title: HHybrid Monte Carlo method for conformational sampling of proteins
1HHybrid Monte Carlo method for conformational
sampling of proteins
Jesús A. Izaguirre with Scott Hampton Department
of Computer Science and EngineeringUniversity of
Notre Dame April 9, 2003 This work is
partially supported by two NSF grants (CAREER and
BIOCOMPLEXITY) and two grants from University of
Notre Dame
2Overview
2. Methods for sampling (MD, HMC)
1. Motivation sampling conformational space of
proteins
3. Evaluation of new Shadow HMC
4. Future applications
3 Protein The Machinery of Life
NH2-Val-His-Leu-Thr-Pro-Glu-Glu- Lys-Ser-Ala-Val-T
hr-Ala-Leu-Trp- Gly-Lys-Val-Asn-Val-Asp-Glu-Val- G
ly-Gly-Glu-..
4 Protein Structure
5Why protein folding?
- Huge gap sequence data and 3D structure data
- EMBL/GENBANK, DNA (nucleotide) sequences 15
million sequence, 15,000 million base pairs - SWISSPROT, protein sequences120,000 entries
- PDB, 3D protein structures20,000 entries
- Bridging the gap through prediction
- Aim of structural genomics
- Structurally characterize most of the protein
sequences by an efficient combination of
experiment and prediction, Baker and Sali (2001) - Thermodynamics hypothesis Native state is at
the global free energy minimum Anfinsen
(1973)
6Questions related to folding I
- Long time kinetics dynamics of folding
- only statistical correctness possible
- ensemble dynamics
- e.g., folding_at_home
- Short time kinetics
- strong correctness possible
- e.g., transport properties, diffusion coefficients
7Questions related to folding II
- Sampling
- Compute equilibrium averages by visiting all
(most) of important conformations - Examples
- Equilibrium distribution of solvent molecules in
vacancies - Free energies
- Characteristic conformations (misfolded and
folded states)
8Overview
2. Methods for sampling (MD, HMC)
1. Motivation sampling conformational space of
proteins
3. Evaluation of new Shadow HMC
4. Future applications
9Classical molecular dynamics
- Newtons equations of motion
- Atoms
- Molecules
- CHARMM force field(Chemistry at Harvard
Molecular Mechanics)
Bonds, angles and torsions
10What is a Forcefield?
In molecular dynamics a molecule is described as
a series of charged points (atoms) linked by
springs (bonds).
The forcefield is a collection of equations and
associated constants designed to reproduce
molecular geometry and selected properties of
tested structures.
11Energy Terms Described in the CHARMm forcefield
Bond
Angle
Dihedral
Improper
12Energy Functions
Ubond oscillations about the equilibrium bond
length Uangle oscillations of 3 atoms about an
equilibrium angle Udihedral torsional rotation
of 4 atoms about a central bond Unonbond
non-bonded energy terms (electrostatics and
Lennard-Jones)
13Molecular Dynamics what does it mean?
MD change in conformation over time using a
forcefield
Energy
Energy supplied to the minimized system at the
start of the simulation
Conformation impossible to access through MD
Conformational change
14MD, MC, and HMC in sampling
- Molecular Dynamics takes long steps in phase
space, but it may get trapped - Monte Carlo makes a random walk (short steps), it
may escape minima due to randomness - Can we combine these two methods?
MC
MD
HMC
15Hybrid Monte Carlo
- We can sample from a distribution with density
p(x) by simulating a Markov chain with the
following transitions - From the current state, x, a candidate state x
is drawn from a proposal distribution S(x,x).
The proposed state is accepted with
prob.min1,(p(x) S(x,x)) / (p(x) S(x,x)) - If the proposal distribution is symmetric,
S(x,x)) S(x,x)), then the acceptance prob.
only depends on p(x) / p(x)
16Hybrid Monte Carlo II
- Proposal functions must be reversible
- if x s(x), then x s(x)
- Proposal functions must preserve volume
- Jacobian must have absolute value one
- Valid proposal x -x
- Invalid proposals
- x 1 / x (Jacobian not 1)
- x x 5 (not reversible)
17Hybrid Monte Carlo III
- Hamiltonian dynamics preserve volume in phase
space - Hamiltonian dynamics conserve the Hamiltonian
H(q,p) - Reversible symplectic integrators for Hamiltonian
systems preserve volume in phase space - Conservation of the Hamiltonian depends on the
accuracy of the integrator - Hybrid Monte Carlo Use reversible symplectic
integrator for MD to generate the next proposal
in MC
18HMC Algorithm
- Perform the following steps
- 1. Draw random values for the momenta p from
normal distribution use given positions q - 2. Perform cyclelength steps of MD, using a
symplectic reversible integrator with timestep
?t, generating (q,p) - 3. Compute change in total energy
- ?H H(q,p) - H(q,p)
- 4. Accept new state based on exp(-? ?H )
19Hybrid Monte Carlo IV
- Advantages of HMC
- HMC can propose and accept distant points in
phase space, provided the accuracy of the MD
integrator is high enough - HMC can move in a biased way, rather than in a
random walk (distance k vs sqrt(k)) - HMC can quickly change the probability density
20Hybrid Monte Carlo V
- As the number of atoms increases, the total error
in the H(q,p) increases. The error is related to
the time step used in MD - Analysis of N replicas of multivariate Gaussian
distributions shows that HMC takes O(N5/4 ) with
time step ?t O(N-1/4) Kennedy Pendleton, 91
System size N Max ?t
66 0.5
423 0.25
868 0.1
5143 0.05
21Hybrid Monte Carlo VI
- The key problem in scaling is the accuracy of the
MD integrator - More accurate methods could help scaling
- Creutz and Gocksch 89 proposed higher order
symplectic methods for HMC - In MD, however, these methods are more expensive
than the scaling gain. They need more force
evaluations per step
22Overview
2. Methods for sampling (MD, HMC)
1. Motivation sampling conformational space of
proteins
3. Evaluation of new Shadow HMC
4. Future applications
23Evaluating MC methods I
- Is method sampling from desired distribution?
- Does it preserve detailed balance?
- Use simple model systems that can be solved
analytically. Compare to analytical results or
well known solution methods. Examples,
Lennard-Jones liquid, butane - Is it ergodic?
- Impossible to prove for realistic problems.
Instead, show self-averaging of properties
24Evaluating MC methods II
- Is system equilibrated?
- Average values of set of properties fluctuate
around mean value - Convergence to steady state from
- Different initial conditions
- Different pseudo random number generators
- Are statistical errors small?
- Run should be about 10 times longer than slowest
relaxation in system - Estimate statistical errors by independent block
averaging - Compute properties
- Vary system sizes
- What are the sampling rates?
25Improved HMC
- Symplectic integrators conserve exactly (within
roundoff error) a modified Hamiltonian that for
short MD simulations (such as in HMC) stays close
to the true Hamiltonian Sanz-Serna Calvo 94 - Our idea is to use highly accurate approximations
to the modified Hamiltonian in order to improve
the scaling of HMC
26Shadow Hamiltonian
- Work by Skeel and Hardy, 2001, shows how to
compute an arbitrarily accurate approximation to
the modified Hamiltonian, called the Shadow
Hamiltonian - Hamiltonian H1/2pTM-1p U(q)
- Modified Hamiltonian HM H O(?t p)
- Shadow Hamiltonian SH2p HM O(?t 2p)
- Arbitrary accuracy
- Easy to compute
- Stable energy graph
- Example, SH4 H f( qn-1, qn-2, pn-1, pn-2
,ßn-1 ,ßn-2)
27- See comparison of SHADOW and ENERGY
28Shadow HMC
- Replace total energy H with shadow energy
- ?SH2m SH2m (q,p) SH2m (q,p)
- Nearly linear scalability of sampling rate
- Computational cost SHMC, N(11/2m), where m is
accuracy order of integrator - Extra storage (m copies of q and p)
- Moderate overhead (25 for small proteins)
29Example Shadow Hamiltonian
30ProtoMol a framework for MD
Matthey, et al, ACM Tran. Math. Software (TOMS),
submitted
Modular design of ProtoMol (Prototyping Molecular
dynamics). Available at http//www.cse.nd.edu/lcl
s/protomol
31SHMC implementation
- Shadow Hamiltonian requires propagation of ß
- Can work for any integrator
32Systems tested
33Sampling Metric 1
- Generate a plot of dihedral angle vs. energy for
each angle - Find local maxima
- Label bins between maxima
- For each dihedral angle, print the label of the
energy bin that it is currently in
34Sampling Metric 2
- Round each dihedral angle to the nearest degree
- Print label according to degree
35Acceptance Rates
36More Acceptance Rates
37Sampling rate for decalanine (dt 2 fs)
38Sampling rate for 2mlt
39Sampling rate comparison
- Cost per conformation is total simulation time
divided by number of new conformations discovered
(2mlt, dt 0.5 fs) - HMC 122 s/conformation
- SHMC 16 s/conformation
- HMC discovered 270 conformations in 33000 seconds
- SHMC discovered 2340 conformations in 38000
seconds
40Conclusions
- SHMC has a much higher acceptance rate,
particularly as system size and timestep increase - SHMC discovers new conformations more quickly
- SHMC requires extra storage and moderate
overhead. - SHMC works best at relatively large timesteps
41Future work
- Multiscale problems for rugged energy surface
- Multiple time stepping algorithms plus
constraining - Temperature tempering and multicanonical ensemble
- Potential smoothing
- System size
- Parallel Multigrid O(N) electrostatics
- Applications
- Free energy estimation for drug design
- Folding and metastable conformations
- Average estimation
42Acknowledgments
- Dr. Thierry Matthey, co-developer of ProtoMol,
University of Bergen, Norway - Graduate students Qun Ma, Alice Ko, Yao Wang,
Trevor Cickovski - Students in CSE 598K, Computational Biology,
Spring 2002 - Dr. Robert Skeel, Dr. Ruhong Zhou, and Dr.
Christoph Schutte for valuable discussions - Dr. Radford Neals presentation Markov Chain
Sampling Using Hamiltonian Dynamics
(http//www.cs.utoronto.ca ) - Dr. Klaus Schultens presentation An
introduction to molecular dynamics simulations
(http//www.ks.uiuc.edu ) - Dr. Edward Maginns Monte Carlo Primer
-