Title: Computational Biology Introduction to Biomolecular Modeling
1Computational BiologyIntroduction to
Biomolecular Modeling
Instructor Prof. Jesús A. Izaguirre Textbook
Tamar Schlick, Molecular Modeling and Simulation
An Interdisciplinary Guide, Springer-Verlag,
Berlin-New York, 2002 Reference C. Brooks, M.
Karplus, B. Pettitt, Proteins A Theoretical
Perspective of Dynamics, Structure, and
Thermodynamics, Wiley, 1988
2Outline
- What is biomolecular modeling?
- Historical perspective
- Theory and experiments
- Protein characterization
- Computational successes
- Remaining challenges
3What is biomolecular modeling?
- Application of computational models to understand
the structure, dynamics, and thermodynamics of
biological molecules - The models must be tailored to the question at
hand Schrodinger equation is not the answer to
everything! Reductionist view bound to fail! - This implies that biomolecular modeling must be
both multidisciplinary and multiscale
4Historical Perspective
- 1946 MD calculation
- 1960 force fields
- 1969 Levinthals paradox on protein folding
- 1970 MD of biological molecules
- 1971 protein data bank
- 1998 ion channel protein crystal structure
- 1999 IBM announces blue gene project
5Theoretical Foundations
- Born-Oppenheimer approximation (fixed nuclei)
- Force field parameters for families of chemical
compounds - System modeled using Newtons equations of motion
- Examples hard spheres simulations (alder and
Wainwright, 1959) Liquid water (Rahman and
Stillinger, 1970) BPTI (McCammon and Karplus)
Villin headpiece (Duan and Kollman, 1998)
6Experimental Foundations I
- X-ray crystallography
- Analysis of the X-ray diffraction pattern
produced when a beam of X-rays is directed onto a
well-ordered crystal. The phase has to be
reconstructed. - Phase problem solved by direct method for small
molecules - For larger molecules, sophisticated Multiple
Isomorphous Replacement (MIR) technique used - Current resolution below 2 \AA
- Protein crystallography
- Difficult to grow well-ordered crystals
- Early success in predicting alpha helices and
beta sheets (Pauling, 1950s)
7Experimental Foundations II
- NMR Spectroscopy
- Nuclear Magnetic Resonance provides structural
and dynamic information about molecules. It is
not as detailed as X-ray, limited to masses of 35
kDa - Distances between neighboring hydrogens are used
to reconstruct the 3D structure using global
optimization
8Proteins I
- Polypeptide chains made up of amino acids or
residues linked by peptide bonds - 20 aminoacids
- 50-500 residues, 1000-10000 atoms
- Native structure believed to correspond to energy
minimum, since proteins unfold when temperature
is increased
9Protein Function in Cell
- Enzymes
- Catalyze biological reactions
- Structural role
- Cell wall
- Cell membrane
- Cytoplasm
-
10Protein The Machinery of Life
NH2-Val-His-Leu-Thr-Pro-Glu-Glu- Lys-Ser-Ala-Val-T
hr-Ala-Leu-Trp- Gly-Lys-Val-Asn-Val-Asp-Glu-Val- G
ly-Gly-Glu-..
11Proteins II
- Secondary structure alpha helices, beta sheets,
turns - Tertiary structure proteins are tightly packed,
with hydrophobic groups in the core and charged
sidechains in the surface - Quaternary structure protein domains may
assemble into so called quaternary structures
12Protein Structure
13Protein Structure
14Model Molecule Hemoglobin
15Hemoglobin Background
- Protein in red blood cells
16Red Blood Cell (Erythrocyte)
17Hemoglobin Background
- Protein in red blood cells
- Composed of four subunits, each containing a heme
group a ring-like structure with a central iron
atom that binds oxygen
18Heme Groups in Hemoglobin
19Hemoglobin Background
- Protein in red blood cells
- Composed of four subunits, each containing a heme
group a ring-like structure with a central iron
atom that binds oxygen - Picks up oxygen in lungs, releases it in
peripheral tissues (e.g. muscles)
20Hemoglobin Quaternary Structure
Two alpha subunits and two beta subunits (141 AA
per alpha, 146 AA per beta)
21Hemoglobin Tertiary Structure
One beta subunit (8 alpha helices)
22Hemoglobin Secondary Structure
alpha helix
23Proteins III
- Protein motions of importance are torsional
oscillations about the bonds that link groups
together - Substantial displacements of groups occur over
long time intervals - Collective motions either local (cage structure)
or rigid-body (displacement of different regions) - What is the importance of these fluctuations for
biological function?
24Proteins IV
- Effect of fluctuations
- Thermodynamics equilibrium behavior important
examples, energy of ligand binding - Dynamics displacements from average structure
important example, local sidechain motions that
act as conformational gates in oxygen transport
myoglobin, enzymes, ion channels
25Proteins V Local Motions
- 0.01-5 AA, 1 fs -0.1s
- Atomic fluctuations
- Small displacements for substrate binding in
enzymes - Energy source for barrier crossing and other
activated processes (e.g., ring flips) - Sidechain motions
- Opening pathways for ligand (myoglobin)
- Closing active site
- Loop motions
- Disorder-to-order transition as part of virus
formation
26Proteins VI Rigid-Body Motions
- 1-10 AA, 1 ns 1 s
- Helix motions
- Transitions between substrates (myoglobin)
- Hinge-bending motions
- Gating of active-site region (liver alcohol
dehydroginase) - Increasing binding range of antigens (antibodies)
27Proteins VII Large Scale Motion
- gt 5 AA, 1 microsecond 10000 s
- Helix-coil transition
- Activation of hormones
- Protein folding transition
- Dissociation
- Formation of viruses
- Folding and unfolding transition
- Synthesis and degradation of proteins
- Role of motions sometimes only inferred from two
or more conformations in structural studies
28Study of Dynamics I
- The computational study of atomic fluctuations in
BPTI and other proteins has shown that - Directional character of active-site fluctuations
in enzymes contributes to catalysis - Small amplitude fluctuations are lubricant
- It may be possible to extrapolate from short time
fluctuations to larger-scale protein motions
29Study of Dynamics II
- Collective motions particularly important for
biological function, e.g., displacements for
transition from inactive to active - Extended nature of these motions makes them
sensitive to environment great difference
between vacuum and solution simulations - Collective motions transmit external solvent
effects to protein interior
30Study of Dynamics III
- For the related storage protein, myoglobin
- Fluctuations in the globin are essential to
binding the protein matrix in X-ray is so
tightly packed that there is no low energy path
for the ligand to enter or leave the heme pocket - Only through structural fluctuations can the
barriers be lowered sufficiently - Demonstrated through energy minimization and
molecular dynamics
31Study of Dynamics IV
- For the transport protein hemoglobin there are
several important motions - Oxygen binding produces tertiary structural
change - A quaternary structural change from deoxy (low
oxygen affinity) to oxy configuration takes
place. This transmits information over a long
distance - From the X-ray deoxy and oxy structures, a
stochastic reaction path has been found. Detailed
ligand binding has been performed using MD. A
statistical mechanical model has provided
coupling between these two processes
32Study of Dynamics VI
- Three open problems are the following
- Ion channel gating highly correlated
fluctuations are likely to be of great
importance. Long time dynamics problem - Flexible docking for MMP, enzymes, etc.,
fluctuations enter into thermodynamics and
kinetic of reactions. Sampling problem - Protein folding too complicated for full
treatment but for smallest proteins, beyond
current methodology. Coarsening problem
33Possible topics for final projects
- Applications
- Virtual screening
- Extend recommender for MD protocols
- Algorithms
- Multiscale integrators or sampling methods
- Cellular automata solvers for diffusion,
reaction, advection, etc. - Software
- 3D Visualization
- Extend simulation engines
34 How to create hierarchical, multiscale,
multilevel algorithms?
- Examples
- Algorithms for N-body problem (linear complexity,
multiple grids) e.g., Matthey and Izaguirre
(2004) J. Par. Dist. Comp. - Multiscale integration (15 order of magnitude gap
on timescales) e.g. Ma and Izaguirre (2003),
Multisc. Model. Simul. - Coarse approximations (use averaging or
stochastic or ensemble) solutions, e.g. Izaguirre
and Hampton (2004), J. Comp. Phys.
35Lengthening scales DPD
- Dissipative Particle Dynamics combines coarsening
of atoms into fluid packages with dissipative
pair interactions, and a stochastic pair
interaction - Total momentum conserved
- Self-organization of lipid bilayer,
self-assembled aggregates formed by amphiphilic
lipid molecules in water.
36Lengthening Scales SRP
- Enzyme simulation of a ms using stochastic
reaction path disadvantage need initial and
final configuration - Finds a trajectory where global energy is
minimized
37How to predict protein interaction networks?
- Goal
- Predict proteins in a genome that are likely to
interact, thus giving clue as to their function. - Our current solution starts from experimental
interaction data and uses clustering and a set
cover approach to predict novel interactions. - This is documented in Huang et al. (2004),
IEEE/ACM TCBB, submitted
38How to create high-performance software that is
easy to use?
ProtoMol, CompuCell3D, Biologo
- Goals
- Encapsulate optimizations like parallelism and
cluster/grid computing so that these can be used
easily. MATLAB and Mathematica are examples of
easy to use scientific software - Allow easy prototyping of algorithms, extensions
of the software by computational scientists (not
expert computer scientists) - Our current solutions use
- Generic and object-oriented programming
- Design patterns
- XML-based domain specific languages
- Related publications
- Matthey et al. (2004) ACM Trans. Math. Software,
20(3) - Cickovski et al. (2004) IEEE/ACM Trans. Comput.
Biol. and Bioinformatics - Cickovski and Izaguirre (2004) ACM Trans. Prog.
Lang. and Systems, in preparation
ProtoMol is open source and available at
http//protomol.sourceforge.net
39How to help user select software, algorithms, and
parameters to solve their problems?
Simulation Requirements
Optimal parameters via XML
ProtoMol/MDSimAid Server
Our solution uses performance models and machine
learning to generate rules, run-time optimization
to fine tune suggestions. We want to use agents
and machine learning to update the rules. This is
documented in Ko (2002) and Crocker et al.
(2004), J. Comp. Chem.
Goal Recommend optimal software and
architectural parameters to solve particular
problems Make this easily available as web portals