Title: Structural bioinformatics for glycobiology
1Structural bioinformatics for glycobiology
2Structural glycoinformatics approaches
- Structural modeling
- Comparative modeling of glycoproteins
- Complex modeling glycoprotein replacement
- Modeling of the complex of glycans and GBPs and
GTs - docking
- Analysis of interaction specificities
- Key residues vs. Specific glycan conformations
- Molecular Dynamics
- Modeling the dynamics of the recognition of
glycans by GBPs - Modeling the enzymology of GTs quantum mechanic
calculations
3Approaches to predicting protein structures
obtain sequence (target)
Sequence-sequence alignment or Sequence-structure
alignment
fold assignment
low identity fragment alignment
high identity long alignment
comparative modeling
ab initio modeling
build, assess model
4Comparative modeling of proteins
- Definition
- Prediction of three dimensional structure of a
target protein from the amino acid sequence
(primary structure) of a homologous (template)
protein for which an X-ray or NMR structure is
available. - Why a Model
- A Model is desirable when either X-ray
crystallography or NMR spectroscopy cannot
determine the structure of a protein in time or
at all. The built model provides a wealth of
information of how the protein functions with
information at residue property level, e.g. the
interaction with the ligands, GBPs/GTs with
glycans.
5Comparative Modeling (or homology modeling)
6Homology models can be very smart!
Homology models have RMSDs less than 2Å more than
70 of the time.
7Sequence similarity implies structural similarity?
.
8Step 1 Fold Identification
Aim To find a template or templates structures
from protein database (PDB)
pairwise sequence alignment - finds high homology
sequences BLAST Fold recognition programs find
low homology sequences (threading,
profile-profile alignment)
Improved Multiple sequence alignment methods
improves sensitivity - remote homologs PSIBLAST,
CLUSTAL
9Step 2 Model Construction
Aim To build three dimension (3D) structures of
proteins, coordinates of every atoms of the
homology proteins
Approach 1 protein structure buildup cores,
loops and sidechains Approach 2 whole protein
modeling constraint-based optimization. Commonly
used programs Modeller (http//salilab.org/mod
eller/) Swiss-model (http//swissmodel.expasy.org
/) Geno3D (http//geno3d-pbil.ibcp.fr/)
10Step 3 Model Construction
11Modeling of glycan-protein complexes
- Template glycan-protein complex
- Case 1 same glycan, different protein
- Glycoprotein replacement comparative modeling of
protein structure - Energy minimization, allowing structural
flexibility of glycans - Case 2 same protein, different glycan
- Flexible docking of glycans
- Case 3 different protein and different glycan
- Comparative modeling of proteins
- Flexible docking of glycan
- Can also be applied without a template of complex
12Flexible docking
- Semi-flexible (rigid protein, flexible ligand)
- Useful for drug screening
- gt150 programs Dock, AutoDock, FlexX/FlexE,
- Flexible protein mainly sidechains (hard)
- Two elements of semi-flexible docking algorithms
- ligand sampling methods
- Pattern matching Genetic Algorithm, Molecular
Dynamics, Monte Carlo - Treatment of intermolecular forces
- Simplified scoring functions empirical,
knowledge-based and molecular mechanics e.g.
AMBER, CHARMM, GROMOS, ... - Very simple treatment of solvation and entropy,
or completely ignored!
13Flexible docking of glycans to proteins
- Glycan structure sampling
- Automatic generation / sampling of 3D glycan
structures Sweet II (http//www.dkfz-heidelberg.d
e/spec/sweet2) - Docking of each glycan conformation to the GBP
Scoring schemes - Empirical scores
- Forcefield
- GLYCAM modified AMBER forcefield / MD tools for
glycans (R. Woods group) - Challenge water molecules
14Flexibility of molecules
- Atoms connected by covalent bonds
- Bond lengths and bond angles are rigid
- Torsion (dihedral) angles are flexible
15Frequently used definitions of glycosidic torsion
angles
Angle NMR style C - 1 crystallographic style C 1 crystallographic style
? H1C1OC'x O5C1OC'x O5C1OC'x
? C1OC'xH'x C1OC'xC'x-1 C1OC'xC'x1
? (16)-linkage C1OC'6C'5 C1OC'6C'5 C1OC'6C'5
? (16)-linkage OC'6C'5H'5 OC'6C'5C'4 OC'6C'5O'5
ASN
sweet2 http//www.dkfz-heidelberg.de/spec/sweet2/
16Induced fit? rigid receptor hypethesis
17Preferred torsion angles of glycans
18Cone-like (left) and umbrella-like (right)
topologies of 2-3 and 2-6 siaylated glycans
binding to influenza viral HAs
Chandrasekaran, et. al. Nature Biotechnology 26,
107 - 113 (2008)
19Combine structural analysis with the glycan array
analysis providing structural insights.
M. E. Taylor and K. Drickamer, Glycobiology 2009
19(11)1155-1162
20Ligand binding by the scavenger receptor C-type
lectin (SRCL) and LSECtin
M. E. Taylor and K. Drickamer, Glycobiology 2009
19(11)1155-1162
21Binding of multiple classes of ligands to DC-SIGN
and the macrophage galactose receptor. Model of
the binding site in the macrophage galactose
receptor with a bound GalNAc residue, based on
the structure of the galactose-binding mutant of
mannose-binding protein that was created by
insertion of key binding site residues from the
galactose-binding receptor.
M. E. Taylor and K. Drickamer, Glycobiology 2009
19(11)1155-1162
22Mechanisms of mannose-binding protein interaction
with ligands.
M. E. Taylor and K. Drickamer, Glycobiology 2009
19(11)1155-1162
23Molecular Dynamics simulation of molecular
motions
- Energy model of conformation
- Two main approaches
- Monte Carlo - stochastic
- Molecular dynamics deterministic
- Understand molecular function and interactions
- Catalysis of enzymes
- Complementary to experiments
- Obtain a movie of the interacting molecules
24Basic Concepts of simulation of molecular motion
- Compute energy for the interaction between all
pairs of atoms. - Move atoms to the next state.
- Repeat.
25Energy Function
- Target function that MD uses to govern the motion
of molecules (atoms) - Describes the interaction energies of all atoms
and molecules in the system - Always an approximation
- Closer to real physics --gt more realistic, more
computation time (I.e. smaller time steps and
more interactions increase accuracy)
26Scale in Simulations
continuum
mesoscale
Monte Carlo
Time Scale
10-6 S
molecular
dynamics
domain
10-8 S
quantum
D
exp(-
E/kT)
chemistry
10-12 S
F MA
10-10 M
10-8 M
10-6 M
10-4 M
Length Scale
Taken from Grant D. Smith Department of Materials
Science and Engineering Department of Chemical
and Fuels Engineering University of
Utah http//www.che.utah.edu/gdsmith/tutorials/tu
torial1.ppt
27The energy model
- Proposed by Linus Pauling in the 1930s
- Bond angles and lengths are almost always the
same - Energy model broken up into two parts
- Covalent terms
- Bond distances (1-2 interactions)
- Bond angles (1-3)
- Dihedral angles (1-4)
- Non-covalent terms
- Forces at a distance between all non-bonded atoms
http//cmm.cit.nih.gov/modeling/guide_documents/m
olecular_mechanics_document.html The NIH Guide to
Molecular Modeling
28The energy equation
- Energy
- Stretching Energy
- Bending Energy
- Torsion Energy
- Non-Bonded Interaction Energy
- These equations together with the data
(parameters) required to describe the behavior of
different kinds of atoms and bonds, is called a
force-field.
29Bond Stretching Energy
kb is the spring constant of the bond. r0 is the
bond length at equilibrium.
Unique kb and r0 assigned for each bond pair,
i.e. C-C, O-H
30Bending Energy
k? is the spring constant of the bend. ?0 is the
bond length at equilibrium.
Unique parameters for angle bending are assigned
to each bonded triplet of atoms based on their
types (e.g. C-C-C, C-O-C, C-C-H, etc.)
31Torsion Energy
The parameters are determined from curve
fitting. Unique parameters for torsional rotation
are assigned to each bonded quartet of atoms
based on their types (e.g. C-C-C-C, C-O-C-N,
H-C-C-H, etc.)
- A controls the amplitude of the curve
- n controls its periodicity
- shifts the entire curve along the rotation angle
axis (?).
32Non-bonded Energy
A determines the degree the attractiveness B
determines the degree of repulsion q is the charge
A determines the degree the attractiveness B
determines the degree of repulsion q is the charge
33Simulating In A Solvent
- The smaller the system, the more particles on the
surface - 1000 atom cubic crystal, 49 on surface
- 106 atom cubic crystal, 6 on surface
- Would like to simulate infinite bulk surrounding
N-particle system - Two approaches
- Implicitly
- Explicitly
- Periodic boundary conditions
Schematic representation of periodic boundary
conditions. http//www.ccl.net/cca/documents/molec
ular-modeling/node9.html
34Parameters for MD Forcefield
- Derived from direct experimental measurements on
small molecules (10 atoms) - Commonly used AMBER, CHARMM, GROMOS, etc
- GLYCAM for MD of glycoconjugates (derived from
AMBER forcefield)
35Monte Carlo
- Explore the energy surface by randomly probing
the configuration space by a Markov Chain
approach - Metropolis method (avoids local minima)
- Specify the initial atom coordinates.
- Select atom i randomly and move it by random
displacement. - Calculate the change of potential energy, ?E
corresponding to this displacement. - If ?E lt 0, accept the new coordinates and go to
step 2. - Otherwise, if ?E ? 0, select a random R in the
range 0,1 and - If e-?E/kT lt R accept and go to step 2
- If e-?E/kT ? R reject and go to step 2
36Deterministic Approach
- Provides us with a trajectory of the system.
- From atom positions, velocities, and
accelerations, calculate atom positions and
velocities at the next time step. - Integrating these infinitesimal steps yields the
trajectory of the system for any desired time
range. - Typical simulations of small proteins including
surrounding solvent in the pico-seconds.
37Deterministic / MD methodology
- From atom positions, velocities, and
accelerations, calculate atom positions and
velocities at the next time step. - Integrating these infinitesimal steps yields the
trajectory of the system for any desired time
range. - There are efficient methods for integrating these
elementary steps with Verlet and leapfrog
algorithms being the most commonly used.
38MD algorithm
r(tDt), v(tDt)
- Initialize system
- Ensure particles do not overlap in initial
positions (can use lattice) - Randomly assign velocities.
- Move and integrate.
r(t), v(t)
Leapfrog algorithm
39MD studies of Prion proteins
- Prion protein (PrP) is associated with an unusual
class of neurodegenerative diseases - Scrapie (sheep) bovine spongiform encephalopathy
(BSE) in cattle kuru, Creutzfeldt-Jacob disease
(CJD), Gerstmann-Sträussler-Scheinker syndrome
(GSS), and fatal familiar insomnia (FFI) in
humans - Protein-only hypothesis (Prusiner, 1982) the
disease is caused by an abnormal form of the 250
amino acid PrP, which accumulates in plaques in
the brain. - PrP (PrPSc) differs from the normal cellular form
(PrPC) only in its 3-D structure, and FTIR and CD
spectra indicate it has a significantly increased
content of ß-sheet conformation compared with
PrPC - Glycosylation appears to protect prion protein
(PrPC) from the conformational transition to the
disease-associated scrapie form (PrPSc)
40PrP is a glyco-protein
- Available NMR structures are for non-glycosylated
PrPC only - Glycosylation appears to protect prion protein
(PrPC) from the conformational transition to the
disease-associated scrapie form (PrPSc) - Objective study of the influence of two N-linked
glycans (Asn181 and Asn197) and of the GPI anchor
attached to Ser230
Zuegg, et. al., Glycobiology, 2000,
10(10)959-974.
41MD simulations
- Molecular dynamics simulations on the C-terminal
region of human prion protein HuPrP(90230), with
and without the three glycans - AMBER94 force field in a periodic box model with
explicit water molecules, considering all
long-range electrostatic interactions - HuPrP(127227) is stabilized overall from
addition of the glycans, specifically by
extensions of two helix and reduced flexibility
of the linking turn containing Asn197 - The stabilization appears indirect, by reducing
the mobility of the surrounding water molecules,
and not from specific interactions such as H
bonds or ion pairs. - Asn197 having a stabilizing role, while Asn181 is
within a region with already stable secondary
structure
Zuegg, et. al., Glycobiology, 2000,
10(10)959-974.
42Cone-like (left) and umbrella-like (right)
topologies of 2-3 and 2-6 siaylated glycans
binding to influenza viral HAs
A retrospective analysis
Chandrasekaran, et. al. Nature Biotechnology 26,
107 - 113 (2008)
43MD simulation of glycan binding of influenza HAs
- A combined approach (MD sequences) to predict
ligand-binding mutants of H5N1 influenza HA - Modeling the ligand-bound state of H5N1 HA using
the isolate VN1194 bound to a2,3-sialyllactose as
previously crystallized - Excess mutual information was computed between
each residue of each monomer and the
corresponding bound ligand, using the average
mutual information between the residue and all
residues as an estimate of the background
mutual information. - Combine these results with sequence analysis of
H5N1 mutational data to predict clusters of
residues that undergo coordinated mutation, which
have some capacity to vary but are subject to
selective pressure relating mutation. These
residues may be richer targets to change ligand
specificity than residues absolutely conserved or
residues that display uncorrelated mutations
(involved in immune escape).
Kasson, et. al., JACS, 2009, 131 (32), pp
1133811340
44Experimentally identified ligand-binding
mutations in red, the top 5 of residues by
dynamics scoring in cyan (overlap of these two in
magenta), and the six mutation sites identified
by both dynamics and sequence analysis in yellow.
The top three mutations from the ligand
dissociation analyses in yellow. A modeled
a2,3-sialyllactose is shown in orange.
45Prediction of dissociation rate for HA mutants
(in silico mutagenesis)
- Bayesian analysis methods to predict dissociation
rates based on extensive simulation of each
mutant and evaluate whether a mutant has a faster
dissociation rate than the influenza clinical
isolate that we use as a wild-type reference. - These simulations were used to estimate the
dissociation rate for each mutation. - The mutation sites predicted by analysis of the
molecular dynamics data include both residues
immediately contacting the bound glycan and
residues located farther away on the globular
head of the hemagglutinin molecule.