Title: Bioinformatics I
1Swiss Institute of Bioinformatics
Bioinformatics I Ab initio Protein Structure
Modeling Fold Recognition
14.1.2003
Torsten.Schwede_at_unibas.ch
2Growth of the Protein Data Bank (PDB)
08. January 2003 19691
PDB http//www.pdb.org
3Public Database Holdings
- No experimental
- structure for mostsequences
4In the near future for most of the known protein
sequences no experimental structure will be
available.
Can we predict protein structures from genome
sequences?
5gattccagag atggacgctt ttgctcttat tcctcgtact
cagtggcaat atgtgatggg tccttcactt taccgaataa
tgaacaacct cttttaattt tataaatacc
ttctataaat acttaggagg tattatgaat atatttgaaa
tgttacgtat agatgaacgt cttagactta aaatctataa
agacacagaa ggctattaca ctattggcat cggtcatttg
cttacaaaaa gtccatcact taatgctgct aaatctgaat
tagataaagc tattgggcgt aattgcaatg gtgtaattac
aaaagatgag gctgaaaaac tctttaatca ggatgttgat
gctgctgttc gcggaattct gagaaatgct aaattaaaac
cggtttatga ttctcttgat gcggttcgtc gctgtgcatt
gattaatatg gttttccaaa tgggagaaac cggtgtggca
ggatttacta actctttacg tatgcttcaa caaaaacgct
gggatgaagc agcagttaac ttagctaaaa gtatatggta
taatcaaaca cctaatcgcg caaaacgagt cattacaacg
tttagaactg
?
Gene prediction
MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN
AAKSELDKAI GRNCNGVITK DEAEKLFNQD VDAAVRGILR
NAKLKPVYDS LDAVRRCALI NMVFQMGETG
VAGFTNSLRM LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI
TTFRTGTWDA YKNL
?
Can we predict protein structures from protein
sequences?
6- Many proteins fold spontaneously to their native
structure - Protein folding is relatively fast
- Chaperones speed up folding, but do not alter
the strcuture
The protein sequence contains all information
needed to create a correctly folded protein.
7Empirical Force Fields and Molecular Mechanics
- describe interaction of atoms or groups
- the parameters are empirical, i.e. they are
dependent on others and have no direct intrinsic
meaning - Examples GROMOS96 (van Gusteren)CHARMM (M.
Karplus)AMBER (Kollman)
8- Bond stretching
-
- Approximation of the Morse potential by an
elastic spring model - Hookes law as reasonable approximation close to
reference bond length l0
l
k Force constant l distance
9- Angle Bending
-
- Deviation from angles from their reference angle
l0 often described by Hookes law
?
k Force constant ? bond angle
- Force constants are much smaller than those for
bond stretching
10Torsional Terms
- Hypothetical potential function for rotation
around a chemical bond
Vn barrier height n multiplicity (e.g.
n3) ? torsion angle ? phase factor
- Need to include higher terms for non-symmetric
bonds (i.e. to distinguish trans, gauche/droit
conformations)
11Non-bonded (Van der Waals) interactions
- act only only at very low distances
- Attractive interaction by induced dipoles between
uncharged atoms r 6 - When atoms come too close, their valence shells
start to overlap and repulse r 12
12Electrostatic interactions
- Electronegative elements attract electrons more
than less electronegative elements - Unequal charge distribution is expressed by
fractional charges - Electrostatic interaction often calculated by
Coulombs law
q
r
-
13Electrostatic interactions Solvent dielectric
model?
- use relative dielectric constant ?0?r
- Problem Inhomogeneous permittivity
- ? For proteins, we need to solve
Poisson-Boltzmann equation numerically
e 80
e 2-4
14Example for a (very) simple Force Field
15Molecular Mechanics - Energy Minimization
- The energy of the system is minimized. The system
tries to relax - Typically, the system relaxes to a local minimum
(LM).
16Molecular Dynamics (MD)
In molecular dynamics, energy is supplied to the
system, typically using a constant temperature
(i.e. constant average constant kinetic energy).
17Molecular Dynamics (MD)
- Use Newtonian mechanics to calculate the net
force and acceleration experienced by each atom. - Each atom i is treated as a point with mass mi
and fixed charge qi - Determine the force Fi on each atom
- Use positions and accelerations at time t (and
positions from t - ? t) to calculate new
positions at time t ? t
18Implicit Solvent Models
- Water molecules are not included as molecules,
but represented by an extra potential on the
solvent accessible surface. - Advantages
- only 50 slower than vacuum calculations
- 10 times faster than explicit water MD
- Disadvantages
- Really represents water ? -gt heavy discussions
- Example SASA model (CHARMM)
19Explicit Solvent Models
- Water molecules are explicitly included as
individual molecules. - Force Fields for water molecules are not trivial
... - Computationally expensive ...
20Periodic Boundary Conditions (PBC)
- Periodic boundary conditions are used to simulate
solvated systems or crystals. - In solvated systems, PBC prevents that the
solvent "evaporates in silico"
21Typical Time Scales ....
- Bond stretching 10-14 - 10-13 sec.
- Elastic vibrations 10-12 - 10-11 sec.
- Rotations of surface sidechains 10-11 - 10-10
sec. - Hinge bending 10-11 - 10-7 sec.
- Rotation of buried side chains 10-4 - 1 sec.
- Protein folding 10-6 - 102 sec.
- Timescale in MD
- A Typical timestep in MD is 1 fs (10-15
sec)(ideally 1/10 of the highest frequency
vibration)
22Ab initio protein folding simulation
? Blue Gene will need 3 years to simulate 100
?sec.
23Want to fold some proteins at home?
24Want to fold some proteins at home?
- Simulations of the villin headpiece
- Folding time is on the order of 10 microseconds
- Hundred of microseconds of MD time simulated
For the villin movie, please see
http//folding.stanford.edu/villin/
25Can we predict protein structures ?
- ab initio folding simulation not yet ...
- ???
26Rosetta Stone Approach
27Rosetta Stone Approach (David Baker)
1. Find sequence patterns that strongly correlate
with protein structure at the local level to
create a library of fragments (I-sites).
E.g. amphipathic helix
Amino acid statistics
Helix position
28Rosetta Stone Approach (David Baker)
2. Model building for a new sequence- Search
for compatible fragments (reduced alphabet)
- Use Monte Carlo simulated annealing to assemble
overlapping fragments - - Scoring functions are used to select best
models (1000)
29Rosetta Stone Approach
- ? Generates thousands of models
- Best Models in CASP4 6 10 Ã… rmsd Ca
- Difficult to distinguish good and bad models
-
http//isites.bio.rpi.edu/index.html
30Can we predict protein structures ?
- ab initio folding simulation not yet ...
- Rosetta approach neither ...
- ???
31Growth of the Protein Data Bank (PDB)
08. January 2003 19691
PDB http//www.pdb.org
32Protein Structure Databases
- Worldwide repository for the processing and
distribution of 3-D biological macromolecular
structure data - http//www.pdb.org
- Protein structures solved experimentally (X-Ray
or NMR) - Provides
- Coordinates (sometimes structure factors, NOEs)
- Images
- Links to derived data, e.g. similar structures,
fold families, etc.
33The number of different protein folds is limited
Seen this before ...
New Folds
34The number of different protein folds is limited
last update Oct 2001
35Protein Structure Databases
CATH - Protein Structure Classification
- hierarchical classification of protein domain
structures - UCL, Janet Thornton Christine Orengo
- clusters proteins at four major levels
- Class(C)
- Architecture(A)
- Topology(T)
- Homologous superfamily (H)
http//www.biochem.ucl.ac.uk/bsm/cath_new/
36- Class(C)derived from secondary structure content
is assigned automatically - Architecture(A)describes the gross orientation
of secondary structures, independent of
connectivity. - Topology(T) clusters structures according to
their topological connections and numbers of
secondary structures
http//www.biochem.ucl.ac.uk/bsm/cath_new/
37(No Transcript)
38(No Transcript)
39Protein Structure Databases
SCOP - Structural Classification of Proteins
- MRC Cambridge (UK), Alexey Murzin, Brenner S. E.,
Hubbard T., Chothia C. - hierarchical classification of protein domain
structures - created by manual inspection
- comprehensive description of the structural and
evolutionary relationships - organized as a tree structure
- Class
- Fold
- Superfamily
- Family
- Species
http//scop.mrc-lmb.cam.ac.uk/scop/