Bioinformatics I - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Bioinformatics I

Description:

Rosetta Stone Approach ... Rosetta Stone Approach (David Baker) Bioinformatics I. Generates ... Rosetta Stone Approach. Bioinformatics I. MNIFEMLRID ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 40
Provided by: Torsten87
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics I


1
Swiss Institute of Bioinformatics
Bioinformatics I Ab initio Protein Structure
Modeling Fold Recognition
14.1.2003

Torsten.Schwede_at_unibas.ch
2
Growth of the Protein Data Bank (PDB)
08. January 2003 19691
PDB http//www.pdb.org
3
Public Database Holdings
  • No experimental
  • structure for mostsequences

4
In the near future for most of the known protein
sequences no experimental structure will be
available.
Can we predict protein structures from genome
sequences?
5
gattccagag atggacgctt ttgctcttat tcctcgtact
cagtggcaat atgtgatggg tccttcactt taccgaataa
tgaacaacct cttttaattt tataaatacc
ttctataaat acttaggagg tattatgaat atatttgaaa
tgttacgtat agatgaacgt cttagactta aaatctataa
agacacagaa ggctattaca ctattggcat cggtcatttg
cttacaaaaa gtccatcact taatgctgct aaatctgaat
tagataaagc tattgggcgt aattgcaatg gtgtaattac
aaaagatgag gctgaaaaac tctttaatca ggatgttgat
gctgctgttc gcggaattct gagaaatgct aaattaaaac
cggtttatga ttctcttgat gcggttcgtc gctgtgcatt
gattaatatg gttttccaaa tgggagaaac cggtgtggca
ggatttacta actctttacg tatgcttcaa caaaaacgct
gggatgaagc agcagttaac ttagctaaaa gtatatggta
taatcaaaca cctaatcgcg caaaacgagt cattacaacg
tttagaactg
?
Gene prediction
MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN
AAKSELDKAI GRNCNGVITK DEAEKLFNQD VDAAVRGILR
NAKLKPVYDS LDAVRRCALI NMVFQMGETG
VAGFTNSLRM LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI
TTFRTGTWDA YKNL
?
Can we predict protein structures from protein
sequences?
6
  • Many proteins fold spontaneously to their native
    structure
  • Protein folding is relatively fast
  • Chaperones speed up folding, but do not alter
    the strcuture

The protein sequence contains all information
needed to create a correctly folded protein.
7
Empirical Force Fields and Molecular Mechanics
  • describe interaction of atoms or groups
  • the parameters are empirical, i.e. they are
    dependent on others and have no direct intrinsic
    meaning
  • Examples GROMOS96 (van Gusteren)CHARMM (M.
    Karplus)AMBER (Kollman)

8
  • Bond stretching
  • Approximation of the Morse potential by an
    elastic spring model
  • Hookes law as reasonable approximation close to
    reference bond length l0

l
k Force constant l distance
9
  • Angle Bending
  • Deviation from angles from their reference angle
    l0 often described by Hookes law

?
k Force constant ? bond angle
  • Force constants are much smaller than those for
    bond stretching

10
Torsional Terms
  • Hypothetical potential function for rotation
    around a chemical bond

Vn barrier height n multiplicity (e.g.
n3) ? torsion angle ? phase factor
  • Need to include higher terms for non-symmetric
    bonds (i.e. to distinguish trans, gauche/droit
    conformations)

11
Non-bonded (Van der Waals) interactions
  • act only only at very low distances
  • Attractive interaction by induced dipoles between
    uncharged atoms r 6
  • When atoms come too close, their valence shells
    start to overlap and repulse r 12

12
Electrostatic interactions
  • Electronegative elements attract electrons more
    than less electronegative elements
  • Unequal charge distribution is expressed by
    fractional charges
  • Electrostatic interaction often calculated by
    Coulombs law

q


r
-
13
Electrostatic interactions Solvent dielectric
model?
  • use relative dielectric constant ?0?r
  • Problem Inhomogeneous permittivity
  • ? For proteins, we need to solve
    Poisson-Boltzmann equation numerically

e 80
e 2-4
14
Example for a (very) simple Force Field
15
Molecular Mechanics - Energy Minimization
  • The energy of the system is minimized. The system
    tries to relax
  • Typically, the system relaxes to a local minimum
    (LM).

16
Molecular Dynamics (MD)
In molecular dynamics, energy is supplied to the
system, typically using a constant temperature
(i.e. constant average constant kinetic energy).
17
Molecular Dynamics (MD)
  • Use Newtonian mechanics to calculate the net
    force and acceleration experienced by each atom.
  • Each atom i is treated as a point with mass mi
    and fixed charge qi
  • Determine the force Fi on each atom
  • Use positions and accelerations at time t (and
    positions from t - ? t) to calculate new
    positions at time t ? t

18
Implicit Solvent Models
  • Water molecules are not included as molecules,
    but represented by an extra potential on the
    solvent accessible surface.
  • Advantages
  • only 50 slower than vacuum calculations
  • 10 times faster than explicit water MD
  • Disadvantages
  • Really represents water ? -gt heavy discussions
  • Example SASA model (CHARMM)

19
Explicit Solvent Models
  • Water molecules are explicitly included as
    individual molecules.
  • Force Fields for water molecules are not trivial
    ...
  • Computationally expensive ...

20
Periodic Boundary Conditions (PBC)
  • Periodic boundary conditions are used to simulate
    solvated systems or crystals.
  • In solvated systems, PBC prevents that the
    solvent "evaporates in silico"

21
Typical Time Scales ....
  • Bond stretching 10-14 - 10-13 sec.
  • Elastic vibrations 10-12 - 10-11 sec.
  • Rotations of surface sidechains 10-11 - 10-10
    sec.
  • Hinge bending 10-11 - 10-7 sec.
  • Rotation of buried side chains 10-4 - 1 sec.
  • Protein folding 10-6 - 102 sec.
  • Timescale in MD
  • A Typical timestep in MD is 1 fs (10-15
    sec)(ideally 1/10 of the highest frequency
    vibration)

22
Ab initio protein folding simulation
? Blue Gene will need 3 years to simulate 100
?sec.
23
Want to fold some proteins at home?
24
Want to fold some proteins at home?
  • Simulations of the villin headpiece
  • Folding time is on the order of 10 microseconds
  • Hundred of microseconds of MD time simulated

For the villin movie, please see
http//folding.stanford.edu/villin/
25
Can we predict protein structures ?
  • ab initio folding simulation not yet ...
  • ???

26
Rosetta Stone Approach
27
Rosetta Stone Approach (David Baker)
1. Find sequence patterns that strongly correlate
with protein structure at the local level to
create a library of fragments (I-sites).
E.g. amphipathic helix
Amino acid statistics
Helix position
28
Rosetta Stone Approach (David Baker)
2. Model building for a new sequence- Search
for compatible fragments (reduced alphabet)
  • Use Monte Carlo simulated annealing to assemble
    overlapping fragments
  • - Scoring functions are used to select best
    models (1000)

29
Rosetta Stone Approach
  • ? Generates thousands of models
  • Best Models in CASP4 6 10 Ã… rmsd Ca
  • Difficult to distinguish good and bad models

http//isites.bio.rpi.edu/index.html
30
Can we predict protein structures ?
  • ab initio folding simulation not yet ...
  • Rosetta approach neither ...
  • ???

31
Growth of the Protein Data Bank (PDB)
08. January 2003 19691
PDB http//www.pdb.org
32
Protein Structure Databases
  • Worldwide repository for the processing and
    distribution of 3-D biological macromolecular
    structure data
  • http//www.pdb.org
  • Protein structures solved experimentally (X-Ray
    or NMR)
  • Provides
  • Coordinates (sometimes structure factors, NOEs)
  • Images
  • Links to derived data, e.g. similar structures,
    fold families, etc.

33
The number of different protein folds is limited
Seen this before ...
New Folds
34
The number of different protein folds is limited
last update Oct 2001
35
Protein Structure Databases
CATH - Protein Structure Classification
  • hierarchical classification of protein domain
    structures
  • UCL, Janet Thornton Christine Orengo
  • clusters proteins at four major levels
  • Class(C)
  • Architecture(A)
  • Topology(T)
  • Homologous superfamily (H)

http//www.biochem.ucl.ac.uk/bsm/cath_new/
36
  • Class(C)derived from secondary structure content
    is assigned automatically
  • Architecture(A)describes the gross orientation
    of secondary structures, independent of
    connectivity.
  • Topology(T) clusters structures according to
    their topological connections and numbers of
    secondary structures

http//www.biochem.ucl.ac.uk/bsm/cath_new/
37
(No Transcript)
38
(No Transcript)
39
Protein Structure Databases
SCOP - Structural Classification of Proteins
  • MRC Cambridge (UK), Alexey Murzin, Brenner S. E.,
    Hubbard T., Chothia C.
  • hierarchical classification of protein domain
    structures
  • created by manual inspection
  • comprehensive description of the structural and
    evolutionary relationships
  • organized as a tree structure
  • Class
  • Fold
  • Superfamily
  • Family
  • Species

http//scop.mrc-lmb.cam.ac.uk/scop/
Write a Comment
User Comments (0)
About PowerShow.com