Structural Bioinformatics - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Structural Bioinformatics

Description:

Rosetta Steps used in CASP4. If possible, use PSI-BLAST to find similar sequences ... Methods like Rosetta represents a breakthrough in the ab initio prediction of ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 19
Provided by: deendayald
Learn more at: https://sse.umkc.edu
Category:

less

Transcript and Presenter's Notes

Title: Structural Bioinformatics


1
Structural Bioinformatics
  • Motivation
  • Concepts
  • Structure Prediction
  • Summary

2
Motivation
  • Holy Grail Mapping between sequence and
    structure. Structure F(Sequence). What is F?
  • Why
  • Structure dictates chemistry, thermodynamics and
    therefore function
  • Not all structures can be (need be?) determined
    experimentally
  • Cost
  • Experimental limitations

3
Concepts Prediction spectrum
Decreasing reliance on known structures
Homology Modeling
ab initio
Quantum Mechanics
Threading
4
Concepts - Common Principles
  • Constraints to reduce search space
  • Consideration of many alternate conformations
  • Protein backbone dihedral angles (Twists along
    axis of protein)
  • Amino-acid geometry (Amino-acids can have more
    than one shape)
  • Method for local optimization
  • Scoring function to compare conformations

5
Evaluation of quality of prediction
  • RMSD comparison with experimentally known
    structure
  • Comparison with crystal structure quality
    criteria
  • Ramachandran Plot
  • Residue specific dihedral angle distribution
  • CASP (Critical assessment of structure
    prediction) and CAFASP (..Fully Automated..)
    competitions

6
Methods
  • Knowledge-based constraints of search space
  • Homology Modeling
  • Threading
  • ab initio (Based on knowledge primitives not
    true ab initio)
  • Approaches to refinement
  • Quantum mechanics (ab initio)
  • Based on quantum mechanical model of elementary
    particles
  • Unscalable
  • Molecular mechanics
  • Uses parametric Force Fields (Newtons laws,
    Hookes law, )
  • Typically used for local or constrained global
    optimization
  • Molecular Dynamics or Monte Carlo-based

7
Homology modeling
  • Homology
  • Based on sequence-sequence similarity ( gt 25,
    the higher, the better)
  • Steps
  • Pair-wise local sequence similarity to identify
    related structures (possible templates)
  • Refine alignment by global pair-wise sequence
    similarity and msa
  • Overlay sequence backbone (N-C-C) on template
  • Model loops based on
  • Statistical knowledge from databases of known
    structures
  • Molecular mechanics
  • Model side-chains (approach similar to that of
    loops)
  • Molecular mechanical unconstrained local
    optimization
  • Pray for a good solution!

8
Threading
  • Based on sequence-structure similarity
  • Concept
  • Residues in core adopt fewer conformations than
    surface
  • Approach
  • Thread sequence through all known structures
  • Score match with core of each structure based on
  • Environmental scoring matrices and/or
  • Amino acid neighborhood matrices (a la Dot
    matrix)
  • Refine structure using molecular mechanics based
    on best template(s)

9
Rosetta (ab initio) Approach
  • Pioneered by David Bakers group in the late
    1990s
  • Remarkable success in CASP and CAFASP experiments
  • Recently made publicly available on an automated
    server by Christopher Bystroffs group
  • Pot pourri of many different approaches
  • Key components
  • Divide and conquer strategy with respect to
    length of sequence to be modeled
  • Use of knowledge based energy function

10
Divide and conquer
  • Mimics natural process of protein folding
  • Compromise between extremes of
  • Looking for homologous sequences with known
    structure
  • Modeling a priori (one amino acid at a time)
  • Use library of 3D structures of fragments of
    length 3 and 9 derived from the crystal structure
    database (a priori estimates 8K and 1012).
  • Break up query sequence into a set of 3mers and
    9mers, to find matches with above library using
    a sequence profile approach

11
Divide and conquer
  • Once matches found, reduces to combinatorial
    problem of selecting best set of fragments with
    most energetically favorable structure
  • In practice, Monte Carlo based search of possible
    combinations is carried out.

12
Knowledge based energy function
  • Fundamentally,
  • ?G ?H - T ?S
  • Free energy is the enthalpy less an entropic term
    that is proportional to temperature
  • Entropy is proportional to the natural log of the
    number of conformations/possible states
  • S K ln W

13
Knowledge based energy function
  • Hence makes sense to use existing distribution of
    structures to derive energy function
  • Energy function is based on taking statistical
    distribution of 3D shapes in database of known
    structures as the underlying probability
    distribution
  • For a given structure, deviations from
    probability distribution are subject to
    proportional energetic penalties

14
Rosetta Steps used in CASP4
  • If possible, use PSI-BLAST to find similar
    sequences
  • If found, use the multiple sequence alignment to
    break down sequence into domains to be modeled
    independently
  • For domains with similarity to known structures,
    use Homology based approach
  • For remaining domains, carry out Rosetta

15
Rosetta - Steps
  • For domains with similarity to other sequences,
    apply following steps to the homologs as well
    (consensus modeling)
  • Generate fragment library for each query
  • Collect 3mer and 9mer sub-structures from the PDB
    with similarity to 3mer and 9mer subsequences
  • Use Monte Carlo approach for backbone fragment
    substitution into query
  • Pick a fragment at random from library (40,000
    fragment substitutions for each structure)
  • Repeat A several times
  • Between 10K and 100K conformations (decoys)
    generated for each target

16
Rosetta - Steps
  • Filter set of conformations to remove unlikely
    structures
  • Remove structures with minimal long range
    interactions (low contact order)
  • Remove structures with unrealistic strands
  • Add side chains as statistically predicted by the
    backbone conformation
  • Cluster set of conformations (including, when
    available, the generated structures of
    homologues)
  • Representative structures from the top 5
    most-populous clusters are candidate structures

17
Summary
  • Methods like Rosetta represents a breakthrough in
    the ab initio prediction of protein 3D structure
    and are very useful in cases where homology
    cannot be observed
  • For CASP4, at least one subsequence longer than
    50 residues could be predicted correctly (lt 6.5
    rmsd) in 17 of 21 cases
  • Combination of various approaches works best

18
Summary
  • However, both completeness and accuracy of
    prediction leave ample room for improvement
  • RMS error frequently too high to be useful
  • Even in homology modeling, template per se is
    often better match!
  • Often, only subsequences are accurately modeled,
    and not the whole structure
  • The Nobel Prize is still up for grabs!
Write a Comment
User Comments (0)
About PowerShow.com