Structural Bioinformatics - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Structural Bioinformatics

Description:

Rosetta Steps used in CASP4. If possible, use PSI-BLAST to find similar sequences ... Methods like Rosetta represents a breakthrough in the ab initio prediction of ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 19

Provided by: deendayald

Learn more at: https://sse.umkc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Structural Bioinformatics

1
Structural Bioinformatics

Motivation
Concepts
Structure Prediction
Summary

2
Motivation

Holy Grail Mapping between sequence and
structure. Structure F(Sequence). What is F?
Why
Structure dictates chemistry, thermodynamics and
therefore function
Not all structures can be (need be?) determined
experimentally
Cost
Experimental limitations

3
Concepts Prediction spectrum
Decreasing reliance on known structures
Homology Modeling
ab initio
Quantum Mechanics
Threading
4
Concepts - Common Principles

Constraints to reduce search space
Consideration of many alternate conformations
Protein backbone dihedral angles (Twists along
axis of protein)
Amino-acid geometry (Amino-acids can have more
than one shape)
Method for local optimization
Scoring function to compare conformations

5
Evaluation of quality of prediction

RMSD comparison with experimentally known
structure
Comparison with crystal structure quality
criteria
Ramachandran Plot
Residue specific dihedral angle distribution
CASP (Critical assessment of structure
prediction) and CAFASP (..Fully Automated..)
competitions

6
Methods

Knowledge-based constraints of search space
Homology Modeling
Threading
ab initio (Based on knowledge primitives not
true ab initio)
Approaches to refinement
Quantum mechanics (ab initio)
Based on quantum mechanical model of elementary
particles
Unscalable
Molecular mechanics
Uses parametric Force Fields (Newtons laws,
Hookes law, )
Typically used for local or constrained global
optimization
Molecular Dynamics or Monte Carlo-based

7
Homology modeling

Homology
Based on sequence-sequence similarity ( gt 25,
the higher, the better)
Steps
Pair-wise local sequence similarity to identify
related structures (possible templates)
Refine alignment by global pair-wise sequence
similarity and msa
Overlay sequence backbone (N-C-C) on template
Model loops based on
Statistical knowledge from databases of known
structures
Molecular mechanics
Model side-chains (approach similar to that of
loops)
Molecular mechanical unconstrained local
optimization
Pray for a good solution!

8
Threading

Based on sequence-structure similarity
Concept
Residues in core adopt fewer conformations than
surface
Approach
Thread sequence through all known structures
Score match with core of each structure based on
Environmental scoring matrices and/or
Amino acid neighborhood matrices (a la Dot
matrix)
Refine structure using molecular mechanics based
on best template(s)

9
Rosetta (ab initio) Approach

Pioneered by David Bakers group in the late
1990s
Remarkable success in CASP and CAFASP experiments
Recently made publicly available on an automated
server by Christopher Bystroffs group
Pot pourri of many different approaches
Key components
Divide and conquer strategy with respect to
length of sequence to be modeled
Use of knowledge based energy function

10
Divide and conquer

Mimics natural process of protein folding
Compromise between extremes of
Looking for homologous sequences with known
structure
Modeling a priori (one amino acid at a time)
Use library of 3D structures of fragments of
length 3 and 9 derived from the crystal structure
database (a priori estimates 8K and 1012).
Break up query sequence into a set of 3mers and
9mers, to find matches with above library using
a sequence profile approach

11
Divide and conquer

Once matches found, reduces to combinatorial
problem of selecting best set of fragments with
most energetically favorable structure
In practice, Monte Carlo based search of possible
combinations is carried out.

12
Knowledge based energy function

Fundamentally,
?G ?H - T ?S
Free energy is the enthalpy less an entropic term
that is proportional to temperature
Entropy is proportional to the natural log of the
number of conformations/possible states
S K ln W

13
Knowledge based energy function

Hence makes sense to use existing distribution of
structures to derive energy function
Energy function is based on taking statistical
distribution of 3D shapes in database of known
structures as the underlying probability
distribution
For a given structure, deviations from
probability distribution are subject to
proportional energetic penalties

14
Rosetta Steps used in CASP4

If possible, use PSI-BLAST to find similar
sequences
If found, use the multiple sequence alignment to
break down sequence into domains to be modeled
independently
For domains with similarity to known structures,
use Homology based approach
For remaining domains, carry out Rosetta

15
Rosetta - Steps

For domains with similarity to other sequences,
apply following steps to the homologs as well
(consensus modeling)
Generate fragment library for each query
Collect 3mer and 9mer sub-structures from the PDB
with similarity to 3mer and 9mer subsequences
Use Monte Carlo approach for backbone fragment
substitution into query
Pick a fragment at random from library (40,000
fragment substitutions for each structure)
Repeat A several times
Between 10K and 100K conformations (decoys)
generated for each target

16
Rosetta - Steps

Filter set of conformations to remove unlikely
structures
Remove structures with minimal long range
interactions (low contact order)
Remove structures with unrealistic strands
Add side chains as statistically predicted by the
backbone conformation
Cluster set of conformations (including, when
available, the generated structures of
homologues)
Representative structures from the top 5
most-populous clusters are candidate structures

17
Summary

Methods like Rosetta represents a breakthrough in
the ab initio prediction of protein 3D structure
and are very useful in cases where homology
cannot be observed
For CASP4, at least one subsequence longer than
50 residues could be predicted correctly (lt 6.5
rmsd) in 17 of 21 cases
Combination of various approaches works best

18
Summary

However, both completeness and accuracy of
prediction leave ample room for improvement
RMS error frequently too high to be useful
Even in homology modeling, template per se is
often better match!
Often, only subsequences are accurately modeled,
and not the whole structure
The Nobel Prize is still up for grabs!

Write a Comment

User Comments (0)