Title: BCB 444544
1BCB 444/544
- Lecture 22
- Protein Structure Prediction
- (ctd.)
- 23 Oct 17
2Chp 15 - Tertiary Structure Prediction
SECTION V STRUCTURAL BIOINFORMATICS Xiong
Chp 15 Protein Tertiary Structure
Prediction Methods Homology Modeling Threading
and Fold Recognition Ab Initio Protein Structural
Prediction CASP
Some slides based on those by Mark Gerstein
3Structural Genomics - Status Goal
- 20,000 "traditional" genes in human genome
- (recall, this is fewer than earlier
estimate of 30,000) - 2,000 proteins in a typical cell
- gt 4.9 million sequences in UniProt (Oct 2007)
- gt 46,000 protein structures in the PDB (Oct
2007) - Experimental determination of protein structure
lags far behind sequence determination! - Goal Determine structures of "all" protein folds
in nature, using combination of experimental
structure determination methods (X-ray
crystallography, NMR, mass spectrometry)
structure prediction
4Problem statement
- Given the primary structure (sequence) of a
protein chain, what is its 3D shape (i.e. what
are the 3D coordinates of its constituent atoms)? - Also
- Given a desired 3D shape, what amino acid
sequence(s) will assume that conformation - Recently accomplished by David Bakers lab for
one designed structure!
5Steps in Protein Folding
- 1- "Collapse"- driving force is burial of
hydrophobic aas (fast - msecs) - 2- Molten globule - helices sheets form, but
"loose" (slow - secs) - 3- "Final" native folded state - compaction
rearrangement of - 2' structures
-
Native state? - assumed to be lowest free
energy - may be an ensemble of structures
6Energy
- Energy Function
- Force field
- bond energy
- bond angle energy
- dihedral angle energy
- van der Waals energy
- electrostatic energy
- Knowledge-based statistical potential
- Conformational Search Function
- Molecular dynamics
- Monte Carlo
- How can we reduce computational complexity of
each step?
7Rotamers
Alkane stereochemistry. (2007, April 25). In
Wikipedia, The Free Encyclopedia. Retrieved April
28, 2007, from http//en.wikipedia.org/w/index.php
?titleAlkane_stereochemistryoldid126676235
Image modified slightly from http//nook.cs.ucdav
is.edu8080/koehl/ProModel/sidechain.html
8Structure modeling methods
- Comparative modeling
- (e.g. MODELLER, SWISS-MODEL)
- Threading
- (e.g. FUGUE)
- Ab initio - from first principles
- Molecular dynamics (physical simulation)
- (e.g. NAMD, GROMACS)
- Hybrid methods
- (e.g. ROSETTA, CABS, I-TASSER)
9Tertiary Structure Prediction Methods
- 2 (or 3) Major Methods
- Comparative Modeling
- Homology Modeling (easiest!)
- Threading and Fold Recognition (harder)
- Ab Initio Protein Structural Prediction (really
hard)
10Protein Dynamics
- Protein in native state is NOT static
- Function of many proteins requires conformational
changes, sometimes large, sometimes small - Globular proteins are inherently "unstable"
- (NOT evolved for maximum stability)
- Energy difference between native and denatured
state is very small (5-15 kcal/mol) - (this is equivalent to 2 H-bonds!)
- Folding involves changes in both entropy
enthalpy
11Difficulty of Tertiary Structure Prediction
- Folding or tertiary structure prediction problem
can be formulated as a search for minimum energy
conformation - Search space is defined by psi/phi angles of
backbone and side-chain rotamers - Search space is enormous even for small proteins!
- Number of local minima increases exponentially
with number of residues
Computationally it is an exceedingly difficult
problem!
12How do these methods stack up?
Baker and Sali. Science 294 (5540) 93-96
13Comparative modeling
- Requires that the structure has been solved for a
protein with similar sequence - Backbone of query structure is initially
positioned identically to template structure - Position of sidechains and loops are then built
and modified to remove steric clashes
14Threading
- In some ways similar to comparitive modelling
but - Used when no highly sequence similar structures
are available - Thread target sequence to a library of structures
- Evaluate energy function
- Incompatible structures will have high energy
- Only accept structures below a cutoff
15Steps in Threading
- Align target sequence with template structures
- in fold library (usually from the PDB)
- Calculate energy score to evaluate "goodness of
fit" between target sequence template structure - Rank models based on energy scores
16Threading Goal - Issues
Find correct sequence-structure alignment of a
target sequence with its native-like fold in
template library (usually derived from PDB)
- Structure database - must be "complete"
- Can't build a good model if there is no good
template in library! - Sequence-structure alignment algorithm
- Bad alignment ? Bad score!
- Energy function or Scoring Scheme
- Must distinguish correct sequence-fold alignment
from incorrect sequence-fold alignments - Must distinguish correct fold from close
decoys - Prediction reliability assessment - How determine
whether predicted structure is correct? (or
even close?)
17Threading Template database
- Build a database of structural templates
- e.g., ASTRAL domain library derived from the
PDB
Sometimes, supplement with additional decoys
e.g., generated using ab initio approach such as
Rosetta (Baker)
18Threading Energy function
- Two main methods ( combinations of these)
- Structural profile (environmental)
physicochemical properties of amino acids - Contact potential (statistical)
- based on contact statistics from PDB
- famous one Miyazawa Jernigan (ISU)
19Ab Initio Prediction
- Develop energy function
- bond energy
- bond angle energy
- dihedral angle energy
- van der Waals energy
- electrostatic energy
- Calculate structure by minimizing energy function
- usually Molecular Dynamics (MD) or Monte Carlo
(MC) - Ab initio prediction - impractical for most real
(long) proteins - Computationally? very expensive
- Accuracy? Usually poor for all except short
peptides - (but much improvement recently!)
Provides both folding pathway folded structure
20Molecular dynamics
- Physical, all-atom simulation
- Calculates force (classical, not quantum) between
all atoms - Iterate over very small time-steps (usually 1 or
2 femtoseconds) - Due to computational cost, typically limited to
simulation run on order of ns - Decent at folding very short sequences, but
impractical for full folding of most sequences
21State of the art
- Critical Assesment of Structure Prediction (CASP)
- Top groups
- David Baker, University of Washington
- ROSETTA
- open source
- also available through ROBETTA server (4 month
queue) - Andrzej Kolinski, University of Warsaw, Poland
- CABS
- Yang Zhang (originally from Jeffrey Skolnicks
lab), University of Kansas - I-TASSER (1 server in CASP7 competition, shorter
queue than ROBETTA, for now)
22CABS
Source http//biocomp.chem.uw.edu.pl/multiscale_m
odeling.php
23Disorder?
- Some proteins (or segments of proteins) appear to
be intrinsically disordered in the cell
24Dynamics?
- No such thing as a static structure
- Fluctuation about minimum energy structure
- Elastic network model - Jernigan
-
25Interactions?
- Protein-Protein
- Protein-DNA
- Protein-RNA
- Protein-Ligand
26Essential Reading
- Baker, D and Sali, A. (2001) Protein Structure
Prediction and Structural Genomics. Science 294
(5540), 93. DOI 10.1126/science.1065659
http//tinyurl.com/3njfez
27Protein Structure Classification
- SCOP Structural Classification of Proteins
- Levels reflect both evolutionary and structural
relationships - http//scop.mrc-lmb.cam.ac.uk/scop
- CATH Classification by Class,
Architecture,Topology Homology http//cathwww.b
iochem.ucl.ac.uk/latest/ - DALI - (recently moved to EBI reorganized)
- DALI Database (fold classification) http/
/ekhidna.biocenter.helsinki.fi/dali/start
Each method has strengths weaknesses.
28SCOP - Structure Classificationhttp//scop.mrc-lm
b.cam.ac.uk/scop/
29CATH - Structure Classification
http//www.cathdb.info/latest/index.html