Title: Protein folding
1Protein folding
2Levinthal paradox
If a protein has 100 amino acids, and each amino
acid has 3 conformational states, then the
protein has 3100 conformational states, of
which 1 is the Native state. Therefore Proteins
cannot sample all possible conformations while
folding. Therefore a folding pathway must exist.
3Anfinsens thermodynamic hypothesis
The 3D structure of a protein in its native
environment is the one in which the Gibbs free
energy of the whole system is the lowest.
sequence determines structure
How did Anfinsen come to this conclusion?
Anfinsen, C.B. Principles that govern the folding
of protein chains. Science 181, 223-30 (1973).
4Shuffling disulfides
Ribonuclease was reduced and denatured (8M urea),
then allowed to reoxidize in denaturing
conditions. Urea was dialyzed away, and the
activity was found to be 1 of native.
8 cysteines, 4 disulfides. 105 disulfide
scrambled isomers
5Activity restored by disulfide exchange enzyme
scrambled enzyme (1 activity)
native (100 activity)
thioredoxin
Under native conditions (urea removed), an enzyme
that breaks disulfides causes that molecule to
recover full activity. Conclusion All other 104
isomers were higher in free energy than the
native isomer.
6Antibodies bind to fragments of ribonuclease
Ab raised against selected using inhibit
E? E 0 E 99-149 99-149 0
Antibodies to E (whole enzyme) bind to the
fragment 99-149. So 99-149, although unfolded, is
sometimes folded. 99-149 flickers in and out of
the native conformation.
7Anfinsens folding pathway
8This much is known
- Proteins are translated from mRNA on the
ribosome, but folding does not require the
ribosome. Proteins fold in a dense melieu of the
cytoplasm, but will also fold in pure water. - Nothing other than water is required for folding.
- The amino acid sequence is sufficient information
to decide the folded structure. - Disulfide formation is not required for folding.
9Methods for observing protein folding
Proteins may be denatured using urea of
guanadine, as well as many other alcohols,
detergents and non-polar solvents. Cold, and
pressure also unfold proteins.
- Very fast dilution
- Temperature jump
- Pressure jump
10Stopped flow spectrophotometry
...gives a time sequence of measurements, shortly
after dilution. Time scale miliseconds.
water
spectrophotometer
mixer
1st-order kinetics
protein in denaturant
dead time
t
0
11What happens early in folding?
Burst phase intermediates. Extrapolating back
to the unfolded state reveals that a significant
change happens before the first measurement, the
dead time. What changes?
12Two hypotheses for early folding
The Molten Globule hypothesis Early folding is
dominated by the hydrophobic effect. Total
hydrophobic surface area decreases rapidly, in
the dead time. The earlystructure looks like a
globule, with hydrophobic sidechains in the
center, like a micelle.
The Framework Model, or Nucleation/Condensation
model Early folding is dominated by local
interactions. Backbone angles are never truly
randomized, even in the unfolded state. Helices
and turns form quickly, then structure propagates
or condenses around them.
13Folding kinetics
Transition state theory. A reaction coordinate
for folding? Only 2 states are observed unfolded
and folded. No stable intermediate are found.
2 states implies a barrier. What is the reaction
coordinate?
U
F
14Is there a part of the chain that folds early?
( before the transition state?)
Alan Fershts bright idea mutant kinetics
What happens to the folding rate when one residue
is mutated to an alanine? (Assume the structure
stays the same.)
U
If the folding rate changes If the folding
rate doesnt change
F
U
F
15f-values
U
f 1
The mutation only effects early stages of
folding. the mutation only effects late stages
of folding.
F
U
f 0
F
16f-value analysis
Sidechains are mutated to Ala. Folding rate and
stability of mutant are determined.
The regions of the molecule that have high f fold
early. Download and display 1CSK.pdb (chain A
only) (SH3 domain)
17SH3 domain
restrict A display --gt backbone select 30-36A
or 50-58A display--gtsticks color cpk
18Early folding region of the SH3 domain
19Do folding initiation sites exist?
Phi-value analysis suggests that early folding
regions exist. Are their sequence motifs for the
early folding regions? If so, can we predict the
fold from the sequence?
20Using the database to look for short sequences
that fold autonomously...
Clustering in Sequence/structure space
each dot represents a short segment of a protein
The distance sequence difference structure
difference
21Clusters in sequence structure space represent
context-independent sequence/structure
correlations
HDFPIEGGDSPMQTIFFWSNANAKLSHGY
CPYDNIWMQTIFFNQSAAVYSVLHLIFLT IDMNPQGSIEMQTIFFGYA
ESAELSPVVNFLEEMQTIFFISGFTQTANSD
INWGSMQTIFFEEWQLMNVMDKIPSIFNESKKKGIAMQTIFFILSGR
PPPMQTIFFVIVNYNESKHALWCSVD
PWMWNLMQTIFFISQQVIEIPS
MQTIFFVFSHDEQMKLKGLKGA
Each cluster represents a set of similar
sequences from globally dissimilar structures.
22Refining the scoring matrix
remove all cluster members that
do not conform with the
paradigm
predominant structure
Average profile of the cluster members
profile of cluster
superimposed structures
Search the database for the
400 nearest neighbors
the database PDB GenBank
23I-sites Library Sequence/structure motifs
after doing this exhaustively...
Proline helix C-cap
(Bystroff Baker, JMB, 1998)
24Previously known local structure motifs...
amphipathic b-strand
amphipathic a-helix
b-bulge
a-helix N-cap
nDnn
pnppn
nSEnp
nn
ppolar nnon-polar
25New local structure motifs
I-sites contains 262 total sequence patterns for
31 structural motifs.
diverging type-2 turn
Frayed helix
Type-I hairpin
Serine hairpin
glycine helix N-cap
alpha-alpha corner
Proline helix C-cap
26I-sites sequence patterns are distinct
conserved polar or not conserved
conserved non-polar
conserved L
L
27Structural features are conserved.
2. conserved sidechain contacts
y
1. glycine at strained angles
f
3. negative design against alternative structures
(helix)
28NMR structure of a 7-residue I-sites motif in
isolation
diverging turn
(Yi et al, J. Mol. Biol, 1998)
29Predicting local structure using motifs
1. Start with a multiple sequence alignment. 2.
Condense it to a profile (a). 3. Compare it to
the motif profile (b). 4. The confidence
P(prediction is correct x)
Pre-calculated from an independent test set
P
x
30Prediction of tertiary structure by Monte Carlo
simulation
A full simulation of folding is too
difficult. but... By restricting the
conformational search for each segment to one of
a few possible choices (moves), we can sample
conformational space.
Try my web site!! isites.bio.rpi.edu (click on
Servers. then HMMSTR) Upload a PDB file (specify
chain and select PDB format), or Paste a sequence
from NCBI or RCSB. Unselect Use PSI-BLAST (for
quicker results)
31Rosetta ab initio
Protocol used for CAFASP2 experiment (2000)
sequence
(1) Find homologues in the database
(Psi-Blast) (2) Predict fragments (I-SITES) (3)
Assemble fragments (ROSETTA, D.Baker)
structure
32Rosetta ab initio
Energy function pairwise secondary structure
contact types. Search function Monte Carlo
fragment insertion. One move consists of
selecting a fragment at random from a set of
local structure predictions. Coordinates are
re-generated after swapping in the new fragment.
The move is accepted or rejected, based on the
energy function.
(Simons et al, PNAS, 1997)
33CASP3 Prediction results for Target 56 DNA
helicase
Predicted
structure of 66-
residue fragment
(23-88)
True structure of same
fragment
34CAFASP Prediction results for Target 122 1GEQ
Tryptophan Synthase
Predicted 97-residue fragment
True structure of same
fragment