Title: The Anatomy and Taxonomy of Protein Structure
1The Anatomy and Taxonomy of Protein Structure
- First few lectures
- how do we look at protein structures?
- how do we classify and compare them?
- Today, a little about the protein backbone or
main chain.
2Backbone geometry in proteins
Ramachandran plot
y
f
Angle w is almost always close to 180--the
peptide bond is planar and trans. f and y may
vary but are limited to certain combinations as
shown at right.
yellow and blue delineate sterically allowed
conformations Red shows residues in helical
secondary structure, cyan In beta-sheet, and
black other. Squares indicate glycines
3Hydrogen bond geometry
- Hydrogen bond not really a covalent bond--not
much orbital overlap. - Model as an electrostatic interaction between two
dipoles consisting of the H-N bond and the O sp2
lone pair. In electrostatic theory, the optimal
orientation of two such dipoles is head-to-tail.
The energy of such an arrangement should decrease
as the head and tail are brought together as long
as atomic van der Waals radii are not violated
(then repulsive forces quickly take over). - Ideal hydrogen bond in this model would have
r3.0 Ã…, p180, b0 and g60. Convince
yourself of this. - In small molecule crystals, this is approximately
what is observed, though there is a lot of
variation in the angles b and g. Thus the
precise COH angle parameters are not critical. - Main chain-main chain hydrogen bonds found in
proteins will show various deviations from this
geometry, partly due to the topological
constraints imposed by forming secondary
structures.
4Criteria for identifying hydrogen bonds in
protein structures
- What is a reasonable hydrogen bond? Criteria for
identifying hydrogen bonds are somewhat arbitrary
and many have been used. Here are a couple of
examples. - Geometric criteria Often H-bonds are just
identified by two parameters, the ON
(acceptor-donor) distance r, and a OH-N angle p.
The angles describing the COH geometry are
ignored. Typical cutoffs p 120 and r Ã…. (Baker Hubbard, 1984) - Electrostatic criteria One of the most commonly
used criteria is a potential function based on a
pure electrostatic model (Kabsch Sander, 1983).
Place partial positive and negative charges on
the C,O (q1,-q1) and N,H (q2,-q2) atoms and
compute a binding energy as the sum of repulsive
and attractive interactions between these four
atoms - Eq1q2(1/r(ON)1/r(CH)-1/r(OH)-1/r(CN))f
-
- where q10.42e and q20.20e, f is a dimensional
factor (332) to convert E to kcal/mol, and r(AB)
is the interatomic distance between atoms A and
B. - A hydrogen bond is then identified by a binding
energy less than some arbitrary cutoff, e.g. E-0.5 kcal/mol. - Note that the criteria defined above are only
applicable when hydrogen atom positions are
available. Crystal structures do not have
hydrogens--however, their positions can be
computed in many cases.
5Secondary Structure Identification
- Next week well learn about predicting the
locations of secondary structures along the amino
acid sequence of a protein from the sequence
information alone. To evaluate whether such a
prediction is correct, one has to be able to
identify secondary structures from an
experimentally determined set of protein
coordinates i.e. how do you define where a
secondary structure element begins and ends? - A trivial but difficult problem (Richardson,
1981) - There is no single and correct algorithm for
assigning secondary structure type. - Most commonly used criteria are backbone
conformation (phi,psi) and hydrogen bonding
pattern. - DSSP (Kabsch Sander, 1983) and STRIDE (Frishman
Argos, 1995) are two of the more common
programs, though there are many ways of defining
secondary structure boundaries.
6DSSP turn and helix definitions
- 3-turn
- 3 3
- -N-C-C--N-C-C--N-C-C--N-C-C- residues
- H O N O H O H O
- ----------------
- 4-turn
- 4 4 4
- -N-C-C--N-C-C--N-C-C--N-C-C-N-C-C residues
- H O N O H O H O H O
- ----------------------
- 5-turn (just an elaboration of 3- and 4-turn.
- A minimal helix is two consecutive N-turns--
- for a minimal four helix from residue i to i3
- i
- 444
- 444
7DSSP bridge, ladder and sheet definitions
- parallel bridge
- x notation
- -N-C-C--N-C-C--N-C-C- residues
- H O H O H O
- \ . . / H-bonds
- \. ./ (\ and /,
- .\ /. or .)
- . \ / .
- H O H O H H residues
- -N-C-C--N-C-C--N-C-C-
- x notations
antiparallel bridge X
notation -N-C-C--N-C-C--N-C-C- residues H O
H O H O . ! ! . H-bonds .
! ! . (! or .) . ! ! .
. ! ! . O H O H O H
residues -C-C-N--C-C-N--C-C-N- X
notations
ladder set of one or more consecutive bridges of
identical type sheet set of one or more ladders
connected by shared residues
8STRIDE (2ndary STRucture IDEntification)
- Uses what is known as a knowledge-based
potential--we as a community of scientists know
intuitively how to define secondary structures,
we just cant put our finger on it! - So how do we quantify what we already know?
- Set of qualitative criteria--most common criteria
used by crystallographers are backbone
conformation and hydrogen bonding. - standard of truth--collective wisdom of
crystallographers--2ndary structure assignments
made by crystallographers when they submitted
structures to the Protein Databank. - STRIDE makes potential energy functions for
H-bonding and backbone conformation but leaves
floating parameters which are adjusted to best
reproduce crystallographers assignments.
9Boundaries of a helix
12
2
2
psi
11
10
phi
Is 10 in the helix? How about 11? How about 2?
11
10
10Side chain conformation
side chains differ in their number of
degrees of conformational freedom (some dont
have any) but side chains of very different
size can have the same number of chi angles.
11Names of canonical side chain conformations
name of conformation
ttrans, ggauche
IUPAC nomenclature http//www.chem.qmw.ac.uk/iupa
c/misc/biop.html
12Rotamers
- a particular combination of angles c1, c2, etc.
for a particular residue is known as a rotamer. - for example, for aspartate, if one considers only
the canonical staggered forms, there are nine
(32) possible rotamers gg-, gg, g-g-, g-g,
tg, gt, tg-, g-t, tt - not all rotamers are equally likely.
- for example, valine prefers its t rotamer.
distribution of valine rotamers in protein
structures (from Ponder Richards, 1987)
180
c10
360
13Rotamer libraries
- one of the problems in designing and
modelling/predicting protein structures is how to
construct an appropriate group of rotamers to
represent the possible side chain conformations
observed in proteins without using so many as to
make the problem computationally intractable. - such groups of rotamers are known as rotamer
libraries (Ponder Richards, 1987). - the probability of finding a particular rotamer
is affected by what the backbone angles for that
residue are (phi, psi). For instance, the g
conformation is very rarely found in a helix.
Thus, backbone-dependent rotamer libraries are
also sometimes used. - Well delve into this in more depth in about a
week when we do homology modelling
14side chain rotamers are not limited to canonical
eclipsed forms--there are many subtly different
rotamers
from Xiang Honig, 2001
an x degree rotamer in this figure means that
at least one side chain angle differs by x
degrees.
15Surface and interior of proteins
- do proteins have a lot of holes/empty space
inside? - how much of a proteins molecular surface is in
contact with the surrounding solvent (water in
the case of globular, soluble proteins)? - are certain residues more likely to be in contact
with solvent than others?
16Calculating Solvent Accessible Surface Area
- Lee Richards, 1971 Shrake Rupley, 1973
- First, represent atoms as spheres with
appropriate van der Waals radii - eliminate overlapping parts of spheres
- This gives a space-filling model similar to the
picture at right
17Now roll a sphere of a given radius all around
the Van der Waals surface the sphere will not
make contact with the entire van der Waals
surface its center will trace out a continuous
surface as it rolls
18Now look at a cross-section Inner surfaces here
are van der Waals. Outer surface is that traced
out by the center of the sphere as it rolls
around the van der Waals surface. If any part
of the arc around a given atom is traced out,
that atom is accessible to solvent. The solvent
accessible surface of the atom is defined as the
sum the arcs traced around an atom.
theres not much solvent accessible surface in
the middle
van der Waals surface
solvent accessible surface
from Lee Richards, 1971
arc traced around atom
19Fractional accessibility
- calculate total solvent accessible surface of
protein structure (also can calculate solvent
accessible surface for individual
residues/sidechains within the protein) - can also model the accessible surface area in an
unfolded protein using accessible surface area
calculations on model tripeptides such as
Ala-X-Ala or Gly-X-Gly. - from these we can calculate what fraction of the
surface is buried (inaccessible to solvent) by
virtue of being within the folded, native
structure of the protein. - this is done by dividing the accessible surface
area in the native protein structure by the
accessible surface in the modelled unfolded
protein. Thats the fractional accessibility.
The residue fractional accessibility and side
chain fractional accessibility refer to the same
thing calculated for individual
residues/sidechains within the structure.
20Accessible surface area in protein structures
- accessible surface area As in native states of
proteins is a non-linear function of molecular
weight (Miller, Janin, Lesk Chothia, 1987) - As 6.3Mr0.73
- where Mr is molecular wt
this is an empirical correlation but it
comes close to the expected two-thirds power law
relating surface area to volume or mass. Why is
the exponent a little larger?
21How much surface area is buried when a protein
folds?
- estimate accessible surface area in unfolded
proteins using the accessible surface areas in
Gly-X-Gly or Ala-X-Ala models. This is a linear
function of molecular weight -
- At 1.48Mr 21
- the total fractional accessibility is As/At ,and
the fraction of surface area buried is 1- As /At - what fraction of surface area is typically buried
for a protein of molecular weight 5000 daltons?
30,000 daltons?
22Distribution of residue fractional accessibilities
note that a sizable group are completely
buried (hatched) or nearly completely buried
note broad distribution among non-buried
residues, and mean accessibility for non-buried
residues of around 0.5
note that few residues are completely exposed to
solvent, but that fractional accessibility of 1
is possible
from (Miller et al, 1987)
23Buried residues in proteins
- the fraction of buried residues (defined by 0 or
5 ASA cutoffs) - increases as a function of molecular weight--for
your average protein - around 25 of the residues will be buried. These
form the core.
size class mean Mr fraction of buried
residues 0 ASA 5 ASA small 8000 0.070 0.1
54 medium 16000 0.107 0.240 large 25000 0.139
0.309 XL 34000 0.155 0.324 all 0.118 0.257
24Core of 434 cro
8 accessibility cutoff
25Residue fractional accessibility correlates with
free energies of transfer for amino acids between
water and organic solvents
- (Miller, Janin, Lesk Chothia, 1987)
- (Fauchere Pliska, 1983)
- the interior of a protein is akin to a
- nonpolar solvent in which the nonpolar
- sidechains are buried. Polar sidechains,
- on the other hand, are usually on the surface.
26Hydropathy scales
- the correlation between a residue being polar or
nonpolar and its tendency to be buried is a
sequence-structure relationship-- a number of
such relationships can be seen from examining
protein structures. As we will see next week,
such relationships are useful in trying to
predict protein structure from amino acid
sequence. - many scientists have tried to develop
hydrophobicity or hydropathy scales to quantify
the tendency of residues to be buried. Most
such scales are based on partitioning of the
amino acid between water and some nonpolar
solvent, or between the surface and interior of
proteins.
27Kyte-Doolittle Hydropathy
nonpolar
on the bubble
polar/ charged
(Kyte Doolittle, 1981)
28Buried polar residues in proteins
- while most of the protein interior is made up of
nonpolar side chains, the average protein will
have a few buried polar residues, even ones which
are capable of carrying a formal charge, e.g.
Lys, Arg, Glu, Asp. - charged residues are almost always paired with
other charged residues to make salt bridges, or
hydrogen bonded to other polar groups. - in general, a key rule of protein structure
anatomy is that you rarely see buried hydrogen
bond donors/acceptors not paired to other
acceptors/donors.
Arg10
buried salt bridge
hydrogen bond to main chain
Glu35
Arg5
29Cavities in proteins
- protein interiors generally have high packing
densities such that not much void space is
present. - nonetheless, proteins do sometimes have interior
cavities big enough to fit water molecules.