Title: Investigating Lattice Structure for Inverse Protein Folding
1Investigating Lattice Structure for Inverse
Protein Folding
C.R. Mead1, J. Manuch2, X. Huang2, B.
Bhattacharyya2, L. Stacho3, A. Gupta2 Email
cmead_at_bcgsc.ca 1Canada's Michael Smith Genome
Sciences Centre, BC Cancer Research Centre,
Vancouver, BC, Canada, 2School of Computing
Science, Simon Fraser University, Burnaby, BC,
Canada, 3Department of Mathematics, Simon Fraser
University, Burnaby, BC, Canada.
1. Abstract
4. Results
4. Results (continued)
Inverse Protein Folding (IPF) has the potential
to significantly impact future drug design by
providing computational tools that aid in the
development of novel proteins with specific
structural properties. In its most primitive
state, IPF is a method of determining an amino
acid sequence which takes on a prescribed
structure within a specified (natural)
environment. IPF is known to be computationally
complex the hydrophobic-polar (HP) model
proposed by Dill 1 is often used to simplify
the problem. This model represents each residue
as either hydrophobic or polar and the prescribed
structure is approximated by attempting to
maximize hydrophobic residue interactions. Each
amino acid is treated as an individual unit that
is placed at a single lattice point of a regular
lattice structure. The choice of lattice plays a
major role within this framework. Our previous
research investigated the development of stable
proteins in a 2D environment using the HP model
2 and we are now focusing on the study of
plausible 3D lattice structures. We investigate
attributes of lattices which make them more
amenable to representation of known protein
structures, identify lattices containing these
attributes, and compare lattices using various
metrics. Our investigations incorporate
statistical and computational analyses of a large
fraction of proteins from the Protein Data Bank
to show that lattices which are regular,
periodic, equilateral, distance preserving, and
contain angles of 90o and 120o are most amenable
to representation of known proteins. This
research represents a first step in the
development of a successful IPF methodology.
We computationally determine whether consecutive
amino acids have consistent edge lengths.
Finally, we determine the regularity of the IPF
lattices by investigating whether randomness
within a lattice affects how well it represents
the PDB protein subset. We compared periodic
lattices, randomized periodic lattices and de
novo random lattices. Two of the most commonly
used lattices in protein modeling are the simple
cubic and face centered cubic (FCC) lattices.
These are the periodic lattices. From these we
construct randomized lattices by shifting each
vertex by a random value taken from the normal
(0,0.0025) distribution. De novo random lattices
are generated such that the average degree and
edge length match the degree of the simple cubic
and FCC lattices.
Figure 3 Distances between consecutive amino
acids. (Consecutive amino acids are defined as
those connected by a peptide bond) Note
Distances shown in Angstroms. n 1285943.
Average distance 3.80217Å. The 2.8-3.2Å region
is amplified 100x.
Consecutive distances show a bimodal distribution
with a sharp, strong peak at 3.8Å and a wide,
weak peak at 2.95Å. The 2.95Å peak is considered
negligible. The ideal IPF lattice should have
consistent edge lengths of 3.8 Angstrom.
a)
b)
c)
2. Hydrophobic-Polar Model
Next we consider distances between
non-consecutive amino acids to determine
non-adjacent vertex distance.
Hydrophobic interactions are the dominant forces
involved in protein folding.1 The HP model
represents amino acids as hydrophobic (H) or
polar (P) depending on their affinity to water.
Contact between two non-consecutive hydrophobic
monomers reduces the free energy of the resulting
molecule. In the HP model, a protein reaches its
ground state when the number of contacts between
non-consecutive hydrophobic monomers is
maximized. The HP model can be applied to any
lattice.
e)
d)
f)
Figure 7 a) and b) represent the cubic and FCC
periodic lattices, respectively b) and e)
represent the randomized lattices and c) and f)
represent the de novo random lattices
Figure 4 Distances between non-consecutive
amino acid pairs. (Non-consecutive amino acid
pairs are those within the same protein not
connected by a peptide bond.) Note Distances
shown in Angstroms. n 164657341. Minimum length
3.06805Å.The 3.0-4.0Å region is amplified 100x.
Number of pairs with distances less than 3.8Å
1999.
Non-consecutive distance values below 3.8Å are
negligible. The ideal IPF lattice must have a
minimum distance of 3.8Å between non-adjacent
vertices. We identified Ca angles containing
three consecutive Ca atoms with B-factors 25Å2.
A total of 1,025,285 such angles were used in
this analysis.
Table 1 Degree and average coordinate root mean
squared deviation (c-RMS) for the different
lattice analyses. (FCC Face-Centered-Cubic)
Figure 1 An example of using the HP model to
determine a stable configuration for a 22 amino
acid peptide within a 2D square lattice. Polar
monomers are yellow, hydrophobic monomers are
black, squares with Xs indicate the ends of the
peptide.
Our PDB protein subset is aligned with each
lattice such that consecutive Ca are placed at
lattice vertices connected by an edge, and no
vertex of the lattice used more than once. The
fitness of the lattice is determined by measuring
the coordinate root mean square deviation (c-RMS)
divergence from the original structure. The ideal
IPF lattice should be periodic in nature.
We use the HP model to investigate inverse
protein folding. A given target shape is placed
within a lattice and computationally analyzed to
determine the most stable conformation. The
result is a first approximation of the target
protein.
5. Conclusions Discussion
- From our analysis we conclude that the ideal IPF
lattice should have - Uniform edge lengths of 3.8Å.
- Minimum distance between any two vertices of
3.8Å. - Mainly 90o and 120o angles.
- Periodic structure.
Figure 2 An example of IPF using a 2D square
lattice. Shape shown in green. Polar monomers
are yellow, hydrophobic monomers are black,
squares with Xs indicate the ends of the peptide.
6. References
Figure 5 Ca angle distributions for trimers of
amino acids. Red vertical lines demark 90o and
120o. a) All angles. b) Angles with a central
proline residue. c) Angles with a central
phenylalanine residue. d) Angles with a central
aspartic acid residue.
3. Finding the Ideal 3D lattice
- Overall, there is a bimodal distribution with one
sharp, strong peak at 90o and wider peak at
120o. More specifically - Trimers with a central proline, valine, or
isoleucine residue have a stronger, sharper peak
at 120o. - Trimers with a central phenylalanine,
tryptophan, or tyrosine residue have a broader,
more even distribution of angles greater than
90o. - Trimers with a central aspartic acid,
histodine, methionine, or serine show the
possibility of a third peak at 140o. - The ideal IPF lattice should contain 90o and 120o
angles.
- Dill, K. A., Theory for the folding and stability
of globular proteins, Biochemistry 1985, 24(6)
1501-1509. - Gupta, A., Manuch, J., Stacho, L., Inverse
protein folding in 2D HP model, Proc. of IEEE
Computational Systems Bioinformatics Conference
(CSB04), 2004, 311-318. - Karplus, P.A. Experimentally observed
conformation-dependent geometry and hidden strain
in proteins. Protein Science 1996,
5(7)1406-1420. - Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat
TN, Weissig H, Shindyalov IN, Bourne PE. The
Protein Data Bank. Nucleic Acids Res
200028(1)235-242.
- Our goal is to computationally design novel
stable 3D proteins which satisfy specific
topological requirements and can be constructed
and verified. The HP model gives an initial
approximation of the target protein. To use the
HP model to build proteins that could occur in a
natural aqueous environment we require a 3D
lattice that represents known protein structures
well. - We addressed the following
- Should all edges of the lattice be identical in
length? - How should non-adjacent lattice points behave?
- What angles should the lattice have?
- How regular should the lattice be?
- To investigate these, we chose a subset of
protein structure files from the April 13, 2004
version of the PDB that met the following
criteria 3, 4 - X-Ray diffraction
- Resolution lt 1.75
- R factor lt 20
- Peptide chains gt 2 amino acids in length
- This resulted in a subset of 3704 PDB files. We
place amino acids on lattice vertices with the Ca
atom representing the center of the amino acid.
Edges connecting vertices represent potential
peptide bond placement.
7. Acknowledgments / Funding
Many thanks to Dr. Frederic Pio for his kind
advice.
Figure 6 An example of using a cubic lattice to
represent a protein (PDB protein model shown in
brown / red, cubic lattice approximation shown in
blue / green)