Title: Protein Design
1Protein Design
Crystal structure of top7 A novel protein
structure created with RosettaDesign.
CS273 Final Project Charles Kou charlesk_at_stanford
.edu
http//rosettadesign.med.unc.edu/
2What is Protein Design
- Opposite of structure prediction determine low
energy sequence that yield given structure. - Computationally difficult
- Search space of 20n where n sequence length
(20 amino acids) - Major algorithms Dead-end elimination, genetic
algorithms, Monte Carlo, Branch Bound.
http//www.stanford.edu/class/cs273/project/projec
t.html
3Major Algorithms
- Trade off between thoroughness and computational
speed. - Monte Carlo / Genetic Algorithm
- Can sample space with infinite number of
solutions - Sidechain identity, side chain orientation and
backbone structure can be varied continuously. - No guarantee of reaching global energy minimum.
- Dead-End Elimination
- Allows only discrete conformations.
- Rejection criteria is used to prune the search
space.
Desjarlais JR, Clarke ND. Computer search
algorithms in protein modification and design.
Curr Opin Struct Biol. 1998 Aug8(4)471-5.
PubMed
4Review Energy Landscape
JC Lantombe, Energy2.ppt
5Review Example Energy Function
- E S bonded terms S non-bonded terms
S solvation terms - E (ES EQ ES-B ETor) (EvdW Edipole)
- Bonded terms - Relatively few
- Non-bonded terms - Depend on distances between
pairs of atoms - O(n2) ? Expensive to compute - Solvation terms- May require computing molecular
surface
JC Lantombe, Energy2.ppt
6Review Monte Carlo Simulation (MCS)
- Random walk through conformation space
- At each cycle
- Perturb current conformation at random
- Accept step with probability
-
- (Metropolis acceptance criterion)
- The conformations generated by an arbitrarily
long MCS are Boltzman distributed, i.e.,
conformations in V
JC Lantombe, Energy2.ppt
7Monte Carlo Simulation
- Tend to waste time in local min.
- May consist of millions of steps.
- Energy must be evaluated frequently
(computationally expensive). - Use ChainTree to improve performance.
-
Lotan, I., Schwarzer, F., Halperin, D., Latombe,
J.C. Efficient maintenance and self-collision
testing for kinematic chains. In Symposium on
Computational Geometry (2002) 4352
Koehl, P and Levitt, M. De novo protein design.
I. In search of stability and specificity.
Journal of Molecular Biology, 293, 1161-1181
(1999).
8Genetic Algorithm
- Starts with First generation pool.
- Iteratively apply genetic operators (selection,
recombination, mutation). - Evloves toward better solution (low energy
function).
S. M. Larson, J. England, J. DesJarlais, and V.
S. Pande. Thoroughly sampling sequence space
large-scale protein design of structural
ensembles. Protein Science 11 2804-281 (2002).
Protein Science
9Selection
- Selection function takes into account the value
of fitness function. This gives priority to the
fit organism but also gives chance for less
fit organisms.
http//en.wikipedia.org/wiki/Genetic_algorithm
10Selection Method
- Roulette Method probability of selection is
proportional to the value of fitness function - Tournament picks k individuals (tournament
size), and choose the individual with probability
p. Iterate with probability p(1-p), then
p(p(1-p)) - Higher k less chance for weaker individual.
http//en.wikipedia.org/wiki/Roulette_wheel_select
ion
http//en.wikipedia.org/wiki/Tournament_selection
11Recombination, Mutation
- Recombination different segment of the structure
which is optimized in parallel can be recombined
into the same model. Recombination occurs with a
set probability. Otherwise, the population is
propogated to the next generation. - Mutation avoids local minima by mutating the
child with a set probability. - Similar to MC there is no guarantee to converge
into global minimum.
http//en.wikipedia.org/wiki/Genetic_operator
12Genome_at_home
- Genome_at_Home uses distributed computing and
genetic algorithm. - It also incorporates backbone flexibility using
Monte Carlo (random perturbation with RMSDlt1.0a)
which improves the result.
http//www.stanford.edu/group/pandegroup/genome/
13Dead-end Elimination
- Discrete conformational search.
- Functionally equivalent to exhustive search.
- It uses rejection criteria to prune the search
space. - The robustness depends on the discreteness and
the rejection criteria used. - Guaranteed convergence to global min.
- Initially used for sidechain placement. More
difficult for protein design because of high
degrees of freedom.
Looger LL, Hellinga HW. Generalized dead-end
elimination algorithms make large-scale protein
side-chain structure prediction tractable
implications for protein design and structural
genomics. J Mol Biol. 2001 Mar 16307(1)429-45.
PubMed
14Energy of conformation
- Reformulation of sidechain placement problem
Amino acid identity is used instead of rotamer. - The general DEE allows residue up to 300.
- Energy of conformation is defined as sum of
interaction among side chains and sum of
interaction of sidechain and the backbone. - Rejection criteria is used and iterated until no
more rotamers can be eliminated. Convergence
occurs, or reduces the problem sufficiently for
exhaustive serach.
15DEE filter Rejection Criteria
- Simple Criterion If lowest energy struct that
can be found using a given sidechain rotamer (low
energy side chain conformation) is higer than the
highest energy struct w/ different rotamer, the
first rotamer is eliminated.
16DEE filter Rejection Criteria
- Goldstein Criteria if energy struct containing
one rotamer is always lowered by changing to a
second one, the first one is eliminated.
17DEE filter Rejection Criteria
- Generalized Criterion residues are added in
group, eliminated clusters of rotamers in the
groups maybe excluded from the minimum operator,
in addition to those which form dead-end clusters
with c.
18Mean Field Theory
- Reduce search space.
- Self-consistency is sought by placing amino acids
at pre-selected positions in a given structure. - Energy function is minimized by mean field.
Voigt CA, Mayo SL, Arnold FH, Wang ZG.
Computational method to reduce the search space
for directed protein evolution. Proc Natl Acad
Sci U S A. 2001 Mar 2798(7)3778-83. PNAS
19Review Branch Bound
- Set of solutions can be partitioned into subsets
(branch) - Upper limit on a subsets solution can be
computed fast (bound) - Branch Bound
- Select subset with best possible bound
- Subdivide it, and compute a bound for each subset
S.Batzoglou, Threading2.ppt
20Rosetta Design
- Initial backbone designed without regard to
side-chain packing. - Iterates between sequence design and backbone
optimization using Monte Carlo. - Perturbation in random change in the torsional
angles of 1-5 random residue, or substitution of
backbone torsonal angles of 1-3 consecutive
residues with torsional angles from a structure
in the PDB. Sidechain optimization. Accept/reject
using Metropolis criterion. - 1.17-a backbone atom RMSD between model and
structure.
Crystal structure of top7 A novel protein
structure created with RosettaDesign.
http//rosettadesign.med.unc.edu/
Kuhlman B, Dantas G, Ireton GC, Varani G,
Stoddard BL, Baker D. Design of a novel globular
protein fold with atomic-level accuracy.
Science. 2003 Nov 21302(5649)1364-8. PubMed
21Using Rosetta Design
- Red PDB 1A1M Mhc Class I Molecule B5301
Complexed With Peptide Typdinqml From Gag Protein
Of Hiv2 - Blue Rosetta Stone Designed
- Visualized with Deep View / Swiss-PdbViewer.
http//us.expasy.org/spdbv/
http//www.rcsb.org/pdb/cgi/explore.cgi?pid195321
117535569pdbId1A1M
22b.e.a.n.s.
- A simple openGL based program was developed to
test monte carol and genetic algorithms on
designing chain of jelly beans. - User is able to vary the initial structure of the
beans and compare the efficiency of the
algorithms via built-in timer. -