Title: A too simple model for protein folding
1A too simple model forprotein folding
- Ethan Bolker
- Mathematics and Computer Science
- UMass Boston
- Clark University
- April 14, 2004
2Preliminaries
- Problem source biology teaching need,
- Analysis mixes biology, cs, mathematics (
applied mathematics) - Ongoing help from Bogdan Calota
- See www.cs.umb.edu/eb/folding
3How life works
- DNA (gene) makes RNA
- RNA makes polypeptide
- Polypeptide folds into protein
- Proteins interact (biochemistry)
- Cells organisms communities
- Natural selection makes gene mix evolve
4Virtual teaching laboratories
- For Brian White (Biology, UMass Boston)
- Virtual Genetics Laboratory (VGL)
- Mendelian genetics
- http//intro.bio.umb.edu/VGL/index.htm
- Science, April 16, 2004
- GenExplorer
- the central dogma
- www.cs.umb.edu/genex/
- Watch this space
5Polypeptide ? protein
- Polypeptide sequence of amino acids
- chemical (biological) activity depends on
three dimensional configuration
(folding) - Protein polypeptide folded into active shape
- Given the sequence, whats the shape?
- Wet lab
- lots of chemistry
- x-ray crystallography
- (newer tools)
- Virtual lab
- compute shape from chemical principles
- need supercomputer or grid
6folding_at_home
- www.stanford.edu/group/pandegroup/folding/
7For beginning biologists
- Problem give students hands on experience
showing how sequence determines shape - Solution very simple model
- amino acid disk in the plane, hydrophobic index
hi expresses wish to avoid wet environment - fold polypeptide on hex grid to minimize energy
- energy S ( exposed edges) ? hi
acids
8folding_at_umb
51882 possible configurations (5279 modulo
dihedral group symmetry) minimum energy
-131.17 minimum occurs once topology 0 2,
7 1 2 0, 7 3 4 5 6
7 0, 2
9folding_at_umb
51882 possible configurations (5279 modulo
dihedral group symmetry) minimum energy
-13.161 minimum occurs twice (second - obvious -
answer has same topology)
10Brute force search
- Try all nonintersecting walks of length n on
plane grid of hexagons - 1, 6, 30, 138, 618, 2730, 11946, 51882, 224130,
964134, 4133166, - Sequence A001334 in the
Online Encyclopedia of Integer Sequences
www.research.att.com/njas/sequences/ - No closed form expression
- Growth rate obviously O(5n), actual ? 4.25n
- To count foldings, divide by 12 (symmetry)
11A (random) chain of length 17
- Five of the 11 minimum energy foldings
- All 11 show same 8 acid cool ring, hot core
- Essentially the same topology
- 12 hour computation
12Open questions (statistical)
- How many minima?
- What is the energy distribution
- for one polypeptide, over all foldings?
- of minima, over all polypeptides of fixed length?
- Do all minima for a pp have same topology?
(several possible definitions for topology) - Do approximate minima have same topology?
(several possible definitions for approximate)
13Which amino acid universe?
- Random polypeptides acids chosen
- hi uniformly distributed in -1,1
- hi (1,-1) with probability (p, 1-p)
- from (Ala, Arg, , Tyr, Val) with
- measured hydrophobic indices
- measured probabilities of occurrence
-
- the natural universe
14Digression
- How do you interpolate visually between red and
green? - in RGB space, white is halfway
- in HSB space, yellow is halfway
- Application uses cubic interpolation to adjust
contrast near the midpoint
15Cubic interpolation
- // Map a range of hydrophobic indices h to a
continuum of - // colors between RED and GREEN in HSB space.
- //
- // First map h linearly to x between 0.0 and
1.1 so that we - // can form convex combinations. To get
better visual effect - // replace x by
- // f(x) ax3 bx2
cx - // color(x) f(x)RED
(1-f(x))GREEN - // f(0) 0 means color(0) GREEN. Then find
a, b and c so that - // f(1) 1, f(1/2) 1/2 and f '(1/2) k
(to be determined). Then - // color(1) RED and color(1/2) 1/2
(REDGREEN) YELLOW, - //
- // When k 1, f(x) x is linear, not cubic
(check the algebra). - // That works well for the natural table. But
for the virtual table it - // provides too little contrast near the
center. k ½ flattens out the - // cubic at its inflection point there and
seems to be just about right.
16Open questions (biological)
- Nature isnt random naturally occurring
polypeptides are not a random selection from the
natural universe - Which shapes can occur as the minimum energy
configurations of polypeptides? - which are beautiful? (polypeptide tangrams)
- which are interesting? (designer drugs)
- (I like cool rings, Brian White likes hot cores)
17Folding algorithms
- Conjecture brute force is NP-complete
- Look for an approximate algorithm
- polynomial time
- close to true minimum with high probability
- not stochastic
- Conjecture no local algorithm will do
18Incremental Folding
- int lookahead
- int step lookahead
- while there are acids to place
- explore all positions for the next
lookahead acids that minimize the energy
of configuration so far - place the first step of those lookahead
acids
19Incremental Folding
- lookahead step 1 is greedy
- lookahead step n is brute force
- time O( ? 4.xlookahead )
- linear in n, but exponential in lookahead
n step
2050 acids, randomly chosen from natural universe
seed 2255 minimum energy -352.38 lookahead 8,
step 1 time 139 seconds
2150 acids, randomly chosen from natural universe
seed 2255 minimum energy -338.42 lookahead 8,
step 4 time 29 seconds
2250 acids, randomly chosen from natural universe
seed 2255 minimum energy -351.54 lookahead 8,
step 5 time 27 seconds
2350 acids, randomly chosen from natural universe
seed 2255 minimum energy -343.98 lookahead 8,
step 7 time 15 seconds
24brute force folding for one random chain of
length 17
25incremental step sensitivity
brute force
26incremental lookahead sensitivity
13
14
11
12
10
9
5
6
8
7
brute force
27Incremental Folding
- Topology highly sensitive to step
- Energy not monotone with step or lookahead
- Can always be fooled
- May be realistic biologically
- Suffices for teaching goal
? ? ?
28More geometry
- Square grid folding is faster O(2.xlookahead)
instead of O(4.xlookahead) - But not nearly as pretty
29Folding in space
- Cubic grid has same folding complexity as hex
grid in plane since each cell has six neighbors - 3D analogue of hex grid is spherical close
packing - oranges at the market
- layers of hexagonally close packed planes
- cell is a rhombic dodecahedron
- each sphere has 12 neighbors
- folding complexity O(10.xn )
30Packing spheres
31H. SteinhausMathematical Snapshots
32Foldings in space
energy 37.8 time 18 seconds explored 752057
chains
energy 15.6 time 0 seconds explored 8185 chains
33Summary
- The customer is satisfied
- You can play with the applet
- The software needs work
- All the interesting questions are still open