Title: Forces and Prediction of Protein Structure
1Forces and Prediction of Protein Structure
- Ming-Jing Hwang (???)
- Institute of Biomedical Sciences
- Academia Sinica
http//gln.ibms.sinica.edu.tw/
2Science 2005
3Sequence - Structure - Function
MADWVTGKVTKVQNWTDALFSLTVHAPVLPFTAGQFTKLGLEIDGERVQR
AYSYVNSPDNPDLEFYLVTVPDGKLSPRLAALKPGDEVQVVSEAAGFFVL
DEVPHCETLWMLATGTAIGPYLSILR
4Sequence/Structure Gap
- Current (May 15, 2007) entries in protein
sequence and structure database - SWISS-PROT/TREMBL 267,354/4,361,897
- PDB 43,459
Sequence
Structure
5Structural Bioinformatics Sequence/Structure
Relationship
Percent Identity
100 90 80 70 60 50 40 30 20 10 0
All possible sequences of amino acids
Protein structures observed in nature
Twilight zone
Midnight zone
Protein sequences observed in nature
6Structure Prediction Methods
Homology modeling
Fold recognition
ab initio
0 10 20 30 40 50 60
70 80 90 100
sequence identity
7Levinthals paradox (1969)
- If we assume three possible states for every
flexible dihedral angle in the backbone of a
100-residue protein, the number of possible
backbone configurations is 3200. Even an
incredibly fast computational or physical
sampling in 10-15 s would mean that a complete
sampling would take 1080 s, which exceeds the age
of the universe by more than 60 orders of
magnitude. - Yet proteins fold in seconds or less!
Berendsen
8Energy landscapes of protein folding
Borman, CE News, 1998
9Levitts lecture for S
10Levitt
11Levitt
12Other factors
- Formation of 2nd elements
- Packing of 2nd elements
- Topologies of fold
- Metal/co-factor binding
- Disulfide bond
13Ab initio/new fold prediction
- Physics-based (laws of physics)
- Knowledge-based (rules of evolution)
14Levitt
15Levitt
16Levitt
17Levitt
18Levitt
19Levitt
20Levitt
21Levitt
22Levitt
23Levitt
24 Levitt
25Levitt
26Levitt
27Molecular Mechanics (Force Field)
28Levitt
29(No Transcript)
301-microsecond MD simulation
980ns
- villin headpiece
- 36 a.a.
- 3000 H2O
- 12,000 atoms
- 256 CPUs (CRAY)
- 4 months
- single trajectory
Duan Kollman, 1998
31Protein folding by MD
PROTEIN FOLDINGA Glimpse of the Holy
Grail? Herman J. C. Berendsen "The Grail had
many different manifestations throughout its long
history, and many have claimed to possess it or
its like". We might have seen a glimpse of it,
but the brave knights must prepare for a long
pursuit.
32Massively distributed computing
- SETI_at_home
- Folding_at_home
- Distributed folding
- Sengents drug design
- FightAIDS_at_home
33Massively distributed computing
Letters to nature (2002)
- engineered protein (BBA5)
- zinc finger fold (w/o metal)
- 23 a.a.
- solvation model
- thousands of trajectories each of 5-20 ns,
totaling 700 ms - Folding_at_home
- 30,000 internet volunteers
- several months, or a million CPU days of
simulation
34Energy landscapes of protein folding
Borman, CE News, 1998
35Protein-folding prediction technique
CGU Convex Global Underestimation - K. Dills
group
36Challenges of physics-based methods
- Simulation time scale
- Computing power
- Sampling
- Accuracy of energy functions
37Structure Prediction Methods
Homology modeling
Fold recognition
ab initio
0 10 20 30 40 50 60
70 80 90 100
sequence identity
38Flowchart of homology (comparative) modeling
From Marti-Renom et al.
39Fold recognition
Find, from a library of folds, the 3D
template that accommodates the target sequence
best. Also known as threading or inverse
folding Useful for twilight-zone sequences
40Fold recognition (aligning sequence to structure)
(David Shortle, 2000)
413D-gt1D score
42On X-ray, NMR, and computed models
43(Rost, 1996)
44Reliability and uses of comparative models
Marti-Renom et al. (2000)
45Pitfalls of comparative modeling
- Cannot correct alignment errors
- More similar to template than to true structure
- Cannot predict novel folds
46Ab initio/new fold prediction
- Physics-based (laws of physics)
- Knowledge-based (rules of evolution)
47From 1D ? 2D ? 3D
Primary
LGINCRGSSQCGLSGGNLMVRIRDQACGNQGQTWCPGERRAKVCGTGNSI
SAYVQSTNNCISGTEACRHLTNLVNHGCRVCGSDPLYAGNDVSRGQLTVN
YVNSC
seq. to str. mapping
Secondary(fragment)
Tertiary
fragment assembly
48CASP Experiments
49One lab dominated in CASP4
One group dominates the ab initio
(knowledge-based) prediction
50Some CASP4 successes
Bakers group
51Ab initio structure prediction server
52Toward High-Resolution de Novo Structure
Prediction for Small Proteins --Philip
Bradley, Kira M. S. Misura, David Baker (Science
2005)
The prediction of protein structure from amino
acid sequence is a grand challenge of
computational molecular biology. By using a
combination of improved low- and high-resolution
conformational sampling methods, improved
atomically detailed potential functions that
capture the jigsaw puzzlelike packing of protein
cores, and high-performance computing,
high-resolution structure prediction (lt1.5
angstroms) can be achieved for small protein
domains (lt85 residues). The primary bottleneck to
consistent high-resolution prediction appears to
be conformational sampling.
533D to 1D?
Science 2003
54A computer-designed protein (93 aa) with 1.2 A
resolution
55Structure prediction servers
http//bioinfo.pl/cafasp/list.html
56Hybrid approach for solving macromolecular
complex structures
57Thank You!