Title: Yang Zhang
1The protein structure prediction problem could be
solved using the current PDB library
- Yang Zhang Jeffrey Skolnick
- PNAS vol 102 (25 Jan. 2005)
4? ???, ???, ???
THE 8TH PROTEIN FOLDING SCHOOL
2Objective
- To examine,
- whether all single-domain proteins are foldable
based on the set of solved structures currently
deposited in PDB -
- 2. whether the templates can be further improved
by rearranging the fragments (TASSER)
3PDB information
Database gt23,000 solved protein structures
(December 30, 2003) and 300 new entries added
each month. New fold entries keep decreasing
(e.g., the percentage of new folds 27 in 1995 ?
5 in 2001).
4Our experimental Method
- 1. Template identification
- - SAL (NeedlemanWunsch dynamic program global
alignment) - score(i, j) 20/(1 d3ij/5)
- 2. Force Field construction
- Ca and side-chain group (SG) regularities/correlat
ions from the statistics of the PDB - propensities for predicted secondary structure
from PSIPRED - tertiary consensus contact/distance restraints
- a protein-specific SG pair potential, both
extracted from the identified multiple templates.
- 3. Structure assembly
- full-length models constructed from assembly of
the continuous fragments - from the templates got after the optimized force
field
5Overview of the TASSER method
By PROSPCETOR_3 (threading)
6Benchmark set of targets and templates for test
proteins
- Developed representative benchmark set of all
single-domain structures in PDB with 41-200 a.a - Target set 1489 non homologous proteins
- 448 a-proteins
- 434 ß-proteins
- 550 a ß-proteins
- 57 (Ca-only targets or have irregular 2D
structures) - Template
- 3,575 representative proteins from PDB
- Pairwise sequence identity to each other 35
maximum
7Summary of folding results
8Improvements of initial alignment
9Modeling Unaligned/Loop Regions
- 1)RMSDlocal
- - measures the modeling accuracy of the local
conformation - 2) RMSDglobal
- measures the modeling accuracy of the local
conformation and global orientation - - loop size ??Accuracy of loop modeling ?
- - TltM always
- - Cutoff RMSDgloballt7 Ã… ?reasonable model
- M loops up to 10 residues
- T loops up to 28 residues
- RMSDglobal cutoff ?
- ? acceptable loop size in T and M ?
- ? difference M?T ?
10Representative examples
-Templates topologies in the core identified by
SAL are quite similar to native lt5 Ã… - Local
packing of the fragments and sometimes termini
are misoriented - Rearrangement using T force
field gt2Ã… improvement in the aligned region
Blue N terminal Red C terminal
11Representative examples (fail)
Fail to model the configuration of the tail ?
Give a full-length RMSD 7.8 Ã… to native ?
Interactions with partner chains not
Included Proof cut the 1st 22 residues in the
N terminus associated with intermolecular interact
ions ? Core region of the first model 1.4
Ã… RMSD
12New Fold Targets in CASP5
Acceptable models can be built from the initial
template alignments using our TASSER
13Strong points
- force field includes multiple sources of
knowledge-based potentials and consensus tertiary
restraints from multiple templates consensus
spatial information - 2. Combination of the different types of energy
terms - improvement because better correlation
between model quality and energy - 3. Templates usually contain unphysical
alignments because chain connectivity not
considered in the initial alignments - ? T reassembly procedure converts these
unphysical alignments into physical models - 4. Can change only in Tasser Relative
orientation of template fragments
14Weak points
- Average sequence identity between target proteins
and best template identified13 - ?challenge correctly align the sequence of
theses templates - 2. Structure alignment program SAL not perfect
- ? not guaranteed to find the best structural
alignment because the final alignment in this
algorithm is sensitive to the initial guesses
superposition - 3. The representative example have shown that the
models can be - bad when interaction of the tails have to be
taken into account
15Concluding
- all single-domain proteins are foldable based on
the set of solved structures currently deposited
in PDB - Best template can be further improved by
rearranging the fragments (TASSER)
16RMSD to native of the templates identified by the
structure alignment program SAL versus the
alignment coverage.