Title: The Status of Structural Genomics
1The Status of Structural Genomics
- Philip E. Bourne
- University of California San Diego
- pbourne_at_ucsd.edu
- http//targetdb.rcsb.org
- http//spam.sdsc.edu/sgtdb
2TargetDB - A Unique Situation
- In no other study is progress measured on a
weekly basis - Here we analyze progress and address the
following questions..
3Questions Addressed
- How is structure genomics doing as a science and
what should we change in the future? - What could it reveal regarding the
sequence-structure-function triad given the
targets under study? - What has it revealed based on the structures
solved thus far?
4Background
- There are 17 centers worldwide contributing
information - At present that is mainly target status
- The focus of groups is different
- Complete genomes
- New folds
- Specific diseases
- Specific classes of molecules
- Specific pathways
5How is structure genomics doing as a science and
what should we change in the future?
6Change in TargetDB since Paper was Submitted
- Review for new funding has taken place ? -
Target strategies have diversified - The
proportion of targets at different stages has
not changed
7Mean Number of Days at Each Experimental Step
(X-ray)
There are no obvious bottlenecks at present
8Time Taken for a Target to Reach PDB
- Range of 3 to gt 19 months
- Small sample size (eg 10 months)
- - Seems comparable to conventional work
9Sequence Identity of Targets Against NR for
Specific Groups
- Number of targets is variable
- Strategy and target match
- Is questionable
10Sequence Identity to Other Targets for Each Group
11Prediction of SCOP fold by iGAP (1) and FFAS (2)
- Genome Biology 2003 4(8), R51 (2)
Protein Science 2000 9, 232 - (3)
Bioinformatics 2002 18(6) 788
Predictions map approximately to estimated genome
fold distributions
12What has it revealed based on the structures
solved thus far?
13Structures Released by the PDB with less than
30 Sequence Identity to Other PDB Entries
In 2002 this Accounts for About 1/3 of Structures
14Targets in the PDB and their sequence identity to
other structures at the time of deposition
Also approximately 1/3
15SCOP Fold Distributions in the non-redundant PDB,
TargetDB and Predicted with iGAP in several model
organisms
a4 RNA/DNA binding 3 helical bundles b1
immunoglobulin-like beta sandwich c1 TIM
barrel c37 P-loop containing nucleotide
triphosphate hydrolase
16SCOP Folds vs Time
SG Begins in Earnest
As of April 2003 17 of new folds came from
structural genomics
17Does Structure Genomics to Date Represent the Low
Hanging Fruit?
- That is, can we expect a large number of single
domain small globular proteins likely to be the
most amenable to structure analysis
18Number of chains in PDB files.
all files deposited into PDB in 2003
all files from targetdb
19Distribution of domains in targetdb vs. PDB
number of domains is determined using PDP
automatic domain assignment method
structures deposited into PDB during 2003
structures from targetdb
2516 chains
470 chains
489
270 chains
1151 chains
75
109
67
27
6
4
3
1
0
20Summary
- TargetDB represents a unique opportunity
- Progress appears linear
- No bottlenecks at present
- Target strategies not clear
- Targets show a somewhat different fold
distribution to current PDB - There is a modest contribution to expanding fold
space - The complexity of structures is similar to that
already found in the PDB
21Acknowledgements
- SDSC/UCSD
- Charlie Allerston
- Werner Krebs
- Wilfred Li
- Ilya Shindyalov
- University College
- Zoubai Ghahramani
- Grant - NIGMS 63208
- PDB - NSF, DOE, NIH, NLM
- Rutgers University
- Lee Chen
- John Westbrook
- The Burnham Inst.
- Adam Godzik
- Iddo Friedberg
- Tong Liu
- Keck Graduate Inst
- David Wild
- S. Hwang
22(No Transcript)
23E-value Distribution of Solved Structures Against
NR
24Sequence Identity of Targets Against NR
- Most targets are characterized against a known
protein - - The known protein may not have a functional
assignment