Title: Alternative talk for structural genomics
1Alternative talk for structural genomics
Note these are conversions of the slides that I
actually presented
2Computational biology bioinformatics _at_NESG
3We can infer 3D similarity from sequence. We
cannot systematically predict novel folds.
4Inferring 3D similarity from sequence
- we can infer 3D similarity from sequence!
- below line ??
- 30 no good for long proteins!
lt2.5Ã… rmsd over 80 of domain
gt6Ã… rmsd
C Sander R Schneider 1991 Proteins, 9, 56-68 B
Rost 1999 Prot Engng, 12, 85-94
5Build sequence-structure families
6Defining sequence-structure families
7EVA comparative modelling
Marc Marti-Renom Andrej Sali (UCSF) http//eva.c
ompbio.ucsf.edu/eva/cm/ http//cubic.bioc.columbi
a.edu/eva
Accuracy
Coverage
V Eyrich, MA Marti-Renom, D Przybylski, A Fiser,
F Pazos, A Valencia, A Sali B Rost (2001)
Bioinformatics 17, 1242-3 MA Marti-Renom, MS
Madhusudhan, A Fiser, B Rost, A Sali (2002)
Structure 10, 435-440
8How much do we cover today
9Structural residue coverage in reality
J Liu B Rost 2002 Bioinformatics, 18, 922-933
10Pan-genomic target selection Need to chop into
domains!
11Scooping families from proteomes, in practice
- Goal for sequence-structure clusters
- All in cluster share fold
- All with similar sequence and similar fold in
same cluster - Problems
- Domains
- Overlap
12Choose targets single-linkage clustering
100,000 eukaryotic proteins (yeast, fly, worm,
weed, human) 22 112 clusters 46 318 in largest
cluster NONSENSE!
- no cluster on full-length proteins we
MUST chop into structural domain-like fragments
Liu, Hegyi, Acton, Montelione Rost 2003
Proteins, in press Liu Rost 2004 Proteins, 55,
678-686
13CHOP proteins into structural domains
Liu Rost 2004 Proteins, 55, 678-686
14Structural domain-like fragments for entire
proteomes
Single-domain proteins 61 in PDB 28 in 62
proteomes
Liu, Hegyi, Acton, Montelione Rost 2003
Proteins, 56, 188-200 Liu Rost 2004 Proteins,
55, 678-686
15How many clusters? (3,000-16,000) Prokaryotes
enough? (yes)
16To take or not to take
Take if gt 50 globular residues and no known 3D
17Renormalize today0, end100
Today0 line
PSI3100 line
J Liu B Rost 2002 Bioinformatics, 18, 922-933
18Structural genomics will make a difference
0 today 100 what remains to be done
residues
fragments
proteins
19Pfam 5000 vs. use all
residues
fragments
proteins
20Eukaryotes needed to complete coverage?
How many proka only? How many euka only? How many
mixed?
21We cannot only do prokaryotes!
22Adolescent already successful!
23Consortia in adolescence successful!
24Target selection really successful!
StrX
PDB all
25Significant contribution to unique!
Every third unique protein from Structural
Genomics
26Does multiplexing help?
Date 2003-07-28
4
Multiplex DOUBLES success rate!
27Structural Genomics changes ... e.g. our map of
sequence space
28Dynamic reorganization through 3D!
29Surprising conclusions
- Most proteins multi-domain
- Most domains about 100 residues long
- 10,000 sequence-unique structures double the
structural coverage - BUTStructural genomics can do that before 2010
- Structural Genomics has already changed the way
we organize structure space, more to come
30(No Transcript)
31Xhttp//www.rostlab.org
- PredictProtein PP X/predictprotein/
- META-PP X/meta/submit_meta.html
- EVA X/eva/
- services X/services/
- LOCnet X/services/locnet/
- PredictNLS X/predictNLS/
- ISIS X/services/isis/
- databases X/db/
- PEP X/db/PEP/
- CellCycleDB X/db/cellcycledb/
- NMPdb X/db/nmpdb/