Structural Genomics: - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Structural Genomics:

Description:

Annabel Todd and Russell Marsden UCL ... Work was supported by NIH grant (GM 62414) and by the US DoE under contract (W-31-109-Eng-38) ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 38
Provided by: wat853
Category:

less

Transcript and Presenter's Notes

Title: Structural Genomics:


1
Structural Genomics Case studies in assigning
function from structure
James D Watson watson_at_ebi.ac.uk
2
Structural Genomics Collaborators
MCSG Mid-west Centre for Structural Genomics
SPINE Structural Proteomics in Europe
SGC Structural Genomics Consortium
3
Structural Genomics Aims
Pathogens and disease
Automation / High Throughput
?
Coverage of Fold Space
Human Proteins
4
Proteins known sequences and 3D structures
5,500 non-redundant structures
1.3m non-redundant protein sequences
260,000 homology models
MRTKSPGDSKFHEITKTPPKNQVSNS MIVISGENVDIAELTDFLCAA
PPRIPYSMVGPCCVFLMHH MDVVDSLFVNGSNITSACELGFENE V
YAWETAHFLDAAPKLIEWEVS MAQQRRGGFKRRKKVDFIAANKIE C
ELGFENETLFCLDRPRPSKE MAQQRRGGFKRRKKVDFIAANKIE MG
MKKNRPRRGSLAFSPRKRAKKLVP MQILKENASNQRFVTRESEV ME
KFEGYSEKQKSRQQYFVYPFLF MEEFVNPCKIKVIGVGGGGSNAVNRM
Y MAVTQEEIIAGIAEIIEEVTGIEP
5
Proteins known sequences and 3D structures
5,500 non-redundant structures
10 unknown
3D structures of 16,000 carefully selected
proteins
Homology models
6
Protein Function
  • Protein function has many definitions
  • Biochemical Function - The biochemical role of
    the protein e.g. serine protease
  • Biological Function - The role of the protein in
    the cell/organism e.g.digestion, blood clotting,
    fertilisation

7
Function through homology
8
Template Methodology
  • Use 3D templates to describe the active site of
    the enzyme - analogous to 1-D sequence motifs
    such as PROSITE, but in 3-D
  • (Wallace et al 1997)
  • defines a functional site
  • search a new structure for a functional site
  • search a database of structures for similar
    clusters

9
SiteSeers reverse templates
10
Problems with template methods
  • Too many hits (hundreds, thousands or even tens
    of thousands)
  • Use of rmsd rarely discriminates true from false
    positives
  • Local distortion in structure may give a large
    rmsd
  • Top hit rarely the correct hit even in
    obvious cases

11
An example
12
Enzyme active site templates
Hits for 1hsk
102. E.C.1.1.1.158 2.19Ã… UDP-N-acetylmuramat
e dehydrogenase
13
Comparison of template environments
14
Comparison of template environments
15
Comparison of template environments
Identical residues in neighbourhood
Template structure 1mbb
Query structure 1hsk
16
Comparison of template environments
Similar residues in neighbourhood
Template structure 1mbb
Query structure 1hsk
17
Results for 1hsk
Hit E.C number Rmsd Score Enzyme

1. E.C.1.1.1.158 2.08 209.1
UDP-N-acetylmuramate dehydrogenase 2.
E.C.3.2.1.14 2.13 146.0 Chitinase A
chitodextrinase
1,4-beta-poly-N-acetylglucos
aminidase
coly-beta-glucosaminidase 3.
E.C.3.2.1.17 1.92 142.4 Turkey
lysozyme 4. E.C.3.2.1.17 1.89 138.7
Hen lysozyme 5. E.C.3.5.1.26 1.47 132.3
Aspartylglucosylaminidase 6. E.C.3.2.1.3
1.54 131.1 Glucan 1,4-alpha-glucosidase

18
ProFunc function from 3D structure
19
Large scale analysis
  • Created an edited version of the target database
    from the PDB only those with status In PDB
  • Extract all PDB codes for each Structural
    Genomics group
  • Extract prior knowledge (Header, Title, Jrnl,
    etc.)
  • Find any associated GOA annotation
  • Classify each structure by whether function is
    known unknown or limited info
  • Run Profunc in a batch process on all codes
    (560)
  • Extract summary results from each analysis
  • Compare to prior knowledge and estimate success

20
Number of deposits to the TargetDB by Structural
Genomics group (Total of 577 unique entries)
March 2004
21
PDB Blast
  • Run query sequences against the PDB using BLAST
  • Filtered out those matches released AFTER the
    query sequence
  • Any hits are ignored from subsequent analyses
  • Still get significant matches
  • why?

22
InterPro Scan
  • InterPro scan on proteins of known function
  • Cannot backdate the InterPro database
  • Essentially picking up itself

23
Function of query structure known
24
Limited Functional Info
25
Unknown Function
26
The Good, the Not So Good and the Ugly
Three examples show the varying levels of
information that can be retrieved from
structures
27
The Good BioH structure (MCSG)
One very strong hit
Function Discovered
28
The Not So Good APC1040 (MCSG)
  • Assigned as a probable glutaminase
  • Most methods suggest b-lactamase activity
  • No match to Prosite patterns

Function being assayed
70 F-T-M-Q-S-I-S-K-V-I-S-F-I-A-A-C 85
APC1040
FY -x-LIVMFY-x-S-TV-x-K-x(4)-AGLM-x(2)-LC

Class A
29
The Ugly MT0777 (MCSG)
Hypothetical protein from Methanobacterium
thermoautotrophicum
  • No sequence motifs
  • Residue conservation is poor.
  • Fold associated with many functions (Rossmann
    fold)
  • Template methods fail

Function Unknown
30
Future Work
  • Improvements to scoring system and additional
    templates
  • Further utilisation of SOAP services as they
    become available (e.g. KEGG API service)
  • Possible adaptation to use as part of a larger
    workflow or in LIMS systems (Taverna and MyGrid)
  • More truely predictive analyses being developed
    (e.g. Electrostatics, ligand prediction,
    catalytic residue prediction)

31
Detection of DNA-binding proteins (with HTH
motif) using structural motifs and electrostatics
(Hugh Shanahan)
  • Combine electrostatics with
  • HTH structural templates.
  • Can detect HTH DNA-binding
  • proteins only.
  • 1/3 of DNA-binding proteins
  • families have HTH motif
  • Use linear predictor as
  • discriminant.
  • Find comparable true positive
  • rate (80) with more
  • complicated methods.
  • Very low (lt 0.01 ) false
  • positive rate.

32
Ligand Prediction
Can active site geometry, shape,
physical-chemical properties etc. be used to
predict the preferred ligand class?
Active Site Ligand description/fingerprinting
methods
  • Spherical Harmonics
  • Hybrid Ellipsoids

33
Spherical Harmonics (Richard Morris)
Spherical t-designs
The computation of Legendre polynomials of high
order requires a robust integration scheme
34
Hybrid Ellipsoids (Rafael Najmanovich)
  • Every shape can be modelled by a set of hybrid
    ellipsoids
  • The parameters describe location and a,b,c of the
    ellipsoid and a smear factor
  • Similar parameters mean similar active sites and
    ligands

35
Predicting Catalytic Residues (Alex Gutteridge)
  • Aims
  • To predict the location of the active site in an
    enzyme structure.
  • To predict the catalytic residues of an enzyme.
  • How?
  • Train a neural network to identify catalytic
    residues.
  • Cluster high scoring residues to find the active
    site.

36
Workflows and Taverna (Tom Oinn)
  • Most procedures used now follow a workflow type
    scheme
  • Taverna allows users to pick elements from
    services to create their own workflows for
    automation of complex sets of procedures.
  • Removes the need to write complex scripts

Beta 9 release available at http//taverna.source
forge.net/
37
Acknowledgements
  • Janet Thornton
  • Christine Orengo
  • Roman Laskowski - Profunc
  • Richard Morris Interpro search, Spherical
    Harmonics
  • Gail Bartlett, Craig Porter Enzyme Templates
  • Alex Gutteridge Catalytic Residue Prediction
  • Sue Jones HTH motifs
  • Hugh Shanahan DNA binding, Electrostatics
  • Jonathan Barker JESS
  • Hannes Ponstingl PITA
  • Rafael Najmanovich Hybrid Ellipsoids
  • Martin Senger, Siamak Sobhany SOAP, Tom Oinn
    Taverna
  • Annabel Todd and Russell Marsden UCL
  • MCSG consortium for lots of structures, plus many
    more at EBI and UCL
  • Work was supported by NIH grant (GM 62414) and by
    the US DoE under contract (W-31-109-Eng-38)
Write a Comment
User Comments (0)
About PowerShow.com