Extracting and Exploiting Structural Patterns in Proteins, especially Relating to Function

1 / 87
About This Presentation
Title:

Extracting and Exploiting Structural Patterns in Proteins, especially Relating to Function

Description:

Extracting and Exploiting Structural Patterns in Proteins, ... Winged helix. Beta Sheet. Zinc-finger. Prediction of DNA Binding Function using Structural Motifs ... –

Number of Views:64
Avg rating:3.0/5.0
Slides: 88
Provided by: Cop45
Category:

less

Transcript and Presenter's Notes

Title: Extracting and Exploiting Structural Patterns in Proteins, especially Relating to Function


1
Extracting and Exploiting Structural Patterns in
Proteins, especially Relating to Function
  • Janet Thornton
  • James Watson, Roman Laskowski - EBI
  • Adel Golovin, Kim Henrick - EBI MSD
  • David Leader, James Milner-White Glasgow
  • Andrzej Joachimiak, Aled Edwards MCSG
  • (Mid-West Centre for Structural Genomics)

2
Outline
  • Structural Motifs
  • PDBsum
  • MSDmotif
  • Functional Motifs
  • Catalytic Site Atlas
  • DNA Binding Motifs
  • Automated templates
  • Reverse Templates
  • From Structure to Function? - ProFunc

3
Structural Motifs
  • Structural motifs are commonly occurring small
    sections of proteins that are distinguished by
  • Sequence Gly-X-Gly
  • Conformation ?,? angles
  • Secondary structure - helix, bab unit
  • Function catalytic triad, calcium binding site

4
Examples of Structural Motifs
AlphaBeta Motif
Beta Turn
Schellmann Loop
Beta Bulge (classic)
Nest
Beta Bulge Loop
5
Structural Motifs
  • They may be continuous along the chain (e.g.
    GXG) or discontinuous (e.g. catalytic triad)
  • Historically motifs were identified and analysed
    in an effort to understand the relationship
    between protein sequence and structure, to
    improve prediction methods. They are also used to
    assign function (Prosite).
  • Many motifs can now be recognised automatically
    from coordinates, using programmes such as DSSP
    and Promotif
  • PDB files can be annotated with these structural
    motifs e.g. in PDBsum

6
http//www.ebi.ac.uk/thornton-s
rv/databases/pdbsum/
Roman Laskowski
7
Example page
8
Protein detail
9
MSD motifhttp//www.ebi.ac.uk/msd-srv/msdmotif
  • Adel Golovin
  • Currently alpha test
  • Full Release probably Oct 2005

PDB 1gci
10
MSD motif
  • Small 3D motifs from J.Milner-White search/view
  • Secondary structure patterns (HTH) search/view
  • ?,?,? based search/view
  • Ligands and their environment search/view
  • Catalytic sites search/view
  • Blast sequence search/view
  • Prosite compliant patterns search/view
  • 3D multiple alignment

11
MSDmotif options
12
Small motifs
Alpha-Beta Motif
Nest
ST staple
11 motifs in total (Prof James Milner-White) http
//doolittle.ibls.gla.ac.uk9006/david/ProteinMotif
DB.html
13
Motifs In MSDmotif (1)
AlphaBeta Motif
Beta Turn
Schellmann Loop
Beta Bulge (classic)
Nest
Beta Bulge Loop
14
Motifs In MSDmotif (2)
Asx Motif
ST Motif
Asx Turn
ST Turn
ST Staple
15
Statistics provided by MSDmotifSTmotif
a)
b)
c)
  1. Amino acid occurrence at each position
  2. Correlation between side chain charge and residue
    position
  3. Motif parameter variation

16
Hit List after clicking
17
Small motifs 3D alignmentfrom different
families
ST-staple
18
MSDmotif options
19
Secondary structure patterns
Where N binds sugar Man or Nag
20
?,?,? search
PDB1gci
Ideal for short loops search
21
Example of a search using MSDmotif
PDB1gci Subtilases family
PDB1f5p Globins family
Phi/Psi Search using MSDmotif
Other Subtilases
Calcium binding site
22
Sequence search
ZN binding pattern CXXCXXXFXXXXXLXXHXXXH
23
3D alignment
24
MSD motif
  • Available in alpha version
  • http//www.ebi.ac.uk/msd-srv/msdmotif
  • Will be published later this year
  • Incremental weekly update
  • 20 G disk space on Oracle DB, linear dependency
  • 0.8 M per PDB
  • Web application server with J2EE servlet engine
  • NCBI Blast
  • Adel Golovin
  • Kim Henrick

25
Outline
  • Structural Motifs
  • PDBsum
  • MSDmotif
  • Functional Motifs
  • Catalytic Site Atlas
  • DNA Binding Motifs
  • Automated templates
  • Reverse Templates
  • From Structure to Function? - ProFunc

26
Catalytic Site Atlas
  • Taken from primary literature
  • ?-lactamase Class A
  • EC 3.5.2.6
  • PDB 1btl
  • Reaction ?-lactam H2O ? ?-amino acid
  • Active site residues S70, K73, S130, E166
  • Plausible mechanism

27
  • The Catalytic Site Atlas a resource of catalytic
    sites and residues identified in enzymes using
    structural data.
  • Craig T. Porter, Gail J. Bartlett, and Janet M.
    Thornton
  • Nucl. Acids. Res. 2004 32 D129-D133.
  • http//www.ebi.ac.uk/thornton-srv/databases/CSA

28
  • Annotates catalytic residues in the PDB
  • Based on a dataset of 514 enzyme families
  • Representative catalytic site for each family
  • Homologues assigned by Psi-BLAST
  • Limited substitution allowed.
  • Homologues updated monthly.
  • Literature references
  • Data also available via MSDsite
  • http//www.ebi.ac.uk/thornton-srv/databases/CSA
  • http//www.ebi.ac.uk/msd-srv/msdsite

29
3-D templates
  • Use 3D templates to describe the active site of
    the enzyme
  • analogous to 1-D sequence motifs such as PROSITE,
    but in 3-D
  • Sequence position independent
  • Captures essence of functional site in protein

30
Pepsin
31
Aspartic Proteinase - Active Site residues -
DTGx2
Eukaryotic Fungal Aspartic Proteinases
all-atom DTG-DTG Template

32
Aspartic Proteases Active Site Template
Asp CO2
Gly C?
A template of 8 atoms is sufficient to
identify all Aspartic Proteinases
Asp O?
Gly C?
Thr/Ser O?
Thr O?
33
Aspartic Protease Template Search against all PDB
green true redfalse
34
TEmplate Search and Superposition TESS
Wallace et al., 1997
  • defines a functional site as a sequence-independen
    t set of atoms in 3-D space
  • search a new structure for a functional site
  • search a database of structures for similar
    clusters

e.g. serine proteinase, catalytic triad
35
Serine Proteinase templates
  • A trypsin-based template of 7 atoms was able to
    identify almost all serine proteinases in PDB-
    including subtilisin
  • It also identified active sites of several other
    functionally distinct enzyme families - serine
    carboxypeptidase, acetylcholine esterase lipase
    dehalogenase
  • The catalytic triad has evolved independently
    many times

36
Active site convergence
Trypsin
Subtilisin
37
(No Transcript)
38
3D Templates to Characterise Functional Sites
Template searches
39
Database of enzyme active site templates
189 templates

Carbamoylsarcosine amidohhydrase
Ser-His-Asp catalytic triad
Dihydrofolate reductase
40
DNA
Protein

41
DNA-binding Motifs
  • Helix-Turn-Helix (HTH)
  • Standard HTH
  • Winged helix
  • Beta Sheet
  • Zinc-finger

42
Prediction of DNA Binding Function using
Structural Motifs
  • Predicting function from structure
  • Structural motifs
  • Helix-Turn-Helix (HTH)
  • Bind in major groove
  • Carboxyl terminal helix - DNA recognition
  • 1/3 DNA-binding protein families (16/54)
  • Brennan and Mathews 1989 Brennan, 1991

43
HTH Motif Proteins
Catabolic activator protein (1ber)
Lambda repressor/operator complex (1lmb)
44
HTH Motif Templates
3D template library (E.g. 1berA16-36)
45
Predicting DNA binding function
  • Scanning template library against 3D structures
  • One template T (length n) scanned against protein
    P of length m, RMSD calculated optimal
    superposition at each m-n1 possible positions in
    P
  • Calculate lowest RMSD for optimal superposition

46
Ideal RMSD distribution
47
RMSD Distributions with HTH templates
1.2Å
RMSD
831/23,506 3.5 false positives 2/142 1.4
false negatives
48
HTH Motif Extended Templates
  • Extend templates by adding 2 residues to start
    and end
  • 1berA16-36
  • 1berA14-38

49
RMSD Distributions with extended HTH templates
1.2Å
110/23,506 0.5 false positives 2/144 1.4
false negatives
50
Comparison of RMSD Distributions
51
HTH Accessible Surface Area
ASA threshold 990Å2 reduced false positives from
110 to 80 False positive rate of 0.3
(80/23506)
52
Summary
  • Structural template library of 144 HTH motifs
  • Minimum RMSD for optimal superpositions on whole
    protein structures based on C? coordinates
  • Thresholds of 1.2Å RMSD and 990Å2 ASA
  • Hit rate of 98.6 false positive rate of 0.3
  • Recognition across sequence families and fold
    families

53
Template databases
  • HAND CURATED
  • Enzyme active sites (PROCAT) 189 templates
  • Currently being extended
  • Metal-binding sites 600 templates
  • AUTOMATED
  • Ligand-binding sites 10,000 templates
  • DNA-binding sites 800 templates

54
Automatically generated templates
a. For each Het Group in the PDB extract a
non-homologous data set of proteins binding that
Het Group
b. Identify residues interacting with ligand (via
H-bonds or non-bonded contacts)
c. Templates generated from overlapping local
groups of 3-residue clusters
d. Gives over 10,000 ligand-binding templates
55
Automatically generated templates
a. Extract a non-homologous data set of
DNA/RNA-binding proteins from the PDB
b. Identify residues interacting with DNA/RNA
(via H-bonds or non-bonded contacts)
c. Templates generated from overlapping local
groups of 3-residue clusters
d. Gives over 800 DNA/RNA-binding templates
56
Problems with automated template methods
  • WITH A LARGE NUMBER OF TEMPLATES
  • Too many hits (usually tens, and often hundreds)
  • Use of rmsd rarely discriminates true from false
    positives
  • Local distortion in structure may give a large
    rmsd
  • Top hit rarely the correct hit even in
    obvious cases

57
An example
58
Enzyme active site templates
Hits for 1hsk
102. E.C.1.1.1.158 2.19Å UDP-N-acetylmuramat
e dehydrogenase
59
Comparison of template environments
Similar residues in neighbourhood
Template structure 1mbb
Target structure 1hsk
60
Comparison of template environments
61
Comparison of template environments
62
Environment similarity score
Score equivalent grid-points using Dayhoff matrix
and taking voids into account
Total similarity score obtained from sum of all
grid-point scores
63
Results for 1hsk
Hit E.C number Rmsd Score Enzyme

1. E.C.1.1.1.158 2.08 209.1
UDP-N-acetylmuramate dehydrogenase 2.
E.C.3.2.1.14 2.13 146.0 Chitinase A
chitodextrinase
1,4-beta-poly-N-acetylglucos
aminidase
coly-beta-glucosaminidase 3.
E.C.3.2.1.17 1.92 142.4 Turkey
lysozyme 4. E.C.3.2.1.17 1.89 138.7
Hen lysozyme 5. E.C.3.5.1.26 1.47 132.3
Aspartylglucosylaminidase 6. E.C.3.2.1.3
1.54 131.1 Glucan 1,4-alpha-glucosidase

64
Residue conservation
65
Residue conservation and cleft proximity
66
Reverse templates
67
Comparison of template environments
Identical residues in neighbourhood
Template structure 1mbb
Target structure 1hsk
68
Reverse templates
  • Typically get 20-40 templates from a single
    structure
  • Search each template vs PDB (or representative
    subset)
  • Non-homologous dataset of 2,500 protein chains
  • Focused search (eg top DALI hits)
  • Locate known PDB entries with closest local
    similarity
  • Program called the Protein SiteSeer
  • Times for search vs 2,500 set
  • JESS 30 minutes
  • SiteSeer 3 hours

69
biological multimeric state
evolutionary relationships
INTERACTIONS
MULTIMERS
FOLD
Structure to Function
Structure to Function
SURFACE
MUTANTS SNPs
3D STRUCTURE
ELECTROSTATICS
LIGANDS
CLUSTERS
ligand functional sites
enzyme active sites
catalytic clusters, mechanisms motifs
70
Protein Function
  • Protein function has many definitions
  • Biochemical Function - The biochemical role of
    the protein e.g. serine protease
  • Biological Function - The role of the protein in
    the cell/organism e.g.digestion, blood clotting,
    fertilisation
  • The 3D structure usually only provides
    information about biochemical function

71
250 structures solved to date by MCSG
Some examples
40 are hypothetical proteins
72
From Gene To Biochemical Function
  • Gene ? Protein ? 3D Structure ? Function
  • Identifying sequence or structural similarity
  • (i.e. identifying an evolutionary relationship)
  • is the most powerful route to function
  • Identification

73
From Gene To Biochemical Function
  • Gene ? Protein ? 3D Structure ? Function
  • Given a protein structure
  • Where is the functional site?
  • Which ligands bind to the protein?

74
Predicting function from 3D Structure
conservation
Residue conservation
  • Conservation
  • Valdar Thornton
  • Lichtarge et al.
  • Aloy et al.
  • Glaser et al.
  • Etc...

75
Predicting function from 3D structure binding
sites
Binding sites
  • Binding site comparison
  • Geometrical hashing
  • eF-site (Nakamura et al.)
  • PINTS (Russell)
  • Pseudospheres (Klebe)
  • pvSOAR (Binkowski et al.)
  • etc

76
Predicting function from 3D Structure templates
3D templates
77
Predicting Binding Site
Binding-site analysis cutA
78
Identifying Binding Site Function Using Motifs
- 3D enzyme active site structural motifs (Craig
Porter) - Catalytic Site Atlas - Identification
of catalytic residues (Gail Bartlett, Alex
Gutteridge) - Metal binding sites (Malcolm
MacArthur) - Binding site features (Gareth
Stockwell) - Automatically generated templates
of ligand-binding and - DNA binding motifs (Sue
Jones, Hugh Shanahan) - Reverse templates
(Roman Laskowski) JESS fast template search
algorithm (Jonathan Barker)
79
An example
Structure Rossmann fold, hence many
structural homologues
80
PROCAT template search
One very strong hit
81
ProFunc function from 3D structure
Roman Laskowski
82
http//www.ebi.ac.uk/thornton-srv/databases/ProFun
c/
83
Goal Function Prediction from Structure
Roman Laskowski James Watson
84
Goal Function Prediction from Structure
85
MCSG Dataflow
86
Functional Annotation
All MCSG structures are automatically run through
ProFunc. The results are examined manually to
try to estimate the most likely function. The
most recent (Nov 2004) dataset contains 193
unique structures
Some assignment possible 102 (53)
Function remains unknown 23 (12)
Prior function known 68 (35)
87
Acknowledgements
  • James Watson, Roman Laskowski - EBI
  • Adel Golovin, Kim Henrick - EBI MSD
  • David Leader, James Milner-White Glasgow
  • Andrzej Joachimiak, Aled Edwards MCSG
  • (Mid-West Centre for Structural Genomics)

http//www.ebi.ac.uk/thornton-srv/databases/pdbsum
/
http//www.ebi.ac.uk/msd-srv/msdmotif
http//www.ebi.ac.uk/thornton-srv/databases/ProFun
c/
Write a Comment
User Comments (0)
About PowerShow.com