Binding site prediction on protein surfaces using Support Vector Machines

1 / 21
About This Presentation
Title:

Binding site prediction on protein surfaces using Support Vector Machines

Description:

Hydrophobicity. Electrostatic potential. Residue propensity. Solvent accessibility ... Hydrophobicity. Simple hydrophobicity scale (Fauch re and Pliska 1983) ... –

Number of Views:140
Avg rating:3.0/5.0
Slides: 22
Provided by: Bioc79
Category:

less

Transcript and Presenter's Notes

Title: Binding site prediction on protein surfaces using Support Vector Machines


1
Binding site prediction on protein surfaces using
Support Vector Machines
  • James Bradford
  • Leeds Bioinformatics Group

2
Talk Structure
  • Motivation.
  • Introduction to machine learning and Support
    Vector Machines.
  • Methods.
  • Results.
  • Initial findings.

3
Binding site prediction
  • Complements our work on the docking problem.
  • Reduces the search space for our docking
    algorithm.
  • Decrease in number of docking solutions.
  • Scoring functions are computer intensive.

4
Machine learning
  • Making predictions by automated learning from
    existing knowledge
  • Learning
  • requires training data where the answer is known.
  • generates rules or other functions that fit the
    training data.
  • The trained method is then used to predict on new
    data.

5
Support Vector Machines (SVMs)
  • A family of learning algorithms that aim to
  • generate a hyperplane that divides a training set
    of examples labeled positive and negative such
    that all points with the same label appear on the
    same side of the hyperplane,
  • maximise the distance between the two classes and
    the hyperplane (optimal separating hyperplane -
    OSH).

6
Support Vector Machines (SVMs)
  • In practise, training sets are usually
    non-separable...
  • Position the OSH to minimise the number of
    misclassified points (Fig 1).
  • and non-linear.
  • Use a Kernel function to map the data from real
    space into high dimensional feature space (Fig 2).

Fig 1
Fig 2
7
SVMs and Binding Site Prediction
Surface patches of contiguous residues are
classified as either part of or outside the
interface between two proteins i.e. the binding
site.
Molecular surface of trypsin and showing the
Bowman-Birk inhibitor binding site (PDB code
1tab)
Interface patch
Non-interface patch
Actual binding site
8
Training Set
  • SVM trained on either whole training set...
  • or training sets subdivided into
  • homodimers
  • enzymes
  • inhibitors
  • heterodimers (transient, small)
  • heterodimers (transient, large)
  • hetero-obligomers.
  • Some interface properties are specific to protein
    type, for example see
  • Bradford Westhead (2003) Asymmetric mutation
    rates at enzyme-inhibitor interfaces
    Implications for the docking problem. Protein
    Science. 12 2099-2103.

9
PDB -gt SVM
Calculate solvent excluded surface
Label each surface vertex with eight chemical,
geometrical or physical properties
Define true binding site
Generate interface and non-interface patches
Generate patches
Calculate patch attributes
Calculate patch attributes
Train SVM
Predict
10
Patch Characteristics
  • Interface patch
  • Circular.
  • Centre centre of actual binding site.
  • Number of surface vertices 0.08 x Number of
    surface vertices of smallest protein in dimer.
  • Non-interface patch
  • As for interface patch except...
  • Centre randomly selected from non-interface
    vertex set.
  • Why not just use the actual interface?
  • No prior knowledge of size and shape of interface
    in blind prediction.
  • No. of non-interface patches no. of interface
    patches.
  • SVM training is balanced.

11
Surface Properties
  • Eight properties seen as useful in distinguishing
    binding sites from the rest of the surface.
  • Conservation
  • Shape index
  • Curvedness
  • Hydrophobicity
  • Electrostatic potential
  • Residue propensity
  • Solvent accessibility
  • Secondary structure
  • Jones Thornton (1996) Principles of
    protein-protein interactions. Proc. Natl. Acad.
    Sci. USA. 93 13-20.

12
Conservation
  • Calculated using Scorecons (William Valdar)
  • Clusters of conserved residues can sometimes
    characterise a functional site.

Conservation at the BPTI binding site on trypsin
(PDB code 2ptc)
Interface
Conservation
13
Shape Index and Curvedness
  • Calculated from the principle curvatures at each
    surface vertex.
  • Shape index
  • Scale -1 (concave) through 0 (flat) to 1
    (convex).
  • Make concave clefts and convex protrusions easy
    to identify.

Shape characteristics of Bowman-Birk inhibitor
(PDB code 1tab)
Shape index
Curvedness
Interface
14
Electrostatic Potential
  • Calculated by Delphi.
  • Interface maybe marked by an area of particularly
    positive or negative potential.

Thermitase binding site on eglin c
Electrostatic potential
Eglin c binding site on thermitase
Positive potential at eglin c interface
complements positive potential on negative
potential on thermitase binding surface (PDB
code 2tec).
15
Other Properties (1)
  • Hydrophobicity
  • Simple hydrophobicity scale (Fauchère and Pliska
    1983).
  • Homodimer interfaces tend to be hydrophobic.
  • Solvent accessibility
  • MSMS (Michael Sanner).
  • Outputs accessible surface area of each atom.
  • Clefts are less accessible than protrusions.

16
Other Properties (2)
  • Residue Propensity
  • Knowledge based.
  • Calculated for each amino acid as the fraction of
    ASA that amino acid contributes to interface
    compared to its contribution to the whole surface
    (Jones and Thornton 1996).
  • Residue propensity gt 1 means that residue occurs
    more frequently at interface.
  • Secondary Structure
  • Extracted from PDB atom coordinates using STRIDE
    (Frishman Argos 1995).

17
Patch Attributes
  • Mean and standard deviation
  • Conservation
  • Shape index
  • Curvedness
  • Hydrophobicity
  • Electrostatic potential
  • Solvent accessibility
  • Residue propensity
  • Proportion
  • Conserved / Variable
  • Concave / Convex
  • Helix / Sheet / Other

18
Initial Results
19
Summary
  • Methods have been implemented to train an SVM to
    distinguish between an interface patch and a
    non-interface patch.
  • Training on a separated data set is more accurate
    than training on all proteins.
  • Results need to be validated.
  • Successful predictions on blind data are the
    ultimate aim.

20
Acknowledgements
  • Supervisor David Westhead
  • Funding BBSRC
  • Support Leeds Bioinformatics Group

Contact
  • Email bmbjrb_at_bmb.leeds.ac.uk
  • Website http//www.bioinformatics.leeds.ac.uk

21
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com