Binding site prediction on protein surfaces using Support Vector Machines

1 / 21

About This Presentation

Title:

Binding site prediction on protein surfaces using Support Vector Machines

Description:

Hydrophobicity. Electrostatic potential. Residue propensity. Solvent accessibility ... Hydrophobicity. Simple hydrophobicity scale (Fauch re and Pliska 1983) ... –

Number of Views:140

Avg rating:3.0/5.0

Slides: 22

Provided by: Bioc79

Category:

more less

Transcript and Presenter's Notes

Title: Binding site prediction on protein surfaces using Support Vector Machines

1
Binding site prediction on protein surfaces using
Support Vector Machines

James Bradford
Leeds Bioinformatics Group

2
Talk Structure

Motivation.
Introduction to machine learning and Support
Vector Machines.
Methods.
Results.
Initial findings.

3
Binding site prediction

Complements our work on the docking problem.
Reduces the search space for our docking
algorithm.
Decrease in number of docking solutions.
Scoring functions are computer intensive.

4
Machine learning

Making predictions by automated learning from
existing knowledge
Learning
requires training data where the answer is known.
generates rules or other functions that fit the
training data.
The trained method is then used to predict on new
data.

5
Support Vector Machines (SVMs)

A family of learning algorithms that aim to
generate a hyperplane that divides a training set
of examples labeled positive and negative such
that all points with the same label appear on the
same side of the hyperplane,
maximise the distance between the two classes and
the hyperplane (optimal separating hyperplane -
OSH).

6
Support Vector Machines (SVMs)

In practise, training sets are usually
non-separable...
Position the OSH to minimise the number of
misclassified points (Fig 1).
and non-linear.
Use a Kernel function to map the data from real
space into high dimensional feature space (Fig 2).

Fig 1
Fig 2
7
SVMs and Binding Site Prediction
Surface patches of contiguous residues are
classified as either part of or outside the
interface between two proteins i.e. the binding
site.
Molecular surface of trypsin and showing the
Bowman-Birk inhibitor binding site (PDB code
1tab)
Interface patch
Non-interface patch
Actual binding site
8
Training Set

SVM trained on either whole training set...
or training sets subdivided into
homodimers
enzymes
inhibitors
heterodimers (transient, small)
heterodimers (transient, large)
hetero-obligomers.
Some interface properties are specific to protein
type, for example see
Bradford Westhead (2003) Asymmetric mutation
rates at enzyme-inhibitor interfaces
Implications for the docking problem. Protein
Science. 12 2099-2103.

9
PDB -gt SVM
Calculate solvent excluded surface
Label each surface vertex with eight chemical,
geometrical or physical properties
Define true binding site
Generate interface and non-interface patches
Generate patches
Calculate patch attributes
Calculate patch attributes
Train SVM
Predict
10
Patch Characteristics

Interface patch
Circular.
Centre centre of actual binding site.
Number of surface vertices 0.08 x Number of
surface vertices of smallest protein in dimer.
Non-interface patch
As for interface patch except...
Centre randomly selected from non-interface
vertex set.
Why not just use the actual interface?
No prior knowledge of size and shape of interface
in blind prediction.
No. of non-interface patches no. of interface
patches.
SVM training is balanced.

11
Surface Properties

Eight properties seen as useful in distinguishing
binding sites from the rest of the surface.
Conservation
Shape index
Curvedness
Hydrophobicity
Electrostatic potential
Residue propensity
Solvent accessibility
Secondary structure
Jones Thornton (1996) Principles of
protein-protein interactions. Proc. Natl. Acad.
Sci. USA. 93 13-20.

12
Conservation

Calculated using Scorecons (William Valdar)
Clusters of conserved residues can sometimes
characterise a functional site.

Conservation at the BPTI binding site on trypsin
(PDB code 2ptc)
Interface
Conservation
13
Shape Index and Curvedness

Calculated from the principle curvatures at each
surface vertex.
Shape index
Scale -1 (concave) through 0 (flat) to 1
(convex).
Make concave clefts and convex protrusions easy
to identify.

Shape characteristics of Bowman-Birk inhibitor
(PDB code 1tab)
Shape index
Curvedness
Interface
14
Electrostatic Potential

Calculated by Delphi.
Interface maybe marked by an area of particularly
positive or negative potential.

Thermitase binding site on eglin c
Electrostatic potential
Eglin c binding site on thermitase
Positive potential at eglin c interface
complements positive potential on negative
potential on thermitase binding surface (PDB
code 2tec).
15
Other Properties (1)

Hydrophobicity
Simple hydrophobicity scale (Fauchère and Pliska
1983).
Homodimer interfaces tend to be hydrophobic.
Solvent accessibility
MSMS (Michael Sanner).
Outputs accessible surface area of each atom.
Clefts are less accessible than protrusions.

16
Other Properties (2)

Residue Propensity
Knowledge based.
Calculated for each amino acid as the fraction of
ASA that amino acid contributes to interface
compared to its contribution to the whole surface
(Jones and Thornton 1996).
Residue propensity gt 1 means that residue occurs
more frequently at interface.
Secondary Structure
Extracted from PDB atom coordinates using STRIDE
(Frishman Argos 1995).

17
Patch Attributes

Mean and standard deviation
Conservation
Shape index
Curvedness
Hydrophobicity
Electrostatic potential
Solvent accessibility
Residue propensity
Proportion
Conserved / Variable
Concave / Convex
Helix / Sheet / Other

18
Initial Results
19
Summary

Methods have been implemented to train an SVM to
distinguish between an interface patch and a
non-interface patch.
Training on a separated data set is more accurate
than training on all proteins.
Results need to be validated.
Successful predictions on blind data are the
ultimate aim.

20
Acknowledgements