Title: High%20Throughput%20Computing%20and%20Protein%20Structure
1High Throughput Computing and Protein Structure
Stephen E. Hamby
2Overview
- Introduction To Protein Structure
- Dihedral Angles
- Previous Work
- Support Vector Regression
- Optimisation
- Prediction
- Results
- Conclusions
3Introduction To Protein Structure
- Molecules with massive biological importance
- Structure determination gives insight into .
- Function, Dynamics, Potential drug targets.
- Experimental structure determination is.
- Expensive, Slow, Difficult
4Introduction To Protein Structure
Primary Structure Order of Amino
Acids Secondary Structure Building
blocks Tertiary Structure Complete 3D Structure
5Introduction To Protein Structure
Secondary Structure Types a-helix ß-sheet Random
Coil
6Dihedral Angles
7Dihedral Angles
8Dihedral Angles
Finding the secondary structure of a protein is a
step towards finding its complete
structure Predicting dihedral angles can help us
to get the secondary structure
How Can We Predict Dihedral Angles?
9Previous work
Destruct Multiple neural networks. Iterative
method. Predicts secondary structure and dihedral
angles.
10Previous work
Real Spine
Twin neural networks give a consensus
prediction. Predicts dihedral angles from various
amino acid properties amino acid composition and
predicted structure.
11Support Vector Regression
Kernel machine learning raises the data to a
higher dimension so a linear relationship can be
found.
12Support Vector Regression
Attempts to fit a linear function to the data in
a high dimensional feature space Accurate
but Slow, needs optimisation, black box.
13Support Vector Regression
Kernel Choice We tested the various kernels
available through the PyML package. These the
are linear, polynomial, and gaussian kernels. We
tested them using the CASP4 dataset. Gaussian
kernel produced the best results.
14Optimisation
Three interdependent parameters Grid based
optimisation on a the CASP4 dataset Around 10000
3 hour jobs. Run in blocks of 10 on
Jupiter Accuracy assessed using the Pearson
correlation coefficient
15Prediction
Support vector machine using a Gaussian kernel
and optimal parameters. Training on the CB513
dataset. Tested by 10 fold cross validation CASP
4 used as a test set.
16Results
Results measured by cross validation
Destruct Real Spine SVM Prediction
Pearson Correlation Coefficient 0.42 0.62 0.57
CASP4 Test set gives Pearson Correlation
Coefficient of 0.56
17Results
Using Secondary structure predictions made by
cascade correlation neural networks Dihedrals
assisted by predicted structure Pearson
correlation coefficient 0.582. Subsequent
iterations should lead to better predictions of
both structure and dihedral angles.
18What Next?
Using further iterations to improve
accuracy. Current method is a black box. Can we
use a program like Trepan to get some definite
rules about secondary structure.
19Conclusions
- Dihedral Angles define protein secondary
structure - Using Support Vector Machines it is possible to
predict dihedral angles - We (hopefully!) can use predicted dihedral
angles to improve the accuracy of secondary
structure prediction.
20Acknowledgements
Jonathan Hirst Hirst group members BBSRC The
University of Nottingham