Predicting Structural Features - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Predicting Structural Features

Description:

Low TC regions with lots of Alphas. High TC regions with lots of Betas. Performance results? ... shown for a secondary structure NN alpha helix output layer. ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 26

Provided by: timothy139

Category:

more less

Transcript and Presenter's Notes

Title: Predicting Structural Features

1
Predicting Structural Features

Chapter 12

2
Structural Features

Phosphorylation sites
Transmembrane helices
Protein flexibility

3
Accuracy Measures Revisited

Level
Individual residues
Complete helix or strand

4
Residue-Level Measures

Q3
Percentage of residues predicted correctly
If one state (eg, Coil) is very common (eg, 50),
blind guessing can give a large Q3!
Matthews correlation coefficient
C (TPxTN - FNxFP)/v(TPFP)(TPFN)(TNFP)(TNFN)
Defined for each state
More balanced than Q3 in range 1
Random prediction C 0

5
Structural Element-Level Measures

SOV
based on the overlap of predicted segments of
helix, strand etc. with the observed segments of
the same type
The N-score
specialized for transmembrane protein predictors
Should TMHMM2 be changed? Should your model?

6
Predicting Helices

Residue propensities
score for a given structure class for each
residue, a
P(H a) is proportional to P(a H) / P(a)
Why? Bayes Rule is your friend!
P(H a) P(a H)P(H) / P(a)
P(H) doesnt depend on a, so
P(H a) proportional to P(a H) / P(a)

Can this be used to see how to group helix states?
7
Identical short segments rarely fold differently

Local sequence is highly important to secondary
structure.
But, this sequence occurs in two proteins and
takes very different forms
KGVVPQLVK
There is significant information about structure
in local sequence.

8
I-sites Sequence Database

About 250 short segments (3-19 residues) that
show strong correlation between sequence and
structure
Example shows
phi and psi angles, log-odds matrix
superimposed backbones
representative structure

9
Nearest Neighbor Prediction Methods

Predict secondary structure based on
Local alignments of the query sequence to a
database of sequences of known structure
Alignment score functions are often
special-purpose, and may include helix/sheet/coil
propensity information
Homologous sequences are often included in the
database
Prediction based on weighted votes of nearest
neighbors (usually only central residue of
alignment is predicted)
73.5 Accuracy (Q3)

10
A different application prediction of misfolding

Diseases such as Alzheimers involve protein
misfolding.
Usually, the misfolded region ends up as
Beta-strands.
How could we use secondary structure information
to predict which proteins will potentially
misfold?

11
H?PHidden Beta Propensity

Key idea Tertiary contacts (TC)
TC is number of contacts a residue has with
others at least 4 residues away
Alpha helices tend to be in regions of HIGH TC
Beta strands tend to be in regions of LOW TC
Look for query residues whose nearest neighbors
are strange with respect to TC and alpha/beta
state
Low TC regions with lots of Alphas
High TC regions with lots of Betas
Performance results?

12
Neural Nets

Each node computes a simple function of its
inputs.
The weighted sum of the inputs are added to a
bias term and squashed
I ? w??-1
??(I?)
The output, ?, is then propagated to nodes in the
next layer.

13
Training Neural Nets

Back-propagation
Optimizes the weights and bias terms
Minimize the error function (difference between
predicted and observed)
RMS
Relative Entropy
Iterative process
Final weights shown for a secondary structure NN
alpha helix output layer.
Over-fitting can be reduced by training for fewer
iterations

14
Adaptive Encoding and Weight Sharing

Orthogonal encoding
Each residue feeds three hidden nodes
The weights for all red nodes are tied together
Each group of three nodes learns the same
encoding of the 20 amino acids

15
Engineering Intuition Into NNs

Alpha helices have a period of 3.6 residues per
turn
A NN can be specially designed to reflect that
Using this, plus adaptive encoding
Q3 66
Adding homology Q3 73

16
HMMs and Transmembrane Proteins (again)
17
HMMTOP Architecture

TMHs 17-25 residues
Tails 1-15 residues
Blue letters show structural state labels

18
TMHMM Architecture

Helices are 5-25 residues
Caps follow helices
Cytoplasmic
Loop 0-20 residues
Globular 1 state
Extra-cellular
Long loop 0-100 residues
Globular 3 states

19
Predicting Globular Proteins with Hidden Neural
Networks

YASPIN
Neural net predicts seven classes (He,H,
Hb,C,Ee,E,Eb) using 15-residue window of PSSM
input
HMM filters this output
Can you imagine how this is done?

20
Coiled-coil HMMMARCOIL
Design lets you start and end in any phase of the
heptad repeat
21
Support Vector Machines SVMs

Classifiers
Basic machine is a 2-class classifier
Training Data
set of labeled vectors
ltx1, x2, ,xn, Cgt,
Class C1 or C-1
Supervised learning (like neural nets)
Learn from positive and negative examples
Output
Function predicting class of unlabeled vectors

22
SVM Example

Alpha helix predictor
15 residue window
21 numbers per residue
Psi-BLAST PSSM 20 numbers
spacer flag indicating off end of protein
315 numbers total per window
Training samples
Non-helix samples ltx1, x2, , x315, -1gt
Helix samples ltx1, x2, , x315, 1gt
Training finds function of X that best separates
the non-helix from the helix samples

23
SVM vs NNas Classifiers

Similarities
Compute a function on their inputs
Trained to minimize error
Differences
NNs find any hyperplane that separates the two
clases
SVMs find the maximum- margin hyperplane
NNs can be engineered by designing their topology
SVMs can be tailored by designing the kernel
function

24
SVM Details
Separating Hyperplanes
Choose w, b to minimize w Subject to
Dual form (support vectors)
Kernel trick replace dot products by a
non-linear kernel bunction.
s.t.
where
25
Dubious Statement

In marked contrast to NN, SVMs have few explicit
parameters to fit
The vector of weights, w, is as long as the
number of training samples
But the minimum-margin hyperplane will have most
of the weights equal to zero only the support
vectors will have non-zero weights.

Write a Comment

User Comments (0)