Predicting Structural Features - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Predicting Structural Features

Description:

Low TC regions with lots of Alphas. High TC regions with lots of Betas. Performance results? ... shown for a secondary structure NN alpha helix output layer. ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 26
Provided by: timothy139
Category:

less

Transcript and Presenter's Notes

Title: Predicting Structural Features


1
Predicting Structural Features
  • Chapter 12

2
Structural Features
  • Phosphorylation sites
  • Transmembrane helices
  • Protein flexibility

3
Accuracy Measures Revisited
  • Level
  • Individual residues
  • Complete helix or strand

4
Residue-Level Measures
  • Q3
  • Percentage of residues predicted correctly
  • If one state (eg, Coil) is very common (eg, 50),
    blind guessing can give a large Q3!
  • Matthews correlation coefficient
  • C (TPxTN - FNxFP)/v(TPFP)(TPFN)(TNFP)(TNFN)
  • Defined for each state
  • More balanced than Q3 in range 1
  • Random prediction C 0

5
Structural Element-Level Measures
  • SOV
  • based on the overlap of predicted segments of
    helix, strand etc. with the observed segments of
    the same type
  • The N-score
  • specialized for transmembrane protein predictors
  • Should TMHMM2 be changed? Should your model?

6
Predicting Helices
  • Residue propensities
  • score for a given structure class for each
    residue, a
  • P(H a) is proportional to P(a H) / P(a)
  • Why? Bayes Rule is your friend!
  • P(H a) P(a H)P(H) / P(a)
  • P(H) doesnt depend on a, so
  • P(H a) proportional to P(a H) / P(a)

Can this be used to see how to group helix states?
7
Identical short segments rarely fold differently
  • Local sequence is highly important to secondary
    structure.
  • But, this sequence occurs in two proteins and
    takes very different forms
  • KGVVPQLVK
  • There is significant information about structure
    in local sequence.

8
I-sites Sequence Database
  • About 250 short segments (3-19 residues) that
    show strong correlation between sequence and
    structure
  • Example shows
  • phi and psi angles, log-odds matrix
  • superimposed backbones
  • representative structure

9
Nearest Neighbor Prediction Methods
  • Predict secondary structure based on
  • Local alignments of the query sequence to a
    database of sequences of known structure
  • Alignment score functions are often
    special-purpose, and may include helix/sheet/coil
    propensity information
  • Homologous sequences are often included in the
    database
  • Prediction based on weighted votes of nearest
    neighbors (usually only central residue of
    alignment is predicted)
  • 73.5 Accuracy (Q3)

10
A different application prediction of misfolding
  • Diseases such as Alzheimers involve protein
    misfolding.
  • Usually, the misfolded region ends up as
    Beta-strands.
  • How could we use secondary structure information
    to predict which proteins will potentially
    misfold?

11
H?PHidden Beta Propensity
  • Key idea Tertiary contacts (TC)
  • TC is number of contacts a residue has with
    others at least 4 residues away
  • Alpha helices tend to be in regions of HIGH TC
  • Beta strands tend to be in regions of LOW TC
  • Look for query residues whose nearest neighbors
    are strange with respect to TC and alpha/beta
    state
  • Low TC regions with lots of Alphas
  • High TC regions with lots of Betas
  • Performance results?

12
Neural Nets
  • Each node computes a simple function of its
    inputs.
  • The weighted sum of the inputs are added to a
    bias term and squashed
  • I ? w??-1
  • ??(I?)
  • The output, ?, is then propagated to nodes in the
    next layer.

13
Training Neural Nets
  • Back-propagation
  • Optimizes the weights and bias terms
  • Minimize the error function (difference between
    predicted and observed)
  • RMS
  • Relative Entropy
  • Iterative process
  • Final weights shown for a secondary structure NN
    alpha helix output layer.
  • Over-fitting can be reduced by training for fewer
    iterations

14
Adaptive Encoding and Weight Sharing
  • Orthogonal encoding
  • Each residue feeds three hidden nodes
  • The weights for all red nodes are tied together
  • Each group of three nodes learns the same
    encoding of the 20 amino acids

15
Engineering Intuition Into NNs
  • Alpha helices have a period of 3.6 residues per
    turn
  • A NN can be specially designed to reflect that
  • Using this, plus adaptive encoding
  • Q3 66
  • Adding homology Q3 73

16
HMMs and Transmembrane Proteins (again)
17
HMMTOP Architecture
  • TMHs 17-25 residues
  • Tails 1-15 residues
  • Blue letters show structural state labels

18
TMHMM Architecture
  • Helices are 5-25 residues
  • Caps follow helices
  • Cytoplasmic
  • Loop 0-20 residues
  • Globular 1 state
  • Extra-cellular
  • Long loop 0-100 residues
  • Globular 3 states

19
Predicting Globular Proteins with Hidden Neural
Networks
  • YASPIN
  • Neural net predicts seven classes (He,H,
    Hb,C,Ee,E,Eb) using 15-residue window of PSSM
    input
  • HMM filters this output
  • Can you imagine how this is done?

20
Coiled-coil HMMMARCOIL
Design lets you start and end in any phase of the
heptad repeat
21
Support Vector Machines SVMs
  • Classifiers
  • Basic machine is a 2-class classifier
  • Training Data
  • set of labeled vectors
  • ltx1, x2, ,xn, Cgt,
  • Class C1 or C-1
  • Supervised learning (like neural nets)
  • Learn from positive and negative examples
  • Output
  • Function predicting class of unlabeled vectors

22
SVM Example
  • Alpha helix predictor
  • 15 residue window
  • 21 numbers per residue
  • Psi-BLAST PSSM 20 numbers
  • spacer flag indicating off end of protein
  • 315 numbers total per window
  • Training samples
  • Non-helix samples ltx1, x2, , x315, -1gt
  • Helix samples ltx1, x2, , x315, 1gt
  • Training finds function of X that best separates
    the non-helix from the helix samples

23
SVM vs NNas Classifiers
  • Similarities
  • Compute a function on their inputs
  • Trained to minimize error
  • Differences
  • NNs find any hyperplane that separates the two
    clases
  • SVMs find the maximum- margin hyperplane
  • NNs can be engineered by designing their topology
  • SVMs can be tailored by designing the kernel
    function

24
SVM Details
Separating Hyperplanes
Choose w, b to minimize w Subject to
Dual form (support vectors)
Kernel trick replace dot products by a
non-linear kernel bunction.
s.t.
where
25
Dubious Statement
  • In marked contrast to NN, SVMs have few explicit
    parameters to fit
  • The vector of weights, w, is as long as the
    number of training samples
  • But the minimum-margin hyperplane will have most
    of the weights equal to zero only the support
    vectors will have non-zero weights.
Write a Comment
User Comments (0)
About PowerShow.com