Structure Prediction - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Structure Prediction

Description:

Title: PowerPoint Presentation Author: Mark Clement Last modified by: Mark Clement Created Date: 2/16/2005 8:12:16 PM Document presentation format – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 18
Provided by: MarkC243
Learn more at: http://dna.cs.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Structure Prediction


1
Chapter 9
  • Structure Prediction

2
Motivation
  • Given a protein, can you predict molecular
    structure
  • Want to avoid repeated x-ray crystallography, but
    want accuracy
  • You could use nucleotide alignment, but what do
    you do with the gapped regions?
  • More complex methods are only justified if they
    can be shown to perform better than simpler
    methods
  • Simpler methods are only justified if they can
    perform better than basic sequence alignment

3
First Step
  • Some structure comparison methods use secondary
    structures of the new sequence
  • Predict location of secondary structure elements
    along the proteins backbone and the degree of
    residue burial
  • Supervised learning has been shown to perform
    well in this task

4
Artificial Neural Network
  • Predicts
  • Structure
  • at this
  • point

5
Danger
  • You may train the network on your training set,
    but it may not generalize to other data
  • Perhaps we should train several ANNs and then let
    them vote on the structure

6
Profile network from HeiDelberg
  • family (alignment is used as input) instead of
    just the new sequence
  • On the first level, a window of length 13 around
    the residue is used
  • The window slides down the sequence, making a
    prediction for each residue
  • The input includes the frequency of amino acids
    occurring in each position in the multiple
    alignment (In the example, there are 5 sequences
    in the multiple alignment)
  • The second level takes these predictions from
    neural networks that are centered on neighboring
    proteins
  • The third level does a jury selection

7
PHD
  • Predicts 4

Predicts 5
Predicts 6
8
Threading
  • Threading matches structure to sequence
  • True threading considers 3D spatial interactions

9
3D-1D Matching (Bowie et al.)
  • Convert 3D structure into a string
  • Include ?-helix, ?-sheet or neither
  • Include buried or solvent accessible (6 levels)
  • Total of 3X618 distinct states
  • With Paj probability of finding amino acid (a)
    in environment (j) and Paprobability of finding
    (a) anywhere

10
3D-1D
  • Calculate the information values score on a
    training set of multiple alignments and the score
    was used as a profile for each column
  • When applied to the globin family an clearly
    identified myoglobins from nonglobins but not
    from other globins

11
Methods using 3D interactions
  • Residues that have large separation in the
    sequence may end up next to each other when the
    protein is folded.
  • Define a measure of contact between residues (two
    atoms within 5Å) and count frequency of contact
    between all pairs in PDB
  • Use measure in alignment to evaluate cost, or to
    select the best alignment

12
3D interactions
13
Potentials of mean force (POMF)
  • Since the notion of contact is somewhat
    arbitrary, a more general formulation can be
    tried
  • Derive an empirical function for the propensity
    of each of the 400 pairs of residues to be any
    given distance apart.

14
Multiple Sequence Threading
  • Multiple Sequence Alignment
  • Align the most similar to create a consensus
    sequence
  • Align consensus sequences to create overall
    alignment
  • Use the same strategy with structures
  • Assume that conserved hydrophobic positions
    should pack in the core
  • This appears to be work in progress (1997)

15
Example
  • Two small hydrophobic residues alanine (A) and
    valine (V), both of which favor packing in the
    core of the protein.
  • The POMF would have a peak around 5A
  • Aspartate (D) and valine since do not often pack
    together
  • The POMF will have a dip around 5A

POMF(A,V)
Probability
Distance
5A
POMF(D,V)
Probability
Distance
5A
16
Sequence-Structure Alignment
  • For all know structures
  • Align the unknown sequence to that structure
  • Find the best alignment
  • Return the structure with the best global
    alignment
  • Unfortunately, we cant use dynamic programming
    (NP Complete)
  • Heuristics must be used to explore the space.

17
Evaluating Methods
  • Is the complexity worth it?
  • This is difficult without a benchmark
  • Few comparative studies have been performed
  • When they have been performed, authors of
    competing methods have complained that wrong
    parameters were used
  • Critical Assessment of Structure Prediction (CASP
    1994) releases protein structures prior to
    publication.
  • All methods submit their predictions
  • Predictions are analyzed based on fold
    recognition, modeling accuracy and alignment
    accuracy.
  • No one method or approach is obviously superior
Write a Comment
User Comments (0)
About PowerShow.com