Structure prediction - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Structure prediction

Description:

4 Basic Levels of Protein Structure and all information is in the sequence ... AB-INITIO methods. Simulate the process of folding ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 65
Provided by: dbb9
Category:

less

Transcript and Presenter's Notes

Title: Structure prediction


1
Structure prediction
  • Why do we need structure prediction ?
  • Intellectual
  • Practical
  • Levinthals paradox
  • Anfinsens experiments
  • How it will be solved
  • Physics
  • Computer science
  • The protein design problem

2
4 Basic Levels of Protein Structure and all
information is in the sequence
3
What is homology
4
Structures change linear with ED
5
Homology
  • Similar sequence - Similar structure ?

6
Examples of similar structure
7
Zones
8
Homology can be detected by Sequence Alignment
  • Key aspect of sequence comparison is sequence
    alignment
  • A sequence alignment maximizes the number of
    positions that are in agreement in two sequences

9
Alignments
  • Local alignment
  • Global alignments

Global Alignments LGPSTKDFGKISESREFDN
LNQLERSFGKINM-RLEDA Local
Alignments ----------FGKI----------
----------FGKI----------
10
Dotplots
11
Methods to align
  • Optimal alignment
  • Maximise similarities
  • Minimize gaps
  • Score of an alignment
  • Score for substition
  • Gap-opening and gap-extension costs
  • Dynamic programming
  • Finds optimal solution
  • BLAST
  • Heuristic, fast algorithm using indexes (hashes)?

12
When are two sequences homologous
13
Identities do not provide the best similarity
14
Statistics of Sequence scores
  • Local alignments
  • Follows extreme value distribution
  • Scores depends on log(length)?
  • E (or P-value)?
  • Global alignments
  • Heuristics
  • Randomize sequences

15
Statistical comparison of alignment scores
16
How to improve alignments
  • Use more evolutionary information
  • Multiple alignments
  • Profiles
  • HMMS
  • Profile-profile alignments
  • Using additional information
  • Structure
  • Structural alignments

17
Multiple sequence alignments
  • Computationally intensive
  • Heuristic methods

18
Profiles can be used to detect distant homologs
  • Extra information
  • How to best use
  • Different methods
  • Patterns
  • Evolutionary method
  • Profile methods
  • HMMs
  • ANNs

19
PSI-BLAST in a nutshell
  • With a protein sequence as query, use BLAST to
    search a protein sequence database.
  • Collapse significant local alignments (those with
    E-value less than or equal to a set threshold h)
    into a multiple alignment, using the residues of
    the query sequence as alignment-column
    placeholders.
  • Abstract a position-specific score matrix from
    the multiple alignment.
  • Search the database with the score matrix as
    query.
  • Iterate a fixed number of times, or until
    convergence.

20
Protein structure prediction (and other uses for
molecules in life in a computer)?
  • Secondary structure predictions
  • Homology detection
  • What is homology
  • Why is is related to protein structure
  • How does it work
  • Simulations of folding
  • What is physics ?
  • Realistic simulations (folding_at_home)?
  • Smart simulations (rosetta_at_home)?

21
It's not that simple...
  • Amino acid sequence contains all the information
    for 3D structure (experiments of Anfinsen,
    1970's)
  • But, there are thousands of atoms, rotatable
    bonds, solvent and other molecules to deal
    with...

22
All the 3D information is in the sequence
23
Levinthal Paradox
  • Cyrus Levinthal, Columbia University, 1968
  • Levinthal's paradox
  • If we have 3 rotamers per residues a 100 residue
    protein have 3100 possible conformations. To
    search all these takes longer than the time of
    the universe. But proteins fold in less than a
    second.
  • Resolution Proteins have to fold through some
    directed process
  • Goal is to understand the dynamics of this process

24
Old vs. New Views of folding
  • Old
  • Hierarchical view of protein folding
  • Secondary structures form, then interact to form
    tertiary structures
  • General order of events
  • New
  • Statistical ensembles of states
  • Potential energy landscape
  • Folding Funnel

25
Two alternatives for structure prediction
  • Simulation of protein folding
  • Folding_at_home (Erik next week)?
  • Identification of lowest energy structure
  • More successful (today)?
  • Several layers
  • Secondary structure
  • 3D-structure

26
Secondary structure prediction
  • AA preferences for different SS
  • Pro
  • Does not have a NH backbone
  • No H-bonds
  • Prefers Coils
  • Also in N-terminal part of helices and Beta-turns
  • Gly (compared with Ala)
  • No sidechain on Gly (more flexible)?
  • Polar groups in loops
  • Additional H-bonds to backbone

27
Amino acid preferences in coil
28
Amino acid preferences in ?-Strand
29
Secondary structure preferences
  • C? branched AAs prefers sheets
  • Entropic cost in helices of sidechain rotations
  • Hydrophobic groups prefers SS-elements
  • Negatively charged residues at C-terminal end of
    helices due to dipole effect.

30
Amino acid preferences in ?-Strand
31
Amino acid preferences in ?-Helix
32
Secondary structure preferences
  • C? branched AAs prefers sheets
  • Entropic cost in helices of sidechain rotations
  • Hydrophobic groups in SS-elements
  • Polar in loops
  • Negatively charged residues at C-terminal end of
    helices due to dipole effect.

33
Templates for helix, loops and sheet
34
More elaborate templates
  • Key residues
  • Gly in turns

35
Incorporating globular effects
  • Hydrophobic lake

36
Exemple of SS predictions
37
PhD (Rost Sander, 1994)?
38
PhD-Input
39
PhD-architecture
40
PhD-predictions
41
PhD summary
  • First methods with gt70 Q3
  • Correct length distributions
  • Much better beta strand predictions
  • Good correlation between score and accuracy
  • Better predictions for larger multiple sequence
    alignments

42
Threading
  • A priori prediction of the Interferon fold in
    1985
  • Good precdiction of helices

43
Prediction of interferon fold
44
FR methodologies
45
3D profiles
46
Threading or Fold recognition
VIFVLWGNAARQKCN LLFQTKHQHAVLACPH
47
PROSA/THREADER
48
How good is FR?
  • LiveBench and CASP measure performance.
  • E-values work reasonably well.
  • In the real world, you might get a few percent
    more hits'' with FR compared to PSI-BLAST.
  • Individual researcher vs. genome-wide analysis
  • Structure information not necessary?

49
Sucess of FR
50
Alignments are not always perfect
51
Does threading really work ?
  • Evolutionary methods work better
  • Secondary Structure Predictions might help

52
AB-INITIO methods
  • Simulate the process of folding
  • Folding_at_home - MD simulations of small peptides
  • Find the lowest energy structure
  • Not simulate process.
  • Consequences of small energy gap
  • Unrealistic to model exactly
  • Easier to distinguish between Correct/Incorrect
    than between Folded/non-folded.

53
How Rosetta Works
  • Minimize energy in the folded state
  • Uses a combination of energy formulas based on
    the likelihood of particular structures, and the
    fitness of the sequence
  • Side-chains simplified to a centroid located at
    center of mass of the side-chain
  • Average of observed side-chain centroids in known
    structures
  • Local sequence does not decide the local
    structure, it only biases the decision
  • Non-local favorable conditions
  • Buried hydrophobic fragments
  • Paired ß strands
  • Specific side-chain interactions

54
Rosetta clustering the models
  • Compare models to each other with RMSD
  • Models can come from different family members
  • Cutoff varied to give 80-100 members in largest
    cluster
  • The largest clusters are assumed to contain the
    best structures (attractors in folding space...?)

55
Recent improvements to Rosetta
  • Refinement in HR rosetta
  • Make small dihedral changes
  • Rebuild sidechains
  • Minimize (in dihedral space)?
  • Evaluate energy
  • Go To 1
  • 5 out 16 small proteins lt 1.5 Ã…

56
Physics of Rosetta
  • Is Rosetta physical ?
  • What is the most important terms in globular and
    local free energies ?
  • How does proteins really fold ?
  • What do you think ?

57
Designs
  • Molten globule designs
  • Regan 50

58
Deign of four-helical bundle (De Grado, 1991)
Molten Globule
59
What characterizes a molten globule
  • Compact
  • Good secondary structure
  • Not solid
  • Sidechains not packed
  • No cooperative folding

60
Mayo method
  • Automatic design
  • Take fold (backbone)?
  • Take sequence (random)?
  • Mutate sequence
  • Build sidechains
  • Calculate energy
  • Accept/reject
  • Go to 3

61
Designing a non-zinc finger
62
Design of a non-zinc finger (Dahiyat and Mayo)
63
Alfabetin
64
Non-MG alfabetin shows cooperative folding
65
TOP7 (Rosetta Design)
  • Novel fold
  • Iterate between design and refinement
  • Non molten globular behavior.

66
Iteration between Seq and Str
Sequence Structure
67
The project
  • Three goals
  • Learn how to develop a (binary) predictor
  • Read the background literature
  • Write a scientific report about your work
  • Write a program that can do all of this.
  • Additional goal (for top grades)
  • Make a web-server
  • Combine your predictor into a full system

68
Tools
  • Python (or other language)
  • Write scripts to do the work
  • svmlight
  • Used in the bioinformatics course
  • Preparsed datafiles
  • Annotations
  • PSIBLAST needs to be parsed
  • Evaluations programs should be developed.

69
The projects
  • Binary classifier
  • Alpha-helix/other etc..
  • Surface area
  • Membrane non-membrane
  • Globular or membrane datasets

70
The program
  • Input.
  • Sequence in fasta file
  • Output
  • A prediction for each residue in the sequence

71
Web-server
  • For top grades (A,B) a web-server should be
    developed, using the following steps.
  • Learn how to use PHP
  • Ask for an account on a web-server.
  • Use the templates index.php available from the
    web-page.

72
The report
  • The following sections
  • Abstract
  • Introduction
  • Methods
  • Results and Discussion
  • Conclusions
  • References
  • More info May 8
Write a Comment
User Comments (0)
About PowerShow.com