Lecture 10 protein structure prediction - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Lecture 10 protein structure prediction

Description:

Protein ... 90% of new globular proteins share similar folds with known ... (align or place) a query protein sequence onto a template structure in ' ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 27
Provided by: csd50
Category:

less

Transcript and Presenter's Notes

Title: Lecture 10 protein structure prediction


1
Lecture 10 protein structure prediction
2
A protein sequence
3
A protein sequence
  • gtgi22330039refNP_683383.1 unknown protein
    protein id At1g45196.1 Arabidopsis thaliana
  • MPSESSYKVHRPAKSGGSRRDSSPDSIIFTPESNLSLFSSASVSVDRCSS
    TSDAHDRDDSLISAWKEEFEVKKDDESQNL
  • DSARSSFSVALRECQERRSRSEALAKKLDYQRTVSLDLSNVTSTSPRVVN
    VKRASVSTNKSSVFPSPGTPTYLHSMQKGW
  • SSERVPLRSNGGRSPPNAGFLPLYSGRTVPSKWEDAERWIVSPLAKEGAA
    RTSFGASHERRPKAKSGPLGPPGFAYYSLY
  • SPAVPMVHGGNMGGLTASSPFSAGVLPETVSSRGSTTAAFPQRIDPSMAR
    SVSIHGCSETLASSSQDDIHESMKDAATDA
  • QAVSRRDMATQMSPEGSIRFSPERQCSFSPSSPSPLPISELLNAHSNRAE
    VKDLQVDEKVTVTRWSKKHRGLYHGNGSKM
  • RDHVHGKATNHEDLTCATEEARIISWENLQKAKAEAAIRKLEKYFPQMKL
    EKKRSSSMEKIMRKVKSAEKRAEEMRRSVL
  • DNRVSTASHGKASSFKRSGKKKIPSLSGCFTCHVF

4
Protein Structure
Heparin docking Red heparin blue central
domain Yellow C-terminal domain
5
A Protein Structure
beta-sheet
alpha-helix
loop
core
6
Domain and Folds
  • A discrete portion of a protein assumed to fold
    independently of the rest of  the protein and
    possessing its own function.
  • Most proteins have multi-domains.
  • The core 3D structure of a domain is called a
    fold. There are only a few thousand possible
    folds.

7
Protein Similarity Level
  • Family
  • The proteins in the same family are homologous at
    the sequence level.
  • Super Family
  • all members of the super family should have the
    same overall domain architecture, i.e., the same
    domains in the same order
  • Fold
  • The folds of two domains are similar.

8
Protein Folding Problem
  • A protein folds into a unique 3D structure
    under the physiological condition.
  • Lysozyme sequence
  • KVFGRCELAA AMKRHGLDNY
  • RGYSLGNWVC AAKFESNFNT
  • QATNRNTDGS TDYGILQINS
  • RWWCNDGRTP GSRNLCNIPC
  • SALLSSDITA SVNCAKKIVS
  • DGNGMNAWVA WRNRCKGTDV
  • QAWIRGCRL

9
Relevance of Protein Structurein the Post-Genome
Era
structure
medicine
sequence
function
10
Structure-Function Relationship
  • Certain level of function can be found
    without structure. But a structure is a key to
    understand the detailed mechanism.
  • A predicted structure is a powerful tool for
    function inference.

Trp repressor as a function switch
11
Structure-Based Drug Design
  • Structure-based rational drug design is still
    a major method for drug discovery.

HIV protease inhibitor
12
Protein Structure Prediction
  • Structure
  • Traditional experimental methods
  • X-Ray or NMR to solve structures
  • generate a few structures per day worldwide
  • cannot keep pace for new protein sequences
  • Strong demand for structure prediction
  • more than 30,000 human genes
  • 10,000 genomes will be sequenced in the next 10
    years.
  • Unsolved problem after efforts of two decades.

13
Ab initio Structure Prediction
  • An energy function to describe the protein
  • bond energy
  • bond angle energy
  • dihedral angel energy
  • van der Waals energy
  • electrostatic energy
  • Minimize the function and obtain the structure.
  • Not practical in general
  • Computationally too expensive
  • Accuracy is poor

14
Template-Based Prediction
  • Structure is better conserved than sequence
  • Structure can adopt a
  • wide range of mutations.
  • Physical forces favor
  • certain structures.
  • Number of fold is limited.
  • Currently 700
  • Total 1,000 10,000 TIM
    barrel

15
Scope of the Problem
  • 90 of new globular proteins share similar folds
    with known structures, implying the general
    applicability of comparative modeling methods for
    structure prediction
  • general applicability of template-based modeling
    methods for structure prediction (currently
    60-70 of new proteins, and this number is
    growing as more structures being solved)
  • NIH Structural Genomics Initiative plans to
    experimentally solve 10,000 unique structures
    and predict the rest using computational methods

16
Homology Modeling
  • Sequence is aligned with sequence of known
    structure, usually sharing sequence identity of
    30 or more.
  • Superimpose sequence onto the template, replacing
    equivalent sidechain atoms where necessary.
  • Refine the model by minimizing an energy
    function.
  • Applicable to 20 of all proteins.

17
Concept of Threading
  • Thread (align or place) a query protein sequence
    onto a template structure in optimal way
  • Good alignment gives approximate backbone
    structure

Query sequence MTYKLILNGKTKGETTTEAVD
AATAEKVFQYANDNGVDGEWTYTE Template set
Prediction accuracy fold recognition / alignment
18
4 Components of Threading
  • Template library
  • Scoring function
  • Alignment
  • Confidence assessment

19
Core of a Template
Core secondary structures a-helices and
b-strands
20
Definition of Template
  • Residue type / profile
  • Secondary structure type
  • Solvent assessibility
  • Coordinates for Ca / Cb
  • RES 1 G 156 S 23 10.528 -13.223 9.932
    11.977 -12.741 10.115
  • RES 5 P 157 H 110 12.622 -17.353 10.577
    12.981 -16.146 11.485
  • RES 5 G 158 H 61 17.186 -15.086 9.205
    16.601 -15.457 10.578
  • RES 5 Y 159 H 91 16.174 -10.939 12.208
    16.612 -12.343 12.727
  • RES 5 C 160 H 8 12.670 -12.752 15.349
    14.163 -13.137 15.545
  • RES 1 G 161 S 14 15.263 -17.741 14.529
    15.022 -16.815 15.733

21
Energy (Score) Function
YKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEW
Pairwise energy How preferable to put two
particular residues nearby E_p
Singleton energy How well a residue fits a
template position (sequence and structural
environment) E_s
Alignment gap penalty E_g
Total energy E_p E_s E_g
22
Threading problem
  • Threading Given a sequence, and a fold
    (template), compute the optimal alignment score
    between the sequence and the fold.
  • If we can solve the above problem, then
  • Given a sequence, we can try each known fold, and
    find the best fold that fits this sequence.
  • Because there are only a few thousands folds, we
    can find the correct fold for the given sequence.
  • Threading is NP-hard.

23
Computational Methods
  • Branch and Bound.
  • Integer Program.
  • Use linear programming plus branch and bound.

24
(No Transcript)
25
Blue Gene
  • On December 6, 1999, IBM announced a 100 million
    research initiative to build the world's fastest
    supercomputer, "Blue Gene", to tackle fundamental
    problems in computational biology.
  • More than one petaflop/s (1,000,000,000,000,000
    floating point operations per second)

26
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com