Computational Approaches to Receptor Structure Prediction - PowerPoint PPT Presentation

1 / 56

About This Presentation

Title:

Computational Approaches to Receptor Structure Prediction

Description:

Computational Approaches to Receptor Structure Prediction U ur Sezerman Biological Sciences and Bioengineering Program Sabanc University, Istanbul – PowerPoint PPT presentation

Number of Views:211

Avg rating:3.0/5.0

Slides: 57

Provided by: Micha858

Category:

more less

Transcript and Presenter's Notes

Title: Computational Approaches to Receptor Structure Prediction

1
Computational Approaches to Receptor Structure
Prediction

Ugur Sezerman
Biological Sciences and Bioengineering Program
Sabanci University, Istanbul

2
Determining Protein Structure

There are O(100,000) distinct proteins in the
human proteome.
3D structures have been determined for over
60,000 proteins, from all organisms
Includes duplicates with different ligands bound,
etc.
Coordinates are determined by X-ray
crystallography or NMR

3
X-Ray Crystallography

The crystal is a mosaic of millions of copies of
the protein.
As much as 70 is solvent (water)!
May take months (and a green thumb) to grow.

4
X-Ray diffraction

Image is averagedover
Space (many copies)
Time (of the diffractionexperiment)

5
Electron Density Maps

Resolution is dependent on the quality/regularity
of the crystal
R-factor is a measure of leftover electron
density
Solvent fitting
Refinement

6
The Protein Data Bank

http//www.rcsb.org/pdb/

ATOM 1 N ALA E 1 22.382 47.782
112.975 1.00 24.09 3APR 213 ATOM 2 CA
ALA E 1 22.957 47.648 111.613 1.00
22.40 3APR 214 ATOM 3 C ALA E 1
23.572 46.251 111.545 1.00 21.32 3APR
215 ATOM 4 O ALA E 1 23.948
45.688 112.603 1.00 21.54 3APR 216 ATOM
5 CB ALA E 1 23.932 48.787 111.380
1.00 22.79 3APR 217 ATOM 6 N GLY E
2 23.656 45.723 110.336 1.00 19.17
3APR 218 ATOM 7 CA GLY E 2 24.216
44.393 110.087 1.00 17.35 3APR 219 ATOM
8 C GLY E 2 25.653 44.308 110.579
1.00 16.49 3APR 220 ATOM 9 O GLY E
2 26.258 45.296 110.994 1.00 15.35
3APR 221 ATOM 10 N VAL E 3 26.213
43.110 110.521 1.00 16.21 3APR 222 ATOM
11 CA VAL E 3 27.594 42.879 110.975
1.00 16.02 3APR 223 ATOM 12 C VAL E
3 28.569 43.613 110.055 1.00 15.69
3APR 224 ATOM 13 O VAL E 3 28.429
43.444 108.822 1.00 16.43 3APR 225 ATOM
14 CB VAL E 3 27.834 41.363 110.979
1.00 16.66 3APR 226 ATOM 15 CG1 VAL E
3 29.259 41.013 111.404 1.00 17.35
3APR 227 ATOM 16 CG2 VAL E 3 26.811
40.649 111.850 1.00 17.03 3APR 228
7
A Peek at Protein Function

Serine proteases cleave other proteins
Catalytic Triad ASP, HIS, SER

8
Cleaving the peptide bond
9
Three Serine Proteases

Chymotrypsin Cleaves the peptide bond on the
carboxyl side of aromatic (ring) residues Trp,
Phe, Tyr and large hydrophobic residues Met.
Trypsin Cleaves after Lys (K) or Arg (R)
Positive charge
Elastase Cleaves after small residues Gly,
Ala, Ser, Cys

10
Specificity Binding Pocket
11
Protein Folding Biological perspective

Central dogma Sequence specifies structure
Denature to unfold a protein back to random
coil configuration
?-mercaptoethanol breaks disulfide bonds
Urea or guanidine hydrochloride denaturant
Also heat or pH
Anfinsens experiments
Denatured ribonuclease
Spontaneously regained enzymatic activity
Evidence that it re-folded to native conformation

12
PROTEIN FOLDING PROBLEM

STARTING FROM AMINO ACID SEQUENCE FINDING THE
STRUCTURE OF PROTEINS IS CALLED THE PROTEIN
FOLDING PROBLEM

13
The Protein Folding Problem

Central question of molecular biologyGiven a
particular sequence of amino acid residues
(primary structure), what will the
tertiary/quaternary structure of the resulting
protein be?
Input AAVIKYGCALOutput ?1?1, ?2?2
backbone conformation(no side chains yet)

14
Folding intermediates

Levinthals paradox Consider a 100 residue
protein. If each residue can take only 3x39
positions, there are 9100 possible conformations.
Folding must proceed by progressive stabilization
of intermediates
Molten globules most secondary structure
formed, but much less compact than native
conformation.

15
Protein Packing

occurs in the cytosol (60 bulk water, 40
water of hydration)
involves interaction between secondary structure
elements and solvent
may be promoted by chaperones, membrane proteins
tumbles into molten globule states
overall entropy loss is small enough so enthalpy
determines sign of ?E, which decreases (loss in
entropy from packing counteracted by gain from
desolvation and reorganization of water, i.e.
hydrophobic effect)
yields tertiary structure

16
Folding help

Proteins are, in fact, only marginally stable
Native state is typically only 5 to 10 kcal/mole
more stable than the unfolded form
Many proteins help in folding
Protein disulfide isomerase catalyzes shuffling
of disulfide bonds
Chaperones break up aggregates and (in theory)
unfold misfolded proteins

17
Forces driving protein folding

It is believed that hydrophobic collapse is a key
driving force for protein folding
Hydrophobic core
Polar surface interacting with solvent
Minimum volume (no cavities)
Disulfide bond formation stabilizes
Hydrogen bonds
Polar and electrostatic interactions

18
Secondary Structure

non-linear
3 dimensional
localized to regions of an amino acid chain
formed and stabilized by hydrogen bonding,
electrostatic and van der Waals interactions

19
Common motifs
20
The Hydrophobic Core

Hemoglobin A is the protein in red blood cells
(erythrocytes) responsible for binding oxygen.
The mutation E6?V in the ? chain places a
hydrophobic Val on the surface of hemoglobin
The resulting sticky patch causes hemoglobin S
to agglutinate (stick together) and form fibers
which deform the red blood cell and do not carry
oxygen efficiently
Sickle cell anemia was the first identified
molecular disease

21
Sickle Cell Anemia
Sequestering hydrophobic residues in the protein
core protects proteins from hydrophobic
agglutination.
22
Computational Approaches

Ab initio methods
Threading
Comperative Modelling
Fragment Assembly

23
Why is ab-initio prediction hard?
24
Ab-initio protein structure prediction as an
optimization problem

Define a function that map protein structures to
some quality measure.

Solve the computational problem of finding an
optimal structure.
?

A dream function
? Has a clear minimum in the native structure.
? Has a clear path towards the minimum.
? Global optimization algorithm should find the
native structure.

Chen Keasar BGU
26

An approximate function
? Easier to design and compute.
? Native structure not always the global
minimum.
? Global optimization methods do not converge.
Many alternative models (decoys) should be
generated.

Chen Keasar BGU
27

An approximate function
? Easier to design and compute.
? Native structure not always the global
minimum.
? Global optimization methods do not converge.
Many alternative models (decoys) should be
generated.
? No clear way of choosing among them.

Chen Keasar BGU
28
Fold Optimization

Simple lattice models (HP-models)
Two types of residues hydrophobic and polar
2-D or 3-D lattice
The only force is hydrophobic collapse
Score number of H?H contacts

29
Scoring Lattice Models

H/P model scoring count noncovalent hydrophobic
interactions.
Sometimes
Penalize for buried polar or surface hydrophobic
residues

30
What can we do with lattice models?

For smaller polypeptides, exhaustive search can
be used
Looking at the best fold, even in such a simple
model, can teach us interesting things about the
protein folding process
For larger chains, other optimization and search
methods must be used
Greedy, branch and bound
Evolutionary computing, simulated annealing
Graph theoretical methods

31
Learning from Lattice Models

The hydrophobic zipper effect

Ken Dill 1997
32
Threading Fold recognition

Given
Sequence IVACIVSTEYDVMKAAR
A database of molecular coordinates
Map the sequence onto each fold
Evaluate
Objective 1 improve scoring function
Objective 2 folding

33
Protein Fold Families

CATH website www.cathdb.info

34
Secondary Structure Prediction
AGVGTVPMTAYGNDIQYYGQVT
A-VGIVPM-AYGQDIQY-GQVT
AG-GIIP--AYGNELQ--GQVT
AGVCTVPMTA---ELQYYG--T
AGVGTVPMTAYGNDIQYYGQVT
----hhhHHHHHHhhh--eeEE
35
Secondary Structure Prediction

Easier than folding
Current algorithms can prediction secondary
structure with 70-80 accuracy
Chou, P.Y. Fasman, G.D. (1974). Biochemistry,
13, 211-222.
Based on frequencies of occurrence of residues in
helices and sheets
PhD Neural network based
Uses a multiple sequence alignment
Rost Sander, Proteins, 1994 , 19, 55-72

36
Chou-Fasman Parameters
37
HOMOLOGY MODELLING

Using database search algorithms find the
sequence with known structure that best matches
the query sequence
Assign the structure of the core regions obtained
from the structure database to the query
sequence
Find the structure of the intervening loops using
loop closure algorithms

38
Homology Modeling How it works

Find template
Align target sequence
with template
Generate model
- add loops
- add sidechains
Refine model

39
Prediction of Protein Structures

Examples a few good examples

actual
predicted
predicted
actual
actual
actual
predicted
predicted
40
Prediction of Protein Structures

Not so good example

41
1esr
42
(No Transcript)
43
(No Transcript)
44
How can we predict protein structures?
45
HOMOLOGY MODELLING

Using database search algorithms find the
sequence with known structure that best matches
the query sequence
Assign the structure of the core regions obtained
from the structure database to the query
sequence
Find the structure of the intervening loops using
loop closure algorithms

46
Homology Modeling How it works

Find template
Align target sequence
with template
Generate model
- add loops
- add sidechains
Refine model

47
Prediction of Protein Structures

Examples a few good examples

actual
predicted
predicted
actual
actual
actual
predicted
predicted
48
Prediction of Protein Structures

Not so good example

49
1esr
50
(No Transcript)
51
(No Transcript)
52
G-protein coupled receptors (GPCRs)

Vital protein bundles with versatile functions.
Play a key role in cellular signaling, regulation
of basic physiological processes by interacting
with more than 50 of prescription drugs.
Therefore excellent potential therapeutic target
for drug design and the focus of current
pharmaceutical research.

53
GPCR Functional Classification Problem

Although thousands of GPCR sequences are known,
the crystal structure solved only for one GPCR
sequence at medium resolution to date.

For many of them, the activating ligand is
unknown.
Functional classification methods for automated
characterization of such GPCRs is imperative.
Not suitable for homology modelling but hybrid
methods may work. A Rayan J. Mol. Modelling
(2010) p 183-191

54
Schematic overview of the MHC-I antigen
processing and presentation pathway
55
Pathway and MHC Molecule