Title: Computational Approaches to Receptor Structure Prediction
1Computational Approaches to Receptor Structure
Prediction
- Ugur Sezerman
- Biological Sciences and Bioengineering Program
- Sabanci University, Istanbul
2Determining Protein Structure
- There are O(100,000) distinct proteins in the
human proteome. - 3D structures have been determined for over
60,000 proteins, from all organisms - Includes duplicates with different ligands bound,
etc. - Coordinates are determined by X-ray
crystallography or NMR
3X-Ray Crystallography
- The crystal is a mosaic of millions of copies of
the protein. - As much as 70 is solvent (water)!
- May take months (and a green thumb) to grow.
4X-Ray diffraction
- Image is averagedover
- Space (many copies)
- Time (of the diffractionexperiment)
5Electron Density Maps
- Resolution is dependent on the quality/regularity
of the crystal - R-factor is a measure of leftover electron
density - Solvent fitting
- Refinement
6The Protein Data Bank
ATOM 1 N ALA E 1 22.382 47.782
112.975 1.00 24.09 3APR 213 ATOM 2 CA
ALA E 1 22.957 47.648 111.613 1.00
22.40 3APR 214 ATOM 3 C ALA E 1
23.572 46.251 111.545 1.00 21.32 3APR
215 ATOM 4 O ALA E 1 23.948
45.688 112.603 1.00 21.54 3APR 216 ATOM
5 CB ALA E 1 23.932 48.787 111.380
1.00 22.79 3APR 217 ATOM 6 N GLY E
2 23.656 45.723 110.336 1.00 19.17
3APR 218 ATOM 7 CA GLY E 2 24.216
44.393 110.087 1.00 17.35 3APR 219 ATOM
8 C GLY E 2 25.653 44.308 110.579
1.00 16.49 3APR 220 ATOM 9 O GLY E
2 26.258 45.296 110.994 1.00 15.35
3APR 221 ATOM 10 N VAL E 3 26.213
43.110 110.521 1.00 16.21 3APR 222 ATOM
11 CA VAL E 3 27.594 42.879 110.975
1.00 16.02 3APR 223 ATOM 12 C VAL E
3 28.569 43.613 110.055 1.00 15.69
3APR 224 ATOM 13 O VAL E 3 28.429
43.444 108.822 1.00 16.43 3APR 225 ATOM
14 CB VAL E 3 27.834 41.363 110.979
1.00 16.66 3APR 226 ATOM 15 CG1 VAL E
3 29.259 41.013 111.404 1.00 17.35
3APR 227 ATOM 16 CG2 VAL E 3 26.811
40.649 111.850 1.00 17.03 3APR 228
7A Peek at Protein Function
- Serine proteases cleave other proteins
- Catalytic Triad ASP, HIS, SER
8Cleaving the peptide bond
9Three Serine Proteases
- Chymotrypsin Cleaves the peptide bond on the
carboxyl side of aromatic (ring) residues Trp,
Phe, Tyr and large hydrophobic residues Met. - Trypsin Cleaves after Lys (K) or Arg (R)
- Positive charge
- Elastase Cleaves after small residues Gly,
Ala, Ser, Cys
10Specificity Binding Pocket
11Protein Folding Biological perspective
- Central dogma Sequence specifies structure
- Denature to unfold a protein back to random
coil configuration - ?-mercaptoethanol breaks disulfide bonds
- Urea or guanidine hydrochloride denaturant
- Also heat or pH
- Anfinsens experiments
- Denatured ribonuclease
- Spontaneously regained enzymatic activity
- Evidence that it re-folded to native conformation
12PROTEIN FOLDING PROBLEM
- STARTING FROM AMINO ACID SEQUENCE FINDING THE
STRUCTURE OF PROTEINS IS CALLED THE PROTEIN
FOLDING PROBLEM
13The Protein Folding Problem
- Central question of molecular biologyGiven a
particular sequence of amino acid residues
(primary structure), what will the
tertiary/quaternary structure of the resulting
protein be? - Input AAVIKYGCALOutput ?1?1, ?2?2
backbone conformation(no side chains yet)
14Folding intermediates
- Levinthals paradox Consider a 100 residue
protein. If each residue can take only 3x39
positions, there are 9100 possible conformations. - Folding must proceed by progressive stabilization
of intermediates - Molten globules most secondary structure
formed, but much less compact than native
conformation.
15Protein Packing
- occurs in the cytosol (60 bulk water, 40
water of hydration) - involves interaction between secondary structure
elements and solvent - may be promoted by chaperones, membrane proteins
- tumbles into molten globule states
- overall entropy loss is small enough so enthalpy
determines sign of ?E, which decreases (loss in
entropy from packing counteracted by gain from
desolvation and reorganization of water, i.e.
hydrophobic effect) - yields tertiary structure
16Folding help
- Proteins are, in fact, only marginally stable
- Native state is typically only 5 to 10 kcal/mole
more stable than the unfolded form - Many proteins help in folding
- Protein disulfide isomerase catalyzes shuffling
of disulfide bonds - Chaperones break up aggregates and (in theory)
unfold misfolded proteins
17Forces driving protein folding
- It is believed that hydrophobic collapse is a key
driving force for protein folding - Hydrophobic core
- Polar surface interacting with solvent
- Minimum volume (no cavities)
- Disulfide bond formation stabilizes
- Hydrogen bonds
- Polar and electrostatic interactions
18Secondary Structure
- non-linear
- 3 dimensional
- localized to regions of an amino acid chain
- formed and stabilized by hydrogen bonding,
electrostatic and van der Waals interactions
19Common motifs
20The Hydrophobic Core
- Hemoglobin A is the protein in red blood cells
(erythrocytes) responsible for binding oxygen. - The mutation E6?V in the ? chain places a
hydrophobic Val on the surface of hemoglobin - The resulting sticky patch causes hemoglobin S
to agglutinate (stick together) and form fibers
which deform the red blood cell and do not carry
oxygen efficiently - Sickle cell anemia was the first identified
molecular disease
21Sickle Cell Anemia
Sequestering hydrophobic residues in the protein
core protects proteins from hydrophobic
agglutination.
22Computational Approaches
- Ab initio methods
- Threading
- Comperative Modelling
- Fragment Assembly
23Why is ab-initio prediction hard?
24Ab-initio protein structure prediction as an
optimization problem
- Define a function that map protein structures to
some quality measure.
- Solve the computational problem of finding an
optimal structure. - ?
25- A dream function
- ? Has a clear minimum in the native structure.
- ? Has a clear path towards the minimum.
- ? Global optimization algorithm should find the
native structure.
Chen Keasar BGU
26- An approximate function
- ? Easier to design and compute.
- ? Native structure not always the global
minimum. - ? Global optimization methods do not converge.
Many alternative models (decoys) should be
generated.
Chen Keasar BGU
27- An approximate function
- ? Easier to design and compute.
- ? Native structure not always the global
minimum. - ? Global optimization methods do not converge.
Many alternative models (decoys) should be
generated. - ? No clear way of choosing among them.
Chen Keasar BGU
28Fold Optimization
- Simple lattice models (HP-models)
- Two types of residues hydrophobic and polar
- 2-D or 3-D lattice
- The only force is hydrophobic collapse
- Score number of H?H contacts
29Scoring Lattice Models
- H/P model scoring count noncovalent hydrophobic
interactions. - Sometimes
- Penalize for buried polar or surface hydrophobic
residues
30What can we do with lattice models?
- For smaller polypeptides, exhaustive search can
be used - Looking at the best fold, even in such a simple
model, can teach us interesting things about the
protein folding process - For larger chains, other optimization and search
methods must be used - Greedy, branch and bound
- Evolutionary computing, simulated annealing
- Graph theoretical methods
31Learning from Lattice Models
- The hydrophobic zipper effect
Ken Dill 1997
32Threading Fold recognition
- Given
- Sequence IVACIVSTEYDVMKAAR
- A database of molecular coordinates
- Map the sequence onto each fold
- Evaluate
- Objective 1 improve scoring function
- Objective 2 folding
33Protein Fold Families
- CATH website www.cathdb.info
34Secondary Structure Prediction
AGVGTVPMTAYGNDIQYYGQVT
A-VGIVPM-AYGQDIQY-GQVT
AG-GIIP--AYGNELQ--GQVT
AGVCTVPMTA---ELQYYG--T
AGVGTVPMTAYGNDIQYYGQVT
----hhhHHHHHHhhh--eeEE
35Secondary Structure Prediction
- Easier than folding
- Current algorithms can prediction secondary
structure with 70-80 accuracy - Chou, P.Y. Fasman, G.D. (1974). Biochemistry,
13, 211-222. - Based on frequencies of occurrence of residues in
helices and sheets - PhD Neural network based
- Uses a multiple sequence alignment
- Rost Sander, Proteins, 1994 , 19, 55-72
36Chou-Fasman Parameters
37HOMOLOGY MODELLING
- Using database search algorithms find the
sequence with known structure that best matches
the query sequence - Assign the structure of the core regions obtained
from the structure database to the query
sequence - Find the structure of the intervening loops using
loop closure algorithms
38Homology Modeling How it works
- Find template
- Align target sequence
- with template
- Generate model
- - add loops
- - add sidechains
- Refine model
39Prediction of Protein Structures
- Examples a few good examples
actual
predicted
predicted
actual
actual
actual
predicted
predicted
40Prediction of Protein Structures
411esr
42(No Transcript)
43(No Transcript)
44How can we predict protein structures?
45HOMOLOGY MODELLING
- Using database search algorithms find the
sequence with known structure that best matches
the query sequence - Assign the structure of the core regions obtained
from the structure database to the query
sequence - Find the structure of the intervening loops using
loop closure algorithms
46Homology Modeling How it works
- Find template
- Align target sequence
- with template
- Generate model
- - add loops
- - add sidechains
- Refine model
47Prediction of Protein Structures
- Examples a few good examples
actual
predicted
predicted
actual
actual
actual
predicted
predicted
48Prediction of Protein Structures
491esr
50(No Transcript)
51(No Transcript)
52 G-protein coupled receptors (GPCRs)
- Vital protein bundles with versatile functions.
- Play a key role in cellular signaling, regulation
of basic physiological processes by interacting
with more than 50 of prescription drugs. - Therefore excellent potential therapeutic target
for drug design and the focus of current
pharmaceutical research.
53GPCR Functional Classification Problem
- Although thousands of GPCR sequences are known,
the crystal structure solved only for one GPCR
sequence at medium resolution to date.
- For many of them, the activating ligand is
unknown. - Functional classification methods for automated
characterization of such GPCRs is imperative. - Not suitable for homology modelling but hybrid
methods may work. A Rayan J. Mol. Modelling
(2010) p 183-191
54Schematic overview of the MHC-I antigen
processing and presentation pathway
55Pathway and MHC Molecule
- Cytotoxic T-cells recognize antigen peptides
(8-10 residues) bound to a MHC class I molecule
on the cell surface.
56MHC-I bound epitope is scanned by T-cell receptor