Title: PREDICTING PROTEIN STRUCTURE AND BEYOND
1PREDICTING PROTEIN STRUCTURE AND BEYOND .
P. V. Balaji Biotechnology Center I.I.T., Bombay
2Organization of the talk
1. Why predict the structure?
2. Methods for structure prediction
3. What next?
3Genome Size is not Proportional to the Complexity
of the Organism
4Molecular Logic of Life is Same
Genome
- Extremely diverse organisms
Biochemically, all things living animals,
plants, bacteria, viruses, etc. are remarkably
similar
5Genome Sequencing and Analysis One of the Key
Steps in Deciphering the Logic of Life
Even minute details have to be analyzed
Hang him, not let him go
Hang him not, let him go
Humans NeuNAc
Chimpanzees NeuNGc
CH3
CH2OH
6Innovations in Technology Have Made Genome
Sequencing a Routine Affair
it is unlikely that the base sequence of more
than a few percent of such a complex DNA will
ever be determined C W Schmid W R Jelinek,
Science, June 1982
7One Aspect of Genome Sequence Analysis is to
Assign Functions to Proteins (Reverse Genetics)
8Function of a Protein can be Defined at Different
Levels
Example Lysozyme
Biochemical level Hydrolyzes CO bond
Physiological level Breaks down the cell wall
Cellular level Defense against infection
Different Analysis Tools Provide Functions at
Different Levels
9Hallmark of Proteins Specificity
Know exactly which small molecule (ligand) they
should bind to or interact with
Also know which part of a macromolecule they
should bind to
10Origin of Specificity
Function is critically dependent on structure
1ruv.pdb
11(No Transcript)
12Sequence Determines Structure
1KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHE
S LADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAY
KTT QANKHIIVACEGNPYVPVHFDASV124
1ruv.pdb
Christian B. Anfinsen Nobel Prize in Chemistry
(1972)
13How Does Sequence Specify Structure?
?
The Protein Folding Problem (second half of the
genetic code)
Structure has to be determined experimentally
14Experimental Methods of Structure Determination
Solubilization of the over-expressed protein
15Limitations of Experimental Methods Consequences
Annotated proteins in the databank 100,000
Total number including ORFs 700,000
Proteins with known structure 5,000 !
Dataset for analysis
ORF, or Open Reading Frame, is a region of genome
that codes for a protein Have been identified by
whole genome sequencing efforts ORFs with no
known function are termed orphan
16Structural Biology Consortia Brute Force
Approach Towards Structure Elucidation
Aim to solve about 400 structures a year
Employ battalions of Ph.Ds Post-doctorals
Large-scale expression crystallization attempts
Basic strategies remain the same
No (known) new tricks
Unrelenting ones will be ignored
Enhances the statistical base for inferring
sequence structure relationships
17Predicting Protein Structure 1. Comparative
Modeling (formerly, homology modeling)
18Comparative Modeling
Structure is much more conserved than sequence
during evolution
Higher the similarity, higher is the confidence
in the modeled structure
19Predicting Protein Structure Alternative Methods
Threading or Fold Recognition
Ab initio
In addition, establishing sequence ? structure
relationship is also important
Input from people trained in statistics, pattern
recognition and related areas of computer science
is very critical
20Statistical Analysis of Protein Structures
Microenvironment Characterization
Describe structures at multiple levels of detail
using a comprehensive set of properties
Atom based properties
Type, Hydrophobicity, Charge
Residue based properties
Type, Hydrophobicity
Chemical group
Hydroxyl, Amide, Carbonyl, etc.
Secondary structure
a-Helix, b-Strand, Turn, Loop
Other properties
VDW volume, B-factor, Mobility, Solvent
accessibility
21Predicting Protein Structure 2. Threading or
Fold Recognition
Fold recognition is essentially finding the best
fit of a sequence to a set of candidate folds
22Fold of a Protein
Refers to the spatial arrangement of its
secondary structural elements (a-helices and
b-strands)
1l45.pdb
4bcl.pdb
1mbl.pdb
a/b-barrel
b-barrel
a/b-sandwich
23Threading Basic Strategy
dhgakdflsdfjaslfkjsdlfjsdfjasd
Query
24Predicting Protein Structure 3. Ab Initio Methods
Sequence
Prediction
Secondary structure
Low energy structures
Predicted structure
Validation
Energy Minimization
Mean field potentials
25Small molecules and/or metal ions are an integral
part of certain proteins
1a6g.pdb
Predicting the structure of such proteins is an
entirely different challenge
26Proof of the Pudding CASP Meetings
Community Wide Experiment on the Critical
Assessment of Techniques for Protein Structure
Prediction 4
Predictions not Post-dictions
Easy and medium targets 100 success Hard
targets 50 success Significant increase from
CASP3
27OK, I can predict the structure correctly! is
that it?
28Inferring Function from Structure Caveats
Glyceraldehyde 3-phosphate dehydrogenase Glycolysi
s Binding protein for plasmin, fibronectin and
lysozyme Transcriptional control of gene
expression, DNA replication and
repair Flocculation
29Same fold, different oligomerization
Dimerization
Tetramerization
ConA
ConA
PNA
PNA, GSIV
30Ligand Induced Conformational Changes are Quite
Common
Binding of first substrate redefines the active
site and creates the binding pocket for the
second substrate and the metal ion
Flexible loop
After
Before
31Take Home Message
32Acknowledgement
Few Useful Links
http//guitar.rockefeller.edu/modeller/modeller.ht
ml
http//www.biochem.ucl.ac.uk/bsm/cath-new/index.ht
ml
http//predictioncenter.llnl.gov/
http//insulin.brunel.ac.uk