Title: BCB 444544 Introduction to Bioinformatics
1BCB 444/544 - Introduction to Bioinformatics
Lecture 31 Predicting Protein Structure by
Threading - cont 31_Nov6
2Seminars in Bioinformatics/Genomics
- Mon Nov 6
- Sue Lamont (An Sci, ISU) Integrated genomic
approaches to enhance host resistance to
food-safety pathogens - IG Faculty Seminar 1210 PM in 101 Ind Ed II
- Thurs Nov 9
- Hassane Mchauourab (Center for Structural
Biology, Vanderbilt) Structural dynamics of
multidrug transporters - Baker Center Seminar 210 PM in Howe Hall
Auditorium - Sean Rice (Biol Sci, Texas Tech) Constructing an
exact and universal evolutionary theory - Applie Math/EEOB Seminar 345 in 210 Bessey
3Assignments Reading This Week
Mon Nov 6 Review Protein Structure
Prediction Ginalski et al (2005) Nucleic Acids
Res.331874 doi10.1093/nar/gki327 Wed Nov
8 1) Review SVMs in Bioinformatics Yang 2004
Briefings in Bioinformatics 5328
doi10.1093/bib/5.4.328 2) SVMs
http//en.wikipedia.org/wiki/Support_Vector_Machin
e 3) ANNs http//en.wikipedia.org/wiki/Artific
ial_neural_network Thurs Nov 9 Lab 10
Protein Structure Prediction Fri Nov 10 Chp
8.1 - 8.4 Proteomics (Previously assigned)
4Assignments Due this week
BCB 444 544 HW5 Due at Noon, Mon Nov 6
(today) BCB 544 Only 544Extra2
Due at Noon, Mon Nov 12 Teams Must meet with
us this week
5Deciphering the Protein Folding Code
- Protein Structure Prediction
- or "Protein Folding" Problem
- Given the amino acid sequence of a protein,
predict its - 3-dimensional structure (fold)
- "Inverse Folding" Problem
- Given a protein fold, identify every amino acid
sequence that can adopt that - 3-dimensional structure
6Tertiary Structure Prediction
- 3 Major Approaches to Protein 3-D Structure
Prediction - 1- Ab initio
- Comparative modeling
- 2 - Homology modeling
- 3- Threading
- "Comparative modeling" - term is sometimes used
to mean just "homology modeling," but also
sometimes used to mean both "homology modeling"
"threading/fold recognition" - Most approaches exploit secondary structure
prediction as input or filtering step - Recall that 2' structure prediction can be highly
accurate - (gt90 on a per residue basis)
- You will perform 2' structure prediction in lab
this week
7Steps in Threading
- Align target sequence with template structures
- (fold library) from the Protein Data Bank (PDB)
- Calculate energy score to evaluate goodness of
fit between target sequence template structure - Rank models based on energy scores
8A Rapid Threading Approach for Protein Structure
Prediction
Kai-Ming Ho, Physics Haibo Cao Yungok
Ihm Zhong Gao James Morris Cai-zhuang
Wang Drena Dobbs, GDCB Jae-Hyung Lee Michael
Terribilini Jeff Sander
Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs,
D, Ho, KM (2004) Three-dimensional threading
approach to protein structure recognition Polymer
45687-697
9Motivations for Assumptions of Ho Threading
Algorithm
- Goal Develop a threading algorithm that
- Is simple rapid enough to be used in high
throughput applications - Is relatively "insensitive" to sequence
similarity between target protein sequence
sequence of template structure - (to enhance detection of remote homologs
structures that are similar due to
convergent evolution) - Can be used to answer questions such as
- What are the predicted folds of all "unassigned"
ORFs in Arabidopsis? - Does Arabidopsis have a protein with structure
similar to mammalian Tumor Necrosis Factor (TNF)? - Assumptions
- Native state of a protein is lowest free energy
state - Hydrophobic interactions drive protein folding
-
10Simplify Template structure representation
if
(contact)
Å
Yungok Ihm
11Simplify Target Sequence Representation
- Miyazawa-Jernigan (MJ) model inter-residue
contact energy M(i, j) is a quasi-chemical
approximation based on pair-wise contact
statistics extracted from known protein
structures in the PDB (20 X 20 matrix) -
- Li-Tang-Wingreen (LTW) factorize the MJ
interaction matrix to reduce the number of
parameters from 210 to 20 q values associated
with 20 amino acids - Hydrophobic-Polar (HP) represent amino acids as
either H (hydrophobic) or polar (P) utility of
this simple binary alphabet representation
promoted by Dill et al.
12Simplify Energy Function
- Interaction counts only if two hydrophobic
amino acid residues are in contact - At residue level, pair-wise hydrophobic
interaction is dominant - E ?i,j Cij Uij
-
- Cij contact matrix
- Uij U(residue I, residue J)
- MJ U Uij
- LTW U QiQj
- HP U 1,0
Yungok Ihm
13Energy calculation Contact energy
Li-Tang-Wingreen (LTW)
20 parameters
solubility hydrophobicity contact matrix
Contact Energy
with
Yungok Ihm
14Summary of Ho Threading Procedure
Yungok Ihm
15Can complexity be further reduced?
Haibo Cao
16Examine eigenvectors of contact matrix
Haibo Cao
17Represent contact matrix by its
dominanteigenvector (1D profile)
- First eigenvector (with highest eigenvalue)
dominates the overlap between sequence and
structure - Higher ranking (rank gt 4) eigenvectors are
sequence blind
Haibo Cao
18Threading Alignment Align target sequence
vector with 1D profile of template structure
Cao et al Polymer 45 (2004)
19Parameters for alignment?
- Gap penalty
- Insertion/deletion in helices or strands is
strongly penalized small penalties for in/dels
in loops -
- Gap penalties do not count in energy calculation
- Size penalty
- If a target residue and aligned template
residue differ in radius by gt 0.5Å and if the
residue is involved in gt2 contacts, alignment is
penalized - Size penalties do not count in energy calculation
ALKKGFGHFDTSE
Loop
Helix
Yungok Ihm
20How include secondary structure?
- Predict secondary structure of target sequence
(PSIPRED, PROF, JPRED, SAM, GOR V) - N total number of matches between the
predicted secondary structure and the template
structure - N- total number of mismatches
- Ns total number of residues selected in
alignment -
- Global fitness f 1 (N - N-) / Ns
- Emod f Ethreading
Yungok Ihm
21 How much better is this fit than random?
- Emod Sequence vs Structure
- Eshuffle Shuffled Sequence vs Structure
- Erelative Emod Eshuffled
Yungok Ihm
22Performance Evaluation? "Blind Test"
- CASP5 Competition
- (Critical Assessment of Protein Structure
Prediction) -
- Given Amino acid sequence
- Goal Predict 3-D structure
- (before experimental results
published) -
-
23Typical Results (well, actually, our BEST
Results) HO 1-ranked CASP5 prediction for
this target
Ho, Cao, Ihm. Wang
24Overall Performance in CASP5 Contest 8th out
of 180 (M. Levitt, Stanford)
- FR Fold Recognition
- (targets manually assessed by Nick Grishin)
- --------------------------------------------------
--------- - Rank Z-Score Ngood Npred NgNW NpNW
Group-name - 1 24.26 9.00 12.00 9 12
Ginalski - 2 21.64 7.00 12.00 7 12
Skolnick Kolinski - 3 19.55 8.00 12.50 9 14
Baker - 4 16.88 6.00 10.00 6 10
BIOINFO.PL - 5 15.25 7.00 7.00 7 7
Shortle - 6 14.56 6.50 11.50 7 13
BAKER-ROBETTA - 7 13.49 4.00 11.00 4 11
Brooks - 8 11.34 3.00 6.00 3 6
Ho-Kai-Ming - 9 10.45 3.00 5.50 3 6
Jones-NewFold - -------------------------------------------------
---------- - FR NgNW - number of good predictions without
weighting for multiple models - FR NpNW - number of total predictions without
weighting for multiple models
25Regulation of Lentivirus Replication or
"Designing New HIV Therapies"
Susan Carpenter (Washington State
Univ) Wendy Sparks Yvonne Wannemuehler Drena
Dobbs, GDCB Jae-Hyung Lee Michael
Terribilini Kai-Ming Ho, Physics Yungok
Ihm Haibo Cao Cai-zhuang Wang Gloria Culver,
BBMB Laura Dutca
BCB Fall 06 Dobbs
26Macromolecular interactions mediated by the Rev
protein in lentiviruses (HIV EIAV)
(protein-RNA)
(protein-protein)
(protein-protein)
(protein-protein)
Susan Carpenter
27Rev is essential for lentiviral replication
- Rev is a small nucleoplasmic shuttling protein
- (HIV Rev 115 aa EIAV Rev 165 aa)
- Recognizes a specific binding site on viral RNA
- Rev Responsive Element (RRE)
- Interacts with CRM1 to export incompletely
spliced viral RNAs from nucleus to the cytoplasm
- Specific domains of Rev mediate nuclear
localization, RNA binding, and nuclear export - Critical role of Rev in lentiviral replication
makes it an attractive target for antiviral
(AIDs) therapy
28Problem no high resolution Rev structure! not
even for HIV Rev, despite intense effort ()
- Why?? Rev aggregates at concentrations needed
for NMR or X-ray crystallography - What about insights from sequence comparisons?
- "undetectable" sequence similarity among Revs
from different lentiviruses (eg, EIAV vs HIV
lt10) - But
- Lentiviral Rev proteins are functionally
"homologous"
29Hypothesis Rev proteins share structural
features critical for function
Approach
- Computationally model structures of lentiviral
Rev proteins - - using threading algorithm (with Ho et al)
- Predict critical residues for RNA-binding,
protein interaction - - using machine learning algorithms (with Honavar
et al ) - Test model and predictions
- - using genetic/biochemical approaches (with
Carpenter Culver) - - using biophysical approaches (with Andreotti
Yu groups) - Initially focus on EIAV Rev RRE
30Functional domains EIAV vs HIV Rev
exon 1
exon 2
1 31
165
NES - Nuclear Export Signal NLS - Nuclear
Localization Signal RBM - putative RNA Binding
Motif
31Predicted EIAV Rev Structure
Yungok Ihm
32Comparison of Predicted Rev Structures
Yungok Ihm
33Structure of N-terminal region of HIV Rev
Yungok Ihm
34Location of functional residues EIAV Rev
Critical Hydrophobic Contact?
NES
Putative RBM
Yungok Ihm
35Mutations of hydrophobic residues predicted to be
critical for helical packing in core
L65 vs L95 L109
Single mutants Leu to Ala Leu to Asp Double
mutants Leu to Ala
Single Ala Mutation L ? A
Negligible effect on Rev activity
Insert charged aa in hydrophobic core
Single Asp Mutation L ? D
Dramatic change in Rev activity?
Double Ala Mutation L?L ? A?A
Reduction in Rev activity?
Yungok Ihm
36Functional Analysis of Rev Structural Mutants in
vivo (CAT assay)
Wendy Sparks
37Functional domains EIAV vs HIV Rev
- RNA interaction - Protein
interaction NES - Nuclear Export Signal NLS -
Nuclear Localization Signal RBM - putative RNA
Binding Motif
Red
Green
38Predicting the RNA-binding domain of EIAV Rev
Yungok Ihm
- 71 81 91
- ARRHLGPGPT QHTPSRRDRW IREQILQAEV LQERLEWRIR
-
121 131 141 151 161
HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP
RVLRPGDSKRRRKHL
Michael Terribilini
31 41 51 61 71
81 91 101 111
121 131 141 151 161
DPQGPLESDQ WCRVLRQSLP EEKISSQTCI ARRHLGPGPT
QHTPSRRDRW IREQILQAEV LQERLEWRIR GVQQVAKELG
EVNRGIWREL HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP
RVLRPGDSKR RRKHL
39Expression of MBP-ERev deletion mutants
1 31 57
125 146 165
RBM Folding?
NES
NLS
MBP-ERev
1-165
MBP
31-165
MBP
31-145
MBP
57-165
MBP
57-145
MBP
57-124
MBP
125-165
MBP
146-165
MBP
Jae-Hyung Lee
40EIAV Rev binds specifically to RRE in vitro
Jae-Hyung Lee
41EIAV Rev Predictions vs Experiments
PREDICTED Structure Protein binding
residues RNA binding residues
RBM
NLS/RBM
Lee et al (2006) J Virol 803844
Terribilini et al (2006) PSB 11415
Jae-Hyung Lee
42Mutagenesis of putative RNA binding motifs
1 31
57
124 146
165
RBD
RBD
NES
NLS
ERLE
KRRRK
RRDRW
AADAA
AALA
KAAAK
ERDE
Jae-Hyung Lee
43PREDICTED Structure Protein binding
residues RNA binding residues
RBM
FOLD?
NLS
NLS/RBM
?
?
?
ERDE
Jae-Hyung Lee
44Summary Predictions vs Experiments
Lee et al (2006) J Virol 803844
Terribilini et al (2006) PSB 11415
45Summary
- Computational wet lab approaches revealed that
- EIAV Rev has a bipartite RNA binding domain
- Two Arg-rich RBMs are critical
- RRDRW in central region
- KRRRK at C-terminus, overlapping the NLS
- Based on computational modeling, the RBMs are in
close proximity within the 3-D structure of
protein - Lentiviral Revs RRE binding sites may be more
similar - in structure than has been appreciated
- Future
- Identify "predictive rules" for protein-RNA
recognition -
Lee et al (2006) J Virol 803844
Terribilini et al (2006) PSB 11415
46Experimentally determine the structure!
47 Building Designer Zinc Finger DNA-binding
Proteins J Sander, P Zaback, F Fu, J
Townsend, R Winfrey D Wright, K Joung, L
Miller, D Dobbs, D Voytas
Wright et al (2006) Nature Protocols, in press