Title: Lab 9.3a: Homology Modeling
1Lab 9.3aHomology Modeling
- Boris Steipe
- boris.steipe_at_utoronto.ca
http//biochemistry.utoronto.ca/steipe - Departments of Biochemistry and Molecular and
Medical Genetics - Program in Proteomics and Bioinformatics
- University of Toronto
2Concepts
- Sequence alignment is the single most important
step in homology modeling. - Reasons to model need to be defined.
- Fully automated homology modeling services
perform well. - SwissModel in practice.
3Concept 1
- Sequence alignment is the single most important
step in homology modeling.
4What is conserved in structure?
E-E.coli ... IKTRFAPSPTGYLHVGGARTA ...
EQMAKGE----KPRYDGRC ... AHVSMINGDDGKKLSKRH E-P.p
utida ... VRTRIAPSPTGDPHVGTAYIA ...
EQQARGE----TPRYDGRA ... CYMPLLRNPDKSKLSKRK Q-E.c
oli ... VHTRFPPEPNGYLHIGHAKSI ...
TLTQPGKNSPYRDRSVEEN ... YEFSRL-NLEYTVMSKRK Q-Fly
... VHTRFPPEPNGILHIGHAKAI ...
FNPKPS---PWRERPIEES ... WEYGRL-NMNYALVSKRK Q-Hum
an ... VRTRFPPEPNGILHIGHAKAI ...
HNTLPS---PWRDRPMEES ... WEYGRL-NLHYAVVSKRK E-Fly
... VVVRFPPEASGYLHIGHAKAA ...
QRVE----SANRSNSVEKN ... WSYSRL-NMTNTVLSKRK E-Hum
an ... VTVRFPPEASGYLHIGHAKAA ...
QRIE----SKHRKNPIEKN ... WEYSRL-NLNNTVLSKRK E-Yea
st ... VVTRFPPEPSGYLHIGHAKAA ...
DGVA----SARRDRSVEEN ... WDFARI-NFVRTLLSKRK ATP-B
inding
QRS E. coli vs. ERS P. putida 19 ID
Many regions are expected to be highly conserved
in structure.
Some changes should be straightforward to model.
5What is conserved in structure?
E-E.coli ... IKTRFAPSPTGYLHVGGARTA ...
EQMAKGE----KPRYDGRC ... AHVSMINGDDGKKLSKRH E-P.p
utida ... VRTRIAPSPTGDPHVGTAYIA ...
EQQARGE----TPRYDGRA ... CYMPLLRNPDKSKLSKRK Q-E.c
oli ... VHTRFPPEPNGYLHIGHAKSI ...
TLTQPGKNSPYRDRSVEEN ... YEFSRL-NLEYTVMSKRK Q-Fly
... VHTRFPPEPNGILHIGHAKAI ...
FNPKPS---PWRERPIEES ... WEYGRL-NMNYALVSKRK Q-Hum
an ... VRTRFPPEPNGILHIGHAKAI ...
HNTLPS---PWRDRPMEES ... WEYGRL-NLHYAVVSKRK E-Fly
... VVVRFPPEASGYLHIGHAKAA ...
QRVE----SANRSNSVEKN ... WSYSRL-NMTNTVLSKRK E-Hum
an ... VTVRFPPEASGYLHIGHAKAA ...
QRIE----SKHRKNPIEKN ... WEYSRL-NLNNTVLSKRK E-Yea
st ... VVTRFPPEPSGYLHIGHAKAA ...
DGVA----SARRDRSVEEN ... WDFARI-NFVRTLLSKRK ATP-B
inding
How would sidechain rotamers be modeled?
- conserved dihedral angles - preferred
rotamers - DEE (Dead End Elimination theorem) for
global consistency.
6Homology Modeling Issues
E-E.coli ... IKTRFAPSPTGYLHVGGARTA ...
EQMAKGE----KPRYDGRC ... AHVSMINGDDGKKLSKRH E-P.p
utida ... VRTRIAPSPTGDPHVGTAYIA ...
EQQARGE----TPRYDGRA ... CYMPLLRNPDKSKLSKRK Q-E.c
oli ... VHTRFPPEPNGYLHIGHAKSI ...
TLTQPGKNSPYRDRSVEEN ... YEFSRL-NLEYTVMSKRK Q-Fly
... VHTRFPPEPNGILHIGHAKAI ...
FNPKPS---PWRERPIEES ... WEYGRL-NMNYALVSKRK Q-Hum
an ... VRTRFPPEPNGILHIGHAKAI ...
HNTLPS---PWRDRPMEES ... WEYGRL-NLHYAVVSKRK E-Fly
... VVVRFPPEASGYLHIGHAKAA ...
QRVE----SANRSNSVEKN ... WSYSRL-NMTNTVLSKRK E-Hum
an ... VTVRFPPEASGYLHIGHAKAA ...
QRIE----SKHRKNPIEKN ... WEYSRL-NLNNTVLSKRK E-Yea
st ... VVTRFPPEPSGYLHIGHAKAA ...
DGVA----SARRDRSVEEN ... WDFARI-NFVRTLLSKRK ATP-B
inding
How would you (or should you even) model indels?
- Where should the insertion be placed? - What is
the conformation of the new residues? - Which
residues should be deleted? - How many additional
residues need to change conformation?
7Alignment is the limiting step for homology model
accuracy
No amount of forcefield minimization will put a
misaligned residue in the right place !
HOMSTRAD _at_ CASP4 Williams MG et al. (2001)
Proteins Suppl.5 92-97
8Superposition vs. Alignment
- The coordinates of two proteins can be
superimposed in space. - An alignment may be derived from a superposition
by correlating residues that are close in space. - An optimal sequence alignment may lead to a
different alignment ...
1GTR vs 2TS1
9Superposition vs. Alignment
TyrRS ERVTLYCGFDPTAdS--LHIGHLATILTMRRFQQAGHRPIA
LVGGAtgligdpsgkkser
1GTR
26 TTVHTRFPPEPNG-YLHIGHAKSICL--NF---------------
GIAqDYKGQCN--
2TS1 29
ERVTLYCGFDPTAdSLHIGHLATILT--MR---------------RFQ-Q
AGHRPI-- TyrRS tlnaketVEAWSARIKEQLgrfldfeadgn
pa----------------k--------IKN
1GTR 26 ----------------------LRFD-DTnpv-----
-----------keDIEYVESIKN
2TS1
29 ----------------------ALVG-GAtgligdpsgkksertlna
ketVEAWSARIKE TyrRS NYDWIgpldvitflrdvgk----hf
svnymmakesvqsrietgisftefsYMMLQAYDFL
1GTR 26 DVewl------------gf----hwsgnVRYSSD-
--------------------YFdql
2TS1 29 QLgrf------------ldfeadgnpakIKNNYD------
---------------WIgpl TyrRS
RLYetegCRLQIGGSDQwgnitaGL--------ELIRKTKgearAFGLTI
PLV
1GTR 26
hayaie-------------linkglayvdeltpeqireyrgtltqpgkns
pyrdrsveen
2TS1 29
dvitfl-------------rdvgkhfsvnym-------------------
---------- TyrRS
1GTR
26 lalfekmraggfeegkaclrakidmaspfivmrdpvlyrikfaehh
qtgnkwciypmYDF
2TS1 29
-------------------------------------makesvqsrietg
isftefsYMM TyrRS 1GTR 26
THCISDALEG----ITHSLCTLEFqdnrrlYDWVLDNITipvhPRQYEFS
RL 262
2TS1 29
LQAYDFLRLYetegCRLQIGGSDQwgnitaGLELIRKTKgearAFGLTIP
LV 223
- Example structural vs. sequence alignment
between E. coli GlnRS and G. stearothermophilus
TyrRS. - Although the optimal sequence alignment is not
unreasonable (19 ID 40/212 residues),
comparison with the structure shows it is
actually wrong for all but 11 residues ! The
structure based alignment is quite dissimilar in
sequence ( 4.5ID 12/265 residues) but the
superposition actually matches 39 of residues
( 104/265 ) over the length of the domain.
10Inserts may be accomodated in a distant part of
the structure
Example - a five residue insert
- Sequence aligment (shows what happened)
- gktlit nfsqehip
- gktlisflyeqnfsqehip
- Structure alignment (shows how it's accomodated)
- gktlitnfsq ehip
- gktlisflyeqnfsqehip
a-helix
11Off by 1, Off by 4
3.8Å
- A shift in alignment of 1 residue corresponds to
a skew in the modeled structure of about 4 Å (3.8
Å is the inter-alpha carbon distance) - Nothing you can do AFTER an alignment will fix
this error (not even molecular dynamics).
12Indels (inserts or deletions)
- Observations of known similarities in structures
demonstrate that uniform gap penalty assumptions
are NOT BIOLOGICAL. - Indels are most often observed in loops, less
often in secondary structure elements - When they do not occur in loops, there is usually
a maintenance of helical or strand properties.
13Can we do better with the gap assumption?
- Required position specific gap penalties
- One approach implemented in Clustal as secondary
structure masks - Get secondary structure information, convert it
to Clustal mask format. (Easy - read
documentation !)
14Secondary structure from PDB .... (Algorithm ?)
15Secondary structure from RasMol .... (DSSP !)
16Concept 2
- Reasons to model need to be defined.
17Use of homology models
Biochemical inference from 3D similarity
- Bonds
- Angles, plain and dihedral
- Surfaces, solvent accessibility
- Amino acid functions, presence in structure
patterns - Spatial relationship of residues to active site
- Spatial relationship to other residues
- Participation in function / mechanism
- Static and dynamic disorder
- Electrostatics
- Conservation patterns (structural and functional)
- Posttranslational modification sites (but not
structural consequences!) - Suitability as drug target
Don't !
18Abuse of homology models
- Modelling properties that cannot / will not be
verified - Analysing geometry of model
- Interpreting loop structures near indels
- Inferring relative domain arrangement
- Inferring structures of complexes
19Databases of Models
- Dont make models unless you check first...
- Swiss-Model repository
- 64,000 models based on 4000 structures and
Swiss-Prot proteins - ModBase
- Made with "Modeller" - 15,000 reliable models for
substantial segments of approximately 4,000
proteins in the genomes of Saccharomyces
cerevisiae, Mycoplasma genitalium, Methanococcus
jannaschii, Caenorhabditis elegans, and
Escherichia coli.
20Concept 3
- Fully automated services perform well.
21Homology Modeling Process
TAR
PSI-BLAST
Search
nr (PDB)
These are really two queries rolled into one
procedure.
TAR Target sequence
T-Coffee
Align
Search Sequence database similarity search
Cinema
nr non-redundant Genbank subset, (with annotated
structures)
MSA
HOM Homologous sequences
SwissModel
Model
ExPDB
TEM Sequences of homologues with known structure
LIG
Align Careful Multiple Sequence Alignment
3D
MSA Multiple Sequence Alignment
Model Generate 3D Model
TextEditor
Complete
ExPDB Modeling template structure database
3DC
Complete Add ligands, substrates etc. to model
Analyse Interpret and conclude
RasMol
Analyse
PUB Publish results
Consurf
PUB
22Homology Modeling Software?
- Freely available packages perform as good as
commercial ones at CASP (Critical Assessment of
Structure Prediction) - Swiss Model (see your Integrated Assignment)
- Modeller (http//guitar.rockefeller.edu)
23Swiss-Model steps
- Search for sequence similarities
BLASTP against EX-NRL 3D
Peitsch M Guex N (1997) Electrophoresis 18 2714
24Swiss-Model steps
Identity gt 25 Expected model gt 20 resid.
- Search for sequence similarities
- Evaluate suitable templates
Peitsch M Guex N (1997) Electrophoresis 18 2714
25Swiss-Model steps
- Search for sequence similarities
- Evaluate suitable templates
- Generate structural alignments
Select regions of similarity and match in
coordinate-space (EXPDB).
Peitsch M Guex N (1997) Electrophoresis 18 2714
26Swiss-Model steps
- Search for sequence similarities
- Evaluate suitable templates
- Generate structural alignments
- Average backbones
Compute weighted average coordinates for backbone
atoms expected to be in model.
Peitsch M Guex N (1997) Electrophoresis 18 2714
27Swiss-Model steps
- Search for sequence similarities
- Evaluate suitable templates
- Generate structural alignments
- Average backbones
- Build loops
- Pick plausible loops from library, ligate to
stems if not possible, try combinatorial search.
Peitsch M Guex N (1997) Electrophoresis 18 2714
28Swiss-Model steps
- Search for sequence similarities
- Evaluate suitable templates
- Generate structural alignments
- Average backbones
- Build loops
- Bridge incomplete backbones
Bridge with overlapping pieces from pentapeptide
fragment library, anchor with the terminal
residues and add the three central residues.
Peitsch M Guex N (1997) Electrophoresis 18 2714
29Swiss-Model steps
- Search for sequence similarities
- Evaluate suitable templates
- Generate structural alignments
- Average backbones
- Build loops
- Bridge incomplete backbones
- Rebuild sidechains
Rebuild sidechains from rotamer library -
complete sidechains first, then regenerate
partial sidechains from probabilistic approach.
Peitsch M Guex N (1997) Electrophoresis 18 2714
30Swiss-Model steps
- Search for sequence similarities
- Evaluate suitable templates
- Generate structural alignments
- Average backbones
- Build loops
- Bridge incomplete backbones
- Rebuild sidechains
- Energy minimize
Gromos 96 - Energy minimization
Peitsch M Guex N (1997) Electrophoresis 18 2714
31Swiss-Model steps
- Search for sequence similarities
- Evaluate suitable templates
- Generate structural alignments
- Average backbones
- Build loops
- Bridge incomplete backbones
- Rebuild sidechains
- Energy minimize
- Write Alignment and PDB file
e-mail results
Peitsch M Guex N (1997) Electrophoresis 18 2714
32CASP5 (2002) - Homology
worse than template
better
shocking!
RMSD(target,template) RMSD(target, model), Å
Remote sequence similarity detection methods have
improved.
Coordinate manipulations do not improve accuracy.
Tramontano A Morea V (2003) Assessment of
homology based predictions in CASP5 Proteins
S6352-368
33Swissmodel in comparison
3D-Crunch 211,000 sequences -gt 64,000
models Controls gt50 ID 1 Å RMSD 40-49 ID
63 lt 3Å 25-29 ID 49 lt 4Å
Manual alternatives Modeller ... Automatic
alternatives SwissModel sdsc1 3djigsaw
pcomb_pcons cphmodels easypred
1 for RMSD and correct aligned, 2 for
coverage
Guex et al. (1999) TIBS 24365-367 EVA Eyrich et
al. (2001) Bioinformatics 171242-1243
(http//cubic.bioc.columbia.edu/eva)
34Concept 4
35SwissModel ... first approach mode
http//www.expasy.org/swissmod
36... enter the ExPDB template ID...
37... run in Normal Mode (Except if defining a
DeepView project )...
38... successful submission.
Results come by e-mail.
39Homology Modeling in Practice
How to assess model reliability ? - All indels
are wrong - Structure analysis ("threading",
"solvent accessibility", compatibility with
ligands) can point out possible alignment
errors - But no point in "repairing"
stereochemistry, only review alignment.
40Homology Modeling in Practice
Can you predict function from your model ? No
(and yes) - the model may be incompatible with a
specific function.
41Uses of structure revisited - I
- Prototype 1 Analytical
- Explain mechanistic aspects of protein.
- (e.g. in terms of)
- residues involved in catalysis
- global properties (like electrostatics)
- shape, relative orientation and distances of
domains or subdomains - flexibility and dynamics - e.g. hypothesizing
about the rate limiting step
42Uses of structure revisited - II
- Prototype 2 Comparative
- Bring conservation patterns into a spatial
context in order to infer causality from
(database) correlations. - (e.g. in terms of)
- describing context specific conservation patterns
and anlyizing these according to conserved
properties - analyizing the predicted effect of sequence
variation (e.g. for engineering changes, fusing
domains or predicting SNP effects) - distinguish physiological vs. nonphysiological
interactions
43Questions ? Feedback ?
boris.steipe_at_utoronto.ca
http//biochemistry.utoronto.ca/steipe/