The Structure Lectures - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

The Structure Lectures

Description:

Crystallization required. Diffraction data collection ... Crystallization is limiting. Diffraction is not imaging! Refinement is required. Data ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 72
Provided by: christop116
Category:

less

Transcript and Presenter's Notes

Title: The Structure Lectures


1
The Structure Lectures
  • Boris Steipe
  • boris.steipe_at_utoronto.ca
    http//biochemistry.utoronto.ca/steipe
  • Departments of Biochemistry and Molecular and
    Medical Genetics
  • Program in Proteomics and Bioinformatics
  • University of Toronto

2
Lecture 9.0Use of Protein Structure
  • Boris Steipe
  • boris.steipe_at_utoronto.ca
    http//biochemistry.utoronto.ca/steipe
  • Departments of Biochemistry and Molecular and
    Medical Genetics
  • Program in Proteomics and Bioinformatics
  • University of Toronto
  • ( Some slides have been adapted from material by
    Chris Hogue, Toronto, prepared for CBW in 2002)

3
Concepts
  • "Sequence" and "structure" are abstractions of
    biopolymers.
  • Structure can be determined experimentally.
  • Structure abstractions can be stored, retrieved
    and visualized.
  • Knowledge of structure allows mechanistic
    explanations.
  • Structure is not arbitrary, but comes in units -
    motifs, helices, strands, domains and complexes.
  • Domains are folding units, functional units and
    units of inheritance.

4
Concept 1
  • "Sequence" and "structure" are abstractions of
    biopolymers.

5
Physical Amino Acids andAmino Acid Abstractions
Formula C9H9NO2 Smiles String
CH(NHR)(C(O)R) CH2-c1(cHcHc
(cHcH1)OH) Name Tyrosine 3-Letter
Tyr 1-Letter Y
ATOM 1091 N TYR 145 -35.676 -13.136
50.622 1.00 10.36 ATOM 1092 CA TYR 145
-36.931 -13.763 51.019 1.00 10.63 ATOM 1093
C TYR 145 -37.676 -12.879 52.016 1.00
11.16 ATOM 1094 O TYR 145 -37.061
-12.316 52.926 1.00 13.91 ATOM 1095 CB TYR
145 -36.660 -15.140 51.638 1.00 9.52 ATOM
1096 CG TYR 145 -37.845 -15.737 52.361
1.00 6.36 ATOM 1097 CD1 TYR 145
-38.144 -15.357 53.663 1.00 3.30 ATOM 1098
CD2 TYR 145 -38.691 -16.652 51.727 1.00
6.14 ATOM 1099 CE1 TYR 145 -39.248
-15.856 54.311 1.00 5.57 ATOM 1100 CE2 TYR
145 -39.804 -17.165 52.376 1.00 4.89 ATOM
1101 CZ TYR 145 -40.076 -16.757 53.670
1.00 4.35 ATOM 1102 OH TYR 145
-41.170 -17.231 54.345 1.00 4.44
http//www.daylight.com/dayhtml/doc/theory/theor
y.smiles.html
6
The Concept of Abstract Amino Acids Allows Highly
Compressed Information
Nucleophile
H-bond Donor
Bulky
Phospho-Acceptor
Hydrophobic
H-Bond Acceptor
Y
Aromatic
2 side chain rotational freedom
7
The Concept of Abstract Amino Acid Similarity is
Lossy
Nucleophile (CDESTY)
H-bond Donor (CHKNQRSTWY)
Bulky (FILQRYW)
Phospho-Acceptor (STY)
Hydrophobic (FAMILYVW)
H-Bond Acceptor (DEHNQSTY)
Y
Aromatic (FWH)
2 side chain rotational freedom (CDFHSW)
8
Structure Contextualizes Sequence
V V I Y T T G
(Tyr262 in 1ERQ.pdb)
9
Structural Abstraction
  • To store structures we need
  • - coordinate
  • - topology, and
  • - chemical type
  • information.

y
e
Sulphur
d
Carbon
x
Oxygen
z
Nitrogen
g
b
a
Met
10
Concept 2
  • Structure can be determined experimentally.

11
Experimental sources of structure
  • Crystallization required
  • Diffraction ? data collection
  • The phase problem MAD, heavy metal isomorphic
    derivatives ...
  • ... or "Molecular replacement" give phase
    approximations
  • Model building in electron density maps
  • Refinement
  • X-ray
  • NMR

12
Experimental sources of structure
Crystallization is limiting. Diffraction is not
imaging! Refinement is required.
X-ray NMR
Model
Data
http//www-structure.llnl.gov/Xray/101index.html
13
Experimental sources of structure
  • X-ray
  • NMR
  • High concentration required( 1mM)
  • Assignment of peaks ...
  • ... determination of crosspeaks ? distance
    constraints
  • Calculation of models from distance constraints
  • Refinement

14
Experimental sources of structure
X-ray NMR
1DRO.PDB
Ensemble of structures that are compatible with
experimental distance constraints
Consensus model
Concentration/Solubility Assignment and
NOEs Refinement
15
Assessing structure quality
  • Metrics
  • Resolution, R-factor and R-free
  • Bond length and angle deviations
  • Coordinate error can be estimated
  • from diffraction data

http//www.sci.sdsu.edu/TFrey/Bio750/Bio750X-Ray.h
tml
Programs Whatcheck and Procheck calculate quality
metrics http//swift.cmbi.kun.nl/WIWWWI//fullche
ck.html http//www.biochem.ucl.ac.uk/roman/proche
ck/procheck.html (also NMR)
Rules of thumb for "good structures" Resolution
2Å, R-factor 20, mean coordinate error 0.2 Å,
RMSD bond-lengts 0.02Å
16
Concept 3
  • Structure abstractions can be stored, retrieved
    and visualized.

17
ThePDB
The PDB is the primary repository of protein
structure data.
http//www.rcsb.org/pdb
18
Whats in a Structure File?
  • Population experiments
  • X-ray, 1 structure
  • NMR - sometimes many structures
  • Incomplete - not all atoms are there
  • Hydrogens, parts of the protein in motion
  • Crystallographic space
  • correct, but not always relevant

19
The PDB format
  • Flat file, column oriented
  • Human readable
  • Human editable
  • Huge legacy problems

Flat File A datafile without indexing structure
or hierarchy. In contrast, to relational
database, or data grammar.
20
Header
HEADER IMMUNOGLOBULIN
01-MAR-93 2IMM 2IMM 2 COMPND
IMMUNOGLOBULIN VL DOMAIN (VARIABLE DOMAIN OF
KAPPA LIGHT 2IMM 3 COMPND 2 CHAIN) OF
MCPC603
2IMM 4 SOURCE HUMAN (HOMO SAPIENS)
RECOMBINANT SYNTHETIC M603 GENE 2IMM
5 AUTHOR B.STEIPE,R.HUBER
2IMM 6 REVDAT 1
15-JUL-93 2IMM 0
2IMM 7 REMARK 1
2IMM
8 REMARK 1 REFERENCE 1
2IMM 9 REMARK 1 AUTH
B.STEIPE,A.PLUCKTHUN,R.HUBER
2IMM 10 REMARK 1 TITL REFINED CRYSTAL
STRUCTURE OF A RECOMBINANT 2IMM
11 REMARK 1 TITL 2 IMMUNOGLOBULIN DOMAIN AND A
2IMM 12 REMARK 1
TITL 3 COMPLEMENTARITY-DETERMINING REGION
1-GRAFTED MUTANT 2IMM 13 REMARK 1 REF
J.MOL.BIOL. V. 225 739 1992
2IMM 14 REMARK 1 REFN ASTM JMOBAK UK
ISSN 0022-2836 070 2IMM 15
... REMARK 2
2IMM 23 REMARK 2
RESOLUTION. 2.00 ANGSTROMS.
2IMM 24 REMARK 3
2IMM
25 ...
21
Seqres
... SEQRES 1 114 ASP ILE VAL MET THR GLN
SER PRO SER SER LEU SER VAL 2IMM 35 SEQRES 2
114 SER ALA GLY GLU ARG VAL THR MET SER CYS
LYS SER SER 2IMM 36 SEQRES 3 114 GLN SER
LEU LEU ASN SER GLY ASN GLN LYS ASN PHE LEU 2IMM
37 SEQRES 4 114 ALA TRP TYR GLN GLN LYS
PRO GLY GLN PRO PRO LYS LEU 2IMM 38 SEQRES 5
114 LEU ILE TYR GLY ALA SER THR ARG GLU SER
GLY VAL PRO 2IMM 39 SEQRES 6 114 ASP ARG
PHE THR GLY SER GLY SER GLY THR ASP PHE THR 2IMM
40 SEQRES 7 114 LEU THR ILE SER SER VAL
GLN ALA GLU ASP LEU ALA VAL 2IMM 41 SEQRES 8
114 TYR TYR CYS GLN ASN ASP HIS SER TYR PRO
LEU THR PHE 2IMM 42 SEQRES 9 114 GLY ALA
GLY THR LYS LEU GLU LEU LYS ARG 2IMM
43 ...
Explicit (above) and implicit sequence may differ
!
22
Atom
Pitfalls Atomname is a mix of Chemical element
and bond topology. "CA.." ? ".CA." Sequence
number is actually a string - Chain and insertion
code are required to make it unique (e.g B 123A).
Atom number
Amino acid type
X
Y
Z
Occ
ATOM 119 CA ARG 18 8.386 51.105
35.847 1.00 7.30 2IMM 179
B
Sequence number
(Temperature factors)
Atom name
Record type
PDB format is strictly column oriented !
23
Hetero Atoms
... HETATM 877 O HOH 1 -4.169
60.050 40.145 1.00 3.00 2IMM 937 ...
http//xray.bmc.uu.se/hicup/
24
The crystallographic asymmetric units does not
necessarily contain a functional molecule
The contents of a crystal lattice unit cell can
be generated from the asymmetric unit by applying
the required symmetry operations for the
crystallographic space-group. But neither is this
trivial for the non-crystallographer, nor is it
obvious which of the symmetry replicates might
make physiological contacts.
1qpi.pdb Tet-repressor/operator complex
25
... Biological Unit
PQS reasons automatically about how a monomer
might be correctly completed to a functional bio-
molecular complex (and is often correct).
http//pqs.ebi.ac.uk/
26
NCBI structure group
MMDB - very well integrated but somewhat
impenetrable.
27
NDB
http//ndbserver.rutgers.edu/NDB/
urx035.pdb (Hammerhead Ribozyme)
28
PDBsum - and "secondary" structure databases
http//www.biochem.ucl.ac.uk/bsm/pdbsum/
29
PDBsum - Information
30
Others
  • Macromolecular Structure Database at EBI
    (Relibase, PQS ...)
  • http//www.ebi.ac.uk/msd/
  • Macromolecular structure related resources at the
    PDB
  • http//www.rcsb.org/pdb/links.html
  • Structure links at the Southwestern Biotechnology
    and Informatics Center
  • http//www.swbic.org/links/1.19.2.5.php
  • Molecular Models from Chemistry
  • http//people.ouc.bc.ca/woodcock/molecule/molecule
    .html
  • Molecular Library
  • http//www.nyu.edu/pages/mathmol/library/
  • .... many, many more.

31
Concept 4
  • Knowledge of structure allows mechanistic
    explanations.

32
Structure as an integrated map - Example questions
  • Which part of my structure appears to be
    conserved ?
  • Are two functionally important residues possibly
    in contact ?
  • Where is Asn220 relative to the active site ?
  • May the mutation E123A possibly have something to
    do with protein stability ?
  • Is Leu234 on the surface, or in the core ?
  • I want to clone my protein into a yeast
    two-hybrid system should I fuse the DNA binding
    domain to the N- or the C- terminus ?

33
Geometric relationships
  • Bonds
  • Angles, plain and dihedral
  • Surfaces
  • Chemical potential, amino acid functions
  • Static and dynamic disorder
  • Structural similarity
  • Electrostatics
  • Conservation patterns (structural and functional)
  • Quarternary structure
  • Posttranslational modification sites
  • Unexpected homology
  • ...

34
Distances from coordinates
XYZ coordinates are vectors in an orthogonal
coordinate system, in Å.
All the rules of analytical geometry apply.
... ATOM 687 OH TYR 86 7.415
62.584 32.900 1.00 3.37 ... ATOM 651 O
ASP 82 9.996 62.571 32.488 1.00
5.18 ...
d (9.996-7.415)2 (62.571-62.584)2
(32.488-32.900)20.5 (2.581)2 (-0.013)2
(-0.412)20.5 6.661561 0.0000169
0.1697440.5 6.8314740.5 2.614 Å
0.2614 nm 2.614 . 10-10 m
35
Dihedral angles
Single bonds Freely rotable, but constrained by
steric overlap. Small energetic barrier,
preference for staggered conformations.
i3
i
i2
Double bonds Constrained to planar geometry.
Large energetic barrier to isomerization.
f
i1
36
Backbone dihedral angles Ramachandran plots
?
?
?
Rotatable bonds in the backbone are named f,y and
w.
Due to steric overlap, not all combinations of
(f,y? are allowed.
Allowed and forbidden regions of (f,y? space are
shown on the Ramachandran plot.
Observed (f,y? values reflect the theoretical
boundaries well.
37
Sidechain rotamers
100 randomly chosen Phe-residues superimposed.
?3
?2
??
Ponder Richards (1987) J. Mol. Biol. 193,
775-791
http//dunbrack.fccc.edu/bbdep/
38
H-bond patterns
Tyr-Thr sidechain H-bond despite canonical
geometry, correct topology may be ambiguous!
Example TYR - Side Chain Donor OH can donate a
single hydrogen (The OH-H bond is 1.00Å long and
lies in the plane of CE1, CE2, CZ and OH forming
an angle of 110 degrees with the CZ-OH bond.)
Distribution of H-bond counts in all and buried
residues, D-A distances, H-A distances and D-H-A
angles inTyr sidechains.
McDonald Thornton (1994) J. Mol. Biol. 238,
777-793
http//www.biochem.ucl.ac.uk/bsm/atlas/
39
Molecular surface
Chain "A" of 1AON.PDB - GroEL/ES complex
Surface rendering of GroEL/ES complex (D.
Goodsell)
40
Molecular surface
Surface provides a visual metaphore, and a useful
tool to map properties. But how can a molecular
surface be defined ? Obviously, the hard-sphere
surface is chemically not very relevant.
Van der Waals surface
41
Molecular surface
Probe !
Van der Waals surface
42
Molecular surface
Contact surface
Accessible surface
"Accessible"
Van der Waals surface
Reentrant surface
"Buried"
43
Calculating solvent accessible surfaces
  • Draw a sphere around each atom, with a radius of
    (VdW solvent probe ).
  • Erase all overlapping sphere surfaces.
  • The remaining area is the accessible surface.

C 1.75 Å N 1.55 O 1.4Å H 1.17Å
44
Parameters and assumptions
Problem Analytical solution inefficient. Solutio
n Numerical solution with probe points Problem
Regular placement of n probe points Solution
Stochastic placement Problem Stochastic
placement quite irregular Solution Enforce
minimum separation Problem Efficiency Solution
Place points only once, translate as
needed Problem What is a good value for n
? Solution Try different n, evaluate standard
deviation Problem Should n be constant per atom,
or per area ? Solution dots/area - need to scale
dots with r VdW Problem Hydrogens - where to get
united atom radii ? Solution Literature
search. Problem Reference areas for relative SAA
needed Solution Model explicitely, as
tripeptides ...
u,v ? 0,1
? 2p?u
f cos-1 (2v1)
http//mathworld.wolfram.com/ SpherePointPicking.h
tml
Even a straightforward algorithm has it's hidden
parameters and assumptions. Results are
meaningful only in this context. Any comparison
is problematic.
45
Mapping properties on surfaces
  • Properties of atoms (B-factors)
  • Ensemble properties of residues
  • (hydrophobicity, conservation)
  • Geometry (local curvature)
  • Fields and potentials
  • (isosurfaces, binding potential)

AChE (1ACL.PDB) color coded by electrostatic
potential with GRASP. (http//trantor.bioc.
columbia.edu/grasp/)
46
Concept 5
  • Structure is not arbitrary, but contains
    recurring units.

47
Basic building blocks of structure
Eg. PROMOTIF - as used in PDBSUM
But classical descriptions of structural
building blocks are as much based on idealized
concepts of geometry as on observations of
nature. An unbiased analysis may arrive at
significantly different classifications !
48
Unbiased structure motifs alignment with added
value
Motif alignments ... Why are particular amino
acids conserved? What is essential in a sequence ?
A structure motif consensus sequence, compiled
from unrelated segments, averages out features of
conservation that are only due to incomplete
divergence (homology). A consensus sequence,
taken from different structural contexts,
averages out features of sequence that are due to
specific functional (binding, catalysis) or
non-local structural requirements (packing,
interaction). What remains is information about
sequence propensities of local structural
elements.
49
A schematikon motif example complex loop
Motif 1icf 215 Length 7 Support 7 Unique 7 Ran
k 399
50
A schematikon motif examplestrand N-cap
Motif 1whi 35 Length 4 Support 7 Unique 7 Rank
444
51
Concept 6
  • Domains are
  • folding units, functional units, and units of
    inheritance.

52
Domains are ubiquitous in proteins
Large proteins are composed of compact,
semi-independent units - domains.
Reason Modularity Folding efficiency
2MCP.PDB
53
Domains in proteins
Number of domains in 787 representative proteins
used as the basis for the CATH database
Jones S et al. (1998) Protein Science 7233
54
Domains in proteins
Non-random relationship between domain number and
chain length in the 787 representative proteins
used as the basis for the CATH database
Jones S et al. (1998) Protein Science 7233
55
Domains in proteins
Domain size in the 787 representative proteins
used as the basis for the CATH database
Jones S et al. (1998) Protein Science 7233
56
There is no universal definition of "domains"
Possible definitions are based on independently
inherited (sub)sequences (sequence domain),
modular protein functions (functional domain),
folding unit or atomic contacts (structural
domain).
Domain A part of structure that can fold
irrespective of the presence of other parts of
structure
But what is measured is commonly sequence,
function, or structure - NOT FOLDING!
57
Further complications
Analogous structure, Domain insertions, Circular
permutations, Domain swapping.
Domain insertion 1A2J.PDB Protein disulfide
isomerase
2TRX.PDB Thioredoxin
58
Further complications
Analogous structure, Domain insertions, Circular
permutations, Domain swapping.
59
Further complications
Analogous structure, Domain insertions, Circular
permutations, Domain swapping.
Domain swapping 11BG.PDB Bull seminal ribonuclease
60
Domains can be elusive
The separation of a structure into domains
requires the arbitrary definition of thresholds
in a continuum of possibilities.
informed
61
Why care ?
Function evolution works on sequence, but
selects function. Definition of domains in
structure can uncover functional units that may
evolve independently. Sequence searches,
alignments etc. with domains are much more
specific. Once structural domains have been
defined, sequence profiles, HMMs or other
computational procedures can be used to pick out
more members of the domain family from the
database. Domains can be defined from sequence
patterns, or from the analyis of structure.
62
Automated (objective) domain definition -
Sequence (CDD)
http//www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtm
l
CDD from Smart and Pfam CDART from CDD and
Genbank
63
SemiAutomated consensus domain definition -
Structure (CATH)
Dehydrolipoamide dehydrogenase 1LPFA
Jones S et al. (1998) Domain assignment for
protein structures using a consensus approach
Chracterization and analysis. Protein Science
7233-242
64
SCOP CATH structural classification
The eight most frequent SCOP Superfolds
http//scop.mrc-lmb.cam.ac.uk/scop/
http//www.biochem.ucl.ac.uk/bsm/cath/
65
CATH - Class
  • Class1 Mainly Alpha

Class 2 Mainly Beta
Class 3 Mixed Alpha/Beta
Class4 Few Secondary Structures
66
CATH - Architecture
  • Roll

Super Roll
Barrel
2-Layer Sandwich
67
CATH - Topology
  • L-fucose Isomerase

Serine Protease
Aconitase, domain 4
TIM Barrel
68
CATH - Homology
  • Alanine racemase

Dihydropteroate (DHP) synthetase
FMN dependent fluorescent proteins
7-stranded glycosidases
69
CATH - Entry
(Example)
70
IV Open Issues
  • I Integration into processes, scriptable APIs
  • II Sequence based identification of domains
  • III Analysing domains in context
  • IV Defining modular domain functions

71
Bioinformaticians apparently do not like
structure !
  • Sequence
  • Discrete alphabet
  • Easy to manipulate
  • Well developed datastructures
  • Well developed libraries
  • Structure
  • Continuous space
  • Linear algebra, complicated energy functions
  • Databases and datastructures are difficult
  • Paucity of libraries

Meet the challenge !
Write a Comment
User Comments (0)
About PowerShow.com