Title: Scaling, Phasing, Anomalous, Density modification, Model building, and Refinement
1Scaling, Phasing, Anomalous, Density
modification, Model building, and Refinement
2What weve learned so far
- Electrons scatter Xrays.
- Scattering is a Fourier transformation.
- Inverting the Fourier transform gives the image
of the electron density. - Waves have amplitude and phase. And we cant
measure phase. - Inverting the Fourier transform without the
phases gives the Patterson map, which is the map
of all inter-atom vectors. - Space groups are groups of symmetry operations in
3D. - The Patterson plus symmetry gives us heavy atoms
positions. - Heavy atom positions plus amplitudes gives us
phases.
3What we can do.
- Sum waves.
- Calculate the phase given atom position and
scattering vector. - Index spots on an X-ray photograph.
- Draw Bragg planes.
- Invert the Fourier transform using Bragg planes.
- Calculate cell dimensions from an X-ray
photograph. - Describe the symmetry of a crystal or periodic
pattern. - Convert a simple Patterson to heavy atom
positions. - Calculate heavy atom vectors given heavy atom
positions. - Solve for phases given amplitudes and heavy atom
vectors.
4From data to model
Collect heavy atom data Fph
Collect native data Fp
Estimate phases
no
density modification?
Calculate r
yes
Is the map traceable?
Trace the map
Refine
5From data to phases
heavy atom data Fph
native data Fp
Calculate difference Patterson
Find heavy atom peaks on Harker sections
Solve for heavy atom positions using symmetry
Calculate heavy atom vectors
Estimate phases
6From data to Patterson map
heavy atom data Fph
native data Fp
Find the best scale factor, k
Calculate Fdiff kFph Fp
Calculate difference Patterson
7From crystal to data
Indexed film
I is relative Bigger crystal, higher I Better
crystal, higher I Longer exposure, higher I More
intense Xrays, higher I
Internal scaling
Intensity, Ip(hkl) F2
native data Fp
Because there is no absolute scale Fp and Fph
are on different scales
8What happens to the phase calculation if the
scaling is off?
Radii are Fp and kFph
9Scaling two datasets
h k l Fp s 1 0 0 3233. 100.2 0 0
2028. 98.3 0 0 2179 88.4 0 0 .... ...
h k l Fph s 1 0 0 1122. 50.2 0 0
1014. 49.3 0 0 1081. 44.4 0 0 .... ...
1st approximation The intrinsic average
amplitude of scattering is constant for different
crystals. A simple scale factor k corrects for
crystal differences
ltFpgt ltFphgtk, therefore k ltFpgt/ltFphgt
10Basic assumption
when scaling two crystals.
The total number of electrons in the unit cell is
the same for each (isomorphous) crystal.
Note isomorphous means same space group, same
cell dimensions.
11Better scaling Wilson B-factor
Low-resolution features (ignored in slope calc)
water peak (ignored in slope calc)
Region of linear dependence of amplitude with
resolution. Slope W Wilson B-factor
ltIgtltF2gt
Scaled separately
Averaged in resolution bins
X-axisTwo sine-theta over lambda 1/d
5Ã…
d 20Ã…
3Ã…
2Ã…
Two sets of Fs might have different overall
B-factors, because the crystals may have
different degrees of mosaicity So Wilson scaling
is better than simple scaling.
ltFpgt ltFphgtk, kW(1/d) C
12How good is scaling?
After solving the structure, we can go back and
see how good the scaling was. Typically, error in
scaling lt 10. In best cases lt 2. Scaling error
is worse if (1) crystals are non-isomorphous (2)
too many heavy atoms present (basic assumption
is wrong).
13Heavy atom difference Fourier
Fph Fp Fh (vector addition)
- The amplitude of Fh is only approximately Fph-Fp.
- The true difference Fph - Fp depends on the
phase of Fh relative to Fp
14Centrosymmetric reflections
- If the crystal has centrosymmetric symmetry, all
reflections are centrosymmetric. Phase 0 or
180 - If the crystal has 2-fold, 4-fold or 6-fold
rotational symmetry, then the reflections in the
0-plane are centrosymmetric. (Because the
projection of the density is centrosymmetric) - For centrosymmetric reflections
- Fph Fp Fh
This means the amplitude Fh is exact for
centrosymmetric reflections.
assuming perfect scaling.
150-plane
R
Draw any set of Bragg planes parallel to the
2-fold. The projected density is centrosymmetric.
R
Therefore, phase is 0 or 180.
16Initial phases
The most probable phase is not necessarily the
best for computing the first e-density map.
weighted average, best phase
Shaded regions are possible Fp and Fph solutions.
17Figure of merit
Figure of merit m is a measure of how good the
phases are.
C is the center of mass of a ring of phase
probabilities (thats the mass). The radius of
the ring is 1. So m1 only if the probabilities
are sharply distributed. If they are distributed
widely, m is small. Fbest(hkl) F(hkl)me-iabest
18In class exercise phase error
FP5.00 s0.5 FPH15.50 s0.8 FH12.23
aH1-63.4 FPH24.50 s0.9 FH20.50 aH2-164 (1)
Draw three circles separated by vectors FH1 and
FH2. (2) Draw circular error bars of width
2s. (3) Draw circle plot of Fp phase
probabilities. (4) Estimate the centroid c of
probabilit. (5) What is the Figure of Merit, m?
19Anomalous dispersion
Inner electrons scatter with a time delay. This
is a phase shift that is always counter-clockwise
relative to the phase of the free electrons.
bound electrons
free electrons
Heavy atom
20Anomalous dispersion
21Anomalous dispersion
SIR single isomorphous replacement Advantages
only one derivative crystal is needed. (fewer
scaling problems) Anomalous dispersion has a
greater effect at higher resolution. (because the
inner electrons are more like a point source)
22Is the initial map good enough?
(1) The map is calculated using abest. (2) The
map is contoured and displayed using InsightII,
MIDAS, XtalView, FRODO, O, ... (3) A trace is
attempted.
23Model building
e- density cages (1 s contours) displayed using
InsightII
24Information used to build the first model
Sequence and Stereochemistry ...plus assorted
disulfide and ligand information.
Models are built initially by identifying
characteristic sidechains (by their shape) then
tracing forward and backward along the backbone
density until all amino acids are in
place. Alpha-carbons can be placed by hand, and
numbered, then an automated program will add the
other atoms (MaxSprout).
25Tracing an electron density map
Class exercise
sequence AGDLLEHEIFGMPPAGGA
Can you locate the density above in the sequence?
26R-factor How good is the model?
Calculate Fcalcs based on the model. Compute
R-factor
Depending on the space group, an R-factor of 55
would be attainable by scaled random data. The
R-factor must be lt 50. Note It is possible to
get a high R-factor for a correct model. What
kind of mistake would do this?
27What can you do if the phases are not good enough?
1. Collect more heavy atom derivative data 2. Try
density modification techniques.
initial phases
Density modification
Fos and (new) phases
Fcs and new phases
Map
Modified map
28Density modification techniques
Solvent Flattening Make the water part of the
map flat.
(1) Draw envelope around protein part
(2) Set solvent r to ltrgt and back transform.
29Solvent flattening
Requires that the protein part can be
distinguished from the solvent part. BC Wangs
method Smooth the map using a 10Ã… Guassian. Then
take the top X of the map, where X is calculated
from the crystal density.
30Skeletonization
(1) Calculate map. (2) Skeletonize it (draw ridge
lines) (3) Prune skeleton so that it is
protein-like (4) Back transform the skeleton to
get new phases.
Protein-like means (a) no cycles, (b) no islands
31Non-crystallographic symmetry
If there are two molecules in the ASU, there is a
matrix and vector that rotate one to the other
Mr1 v r2 (1) Using Patterson Correlation
Function, find M and v. (2) Calculate initial
map. (3) Set r(r1) and r(r2) to (r(r1) r(r2)
)/2 (4) Back transform to get new phases.
32What does a good map look like?
plexiglass stack
brass parts model
Before computers, maps were contoured on stacked
pieces of plexiglass. A Richards box was used
to build the model.
half-silvered mirror
33Low-resolution
At 4-6Ã… resolution, alpha helices look like
sausages.
34Medium resolution
3Ã… data is good enough to se the backbone with
space inbetween.
35The program BONES traces the density
automatically, if the phases are good.
36BONES models need to be manually connected and
sidechains attached. MaxSPROUT converts a fully
connected trace to an all-atom model.
37Errors in the phases make some connections
ambiguous.
38Contouring at two density cutoffs sometimes helps
39Holes in rings are a good thing
Seeing a hole in a tyrosine or phenylalanine ring
is universally accepted as proof of good phases.
You need at least 2Ã… data.
40Can you see in stereo?
Try this at home. In 3D, the density is much
easier to trace.
41New rendering programs
CONSCRIPT A program for generating electron
density isosurfaces for presentation in protein
crystallography. M. C. Lawrence, P. D. Bourke
42Great map holes in rings
43Superior map Atomicity
Rarely is the data this good. 2 holes in Trp. All
atoms separated.
44Only small molecule structures are this good
Atoms are separated down to several contours.
Proteins are never this well-ordered. But this is
what the density really looks like.
45Refinement
- The gradient of the R-factor with respect to
each atomic position may be calculated. - Each atom is moved down-hill along the gradient.
- Restraints may be imposed.
46What is a restraint?
A restraint is a function of the coordinates that
is lowest when the coordinates are ideal, and
which increases as the coordinates become less
ideal..
Stereochemical restraints
also... planar groups Bs
bond lengths
bond angles
torsion angles
47Calculated phases, observed amplitudes hybrid
F's
- Fcalcs are calculated from the atomic
coordinates - A new electron density map calculated from the
Fcalc's would only reproduce the model. (of
course!) - Instead we use the observed amplitudes Fobs,
and the model phases, acalc.
Hybrid back transform
Hybrid maps show places where the current model
is wrong and needs to be changed.
48The free R-factor cross-validation
The free R-factor is the test set residual,
calculated the same as the R-factor, but on the
test set. Free R-factor asks how well does
your model predict the data it hasnt seen?
49Why cross-validate?
If you have three points, you can fit them to a
quadratic equation (3 parameters) with zero
residual, but is it right?
observed
calculated
R-factor 0.000!!
50Fitting and overfitting
Fit is correct if additional data, not used in
fitting the curve, fall on the curve. Low
residual in the test set justifies the fit.
residual?0
51cross-validation
Measuring the residual on data (the test set)
that were not used to create the model.
The residual on test data is likely to be small
if is large.
a line has 2 parameters
52parameters versus data
Example from Drenth, Ch 13 Papain crystal
structure has 25,000 reflections. Papain has 2000
non-H atoms times 4 parameters each (x, y, z,
B) equals 8000 parameters data/parameters
25,000/8000 3 lt-- this is too small!
53restraints are data
Bond lengths, angles, etc. are measurements
that must be fit by the model. The true
residual should include deviations from ideal
bond lengths, angles, etc. In practice, residual
in restraints (e.g. deviations from ideal bond
lengths, angles) is very low. This means that
restraints are essentially constraints.
bond lengths
bond angles
van der Waals
torsion angles
planar groups
54constraints reduce the number of parameters
Bond lengths, angles, and planar groups may be
fixed to their ideal values during refinement
(Torsion angle refinement). Using
constraints, Ser has 3 parameters, Phe 4, and Arg
6.
bond lengths
bond angles
There are an average 3.5 torsion angles per
residue. Papain has 700 torsion angle
parameters. \ data/parameter700/25,00035
planar groups
55radius of convergence
total residual
parameter space
...How far away from the truth can it be, and
still find the truth? radius of convergence
depends on data method. More data fewer false
(local) minima Better method one that can
overcome local minima
56Molecular dynamics w/ Xray refinement
MD samples conformational space while maintaining
good geometry (low residual in restraints). E
(residual of restraints) (R-factor) dE/dxi is
calculated for each atom i, then we move i
downhill. Random vectors added, proportional to
temperature T. The simulated annealing MD
method (1) start the simulation hot (2) cool
slowly, trapping structure in lowest minimum.
X-plor Axel Brünger et al
57Phase bias, and how to fix it.
The model biases the phases. Phase bias is
localized. To remove bias, we must remove the
wrong parts of the model. An OMIT MAP is
calculated. The phases for an omit map are
derived from a partial model, where some small
part has been omitted.
58Omit maps
This residue has been removed before calculating
Fc.
2Fo-Fc density Fo (Fo - Fc) The native map
plus the difference map.
59Two inhibitor peptides bound to thrombin. The
inhibitors were omited from the Fc calculation.
(stereo images)
FÉTHIÈRE et al, Protein Science (1996), 5 1174-
1183.
60The final model
Other data commonly reported total unique
reflections, completeness, free R-factor