Title: Xray Crystallography Workshop
1X-ray Crystallography Workshop
- DAY 4
- Recap PHASER job with search_model_2.pdb
- Lecture electron density maps, model building
- Run one cycle Refmac refinement for rigid body
refinement - Open maps and models in COOT
- Learn basic COOT tools
- Manual adjustment of model
- Lecture on refinement
- Run refinement to 1.5 A with Refmac
- Look at maps again, find waters etc.
2The goal of model building and refinementis to
fit the model (the set of atomic coordinates
from whatever source) to the real experimentally
measured diffraction data.
- BUT, with the caveat that this should not produce
unreasonable protein structure. - Sources of error in the model
- The model used for molecular replacement is not
exactly homologous to your real protein, either
because of sequence differences, or bound ligand,
or point mutations. - The model generated by heavy atom phasing is not
correct or not complete, due to various reasons. - Sources of error in the data
- Measurement error due to
- Detector other set-up problems
- Crystal decay or absorption
- Unknown
3Here are some links for you with (I hope)helpful
information
- Protein structure basics
- A series of articles in Acta Cryst. on basic
practice and recent developments - Link to a refmac tutorial
- I will also download two tutorial documents on
the blackboard, one on COOT, one on Refmac, by
the people that wrote these programs, e.g. - coot_tutorial.pdf
4What does an electron density map look like?
- This is some near-perfect electron density from a
refined model that I recently did for a UNC
collaborator
5What does an electron density map look like?
- This is some not-as well fit electron density
from a step earlier in the process of the
previous model.
6What does an electron density map look like?
- This is what I would call un-interpretable.
There are no recognizable features - and lots of
disconnected density
7What happens if you calculate an electrondensity
map from a bad molecular replacementsolution?
- See many breaks in this helical region - many
atoms not in density and some unaccounted-for
density
8What happens if you calculate an electrondensity
map from a good molecular replacementsolution?
- This is from our data set and our PHASER
molecular replacement solution
9Before we go on, lets talk about several
typesof electron density map that are important
- a. Using phases from a model
- i. 2Fo-Fc or 3Fo-2Fc (think of as Fo (Fo-Fc))
uses phases calculated from the model and
amplitudes from the measured data minus the
calculated data. Gives you the model electron
density PLUS the differences between the OBSERVED
data and the CALCULATED data - ii. Fo Fc (difference map) uses phases
calculated from the model and amplitudes from the
OBSERVED minus the CALCULATED data. Tells you
where you either need atoms (positive difference
density) or where you need to get rid of atoms
(negative density). - b. Omit maps Like a, except leave out some
atoms from the phase calculation that may be
biasing your phases in a region of the maps where
the density is poor then the good parts of the
model should help bring back the density in the
bad parts of the structure if there really is
density there. - c. Using experimental phases (from heavy atoms
and solvent flattening and NCS averaging - we
will talk more about this next week). Fobs maps.
10Look at electron density mapsfrom the PHASER
output
- If I open ccp4i, and click on the view files from
job, I can look at the output .mtz file from the
PHASER molecular replacement (using
search_model_2.pdb)
11We can look at the contents of a binary mtz file
using the GUI like that it runs a program
called mtzdump - dumps the contents of the
mtz to the standard output
- The columns labeled FWT and PHWT are the
amplitudes and phases for a 2Fo-Fc map the
DELFWT and PHDELWT are for Fo-Fc. Fc is
calculated structure factors, F_karen is our
original measured data. See that phases are just
like they should be - angles that describe the
offset of the reflections wave from the reference
wave. So they run from - -180 to 180 or 360.
12The electron density map is calculated from the
equation using the absolute value of the
F(thats our FWT or other) times the two
exponentials
So, we have everything we need in the file, the
f, h, k, l, x, y, z, Fhkl. All we need is a
fast computer to do all the sums
13Before we look at our maps, we are goingto run a
rigid body refinement on our PHASER output
coordinates
- Open the ccp4i GUI, and go to your project for
this lysozyme Molecular Replacement - Start the Refmac program by clicking on the
Refinement menu, and then Refmac - Fill in the blanks that I tell you - we will talk
more about refinement later
14Ccp4 can calculate electron density maps(in a
program called FFT Fast Fourier Transform, but
COOT does it for us
15Refinement Links
- A slide show on Refmac from the CCP4 people
- An online lecture from one of the CCP4 authors
no theory - Another online lecture from Randy Read - a really
really smart crystallographer
16Refinement is the automated partof adjusting
the model to fit the data
- What do we mean by the model ?
- Here is a line from a coordinate file in pdb
format - ATOM 1 N LYS A 1 -20.428 45.403
9.798 1.00 15.98 A N - There are four parameters (x,y,z, and B-factor
15.98) about the atom that describe its position
and average disorder B-factor includes thermal
motion and crystalline disorder of unspecified
nature
17At lower resolution, (2.5 - 3 Ã…)
- The way that the ratio of observations to
parameters is increased is to add observations
as bond lengths, bond angles, van der Waals
contacts, deviations from planarity or chirality,
non-crystallographic symmetry - these are called
restraints - These are weighted with respect to the
positional parameters such that they are not
allowed to deviate as much (from their standard
values in known, well-refined small molecule
structures) when the resolution is lower this
prevents us from overfitting density that does
not have the detail we need to REALLY say where
an atom is, for example - You will see this graphically when you look at
your higher resolution refinement maps.
18So, the parameters are shifted and optimized in
some way,while some target function is
evaluatedthat measures the fit of model to data
- Examples of optimization methods
- Gradient descent methods calculate shifts for the
parameters, and then, using differential
equations, re-calculate shifts based on trying to
make the gradient zero (to be at a minimum of the
target function - Search methods generate a random set of models
(e.g.simulated annealing uses molecular dynamics
to generate these models applies kinetic energy
to atoms to move them) and then - require more
computer power but have larger radius of
convergence
19Examples of target functions
- Empirical energy - something like using the model
with the lowest conformational energy. Not very
realistic because it doesnt lend itself to error
analysis - Least squares residual - sum of squares of the
difference between the observed minus predicted
parameter divided by the standard deviation -
suffers from the bad assumption that errors in
the observations have a normal distrubution - Maximum likelihood - Allows for more arbitrary
distributions of errors in observations - uses
Bayesian statistical methods (beyond my current
ability to understand well enough to try and
explain) to create models of this error that
supposedly make it a better target than least
squares, which was used routinely before the
1990s
20Optimization methods and target functionsof
commonly used refinement programs
- Refmac - uses gradient descent method to minimize
maximum likelihood OR least squares - CNS - Uses simulated annealing search method to
minimize maximum likelihood - Others - see Tronrun paper Table at the end
21Checking the success of refinement
cross-validation by R-free
- Equation for conventional R-factor residual
agreement of model with data - R S Fo - Fc / S Fo
- If refinement is done incorrectly - this number
can be deceptively low, which we want, but the
model can be extremely inaccurate (low
resolution). - What if we set aside a small percentage (e.g. 5)
of the data to calculate an Rfactor for, but NOT
to use in the refinement itself - a more unbiased
R-factor (Brunger, 1993).
22Rfree should drop along with Rworkingduting the
refinement, and stay within somethinglike 4
percentage points of each other
23Refinement is finished when you decideyou
cant flatten the difference map any more, or
you get the lowest possible Rfree
- Ususally, you cant ever flatten the difference
map, and you stop when you cant stand it any
more, or, really, when you dont drop the Rfree
by doing anything reasonable - You will see papers where Rfree is as high as say
29 - but these are usually hot structures at
fairly low resolution, like membrane proteins or
enormous complexes (ribosome e.g.) where
publication often happens before refinement is
really finished - But, how do you know your structure is correct???
24COOT has a nice set of tools for lookingat areas
of the structure that are unusualor bad
- If we go to the Validate menu in COOT and
select Ramachandran plot, we will see residues
that are in preferred areas of PHI/PSI backbone
angles, (pink), in allowed but less preferred
by energetics and known structures (yellow), and
dis-allowed (for all by Gly) - Can click on the residues and COOT takes you to
it so you can take a look!
25Also, Refmac prints out (in the pdb file) a set
of error measures that are usefuland required
for deposition
- The RMS deviations from ideal values are a bit
misleading, since you use them to restrain the
refinement - Coordinate error is always lower at higher
resolution, of course