Title: Refinement of Macromolecular structures using REFMAC5
1Refinement of Macromolecular structures using
REFMAC5
- Garib N Murshudov
- York Structural Laboratory
- Chemistry Department
- University of York
2Contents
- Introduction
- Considerations for refinement
- TWIN
- TLS
- Dictionary and alternative conformations
- Bulk solvent
- New features KL B-value, local ncs, external
structure, map sharpening - Conclusions
3Available refinement programs
- SHELXL
- CNS
- REFMAC5
- TNT
- BUSTER/TNT
- Phenix.refine
- RESTRAINT
- MOPRO
4What can REFMAC do?
- Simple maximum likelihood restrained refinement
- Twin refinement
- Phased refinement (with Hendrickson-Lattmann
coefficients) - SAD/SIRAS refinement
- Structure idealisation
- Library for more than 9000 ligands (from the next
version) - Covalent links between ligands and ligand-protein
- Rigid body refinement
- NCS local, restraints to external structures
- TLS refinement
- Map sharpening
- etc
5Considerations in refinement
- Function to optimise (link between data and
model) - Should use experimental data
- Should be able to handle chemical (e.g bonds) and
other (e.g. NCS, structural) information - Parameters
- Depends on the stage of analysis
- Depends on amount and quality of the experimental
data - Methods to optimise
- Depends on stage of analysis simulated
annealing, conjugate gradient, second order
(normal matrix, information matrix, second
derivatives) - Some methods can give error estimate as a
by-product. E.g second order.
6Two components of target function
- Crystallographic target functions have two
components one of them describes the fit of the
model parameters into the experimental data
(likelihood) and the second describes chemical
integrity (restraints). - Currently used restraints are bond lengths,
angles, chirals, planes, ncs if available, some
torsion angles, jelly body, external structure etc
7Various forms of functions
- SAD function uses observed F and F- directly
without any preprocessing by a phasing program
(It is not available in the current version but
will be available soon) - MLHL - explicit use of phases with Hendrickson
Lattman coefficients - Rice - Maximum likelihood refinement without
phase information
8Twin refinement
- Twin refinement in the new version of refmac is
automatic. - Twin operators are identified
- Rmerge for each operator is calculated and
operators for which Rmergelt0.50 are kept Twin
plus crystal symmetry operators should form a
group - Twin fractions are refined and only domains with
fraction above certain threshold are kept
(default threshold is 0.05) Twin plus symmetry
operators should form a group - Intensities can be used
- Twin refinement is not possible together with SAD
yet - Maximum likelihood refinement is used
- Twinning can be used even if there is no twin
indication
9Likelihood
The dimension of integration is in general twice
the number of twin related domains. Since the
phases do not contribute to the first part of the
integrant the second part becomes Rice
distribution. The integration is carried out
using Laplace approximation. In principle these
equations are general enough to account for
non-merohedral twinning (including allawtwin),
unmerged data. A little bit modification should
allow simultaneous twin and SAD/MAD phasing.
10Electron density likelihood based
Equation for map calculation It seems to be
working reasonable well. For unbiased map it is
necessary to integrate over errors in all
parameters. I hope it will be available in the
next version of refmac
11Twin Few warnings about R factors
- For acentric case only
- For random structure
- Crystallographic R factors
- No twinning
58 - For perfect twinning twin modelled
40 - For perfect twinning without twin modelled
50 - R merges without experimental error
- No twinning
50 - Along non twinned axes with another axis than
twin 37.5
Non twin
Twin
12Effect of twinning on electron density
Using twinning in refinement programs is
straightforward. It improves statistics
substantially (sometimes R-factors can go down by
10). However improvement of electron density is
not very dramatic (just like when you use TLS).
It may improve electron density in weak parts but
in general do not expect miracles. Especially
when twinning and NCS are close then improvements
are marginal.
13Parameters
- Usual parameters (if programs allow it)
- Positions x,y,z
- B values isotropic or anisotropic
- Occupancy
- Derived parameters
- Rigid body positional
- After molecular replacement
- Isomorphous crystal (liganded, unliganded,
different data) - Rigid body of B values TLS
- Useful at the medium and final stages
- At low resolution when full anisotropy is
impossible - Torsion angles
14Bulk solventMethod 1 Babinets bulk solvent
correction
At low resolution electron density is flat. Only
difference between solvent and protein regions is
that solvent has lower density than protein. If
we would increase solvent just enough to make its
density equal to that of protein then we would
have flat density (constant). Fourier
transformation of constant is zero (apart from
F000). So contribution from solvent can be
calculated using that of protein. And it means
that total structure factor can calculated using
contribution from protein only
S
P
?s?p?T ltgt FsFpFT ?sk?pc ltgt
FskFp0 Fs-kFp gt FTFp-kFp(1-k)Fp
k is usually taken as kb exp(-Bbs2). kb must be
less than 1. kb and Bb are adjustable parameters
15Bulk solventMethod 2 Mask based bulk solvent
correction
Total structure factor is the sum of protein
contribution and solvent contribution. Solvent
region is flat. Protein contribution is
calculated as usual. The region occupied by
protein atoms is masked out. The remaining part
of the cell is filled with constant values and
corresponding structure factors are calculated.
Finally total structure factor is calculated using
S
FTFpksFs
ks is adjustable parameter.
Mask based bulk solvent is a standard in all
refinement programs. In refmac it is default.
16Overall parameters Scaling
- There are several options for scaling
- Babinets bulk solvent assumes that at low
resolution solvent and protein contributors are
very similar and only difference is overall
density and B value. It has the form kb 1-kb
e(-Bb s2/4) - Mask bulk solvent Part of the asymmetric unit
not occupied by atoms are asigned constant value
and Fourier transformation from this part is
calculated. Then this contribution is added with
scale value to protein structure factors. Total
structure factor has a form Ftot Fpssexp(-Bs
s2/4)Fs. - The final total structure factor that is scaled
has a form - sanisosprotein kbFtot
17TLS groups
- Rigid groups should be defined as TLS groups. As
starting point they could be subunits or
domains. - If you use script then default rigid groups are
subunits or segments if defined. - In ccp4i you should define rigid groups (in the
next version default will be subunits). - Rigid group could be defined using TLSMD
webserver - http//skuld.bmsc.washington.edu/tlsmd/
18Alternative conformation Example in pdb file
- ATOM 977 N GLU A 67 -11.870 9.060
4.949 1.00 12.89 N - ATOM 978 CA GLU A 67 -12.166 10.353
4.354 1.00 14.00 C - ATOM 980 CB AGLU A 67 -13.562 10.341
3.738 0.50 14.81 C - ATOM 981 CB BGLU A 67 -13.526 10.285
3.654 0.50 14.35 C - ATOM 986 CG AGLU A 67 -13.701 9.400
2.573 0.50 16.32 C - ATOM 987 CG BGLU A 67 -13.876 11.476
2.777 0.50 14.00 C - ATOM 992 CD AGLU A 67 -15.128 9.179
2.134 0.50 17.17 C - ATOM 993 CD BGLU A 67 -15.237 11.332
2.110 0.50 15.68 C - ATOM 994 OE1AGLU A 67 -15.742 10.153
1.644 0.50 20.31 O - ATOM 995 OE1BGLU A 67 -15.598 12.213
1.307 0.50 16.68 O - ATOM 996 OE2BGLU A 67 -15.944 10.342
2.389 0.50 18.94 O - ATOM 997 OE2AGLU A 67 -15.610 8.027
2.235 0.50 21.30 O - ATOM 998 C GLU A 67 -12.110 11.473
5.386 1.00 13.40 C - ATOM 999 O GLU A 67 -11.543 12.528
5.110 1.00 12.98 O - Note that pdb is strictly formatted. Every
element has its position
19Problems of low resolution refinement
- Function to describe fit of the model into
experiment likelihood or similar - Data may come from very peculiar crystals
Twin, OD-disorder, multiple cell - Radiation damage
- Converting I-s to F may not be valid
- Limited and noisy data use of available
knowledge - Known structures
- Internal patterns NCS, secondary structure
- Smeared electron density with vanishing side
chains, secondary structures, domains High B
values and series termination - Filtering methods Solve inverse problem with
regulariser - Missing data problem Data augmentation, bootstrap
20Use of available knowledge 1) NCS local2)
Restraints to known structure(s)3) Restraints to
current inter-atomic distances (implicit normal
modes or jelly body)4) Better restraints on B
values These are available from the version
5.6NoteBuster/TNT has local NCS and
restraints to known structures CNS has
restraints to known structures (they call it
deformable elastic network)Phenix has B-value
restraints on non-bonded atom pairs and automatic
global NCSLocal NCS (only for torsion angle
related atom pairs) was available in SHELXL since
the beginning of time
21Auto NCS local and global
- Align all chains with all chains using
Needleman-Wunsh method - If alignment score is higher than predefined
(e.g.80) value then consider them as similar - Find local RMS and if average local RMS is less
than predefined value then consider them aligned - Find correspondence between atoms
- If global restraints (i.e. restraints based on
RMS between atoms of aligned chains) then
identify domains - For local NCS make the list of corresponding
interatomic distances (remove bond and angle
related atom pairs) - Design weights
- The list of interatomic distance pairs is
calculated at every cycle
22Auto NCS
- Global RMS is calculated using all aligned atoms.
- Local RMS is calculated using k (default is 5)
residue sliding windows and then averaging of the
results
23Auto NCS Neighbours
Water or ligand
Shell 2
- After alignment, neighbours are analysed.
- Each water, ligand is assigned to the chain they
are close to. - Neighbours included in restrains if possible
Shell 1
Water or ligand
Chain B
Shell 2
Shell 1
24Auto NCS Iterative alignment
Example of alignment 2vtu. There are two chains
similar to each other. There appears to be gene
duplication RMS all aligned atoms Ave(RmsLoc)
local RMS
Alignment results
--------------------------------------------------
----------------------------- N Chain 1
Chain 2 No of aligned Score RMS
Ave(RmsLoc) ----------------------------------
---------------------------------------------
1 J( 131 - 256 ) J( 3 - 128 ) 126
1.0000 5.2409 1.6608 2 J( 1 -
257 ) L( 1 - 257 ) 257 1.0000
4.8200 1.6694 3 J( 131 - 256 ) L(
3 - 128 ) 126 1.0000 5.2092 1.6820
4 J( 3 - 128 ) L( 131 - 256 ) 126
1.0000 3.0316 1.5414 5 L( 131
- 256 ) L( 3 - 128 ) 126 1.0000
0.4515 0.0464 ----------------------------
--------------------------------------------------
--------------------------------------------------
--------------
25Auto NCS Conformational changes
Domain 2
In many cases it could be expected that two or
more copies of the same molecule will have
(slightly) different conformation. For example if
there is a domain movement then internal
structures of domains will be same but between
domains distances will be different in two copies
of a molecule
Domain 2
Domain 1
26Robust estimators
One class of robust (to outliers) estimators are
called M-estimators maximum-likelihood like
estimators. One of the popular functions is
Geman-Mcclure. Essentially when distances are
similar then they should be kept similar and when
they are too different they should be allowed to
be different. This function is used for NCS
local restraints as well as for restraints to
external structures
Red line x2 Black line x2/(1w
x2) where x(d1-d2)/s, w0.1
27Restraints to external structuresIt is done by
Rob Nicholls
- Compares Two Protein Chains
- Conformation-invariant structural comparison
- Residue-residue alignment
- Superimposition
- Residue-based and global similarity scores
- Produces local atomic distance restraints
- Based on one or more aligned chains
- Possibility of multi-crystal refinement
28ProSmart Restrain
structure to be refined known similar
structure (prior)
xÅ
29ProSmart Restrain
structure to be refined known similar
structure (prior)
Remove bond and angle related pairs
30To allow conformational changes, Geman-McClure
type robust estimator functions are used
31Restraints to current distances
The term is added to the target
function Summation is over all pairs in the
same chain and within given distance (default
4.2A). dcurrent is recalculated at every cycle.
This function does not contribute to gradients.
It only contributes to the second derivative
matrix. It is equivalent to adding springs
between atom pairs. During refinement
inter-atomic distances are not changed very much.
If all pairs would be used and weights would be
very large then it would be equivalent to rigid
body refinement. It could be called implicit
normal modes, soft body or jelly body
refinement.
32B value restraints and TLS
- Designing restraints on B values is much more
difficult. - Current available options to deal with B values
at low resolutions - Group B as implemented in CNS
- TLS group refinement as implemented in refmac and
phenix.refine - Both of them have some applications. TLS seems to
work for wide range of cases but unfortunately it
is very often misused. One of the problems is
discontinuity of B values. Neighbouring atoms may
end up having wildly different B values - In ideal world anisotropic U with good restraints
should be used. But this world is far far away
yet. Only in some cases full aniso refinement at
3Å gives better R/Rfree than TLS refinement.
These cases are with extreme ansiotropic data.
TLS2
TLS1
loop
33Parameters B value restraints and TLS
- Restraints on B values
- Differences of projections of aniso U of atom on
the bond should be similar (rigid bond) - Kullback-Liblier (conditional entropy) divergence
should be small - For isotropic atoms (for bonded and non-bonded
atoms) - B1/B2B2/B1-2
- Local TLS Neighboring atoms should be related as
TLS groups (not available yet)
34Kullback-Leibler divergence
If there are two densities of distributions
p(x) and q(x) then symmetrised Kullback-Leibler
divergence between them is defined (it is
distance between distributions) If both
distributions are Gaussian with the same mean
values and U1 and U2 variances then this distance
becomes And for isotropic case it
becomes Restraints for bonded pairs have more
weights more than for non-bonded pairs. For
nonbonded atoms weights depend on the distance
between atoms. This type of restraint is also
applied for rigid bond restraints in anisotropic
refinement
35Example, after molecular replacement 3A
resolution, data completeness 71
Rfactors vs cycle Black simple refinement Red
Global NCS Blue Local NCS Green Jelly
body Solid lines Rfactor Dashed lines -
Rfree
36Example 4A resolution, data from pdb 2r6c
Rfactors vs cycle Black Simple refinement Red
External restraints Blue Jelly body Solid
lines Rfactor Dashed lines - Rfree
37MAP SHARPENING INVERSE PROBLEM
. Very simple case blurring is due to overall
B value. Sharpening function is
38MAP SHARPENING 2R6C, 4Å RESOLUTION
Original
No sharpening
Top left and bottom After local NCS refinement
Sharpening, median B a optimised
Sharpening, median B a 0
39Some of the other new features in REFMAC
SAD refinement available from version
5.5 SIRAS refinement available from version
5.6 New and complete dictionary available from
version 5.6 Improved mask solvent available
from version 5.6 Jligand for ligand dictionary
and link description
40How to use new features
Download refmac from the website www.ysbl.york.ac.
uk/refmac/data/refmac_experimental/refmac5.6_linux
.tar.gz www.ysbl.york.ac.uk/refmac/data/refmac_exp
erimental/refmac5.6_macintel.tar.gz Download
the dictionary www.ysbl.york.ac.uk/refmac/data/re
fmac_experimental/refmac5.6_dictionary_v5.18.gz
Change atom names using molprobity (optional
important if you have dna/rna) http//molprobity.b
iochem.duke.edu/ Refmac refmac5 with the new one
and you are ready for the new version.
41Twin refinement (it works with older version
also)
42Adding external keywords
- Add the following command to a file
- ncsr local automatic and local ncs
- ridg dist sigm 0.05 jelly body restraints
- mapcalculate shar regularised map sharpening
- Save in a file (say keyw.dat)
-
43Add external keywords file in refmac interface
Browse files
44Add external keywords file in refmac interface
Select keywords file
45Add external keywords file in refmac interface
Keywords file
46Things to look at
- R factor/Rfree They should go down during
refinement - Geometric parameters rms bond and other. They
should be reasonable. For example rms bond should
be around 0.02 - Map and coordinates using coot
- Logggraph outputs. That is available on the cpp4i
interface
47Behaviour of R/Rfree, average Fobs vs resolution
should be reasonable. If there is a bump or it
has an irregular behaviour then either something
is wrong with your data or refinement.
48What and when
- Rigid body At early stages - after molecular
replacement or when refining against data from
isomorphous crystals - TLS - at medium and end stages of refinement at
resolutions up to 1.7-1.6A (roughly) - Anisotropic - At higher resolution towards the
end of refinement - Adding hydrogens - Higher than 2A but they could
be added always - Phased refinement - at early and medium stages of
refinement - SAD - at all stages(?)
- Twin always try (?)
- Ligands - as soon as you see them
- Jelly body at low resolution and early stages
- External Structure at low resolutions
- Map sharpening try with and without
49Conclusion
- Twin refinement improves statistics and
occasionally electron density - Use of similar structures should improve
reliability of the derived model Especially at
low resolution - NCS restraints must be done automatically but
conformational flexibility must be accounted for - Jelly body works better than I thought it
should - Regularised map sharpening looks promising. More
work should be done on series termination and
general sharpening operators
50Acknowledgment
- York Leiden
- Alexei Vagin Pavol Skubak
- Andrey Lebedev Raj Pannu
- Rob Nocholls
- Fei Long
- CCP4, YSBL people
- REFMAC is available from CCP4 or from Yorks ftp
site - www.ysbl.york.ac.uk/refmac/latest_refmac.html
- This and other presentations can be found on
- www.ysbl.york.ac.uk/refmac/Presentations/