Title: Validation of NMRderived structures
1Validation of NMR-derived structures
- Chris Spronk
- Centre for Molecular and Biomolecular Informatics
- University of Nijmegen
- The Netherlands
2Overview
- Introduction
- Selection of structures
- Validation of structures using experimentally
derived restraints - Validation of structures using local and overall
structure quality indicators
3Why validate structures?
- Bio-macromolecular structures are a valuable
source for understanding biology - Structure based drug design
- Homology modeling
- Structural genomics
- Structures should be reliable
- Satisfy experimental data
- Good local and overall quality
4NMR Structure determination
NMR experimental data
Structure ensemble
Experimental restraints
Structure calculation and selection
Assignment and conversion
Restraint violation and error analysis
Validated structure data
Structure quality checks and statistics
(often not done!)
5Selection of NMR structures (1)
- Common selection criteria
- Violation cutoff criteria, e.g.
- No distance restraint violations gt 0.5Ã…
- No dihedral angle violations gt 5
- Energy criteria e.g.
- Select a sub-ensemble consisting of the lowest
energy structures
6Selection of NMR structures (2)
rmsd3.04
rmsd0.82
rmsd0.77
7Satisfaction of experimental data
- Number and size of violations
- Fit of the structure to the experimental data
- Root mean square deviations
- NOE, Dihedral angles etc.
- Cornilescu Q-factors
- Residual dipolar couplings
- Experimental restraint energy
8Evaluation of experimental data (1)
- Number of restraints
- Completeness
9Evaluation of experimental data (2)
- Restraints per residue
- Completeness
- per residue
10Evaluation of experimental data (3)
11Evaluation of experimental data (4)
- Intra-residual
- Side chain conformation
- Sequential residue i to residue i1
- Secondary structure
- Medium range residue i to residue i5
- Secondary structure
- Long range residue i to residue gt i5
- Secondary and tertiary structure
12Evaluation of experimental data (5)
- New method for evaluation of distance restraints
- QUantitative Evaluation of Experimental NMR
restraints QUEEN
13QUEEN (1)
- Quantification of the information contained in
distance restraints - Relative contributions of each restraint to the
determination of a structure -
- The method allows to identify
- Important restraints
- Unique restraints (?error analysis)
- Redundant restraints
14QUEEN (2)
YBOX 1h95
10 most important restraints
15QUEEN (3)
1GB1 Rather unique
3GB1 Less unique
IgG binding domain (1gb1)
X-ray structure
Restraint Ala26 Tyr45
16QUEEN (4)
17R-factors and cross-validation
- R-factor
- Free R-factor (X-ray)
- Calculate structures with 90 of data (working
set) - Determine free R on 10 of data (test set)
- Complete cross validation (NMR)
- Use a number of different randomly chosen test
sets
18Uncertainty in structure coordinates
- X-ray crystallography B-factor
- Quality of the crystal
- Dynamic behavior of the molecule
- Disorder
- NMR atomic Root Mean Square Deviation or RMSD
- Should reflect measured dynamics and the
uncertainty in the experimental data - Used as a measure for precision and accuracy
19Coordinate RMSDs
- Calculation requires superposition of structures
- Region dependent
- Use Circular Variance, CV?
- Structure selection criteria are subjective
- Allowed variation in structures depends on the
force-field
20RMSD and Circular Variance
Arg55
Ser74
21Precision versus Accuracy (1)
- Precision is the variation of X around ltXgt
- expressed as standard deviation or variance
- Accuracy is the closeness of ltXgt to the true
value of X - Precision and accuracy are often mixed in the
literature
22Precision versus Accuracy (2)
Precise, not accurate
Accurate, not precise
Precise and accurate
Not accurate and not precise
23Accuracy of NMR structures
- Accuracy can only be assessed when the true
structure is known (Gold Standard) - Only the case for simulated data-sets
- Sometimes X-ray structures are used
- Different experimental conditions
- Crystal contacts
- In some cases X-ray structures fit NMR data
better than NMR structures
24Precision and true variance
Precision underestimates true variance
Precision equals true variance
Precision overestimates true variance
25Re-sampling of ensembles
Original ensemble (60 structures)
Re-sampled ensemble (60 structures)
Narrow bundle low RMSD (0.38) high
precision Unrealistic error estimate
Wide bundle high RMSD (0.94) low
precision More realistic error estimate
26Part II
27Validation of protein structure quality
- What type of properties are important?
- How can we check these properties?
- PROCHECK
- WHAT IF (FULLCHECK)
- How is the quality of the properties expressed
- Z-scores, RMS Z-scores (WHAT IF)
28Validation criteria for protein structures
- Local geometry
- Bond lengths, bond angles, chirality, omega
angles, side chain planarity - Overall quality
- Ramachandran plot, rotameric states, packing
quality, backbone conformation - Others
- Inter-atomic bumps, buried hydrogen-bonds,
electrostatics
29Bonded geometry
Distorted C?-chirality
L-amino acid
D-amino acid
30Rotameric states
Eclipsed
Staggered
31Inter-atomic bumps
Overlap of two backbone atoms
32Omega angles
Trans-conformation (omega180)
Cis-conformation (omega0)
33Side chain planarity
Planar ARG side-chain (Good)
34Internal hydrogen bonding
Internal hydrogen bonding in Crambin
35Electrostatics
After energy minimization including electrostatics
Bad electrostatics
36Packing quality
Good packing
Bad packing
37Backbone Conformation
Very normal
Very unique
38Ramachandran Plot
Phi and Psi angles
Ramachandran plot
39PROCHECK WHAT IF
- PROCHECK and PROCHECK_NMR
- Very useful graphical and text output
- WHAT IF
- More checks and more critical checks
- The reference data base of X-ray structures is
continuously updated
40The WHAT IF reference set
- Overall quality
- Well refined high resolution X-ray structures
(resolution lt 2.0 Ã…ngstrom, R-factor lt 19) - Continuously updated
- Local geometric quality
- Cambridge small molecule database (CSD)
- Well refined high resolution X-ray structures
41A WHAT IF summary report
42Normal distributions and Z-scores
43Normal distributions andRMS Z-scores
RMS Z-score0.5
RMS Z-score1.0 (reference)
RMS Z-score2
44Z-scores and RMS Z-scores
- Structure Z-scores
- Z-scores gt 0 are better than average
- Z-scores lt 0 are worse than average
- However A Z-score of -1 is equally likely as a
Z-score of 1!! - Local geometry RMS Z-scores
- Too tight restraining of geometry ? RMS Z-score lt
1 - Too loose restraining of geometry ? RMS Z-score gt
1 - Proper Gaussian distribution ? RMS Z-score 1
45Z-scores and dihedral angle distributions
Ramachandran
Chi-1/Chi-2
Z-score 1.8
Z-score 1.9
Z-score -8.5
Z-score -5.3
46RMS Z-scores and bond and angle distributions
RMS Z-score 0.96
RMS Z-score 0.97
RMS Z-score 0.22
RMS Z-score 0.01
47X-ray versus NMR structures
48NMR structures at the PDB
49Improving protein NMR structures
- Structures can be significantly improved by final
refinement in explicit water - Currently a data base of refined NMR structures
is built at the CMBI
50Data base potentials
- Improves the appearance of the quality of
structures - Ramachandran plot refinement
- Chi-1/Chi-2 rotamer refinement
- Use with caution!
51Practicals
- The practicals are focused on the use of WHAT IF
and PROCHECK only Use these two program to have
a look at pdb-entries 1i1s and 1ka3. Want went
wrong in these structures?
52Acknowledgements
- Nijmegen University Sander Nabuurs
- Elmar Krieger
- Gert Vriend
- Geerten Vuister
- BioMagResBank Jurgen Doreleijers
- Utrecht University Aart Nederveen
- Alexandre Bonvin
- EBI, Cambridge Wim Vranken