Title: NMR Structure Refinement
132nd Steenbock Symposium on Biochemistry, May
18-21, 2006, U of Wisconsin at Madison
Protein Structure Refinement via Database Derived
Distance Constraints and Mean-Force Potentials
Zhijun Wu Department of Mathematics Program on
Bioinformatics and Computational Biology Iowa
State University Joint with Feng Cui, Di Wu,
Robert Jernigan
2Protein Structure Refinement
Refinement of protein structures determined by
X-ray crystallography, NMR, or comparative
modeling.
Comparative Models
RMSD 4.57Å
RMSD 3.76Å
RMSD 2.81Å
RMSD 2.19Å
From CASP6
3Distance Based Approach
Determine the coordinates of the atoms (and hence
the structure of the protein) based on protein
inter-atomic distances, in addition to the
chemistry knowledge on bond-lengths, bond-angles,
force fields, etc.
Inter-Atomic Distances
3D Structures
NMR can be considered as a major distance based
approach distances are also used in X-ray
structure determination (Patterson map) the
approach can be used for structural determination
as well as refinement.
4Distance Based Approach
Distance Geometry Problem
Inter-Atomic Distances
3D Structures
5Deriving Statistical Distances from Databases of
Known Protein Structures
Miyazawa, Jernigan 1985
cross residue, inter-atomic distances
Sippl 1990
Wall, Subramaniam, Phillips 1999
Cui, Jernigan, WuZ 2005
WuD, Jernigan, WuZ 2006
6Distributions of the Distances
PA1, A2, R1, R2, S (D)
R1 H O
A1
Ca N C
A1 N, A2 C, R1 ALA, R2 ALA, S 0
A2
N C Ca
H O R2
in Databases of Known Protein Structures
7Distributions of the Distances
PA1, A2, R1, R2, S1, S2, S3 (D) Probability
Distribution of Cross-Residue Inter-Atomic
Distances in Databases of Known Protein
Structures A1 1st Atom, A2 2nd Atom, R1
1st Residue, R2 2nd Residue, S1, S2, S3
Separating Residues For any D in Di,
Di1, PA1, A2, R1, R2, S1, S2, S3 (D)
Distances in Di, Di1 / Distances in D0, Dn
in Databases of Known Protein Structures
8PIDD A Database for Protein Inter-atomic
Distance Distributions
2090 X-ray structures with resolution of 2 Å or
higher and sequence similarity of 70 or less
from PDB Data Bank are utilized.
The probability distributions of 320,000,000
short-range, cross-residue, inter-atomic
distances are profiled.
Structural Database
Distance Database
9Web interface
PIDD
Selections
PIDD http//www.math.iastate.edu/pidd
WuD, Cui, Jernigan, WuZ 2006
Graph display
10Database Derived Distance Constraints
l mean 2 std
u mean 2 std
l d u
11Database Derived Mean-Force Potentials
P PA1,R1,A2,R2,S1,S2,S3(D)
12Refining NMR Structures with Database Derived
Distance Constraints
Cui, Jernigan, WuZ 2005
Ensemble RMSD Values against the Average
Structures
Refined with CNS simulated annealing protocol
NMR data from BioMagResBank.
13Refining NMR Structures with Database Derived
Distance Constraints
Cui, Jernigan, WuZ 2005
Ensemble RMSD Values against the X-ray Reference
Structures
Refined with CNS simulated annealing protocol
NMR data from BioMagResBank.
14Prion E200K
Critical Loop
Cui, Mukhopadhyay, Young, Jernigan, WuZ 2005
15Cui, Mukhopadhyay, Young, Jernigan, WuZ 2005
16Refining NMR Structures with Database Derived
Mean-Force Potentials
WuD, Jernigan, WuZ 2006
Average Energy Values in the Structural Ensembles
Refined with CNS distance geometry simulated
annealing protocol NMR data from BioMagResBank.
17Refining NMR Structures with Database Derived
Mean-Force Potentials
WuD, Jernigan, WuZ 2006
Ensemble RMSD Values
1I6F
CNS
1I6F
PMF
Refined with CNS distance geometry simulated
annealing protocol NMR data from BioMagResBank.
18Total 70 NMR-determined structures were refined
using the database derived mean-force potentials.
The energies of about 80 of the structures were
significantly minimized (in average, by 7.5)
after the refinement.
In average, the percentage of the residues in the
most favorable region of the Ramachandran plot
was increased from 69.1 to 73.4. The increase
was observed in about 80 of the refined
structures
WuD, Jernigan, WuZ 2006
Average Results of Ramachandran Plots of 70
Refined Structures
19Refinement of Comparative Models
CASPR 2006 Over the course of the CASP
experiments, it has become clear that refinement
of comparative models of protein structure is a
major challenge. In spite of considerable effort,
there has been little or no progress in a decade.
CASPR 2006 Target 4 Hypothetical protein
residues 70 Residue RMS 2.19Å
Crystal structure 1WHZ (Resolution 1.52Å)
Target structure TMR04 (by Baker Group)
20Refinement of Comparative Models
- A set of distance bounds is generated from the
target structure by allowing some distances to be
flexible by 20. (SEXP) - A set of cross-residue distances is selected to
be optimized by the database derived mean-force
potentials. (EDB) - A large set of energy minima is determined by
running CHARMM in parallel on multi-processors.
(ETH) - A small set of energy-minimized structures is
refined by using CNS with the generated distances
constraints and the derived mean-force
potentials. (STH SEXP ETH EDB) - The final structure is determined from the
structures generated by CNS based on energy,
Ramachandran plots, and RMSD.
21Refinement of Comparative Models
Ravindrudu, WuD, Gunaratne, Jernigan, WuZ 2006
Refined Model
Residue RMSD 1.80Å
CASPR 2006 Target 4 Hypothetical protein
residues 70 Residue RMSD 2.19Å
Crystal structure 1WHZ (Resolution 1.52Å)
Target structure TMR04 (by Baker Group)
2232nd Steenbock Symposium on Biochemistry, May
18-21, 2006, U of Wisconsin at Madison
Remarks
Protein structures can be refined by using
database-derived distance constraints and
mean-force potentials.
The obtained structures are still theoretical
models and should be evaluated carefully and used
with caution.
To increase the accuracy, distributions of
distances under more specific conditions should
be considered.
It is yet to be investigated on how to evaluate
the refined structures or structural ensembles
(energy? ?-f- plots? RMSD?).