Title: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION
1DISTANCE MATRIX-BASED APPROACH TO PROTEIN
STRUCTURE PREDICTION
- Andrzej Kloczkowski, Robert L. Jernigan, Zhijun
Wu, Guang Song, Lei Yang - Iowa State University,
USA - Andrzej Kolinski, Piotr Pokarowski - Warsaw
University, Poland
2Matrices containing structural information
- Distance matrix (dij)
- Matrix of square distances D (dij2)
- Contact matrix C (cij)
- cij 1 if dij gt dcutoff
- otherwise cij 0
- Laplacian of C (Kirchhoff matrix)
- Lc diag(Scij) - C
-
3- Lc-1 generalized inverse of Lc in elastic network
models defines covariance between fluctuations - Similarly we can define Laplacian of D LD and
generalized inverse LD-1
4Spectral decomposition of structural matrices
-
- A S lk vk vkT
- is expressed by eigenvalues and corresponding
eigenvectors of A
5Spectral decomposition of a square distance matrix
- Spectral decomposition of a square distance
matrix is a complete and simple description of a
system of points. It has at most 5 nonzero,
interpretable terms - A dominant eigenvector is proportional to r2 -
the square distance of points to the center of
the mass, and the next three are principal
components of the system of points.
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10- CN contact number
- PECM principal eigenvector of the contact
matrix - GNM fluctuations of residues computed from the
Gaussian Network Model (Bahar et al. 1997) - SVR Support Vector Regression variant of SVM
for continuous variables - B-factor temperature factor from X-ray
crystallography
11- B-factor correlates with the distance from the
center of mass r2 Petsko 1980 - Correlation between fluctuations of residues and
the inverse of their contact number Halle 2002
12Approximation of distance matrices
- A S lk vk vkT
- We used a nonredundnt database of 680 structures
from the ASTRAL database - r2 itself approximates structures with DRMS 7.3Ã…
- r2 combined with first principal component
approximates structures with DRMS 4.0Ã…
13- Current work
- Prediction of r2 from the sequence with SVR
- Prediction of the first structural component
from the sequence
14Principal Component Analysis of Multiple HIV-1
Proteases Structures
- 164 X-ray PDB structures and 28 NMR PDB
structures and 10,000 structures (snapshots) from
the Molecular Dynamics simulations were analysed. - The Principal Component Analysis of these three
different datasets were performed. - The results were compared with normal modes
computed from the Anisotropic Network Model an
Elastic Network Model that considers anisotropy
of fluctuations of residues in protein.
15The a-carbon trace of the HIV-1 structure
16Elastic network models
- Rubber elasticity
- (polymers - Flory)
- Intrinsic motions of structures
- (Tirion 1996)
- Simple elastic networks of uniform material
- Appropriate for largest, most important domain
motions of proteins - independent of many
structure details - High resolution structures not needed to learn
about important motions
Rubbery Bodies with Well Defined, Highly
Controlled Motions
17Elastic Network Models Calculating Protein
Position Fluctuations
- Vtot(t) (g/2) tr DR(t)T G DR(t)
- ltDRi . DRjgt (1/ZN) ? (DRi . DRj) exp
-Vtot/kT dDR - (3kT/g) G-1ij
- G Kirchhoff matrix of contacts
-
G
Compute Normal Modes for Fluctuations and
Correlations
18HIV Reverse Transcriptase Slowest Motion
Push-pull Hinge
19Modes of Motion HIV Protease
Mode 1 Mode 2 Mode 3
Three Ways to Open the Flaps
20NMR Structures Fit Elastic Networks Better than
X-Ray Structures
HIV Protease Overlaps between directions of
motions (dot products of vectors)
Includes Many Drug Bound Structures Distortions
for Drug Binding Are Intrinsic to Protein
Structure
Results for 164 X-ray and 28 NMR HIV Protease
Structures
21Cumulative Overlaps with NMR Motions
NMR Agreement Better than X-ray
22Structural Refinement Using Distribution of
Distances
- We have developed a method of refining NMR
structures using derived distance constraints and
mean-force potentials. - The original NMR experimental constraints for the
structures were downloaded from BioMagResBank. - The structures were refined using the default
dynamic simulated annealing protocol implemented
in CNS software (Brunger et al. Yale Univ). - We used also mean-force potentials E kT ln P(r)
by adding them into the energy function of the
NMR modeling software CNS. The structures have
been improved significantly (in terms of RMSD,
their energy, NOEs, etc.) after refinement with
the database-derived mean-force potentials.
23CASPR 2006
- We have successfully used this method in CASPR
2006 structure refinement experiment. - Figure below shows application of our method for
a model of 1WHZ (70 residues) a refinement from
2.19 Ã… to 1.80 Ã… has been obtained.
24Distance Intervals
The distances are given with their possible
ranges.
i
j
NP-hard!
25A Generalized Distance Geometry Problem
i
Dri
Root mean square fluctuations B-factors
di,j
j
rj
26Protein 1AX8
Data generation fi the rms fluctuation of
atom i. S (i,j) di,j yi yj lt
5Ã… li,j di,j fi fj ui,j di,j fi fj
Original
Problem solved ri the fluctuation radius of
atom i. maxx, r ?D ri3 di,j xi
xj li,j di,j Dri Drj ui,j di,j Dri
D rj, (i,j) in S
Computed
RMSD (x, y) 3.6 e -07
1017 atoms
27Atomic Fluctuations
Original
fi
Dri
Computed
28Acknowledgments
- NIH support
- 1R01GM081680-01 (AKlo)
- 1R01GM073095-01A2 (RLJ) 1R01GM072014-01 (RLJ)