Title: Protein Structure Prediction and Determination
1Protein Structure Prediction and Determination
- Zhijun Wu
- Department of Mathematics
- Iowa State University
2Biological Building Blocks
DNA
GAA GTT GAA AAT CAG GCG AAC CCA CGA CTG
RNA
GAA GUU GAA AAU CAG GCG AAC CCA CGA CUG
PROTEIN
GLU GAL GLU ASN GLN ALA ASN PRO ARG LEU
3Protein Folding
LEU
ARG
ASN
PRO
ALA
ASN
GLN
GLU
GLU
VAL
GLU
GLU
ASN
VAL
LEU
ARG
PRO
ASN
ALA
GLN
. . .
4Examples
Myoglobin, John Kendrew, 1962, Nobel Prize in
Chemistry
Examples
Prion, Stanley B. Prusiner, 1997, Nobel Prize in
Physiology and Medicine
5Methods for Structure Prediction and Determination
Potential Energy Minimization
Molecular Dynamics Simulation
Homology Modeling, Fold Recognition, Inverse
Protein Folding
Nuclear Magnetic Resonance
Protein X-ray Crystallography
6 X-ray Crystallography Computing
- 80 of the structures in PDB
- Data Bank were determined by
- using X-ray crystallography.
- The process is time consuming,
- and some proteins cannot even
- be crystallized.
In X-ray crystallography, protein first needs to
be purified and crystallized, which may take
months or years to complete, if not failed.
- A mathematical problem, called
- the phase problem, needs to be
- solved before every crystal
- structure can be fully determined
- from the diffraction data.
After that, the protein crystal is put into an
X-ray equipment to make an X-ray diffraction
image. The diffraction image can be used to
determine the three-dimensional structure of the
protein.
7 NMR Structure Determination
- 15 of the structures in PDB
- Data Bank were determined by
- using NMR spectroscopy.
The NMR approach is based on the fact that nuclei
spin and generate magnetic fields. When two
nuclei are close their spins interact. The
intensity of the interaction depends on the
distance between the nuclei. Therefore, the
distances between certain pairs of atoms can be
estimated by measuring the intensities of the
nuclei spin-spin couplings.
- Not all distances between pairs of
- atoms can be detected. In
- practice, only lower and upper
- bounds for the distances can be
- obtained also.
- Structure can be determined by
- solving a distance geometry
- problem with the distance data
- from the NMR experiments.
The distance data obtained from the NMR
experiment can be used to deduce the structural
information for the molecule. One way of
achieving such a goal is based on molecular
distance geometry.
8 Potential Energy Minimization
- A reasonably accurate potential
- energy function needs to be
- constructed.
- Given such a function, a local
- minimizer is easy to find, but
- a global one is hard, especially
- if the function has many local
- minimizers. No completely
- satisfactory algorithm has been
- developed yet for minimizing
- proteins.
Hypothesis Protein native structure has the
lowest or almost lowest potential energy. It can
therefore be located at the global energy minimum
of protein.
- Potential energy minimization
- has been used successfully for
- structure refinement though.
9 Molecular Dynamics Simulation
- The step size has to be small in
- femtosecond to achieve accuracy.
- Current computing technology
- can make only picoseconds to
- microseconds of simulation,
- while protein folding may take
- seconds or even longer time.
- Molecular dynamics simulation
- has been used successfully for
- the study of other types of
- dynamical behavior of protein.
Folding can be simulated by following the
movement of the atoms in protein according to
Newtons second law of motion.
10 Sequence Structure Alignment
- Scoring functions may not be able to distinguish
between good and bad matches.
Known Sequences / Structures
Sequence Structure Alignment
- Computing the best alignment is NP-hard in
general when gaps are allowed.
Ranking Sequences / Structures
- The results are not accurate and have only
certain level of confidence.
Homology Modeling Sequence to Sequence Fold
Recognition Sequence to Structure Inverse
Protein Folding Structure to Sequence