Title: Software Can be Grouped into Two General Classes:
1Software for Protein Structures by NMR
- Software Can be Grouped into Two General Classes
- Protein Based Programs
- Calculate Protein Structures
- XPLOR (NIH, CNS,CXS), DYANA, CHARMM, Sybyl,
Amber, etc. - Visualize Protein Structures
- Quanta, Insight II, VMD-XPLOR, RasMol, Chimera,
MOLMOL, MolScript etc - Evaluate Protein Structures
- PROCHECK, MOLProbity, PROSA, WHATIF, Verify3D,
etc - NMR Based Programs
- NMR data processing
- NMRPipe, ACD/NMR, Felix
- NMR data analysis/visualization
- NMRDraw, NMRViewJ, PIPP, SPARKY, XEASY
- Iterative Relaxation Matrix Calculations
- IRMA, CORMA, MARDIGRAS, XPLOR, MORASS, etc
- Automated NMR Analysis
- AutoAssign, AutoStructure, ARIA, PINE, CANDID,
GARRANT, CS-ROSETTA, etc - Not A complete List of Software
- New software is constantly being developed
2Software for Protein Structures by NMR
- Protein NMR Based Software Programs
- There are multiple programs that have similar
functions. - Not practical or necessary to discuss all the
variety of programs that are available. - Applications will be discussed in general with
specific references to a limited number - of programs.
- Protein Based Programs Visualize Protein
Structures - How is the protein structure stored?
- No uniform format.
- Protein Data Bank (PDB) is the closest thing to
a uniformed format - Most programs can read and/or write PDB file
formats - Just about every program has its own proprietary
format - Babel program can interconvert 47 different
structure formats - Common Information in a protein structure
- atoms, residues, chains
- X, Y, Z coordinates
3Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Protein Data Bank (PDB) format
- Header
Unique PDB Identifier
Protein Name
Submission Date
HEADER DNA BINDING PROTEIN 08-SEP-01 1JXS
TITLE SOLUTION STRUCTURE OF THE DNA-BINDING
DOMAIN OF INTERLEUKIN TITLE 2 ENHANCER BINDING
FACTOR COMPND MOL_ID 1 COMPND 2 MOLECULE
INTERLEUKIN ENHANCER BINDING FACTOR COMPND 3
CHAIN A COMPND 4 FRAGMENT DNA-BINDING DOMAIN
COMPND 5 SYNONYM ILF-1 COMPND 6 ENGINEERED
YES SOURCE MOL_ID 1 SOURCE 2
ORGANISM_SCIENTIFIC HOMO SAPIENS SOURCE 3
ORGANISM_COMMON HUMAN SOURCE 4 GENE ILF-1
SOURCE 5 EXPRESSION_SYSTEM ESCHERICHIA COLI
SOURCE 6 EXPRESSION_SYSTEM_COMMON BACTERIA
SOURCE 7 EXPRESSION_SYSTEM_STRAIN BL21 SOURCE
8 EXPRESSION_SYSTEM_VECTOR_TYPE PLASMID SOURCE
9 EXPRESSION_SYSTEM_PLASMID PET21A KEYWDS
DNA-BINDING DOMAIN, WINGED HELIX EXPDTA NMR, 20
STRUCTURES AUTHOR W.J.CHUANG,P.P.LIU,C.LI,Y.H.HSI
EH,S.W.CHEN,S.H.CHEN,W.Y.JENG REVDAT 1 11-MAR-03
1JXS 0 JRNL AUTH P.P.LIU,Y.C.CHEN,C.LI,Y.H.HSIEH,
S.W.CHEN,S.H.CHEN, JRNL AUTH 2
W.Y.JENG,W.J.CHUANG JRNL TITL SOLUTION STRUCTURE
OF THE DNA-BINDING DOMAIN OF JRNL TITL 2
INTERLEUKIN ENHANCER BINDING FACTOR 1 (FOXK1A)
JRNL REF PROTEINS V. 49 543 2002 JRNL REF 2
STRUCT.,FUNCT.,GENET.
Descriptive Title of Structure
All Compounds Present
Source of Sample
Authors
Publication Information
4Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Protein Data Bank (PDB) format
- Header
REMARK 210 EXPERIMENTAL DETAILS REMARK 210
EXPERIMENT TYPE NMR REMARK 210 TEMPERATURE
(KELVIN) 300 300 300 300 REMARK 210 PH 6
6 6 6 REMARK 210 IONIC STRENGTH 125 125
125 125 REMARK 210 PRESSURE AMBIENT AMBIENT
AMBIENT REMARK 210 AMBIENT REMARK 210 SAMPLE
CONTENTS 3MM ILF, 25MM PHOSPHATE REMARK 210
BUFFER, 100MM NACL 3MM ILF, REMARK 210 25MM
PHOSPHATE BUFFER, 100MM REMARK 210 NACL 3MM ILF
U-15N, 25MM REMARK 210 PHOSPHATE BUFFER, 100MM
NACL REMARK 210 2MM ILF U-15N, 13C, 25MM
REMARK 210 PHOSPHATE BUFFER, 100MM NACL REMARK
210 REMARK 210 NMR EXPERIMENTS CONDUCTED
NOESY, DQF-COSY, TOCSY, 3D_ REMARK 210
15N-SEPARATED_NOESY, 3D_13C- REMARK 210
SEPARATED_NOESY REMARK 210 SPECTROMETER FIELD
STRENGTH 600 MHZ, 500 MHZ REMARK 210
SPECTROMETER MODEL AVANCE, DMX REMARK 210
SPECTROMETER MANUFACTURER BRUKER REMARK 210
REMARK 210 STRUCTURE DETERMINATION. REMARK 210
SOFTWARE USED AURELIA 2.7.10, XWINNMR 2.6
REMARK 210 METHOD USED HYBRID DISTANCE
GEOMETRY- REMARK 210 HBHA(CBCACO)NH
Description of Experimental Data
. . .
5Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Protein Data Bank (PDB) format
- Header
REMARK 900 RELATED ENTRIES REMARK 900 RELATED
ID 4829 RELATED DB BMRB REMARK 900 1H, 15N AND
13C RESONANCE ASSIGNMENTS FOR THE DNA-BINDING
REMARK 900 DOMAIN OF INTERLEUKIN ENHANCER BINDING
FACTOR DBREF 1JXS A 1 98 SWS Q01167 ILF1_HUMAN
251 348 SEQRES 1 A 98 ASP SER LYS PRO PRO TYR
SER TYR ALA GLN LEU ILE VAL SEQRES 2 A 98 GLN
ALA ILE THR MET ALA PRO ASP LYS GLN LEU THR LEU
SEQRES 3 A 98 ASN GLY ILE TYR THR HIS ILE THR
LYS ASN TYR PRO TYR SEQRES 4 A 98 TYR ARG THR
ALA ASP LYS GLY TRP GLN ASN SER ILE ARG SEQRES 5
A 98 HIS ASN LEU SER LEU ASN ARG TYR PHE ILE LYS
VAL PRO SEQRES 6 A 98 ARG SER GLN GLU GLU PRO
GLY LYS GLY SER PHE TRP ARG SEQRES 7 A 98 ILE
ASP PRO ALA SER GLU SER LYS LEU ILE GLU GLN ALA
SEQRES 8 A 98 PHE ARG LYS ARG ARG PRO ARG HELIX
1 1 ALA A 9 MET A 18 1 10 HELIX 2 2 THR A 25 TYR
A 37 1 13 HELIX 3 3 TRP A 47 ASN A 58 1 12
HELIX 4 4 SER A 83 ARG A 93 1 11 SHEET 1 A 3
GLN A 23 LEU A 24 0 SHEET 2 A 3 PHE A 76 ILE A
79 -1 O TRP A 77 N LEU A 24 SHEET 3 A 3 PHE A 61
VAL A 64 -1 N VAL A 64 O PHE A 76 CRYST1 1.000
1.000 1.000 90.00 90.00 90.00 P 1 1 ORIGX1
1.000000 0.000000 0.000000 0.00000 ORIGX2
0.000000 1.000000 0.000000 0.00000 ORIGX3
0.000000 0.000000 1.000000 0.00000 SCALE1
1.000000 0.000000 0.000000 0.00000
Reference to Data in other Databases
Protein Sequence
Observed Secondary Structure Elements
Meaningless symmetry data (consistency with X-ray
structures)
. . .
6Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Protein Data Bank (PDB) format
- Coordinates
Atom Type
Residue Type
Temperature Factor
Atom No.
Occupancy
Residue No.
Model Number (NMR structures typically Will have
multiple models in a single PDB file
Atom Identifier
. . .
X, Y, Z coordinates
Chain (structures composed of multiple proteins
will have a different chain for each protein)
Identifier (4 characters)
7Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Protein Data Bank (PDB) format
- Coordinates
- Other Features
End of Model
. . .
End of File
HETATM Identifier (non-protein atoms Small
molecules, ions, solvent, water etc)
Define Specific Atom Connectivity
N-Terminal NH (NH3 instead of NH)
C-Terminal O (sometimes OXT1 OXT2)
8Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Protein Data Bank (PDB) format
- Coordinates
- Are internally consistent i.e. the X,Y,Z
coordinates of atom A is the appropriate bond
distance away from the X,Y,Z coordinates of atom
B. - The coordinates on an absolute scale are
arbitrary i.e. there is no defined relationship
between the coordinates of protein A and protein
B, even if protein A and protein B are multiple
copies of the same protein. - Alignment Issue
- Proteins need to be aligned for any structural
comparison - After alignment, can visually compare relative
orientation/position of secondary structures,
active-sites, bound ligands, position of
side-chains, etc - After alignment, relative distance comparisons
have meaning i.e. if 2 helix do not overlap
perfectly a measured displacement of the helices
is relevant - Alignment requires both rotational and
translational transformation of one coordinate
axis relative to the other. - one protein is remained fixed and the other
protein(s) are aligned to it
Y
Protein A
Relative position of the 2 proteins in the X,Y,Z
coordinate system is arbitrary.
The 2 proteins are now centered in the same
coordinate frame.
Align
Protein B
X
Z
9Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Different Ways to Visualize the Same Protein
Structure - Lines/Sticks
- Connect each atom coordinate position by a
straight line - Bond colored by atom type where ½ of bond
corresponds to atom 1 and the other ½ to atom 2 - Accurate representation of atom position
- Poor representation of protein packing
- Crowded
- Reduce complexity by only displaying backbone or
specific regions - Reduce complexity by zooming in on particular
region
10Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Different Ways to Visualize the Same Protein
Structure - BallStick
- Connect each atom coordinate position by a
straight line - Display each atom as a sphere
- Accurate representation of atom position
- poor representation of protein packing
- Crowded
- Reduce complexity by only displaying backbone or
specific regions - Reduce complexity by zooming in on particular
region
11Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Different Ways to Visualize the Same Protein
Structure - Ribbons/Cartoon
- Connect each Ca atom coordinate position by a
graphical representation - Smooth-Fit of Ca positions
- Not accurate representation of atom coordinates
- Reduces Complexity of View ?No Side-chains,
usually only backbone - Highlights secondary structure
- b-strands typically shown as arrow pointing in
direction of C-terminus - a-helix shown as a thick helical coil
- random coil regions shown as tube
- Highlights Overall fold and topology
- Easy Comparison of Fold Families
12Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Different Ways to Visualize the Same Protein
Structure - Space Filling/van der Waals
- Each atom position represented by a sphere
- diameter of sphere is equal to van der Waals
radius - very accurate representation of protein
- Highlights surface structure
- identify binding pockets
- can not visualize interior of protein without
slicing through structure - Highlights packing
- verify absence of holes in structure
- verify tight packing of different domains, small
molecule in binding pocket, etc
Colored coded by domain
Space Filling emphasizes hole or channel in
protein
van der Waals radii (in Ã…)
13Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Different Ways to Visualize the Same Protein
Structure - GRASP
- Generates a smooth topology or shape of the
proteins surface - Highlights detailed surface structure
- identify binding pockets
- can not visualize interior of protein without
slicing through structure - Can Map properties of the protein onto the
surface - electrostatic
- NMR chemical shift changes
- NMR Dynamics X-ray B-factors
- Conserved Residues from Sequence Alignment
GRASP surface of acetyl choline esterase
complexed with acetyl choline colored by
potential (red negative, blue positive)
GRASP surface of MMP-1 displaying NMR chemical
shift changes upon binding an inhibitor
14Software for Protein Structures by NMR
- Protein Based Programs Visualize Protein
Structures - Demos using
- Rasmol
- VMD
- Chimera
15Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - Compare to known Structures
- All Structures have Problems or Errors as
determined by software analysis - The challenge is to determine which, if any,
errors are serious misinterpretation of the data
and require correcting. - Three general rules of thumb
- If the error is sever, far outside the norm, it
is probably a mistake. - If errors cluster together, there is almost
certainly a mistake. - If the structure has an odd conformation
- knot, large holes, p-helix, f for non-Gly, etc.
Remember The comparison is made against typical
structures, your error may simply represent a
novel fold or conformation that has not been
seen. Let the Data Determine the Structure
16Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - Compare a new protein structure against
standard parameters or values - standard values or trends are ascertained from
analysis of high quality, high resolution
structures in the PDB - typical features as we discussed in the
introduction to protein structures - PROCHECK
- A common program used by PDB to validate
deposited structures - Assesses the "stereochemical quality" of a given
protein structure - reads a PDB formatted file
- generates 10 output postscript files
- analyzes f, y, c1,c2 torsion angles, bond
lengths bond angles - analyzes bad contacts atoms too close by van
der waals radius - analyzes hydrogen bond energy
- analyzes G-factor
- Provides overall and per residue analyses
- Identifies distorted geometry
- To run the program
- procheck filename chain resolution
- where filename the coordinates file in
Brookhaven format chain an optional
one-letter chain-ID resolution a real number
giving the resolution of the structure
- Compares bond lengths and bond angles to
database of standard small molecule values
17Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - PROCHECK
- correct f, y distribution
- most residues should fall in the most favored
region of Ramachandran plot
Red contours indicate preferred region of the
Ramachandran plot
Colored contours indicate allowed regions of the
Ramachandran plot
18Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - PROCHECK
- correct f, y, c1,c2 distribution as a function
of residue type - most residues should fall in the preferred
region of the - Ramachandran plots
Dark contours are preferred regions
19Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - PROCHECK
- comparison of main chain parameters to standard
values of comparable X-ray structures - consistent or better results with a comparable
resolution structure implies a reliable structure
Value observed for structure at specified
resolution. Inside band indicates it is
consistent with other similar resolution
structures
Boxed Plot is Overall G-factor or Structure
Quality Score
Band indicates range of values observed as a
function X-ray resolution
20Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - PROCHECK
- comparison of side chain parameters to standard
values of comparable X-ray structures - consistent or better results with a comparable
resolution structure implies a reliable structure
Value observed for structure at specified
resolution. Inside band indicates it is
consistent with other similar resolution
structures
Band indicates range of values observed as a
function X-ray resolution
21Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - PROCHECK
- Complete list of structure violations
- Per residue plot of main chain and side-chain
parameters - Number of plots of statically summaries of
parameters
22Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - MOLPROBITY
- Provides a variety of protein structure checks
by comparison to standard values in PDB - Some overlap with Procheck
- Some unique checks including clashes and
structure visualization
100th percentile is the best among structures
of comparable resolution 0th percentile is the
worst.
23Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - MOLPROBITY
- Multi-criterion chart
- per residue analysis of all problems
24Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - MOLPROBITY
- Multi-criterion kinemage
- view all problems
Bad rotamer
Bad backbone conformation
Choose what to display
Bad clash
25Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - MOLPROBITY
- Single-criterion files
- view all problems
- Clash list
- Ramachandran plot kinemage
- Ramachandran plot PDF
- Cß deviation scatter plot
Clash List Atom Pair
Distance
26Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - MOLPROBITY
- Single-criterion files
27Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - MOLPROBITY
- Single-criterion files
- view all problems
- Cß deviation scatter plot
28Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - Verify3D
- Compares the primary sequence against the
proteins 3D structure - Compares each residues position to statistical
distribution of the 20 amino acids against
defined structural environments. - based on the total area buried and fraction of
side-chain area covered by polar atoms
Structure Environments
29Software for Protein Structures by NMR
Buried Hydrophobic Environment
Exposed Hydrophilic Environment
3D-1D Scoring Table
30Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - Verify3D
- Example scoring function on a per residue basis
Actual X-ray structure
Incorrect modeled structure
31Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - ProSA-Web
- Overall model quality (Z-score)
- compare to typical range for known NMR and X-ray
structures - calculate energy for all Ca-Ca or Cb-Cb
interactions - generate collection of decoy folds (50,0000) by
using database of sequence/structure fragments - thus, correct fold will have low energy low
Z-score relative to decoy structures - length dependent
Protein analyzed
32Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - ProSA-Web
- Model energy as a function of amino acid
seqeunce - positive values correspond to problematic
regions - single-value has large fluctuation and is of
little value - averaged over a window of 40 (dark) and 10
(light) residues
Visualize the per residue energy on the structure
(identify problematic regions)
Reliable Structure (no strain energy)
33Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - WHATIF/WHATCHECK
- Provides a variety of protein structure checks
by comparison to standard values in PDB - Some overlap with Procheck
- Some unique checks including packing parameters
- Unique to WHATIF/WHATCHECK
- Check for buried unsatisfied h-bond donors and
acceptors - Peptide bond flip check
- Check for amino-acid handedness
- HIS GLN ASN side chain conformation check
- Check for atom nomenclature
- Side chain planarity check
- Verification of Proline puckering
- New Directional atomic contact analysis
- Directional atomic contact analysis
- Particular to X-ray Structures
- Check for isolated water clusters
- Atomic occupancy check
- Symmetry check
- Chain Name Validation
- Similar to Procheck
- Verification of bond lengths
- Check for bumps (bad contacts)
- Amino-acid side chain rotamer analysis
- Torsion angle evaluation
34Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - WHATIF/WHATCHECH
Protein Packing Report
Warning Low packing Z-score for some
residues The residues listed in the table below
have an unusual packing environment according to
the 2nd generation quality check. The score
listed in the table is a packing normality
Z-score positive means better than average,
negative means worse than average. Only residues
scoring less than -2.50 are listed here. These
are the "unusual" residues in the structure, so
it will be interesting to take a special look at
them. 137 LYS ( 10 ) B -3.43 136
LYS ( 9 ) B -3.11 30 GLN ( 40 )
A -3.08 218 GLU ( 91 ) B -2.84
158 VAL ( 31 ) B -2.83 240 LYS (
113 ) B -2.59 231 GLU ( 104 ) B
-2.52 Warning Abnormal packing Z-score for
sequential residues A stretch of at least four
sequential residues with a 2nd generation packing
Z-score below -1.75 was found. This could
indicate that these residues are part of a
strange loop or that the residues in this range
are incomplete, but it might also be an
indication of mis-threading. The table below
lists the first and last residue in each stretch
found, as well as the average residue Z-score of
the series. 134 ASN ( 7 ) B ---
137 LYS ( 10 ) B -2.65 Warning
Structural average packing Z-score a bit
worrisome The structural 2nd generation average
quality control value is a bit low. The protein
is probably threaded correctly, but either poorly
refined, or it is just a protein with an unusual
(but correct) structure. The average quality of
properly refined X-ray structures is 0.0/-1.0.
All contacts Average -0.589 Z-score
-3.74 BB-BB contacts Average -0.178 Z-score
-1.27 BB-SC contacts Average -0.574
Z-score -3.07 SC-BB contacts Average
-0.240 Z-score -1.29 SC-SC contacts Average
-0.563 Z-score -2.79
35Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - WHATIF/WHATCHECH Packing Score
- For each "fixed fragment" in a protein structure
(any "largest group" of atoms that does not
contain a torsion angle) - the occurrence of all possible atom types in all
possible positions around the fixed fragment is
counted. - If a certain configuration occurs very
frequently, it is assumed to be a preferred
configuration. - All preference counts for all atoms around a
residue are used to calculate a summary score for
each residue. -
- Quality control score for each residue is a
Z-score - Describes how well this residue feels compared
to other similar residues in well refined
structures. - If the residue Z-score is negative, it feels
less at home than the "average" residue. - If the Z-score is positive, it feels more at
home than average. - The individual scores are not very powerful.
- A lot of structures have a few low-scoring
residues. - More useful are
- list of sequential residues that all have low
scores (possibly indicating a mis-threaded
segment), - overall quality control Z-score
-
- Impact on modeling by homology
- Severe.
- If a structure has a bad quality control
Z-score, it can not be trusted. -
36Software for Protein Structures by NMR
- Protein Based Programs Evaluate Protein
Structures - WHATIF/WHATCHECH
Buried hydrogen bond donors and acceptors are
not involved in a hydrogen bond
The pairs of atoms listed have an unusually short
distance.
9 GLY ( 19 ) A N 11 TYR ( 21 ) A N 15
ILE ( 25 ) A O 29 ASP ( 39 ) A O 30 GLN (
40 ) A O 31 HIS ( 41 ) A ND1 32 ILE ( 42
) A N 33 GLN ( 43 ) A N 39 GLU ( 49 ) A
O 48 SER ( 58 ) A O 60 ASP ( 70 ) A N
62 LEU ( 72 ) A N 74 LEU ( 84 ) A N 81
GLU ( 91 ) A O 84 TYR ( 94 ) A N 92 HIS
( 102 ) A NE2 101 LEU ( 111 ) A O
45 TYR ( 55) A CZ -- 74 LEU ( 84) A CD1
0.479 2.721 INTRA 78 ARG ( 88) A CD -- 86
THR ( 96) A CG2 0.391 2.809 INTRA 109 LEU (
119) A O -- 110 GLY ( 120) A C 0.375 2.425
INTRA 110 GLY ( 120) A N -- 111 PRO ( 121) A
CD 0.365 2.635 INTRA 131 PRO ( 4) B O --
133 GLY ( 6) B N 0.358 2.192 INTRA BF 39 GLU
( 49) A O -- 40 SER ( 50) A CB 0.349 2.451
INTRA 109 LEU ( 119) A C -- 111 PRO ( 121) A
CD 0.340 2.860 INTRA 163 ASP ( 36) B O --
165 SER ( 38) B N 0.328 2.372 INTRA 114 HIS (
124) A O -- 115 PHE ( 125) A C 0.328 2.472
INTRA 165 SER ( 38) B O -- 166 ASP ( 39) B C
0.303 2.497 INTRA 98 PHE ( 108) A CB -- 120
ILE ( 130) A CG1 0.297 2.903 INTRA 132 LEU (
5) B O -- 133 GLY ( 6) B C 0.296 2.504
INTRA BF 246 LEU ( 119) B O -- 247 GLY ( 120) B
C 0.295 2.505 INTRA 113 THR ( 123) A CB --
120 ILE ( 130) A CD1 0.286 2.914 INTRA 131 PRO (
4) B O -- 132 LEU ( 5) B C 0.282 2.518
INTRA BF 151 ARG ( 24) B NH1 -- 153 LEU ( 26) B
CD2 0.278 2.822 INTRA 81 GLU ( 91) A C --
83 GLY ( 93) A N 0.277 2.623 INTRA 96 HIS (
106) A CD2 -- 216 LEU ( 89) B CD2 0.255 2.945
INTRA
. . .
. . .
37Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - Comparison of XPLOR and DYANA
- XPLOR
- Also known as XPLOR-NIH, CNS and CNX
- Calculates structures using Cartesian
coordinates - Uses a modified PDB file format
- Optimizes
- Number of specific Target Functions to refine
protein structure - 1H -1H distance (NOEs)
- Chemical shifts (both 13C 1H)
- Coupling constants (3JNHCa)
- Ramachandran database
- Empirical Backbone-Backbone Hydrogen-Bonding
Potential - Radius of Gyration
- Residual Dipolar Coupling Constants
- DYANA/CYANA
- Dynamics geometry Algorithm for NMR Applications
- Calculates structures using Torsional Space
- Bond lengths and bond angles are kept fixed only
torsion angles
38Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - First Step is Determining a Molecular Structure
File for Your Specific Protein Sequence - Molecular Structure File (PSF)
- Contains all the information to describe the
connectivity of the protein - Contains atom/residue information (names, types,
charges masses, etc.) - Contains structure terms (bond, angle, dihedral,
improper, etc.) - Does not contain atomic coordinates!
- Information is obtained from two standard
databases - Topallhdg_new.pro
- - connectivity information for each amino acid
- - need to define topology for ALL non-amino
acids - Parallhdg_new.pro
- - defines expected values for bond lengths, bond
angles, etc - PSF patches
- define disulphide bonds
- define cis peptide bonds
- PSF file is required for ALL XPLOR calculations
- PSF file must match exactly all the information
in the structure or
39Software for Protein Structures by NMR
- An Example
- You want to compare your NMR structure with an
X-ray structure you obtained from the PDB - X-ray structure
- - does not contain hydrogens.
- - There is a loop that doesnt have coordinates
(no electron density) - - The structure contains a number of water
molecules and detergent molecules - - Identifiers are 1PDB, WAT, DET
- NMR structure
- - has a His-tag at the C-terminus (aid in
purification) - - has three additional residues at the
N-terminus (artifact of the cloning process) - - the residue numbering start at 1 instead of
185 in the X-ray structure - - Identifier is the atom type (C,H,N,O)
- Your PSF file is consistent with your NMR
structure, so XPLOR will give numerous errors
when you try to read both the NMR and X-ray
coordinate files. What are your options? - 1) Make the X-ray coordinate file exactly match
the NMR coordinate file - - add hydrogens
- - add dummy coordinates for the missing loop
region - - remove all the water molecules and detergent
molecules - - change identifier
- 2) Make the NMR coordinate file exactly match
the X-ray coordinate file and create a
40Software for Protein Structures by NMR
mass H 1.008mass C 12.011mass N
14.007mass O 15.999 residue ALA group
atom N typeNH1 charge-0.36 end atom HN
typeH charge 0.26 end group atom CA
typeCT charge 0.00 end atom HA typeHA
charge 0.10 end group atom CB typeCT
charge-0.30 end atom HB1 typeHA charge
0.10 end atom HB2 typeHA charge 0.10 end
atom HB3 typeHA charge 0.10 end group
atom C typeC charge 0.48 end atom O
typeO charge-0.48 end bond N HN
bond N CA bond CA HA bond CA CB
bond CB HB1 bond CB HB2 bond CB
HB3 bond CA C bond C O improper
HA N C CB !stereo CA improper HB1 HB2 CA
HB3 !stereo CBend
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
calculations - Topallhdg_new.pro
Partial list of atomic masses
Defines and groups all atoms, assigns a type and
charge
Defines pairs of bonded atoms
Defines a group of four atoms comprising an
improper torsion angle
41Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - Topallhdg_new.pro
Atoms defined by an improper angle to maintain
proper sterochemistry are boxed. Usually set to
either 0o or 180o
Atom Types all atoms that have the same
structural properties i.e. same bond lengths,
bond angles, dihedrals are classified to the same
atom type. Simplifies the assignment of
structural parameters while keeping unique atom
identifiers.
Improper Artificial dihedral definition used
primarily to maintain planer arrangement of atoms
or proper stereochemistry in the structure
(peptide bond, aromatic rings, etc). Does not
follow the linear connectivity of a proper
dihedral angles.
The bond lengths and bond angles for CA-HA,
CB-HB1, CB-HB2, and CB-HB3 are identical. So, all
defined as CT-HA
42Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - Parallhdg_new.pro
Force Constant
Ideal Value
bonds H NA kbon
0.98 bond CT CT kbon
1.53 angle HA CT C
kang 109.5 angle CA CA CT
kang 120.0 improper H X
X C kpla 0 0.0 improper
C X X C kpla 0
0.0 dihedral CA CA CT CT kdih
3 0.0 dihedral NA CC CT CT
kdih 3 0.0 NONbonded C
0.0903 3.2072 0.0903 3.2072
NONBonded CA 0.120 3.2072 0.120
3.2072 nbfix H O 44.2 1.0 44.2
1.0 nbfix H OC 44.2 1.0 44.2 1.0
. .
List all possible combinations of bonds, angles,
impropers and dihedral with ideal values, force
constants and multiplicity.
. .
. .
. .
Parameterization of van der Waals equation for
atom-atom contact.
. .
Parameterization of hydrogen-bond interactions.
multiplicity
43Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - Parallhdg_new.pro
- Defining atomic parameters is a very active area
of molecular modeling research - The values in the parameter database come from
multiple sources - X-ray database of high-resolution small
molecules - ab initio calculations
- experimental observations, IR, Raman, water-ion
neutron and X-ray diffraction data, free energy
of solvation data, etc
44Protein Structures from an NMR Perspective
Distribution of Bond Distances in Protein
Hydrogen Bonds
45Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - XPLOR PSF Script
remarks build psf file rtf _at_/PROGRAMS/xplor-nih-2.
9.1/toppar/topallhdg_new.pro END parameter
_at_/PROGRAMS/xplor-nih-2.9.1/toppar/parallhdg_new.pr
o END segment name" " SETUPTRUE
chain LINK PEPP HEAD - TAIL
PRO END LINK to PRO LINK PEPT
HEAD - TAIL END FIRSt
PROP TAIL PRO
END FIRSt NTER
TAIL END LAST CTER HEAD
- END
sequence MET THR LEU LYS HIS HIS HIS
end end end write psf outputPROTEIN.psf
end stop
Read parameter and topology files
Initiate a segment. Repeat for each individual
chain or component of the structure
Definitions in the topology file on how to make a
peptide bond and cap the N-terminus and
C-terminus
Complete protein sequence
. .
Write out the PSF file with name PROTEIN.psf
46Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - XPLOR PSF Script
- PATCHES
HIS HIS HIS end end end patch CISP
reference"-"(residue 109) reference""(res
idue 110) end patch DISU reference1(residue
29) reference2(residue 57) end patch
ltod referencenil(resid 8) end write psf
outputPROTEIN.psf end stop
Create a cis peptide bond between residues 109
(P) and 110
Create a disulphide bond between residues 29 and
57
Convert residue 8 to a D-amino acid
47Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - XPLOR PSF Script
- Using Structures and Multiple Segments
rtf _at_/PROGRAMS/xplor-nih-2.9.1/toppar/topallhdg_ne
w.pro _at_molecule.top END parameter
_at_/PROGRAMS/xplor-nih-2.9.1/toppar/parallhdg_new.pr
o _at_molecule.par END segment namePROT"
SETUPTRUE chain LINK PEPP HEAD -
TAIL PRO END LINK to PRO LINK
PEPT HEAD - TAIL
END coordinates _at_PROTEIN.pdb end
end end segment nameMOLE " SETUPTRUE
CHAIN sequence CPD end
end end write psf outputPROTEIN.psf end stop
Read in your parameter and topology files
defining molecule
Instead of listing sequence, read in PDB file
Define segment MOLE that contains a single copy
of molecule (note no LINK used)
48Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - Second Step is to create a linear extended
structure of the protein sequence using idealized
- geometry
- Extended structure coordinate File (EXT)
- Standard XPLOR PDB coordinate file
- Starting point to generate a proper fold for the
protein from experimental data
Typical extended structure created by XPLOR based
on a PSF file
49Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - Third Step is to convert NMR experimental data
into XPLOR format - Distance Constraints
- a file (noe.tbl) containing a list of all
observed/assigned NOE distant constraints
a b c
assign ( resid 3 and name HB ) ( resid 49 and
name HD ) 4.0 2.2 3.0
XPLOR assign statement
Residue number and atom name for each atom
involved in the distance constraint
Distance information
Understanding the distance information (a b
c) - a distance constraint is typically defined
with a range as opposed to an absolute
number. an upper and lower bound - in XPLOR
format upper bound a c in our
example upper bound 4.0Ã… 3.0Ã… 7.0Ã…
lower bound a - b in our example lower
bound 4.0Ã… 2.2Ã… 1.8Ã…
50Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - Distance Constraints
- Pseudo-Atoms/Wildcards
assign ( resid 3 and name HB ) ( resid 49 and
name HD ) 4.0 2.2 3.0
What atom is HB or HD? - Recall the PDB atom
nomenclature each atom gets a unique atom
identifier but each atom does not
have a unique NMR resonance a distance
constraint to Ala methyl needs to go to HB1,
HB2 and HB3. - XPLOR represents these equivalent
atoms with a single pseudo atom that is
positioned equidistant between them in the
assign statement the equivalent atoms are
represented with a wildcard ( or ) -
represents 1 character i.e. HB ? HB1
HB2 - represents 2 characters i.e. HD ?
distance constraint is to the pseudo-atom
Pseudo-atom (HB)
HD11,HD12,HD13 HD21,HD22,HD23 2 Leu d methyls
51Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - Distance Constraints
- Pseudo-Atoms/Wildcards
assign ( resid 14 and name HD ) ( resid 97 and
name HD ) 4.0 2.2 5.8
Why Not Just Use Multiple Assign Statements? -
For a distance constraint between two sets of Leu
d methyls there would be 36 possible
combinations! - Multiple constraints between the
same sets of atoms would bias or
overemphasize that distance constraints relative
to others Each constraint would contribute
independently to a violation energy that
XPLOR attempts to minimize. Each duplication
of a constraint that is violated would
increase the likelihood that that constraint
would be satisfied at the expense of other
constraints Tipping the balance of energy to
favor one constraint - All the hydrogens may not
be simultaneously satisfied for any given
conformation. XPLOR will try to satisfy all
the constraints leading to a distorted
structure.
Pseudo-atom (HB)
52Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - Distance Constraints
- Pseudo-Atoms/Wildcards
assign ( resid 14 and name HD ) ( resid 97 and
name HD ) 4.0 2.2 5.8
What Not Just Choose One Hydrogen to Represent
the Set? - Which one do you choose? - How do
you make the proper choice when there are
multiple distance constraints going to the
same set of hydrogens and when the
constraints are coming from very different
directions? Using Pseudo-Atoms is Not a Perfect
Solution. - distance constraint is going to
location that is spatially distinct from any
of the real atoms. - going to a center average
location - need to adjust the distance
constraints to account for the location of
the pseudo atom.
Pseudo-atom (HB)
53Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - Distance Constraints
- Pseudo-Atoms/Wildcards
assign ( resid 14 and name HD ) ( resid 97 and
name HD ) 4.0 2.2 3.0
Distance information
How are the Distance Assignments Made? - One
common approach uses a qualitative analysis of
the NMR data to cluster the assignments as
strong, medium, weak and very weak based on the
intensity of the NOE crosspeak. - The
following rules apply Strong 2.5 0.7 0.2 ?
for NH-NH constraints use 2.5 0.7
0.6 Medium 3.0 1.2 0.3 ? for NOEs with NH
use 3.0 1.2 0.5 Weak 4.0 2.2 1.0 Very
Weak 5.0 2.0 1.0 the lower limit is always set
to slightly less than twice the hydrogen van
der Waals radius (1.8Ã…) For hydrogen bond
constraints constraint between O N 2.8 0.4
0.5 constraint between O HN 1.8 0.3 0.5
54Software for Protein Structures by NMR
- Protein Based Programs Calculate Protein
Structures - General overview of XPLOR Protein Structure
Calculations - Distance Constraints
- Rules for pseudo-atom distance corrections
- 1) For non stereoassigned CbHs add 1.0 to upper
bound - if HB is used instead of HB1 or HB2
- 2) 1.0 is added to upper bound for other
methylenes - if HG, HD