Software Can be Grouped into Two General Classes:

About This Presentation

Title:

Software Can be Grouped into Two General Classes:

Description:

Visualize Protein Structures ... Protein Data Bank (PDB) is the closest thing to a uniformed format ... HEADER DNA BINDING PROTEIN 08-SEP-01 1JXS ... – PowerPoint PPT presentation

Number of Views:117

Avg rating:3.0/5.0

Slides: 71

Provided by: Robert80

Category:

more less

Transcript and Presenter's Notes

Title: Software Can be Grouped into Two General Classes:

1
Software for Protein Structures by NMR

Software Can be Grouped into Two General Classes
Protein Based Programs
Calculate Protein Structures
XPLOR (NIH, CNS,CXS), DYANA, CHARMM, Sybyl,
Amber, etc.
Visualize Protein Structures
Quanta, Insight II, VMD-XPLOR, RasMol, Chimera,
MOLMOL, MolScript etc
Evaluate Protein Structures
PROCHECK, MOLProbity, PROSA, WHATIF, Verify3D,
etc
NMR Based Programs
NMR data processing
NMRPipe, ACD/NMR, Felix
NMR data analysis/visualization
NMRDraw, NMRViewJ, PIPP, SPARKY, XEASY
Iterative Relaxation Matrix Calculations
IRMA, CORMA, MARDIGRAS, XPLOR, MORASS, etc
Automated NMR Analysis
AutoAssign, AutoStructure, ARIA, PINE, CANDID,
GARRANT, CS-ROSETTA, etc
Not A complete List of Software
New software is constantly being developed

2
Software for Protein Structures by NMR

Protein NMR Based Software Programs
There are multiple programs that have similar
functions.
Not practical or necessary to discuss all the
variety of programs that are available.
Applications will be discussed in general with
specific references to a limited number
of programs.
Protein Based Programs Visualize Protein
Structures
How is the protein structure stored?
No uniform format.
Protein Data Bank (PDB) is the closest thing to
a uniformed format
Most programs can read and/or write PDB file
formats
Just about every program has its own proprietary
format
Babel program can interconvert 47 different
structure formats
Common Information in a protein structure
atoms, residues, chains
X, Y, Z coordinates

3
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Protein Data Bank (PDB) format
Header

Unique PDB Identifier
Protein Name
Submission Date
HEADER DNA BINDING PROTEIN 08-SEP-01 1JXS
TITLE SOLUTION STRUCTURE OF THE DNA-BINDING
DOMAIN OF INTERLEUKIN TITLE 2 ENHANCER BINDING
FACTOR COMPND MOL_ID 1 COMPND 2 MOLECULE
INTERLEUKIN ENHANCER BINDING FACTOR COMPND 3
CHAIN A COMPND 4 FRAGMENT DNA-BINDING DOMAIN
COMPND 5 SYNONYM ILF-1 COMPND 6 ENGINEERED
YES SOURCE MOL_ID 1 SOURCE 2
ORGANISM_SCIENTIFIC HOMO SAPIENS SOURCE 3
ORGANISM_COMMON HUMAN SOURCE 4 GENE ILF-1
SOURCE 5 EXPRESSION_SYSTEM ESCHERICHIA COLI
SOURCE 6 EXPRESSION_SYSTEM_COMMON BACTERIA
SOURCE 7 EXPRESSION_SYSTEM_STRAIN BL21 SOURCE
8 EXPRESSION_SYSTEM_VECTOR_TYPE PLASMID SOURCE
9 EXPRESSION_SYSTEM_PLASMID PET21A KEYWDS
DNA-BINDING DOMAIN, WINGED HELIX EXPDTA NMR, 20
STRUCTURES AUTHOR W.J.CHUANG,P.P.LIU,C.LI,Y.H.HSI
EH,S.W.CHEN,S.H.CHEN,W.Y.JENG REVDAT 1 11-MAR-03
1JXS 0 JRNL AUTH P.P.LIU,Y.C.CHEN,C.LI,Y.H.HSIEH,
S.W.CHEN,S.H.CHEN, JRNL AUTH 2
W.Y.JENG,W.J.CHUANG JRNL TITL SOLUTION STRUCTURE
OF THE DNA-BINDING DOMAIN OF JRNL TITL 2
INTERLEUKIN ENHANCER BINDING FACTOR 1 (FOXK1A)
JRNL REF PROTEINS V. 49 543 2002 JRNL REF 2
STRUCT.,FUNCT.,GENET.
Descriptive Title of Structure
All Compounds Present
Source of Sample
Authors
Publication Information
4
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Protein Data Bank (PDB) format
Header

REMARK 210 EXPERIMENTAL DETAILS REMARK 210
EXPERIMENT TYPE NMR REMARK 210 TEMPERATURE
(KELVIN) 300 300 300 300 REMARK 210 PH 6
6 6 6 REMARK 210 IONIC STRENGTH 125 125
125 125 REMARK 210 PRESSURE AMBIENT AMBIENT
AMBIENT REMARK 210 AMBIENT REMARK 210 SAMPLE
CONTENTS 3MM ILF, 25MM PHOSPHATE REMARK 210
BUFFER, 100MM NACL 3MM ILF, REMARK 210 25MM
PHOSPHATE BUFFER, 100MM REMARK 210 NACL 3MM ILF
U-15N, 25MM REMARK 210 PHOSPHATE BUFFER, 100MM
NACL REMARK 210 2MM ILF U-15N, 13C, 25MM
REMARK 210 PHOSPHATE BUFFER, 100MM NACL REMARK
210 REMARK 210 NMR EXPERIMENTS CONDUCTED
NOESY, DQF-COSY, TOCSY, 3D_ REMARK 210
15N-SEPARATED_NOESY, 3D_13C- REMARK 210
SEPARATED_NOESY REMARK 210 SPECTROMETER FIELD
STRENGTH 600 MHZ, 500 MHZ REMARK 210
SPECTROMETER MODEL AVANCE, DMX REMARK 210
SPECTROMETER MANUFACTURER BRUKER REMARK 210
REMARK 210 STRUCTURE DETERMINATION. REMARK 210
SOFTWARE USED AURELIA 2.7.10, XWINNMR 2.6
REMARK 210 METHOD USED HYBRID DISTANCE
GEOMETRY- REMARK 210 HBHA(CBCACO)NH
Description of Experimental Data
. . .
5
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Protein Data Bank (PDB) format
Header

REMARK 900 RELATED ENTRIES REMARK 900 RELATED
ID 4829 RELATED DB BMRB REMARK 900 1H, 15N AND
13C RESONANCE ASSIGNMENTS FOR THE DNA-BINDING
REMARK 900 DOMAIN OF INTERLEUKIN ENHANCER BINDING
FACTOR DBREF 1JXS A 1 98 SWS Q01167 ILF1_HUMAN
251 348 SEQRES 1 A 98 ASP SER LYS PRO PRO TYR
SER TYR ALA GLN LEU ILE VAL SEQRES 2 A 98 GLN
ALA ILE THR MET ALA PRO ASP LYS GLN LEU THR LEU
SEQRES 3 A 98 ASN GLY ILE TYR THR HIS ILE THR
LYS ASN TYR PRO TYR SEQRES 4 A 98 TYR ARG THR
ALA ASP LYS GLY TRP GLN ASN SER ILE ARG SEQRES 5
A 98 HIS ASN LEU SER LEU ASN ARG TYR PHE ILE LYS
VAL PRO SEQRES 6 A 98 ARG SER GLN GLU GLU PRO
GLY LYS GLY SER PHE TRP ARG SEQRES 7 A 98 ILE
ASP PRO ALA SER GLU SER LYS LEU ILE GLU GLN ALA
SEQRES 8 A 98 PHE ARG LYS ARG ARG PRO ARG HELIX
1 1 ALA A 9 MET A 18 1 10 HELIX 2 2 THR A 25 TYR
A 37 1 13 HELIX 3 3 TRP A 47 ASN A 58 1 12
HELIX 4 4 SER A 83 ARG A 93 1 11 SHEET 1 A 3
GLN A 23 LEU A 24 0 SHEET 2 A 3 PHE A 76 ILE A
79 -1 O TRP A 77 N LEU A 24 SHEET 3 A 3 PHE A 61
VAL A 64 -1 N VAL A 64 O PHE A 76 CRYST1 1.000
1.000 1.000 90.00 90.00 90.00 P 1 1 ORIGX1
1.000000 0.000000 0.000000 0.00000 ORIGX2
0.000000 1.000000 0.000000 0.00000 ORIGX3
0.000000 0.000000 1.000000 0.00000 SCALE1
1.000000 0.000000 0.000000 0.00000
Reference to Data in other Databases
Protein Sequence
Observed Secondary Structure Elements
Meaningless symmetry data (consistency with X-ray
structures)
. . .
6
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Protein Data Bank (PDB) format
Coordinates

Atom Type
Residue Type
Temperature Factor
Atom No.
Occupancy
Residue No.
Model Number (NMR structures typically Will have
multiple models in a single PDB file
Atom Identifier
. . .
X, Y, Z coordinates
Chain (structures composed of multiple proteins
will have a different chain for each protein)
Identifier (4 characters)
7
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Protein Data Bank (PDB) format
Coordinates
Other Features

End of Model
. . .
End of File
HETATM Identifier (non-protein atoms Small
molecules, ions, solvent, water etc)
Define Specific Atom Connectivity
N-Terminal NH (NH3 instead of NH)
C-Terminal O (sometimes OXT1 OXT2)
8
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Protein Data Bank (PDB) format
Coordinates
Are internally consistent i.e. the X,Y,Z
coordinates of atom A is the appropriate bond
distance away from the X,Y,Z coordinates of atom
B.
The coordinates on an absolute scale are
arbitrary i.e. there is no defined relationship
between the coordinates of protein A and protein
B, even if protein A and protein B are multiple
copies of the same protein.
Alignment Issue
Proteins need to be aligned for any structural
comparison
After alignment, can visually compare relative
orientation/position of secondary structures,
active-sites, bound ligands, position of
side-chains, etc
After alignment, relative distance comparisons
have meaning i.e. if 2 helix do not overlap
perfectly a measured displacement of the helices
is relevant
Alignment requires both rotational and
translational transformation of one coordinate
axis relative to the other.
one protein is remained fixed and the other
protein(s) are aligned to it

Y
Protein A
Relative position of the 2 proteins in the X,Y,Z
coordinate system is arbitrary.
The 2 proteins are now centered in the same
coordinate frame.
Align
Protein B
X
Z
9
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Different Ways to Visualize the Same Protein
Structure
Lines/Sticks
Connect each atom coordinate position by a
straight line
Bond colored by atom type where ½ of bond
corresponds to atom 1 and the other ½ to atom 2
Accurate representation of atom position
Poor representation of protein packing
Crowded
Reduce complexity by only displaying backbone or
specific regions
Reduce complexity by zooming in on particular
region

10
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Different Ways to Visualize the Same Protein
Structure
BallStick
Connect each atom coordinate position by a
straight line
Display each atom as a sphere
Accurate representation of atom position
poor representation of protein packing
Crowded
Reduce complexity by only displaying backbone or
specific regions
Reduce complexity by zooming in on particular
region

11
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Different Ways to Visualize the Same Protein
Structure
Ribbons/Cartoon
Connect each Ca atom coordinate position by a
graphical representation
Smooth-Fit of Ca positions
Not accurate representation of atom coordinates
Reduces Complexity of View ?No Side-chains,
usually only backbone
Highlights secondary structure
b-strands typically shown as arrow pointing in
direction of C-terminus
a-helix shown as a thick helical coil
random coil regions shown as tube
Highlights Overall fold and topology
Easy Comparison of Fold Families

12
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Different Ways to Visualize the Same Protein
Structure
Space Filling/van der Waals
Each atom position represented by a sphere
diameter of sphere is equal to van der Waals
radius
very accurate representation of protein
Highlights surface structure
identify binding pockets
can not visualize interior of protein without
slicing through structure
Highlights packing
verify absence of holes in structure
verify tight packing of different domains, small
molecule in binding pocket, etc

Colored coded by domain
Space Filling emphasizes hole or channel in
protein
van der Waals radii (in Å)
13
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Different Ways to Visualize the Same Protein
Structure
GRASP
Generates a smooth topology or shape of the
proteins surface
Highlights detailed surface structure
identify binding pockets
can not visualize interior of protein without
slicing through structure
Can Map properties of the protein onto the
surface
electrostatic
NMR chemical shift changes
NMR Dynamics X-ray B-factors
Conserved Residues from Sequence Alignment

GRASP surface of acetyl choline esterase
complexed with acetyl choline colored by
potential (red negative, blue positive)
GRASP surface of MMP-1 displaying NMR chemical
shift changes upon binding an inhibitor
14
Software for Protein Structures by NMR

Protein Based Programs Visualize Protein
Structures
Demos using
Rasmol
VMD
Chimera

15
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
Compare to known Structures
All Structures have Problems or Errors as
determined by software analysis
The challenge is to determine which, if any,
errors are serious misinterpretation of the data
and require correcting.
Three general rules of thumb
If the error is sever, far outside the norm, it
is probably a mistake.
If errors cluster together, there is almost
certainly a mistake.
If the structure has an odd conformation
knot, large holes, p-helix, f for non-Gly, etc.

Remember The comparison is made against typical
structures, your error may simply represent a
novel fold or conformation that has not been
seen. Let the Data Determine the Structure
16
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
Compare a new protein structure against
standard parameters or values
standard values or trends are ascertained from
analysis of high quality, high resolution
structures in the PDB
typical features as we discussed in the
introduction to protein structures
PROCHECK
A common program used by PDB to validate
deposited structures
Assesses the "stereochemical quality" of a given
protein structure
reads a PDB formatted file
generates 10 output postscript files
analyzes f, y, c1,c2 torsion angles, bond
lengths bond angles
analyzes bad contacts atoms too close by van
der waals radius
analyzes hydrogen bond energy
analyzes G-factor
Provides overall and per residue analyses
Identifies distorted geometry
To run the program
procheck filename chain resolution
where filename the coordinates file in
Brookhaven format chain an optional
one-letter chain-ID resolution a real number
giving the resolution of the structure

Compares bond lengths and bond angles to
database of standard small molecule values

17
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
PROCHECK
correct f, y distribution
most residues should fall in the most favored
region of Ramachandran plot

Red contours indicate preferred region of the
Ramachandran plot
Colored contours indicate allowed regions of the
Ramachandran plot
18
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
PROCHECK
correct f, y, c1,c2 distribution as a function
of residue type
most residues should fall in the preferred
region of the
Ramachandran plots

Dark contours are preferred regions
19
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
PROCHECK
comparison of main chain parameters to standard
values of comparable X-ray structures
consistent or better results with a comparable
resolution structure implies a reliable structure

Value observed for structure at specified
resolution. Inside band indicates it is
consistent with other similar resolution
structures
Boxed Plot is Overall G-factor or Structure
Quality Score
Band indicates range of values observed as a
function X-ray resolution
20
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
PROCHECK
comparison of side chain parameters to standard
values of comparable X-ray structures
consistent or better results with a comparable
resolution structure implies a reliable structure

Value observed for structure at specified
resolution. Inside band indicates it is
consistent with other similar resolution
structures
Band indicates range of values observed as a
function X-ray resolution
21
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
PROCHECK
Complete list of structure violations
Per residue plot of main chain and side-chain
parameters
Number of plots of statically summaries of
parameters

22
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
MOLPROBITY
Provides a variety of protein structure checks
by comparison to standard values in PDB
Some overlap with Procheck
Some unique checks including clashes and
structure visualization

100th percentile is the best among structures
of comparable resolution 0th percentile is the
worst.
23
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
MOLPROBITY
Multi-criterion chart
per residue analysis of all problems

24
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
MOLPROBITY
Multi-criterion kinemage
view all problems

Bad rotamer
Bad backbone conformation
Choose what to display
Bad clash
25
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
MOLPROBITY
Single-criterion files
view all problems
Clash list
Ramachandran plot kinemage
Ramachandran plot PDF
Cß deviation scatter plot

Clash List Atom Pair
Distance
26
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
MOLPROBITY
Single-criterion files

27
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
MOLPROBITY
Single-criterion files
view all problems
Cß deviation scatter plot

28
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
Verify3D
Compares the primary sequence against the
proteins 3D structure
Compares each residues position to statistical
distribution of the 20 amino acids against
defined structural environments.
based on the total area buried and fraction of
side-chain area covered by polar atoms

Structure Environments
29
Software for Protein Structures by NMR
Buried Hydrophobic Environment
Exposed Hydrophilic Environment
3D-1D Scoring Table
30
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
Verify3D
Example scoring function on a per residue basis

Actual X-ray structure
Incorrect modeled structure
31
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
ProSA-Web
Overall model quality (Z-score)
compare to typical range for known NMR and X-ray
structures
calculate energy for all Ca-Ca or Cb-Cb
interactions
generate collection of decoy folds (50,0000) by
using database of sequence/structure fragments
thus, correct fold will have low energy low
Z-score relative to decoy structures
length dependent

Protein analyzed
32
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
ProSA-Web
Model energy as a function of amino acid
seqeunce
positive values correspond to problematic
regions
single-value has large fluctuation and is of
little value
averaged over a window of 40 (dark) and 10
(light) residues

Visualize the per residue energy on the structure
(identify problematic regions)
Reliable Structure (no strain energy)
33
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
WHATIF/WHATCHECK
Provides a variety of protein structure checks
by comparison to standard values in PDB
Some overlap with Procheck
Some unique checks including packing parameters

Unique to WHATIF/WHATCHECK
Check for buried unsatisfied h-bond donors and
acceptors
Peptide bond flip check
Check for amino-acid handedness
HIS GLN ASN side chain conformation check
Check for atom nomenclature
Side chain planarity check
Verification of Proline puckering
New Directional atomic contact analysis
Directional atomic contact analysis
Particular to X-ray Structures
Check for isolated water clusters
Atomic occupancy check
Symmetry check
Chain Name Validation

Similar to Procheck
Verification of bond lengths
Check for bumps (bad contacts)
Amino-acid side chain rotamer analysis
Torsion angle evaluation

34
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
WHATIF/WHATCHECH

Protein Packing Report
Warning Low packing Z-score for some
residues The residues listed in the table below
have an unusual packing environment according to
the 2nd generation quality check. The score
listed in the table is a packing normality
Z-score positive means better than average,
negative means worse than average. Only residues
scoring less than -2.50 are listed here. These
are the "unusual" residues in the structure, so
it will be interesting to take a special look at
them. 137 LYS ( 10 ) B -3.43 136
LYS ( 9 ) B -3.11 30 GLN ( 40 )
A -3.08 218 GLU ( 91 ) B -2.84
158 VAL ( 31 ) B -2.83 240 LYS (
113 ) B -2.59 231 GLU ( 104 ) B
-2.52 Warning Abnormal packing Z-score for
sequential residues A stretch of at least four
sequential residues with a 2nd generation packing
Z-score below -1.75 was found. This could
indicate that these residues are part of a
strange loop or that the residues in this range
are incomplete, but it might also be an
indication of mis-threading. The table below
lists the first and last residue in each stretch
found, as well as the average residue Z-score of
the series. 134 ASN ( 7 ) B ---
137 LYS ( 10 ) B -2.65 Warning
Structural average packing Z-score a bit
worrisome The structural 2nd generation average
quality control value is a bit low. The protein
is probably threaded correctly, but either poorly
refined, or it is just a protein with an unusual
(but correct) structure. The average quality of
properly refined X-ray structures is 0.0/-1.0.
All contacts Average -0.589 Z-score
-3.74 BB-BB contacts Average -0.178 Z-score
-1.27 BB-SC contacts Average -0.574
Z-score -3.07 SC-BB contacts Average
-0.240 Z-score -1.29 SC-SC contacts Average
-0.563 Z-score -2.79
35
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
WHATIF/WHATCHECH Packing Score

For each "fixed fragment" in a protein structure
(any "largest group" of atoms that does not
contain a torsion angle)
the occurrence of all possible atom types in all
possible positions around the fixed fragment is
counted.
If a certain configuration occurs very
frequently, it is assumed to be a preferred
configuration.
All preference counts for all atoms around a
residue are used to calculate a summary score for
each residue.
Quality control score for each residue is a
Z-score
Describes how well this residue feels compared
to other similar residues in well refined
structures.
If the residue Z-score is negative, it feels
less at home than the "average" residue.
If the Z-score is positive, it feels more at
home than average.
The individual scores are not very powerful.
A lot of structures have a few low-scoring
residues.
More useful are
list of sequential residues that all have low
scores (possibly indicating a mis-threaded
segment),
overall quality control Z-score
Impact on modeling by homology
Severe.
If a structure has a bad quality control
Z-score, it can not be trusted.

36
Software for Protein Structures by NMR

Protein Based Programs Evaluate Protein
Structures
WHATIF/WHATCHECH

Buried hydrogen bond donors and acceptors are
not involved in a hydrogen bond
The pairs of atoms listed have an unusually short
distance.
9 GLY ( 19 ) A N 11 TYR ( 21 ) A N 15
ILE ( 25 ) A O 29 ASP ( 39 ) A O 30 GLN (
40 ) A O 31 HIS ( 41 ) A ND1 32 ILE ( 42
) A N 33 GLN ( 43 ) A N 39 GLU ( 49 ) A
O 48 SER ( 58 ) A O 60 ASP ( 70 ) A N
62 LEU ( 72 ) A N 74 LEU ( 84 ) A N 81
GLU ( 91 ) A O 84 TYR ( 94 ) A N 92 HIS
( 102 ) A NE2 101 LEU ( 111 ) A O
45 TYR ( 55) A CZ -- 74 LEU ( 84) A CD1
0.479 2.721 INTRA 78 ARG ( 88) A CD -- 86
THR ( 96) A CG2 0.391 2.809 INTRA 109 LEU (
119) A O -- 110 GLY ( 120) A C 0.375 2.425
INTRA 110 GLY ( 120) A N -- 111 PRO ( 121) A
CD 0.365 2.635 INTRA 131 PRO ( 4) B O --
133 GLY ( 6) B N 0.358 2.192 INTRA BF 39 GLU
( 49) A O -- 40 SER ( 50) A CB 0.349 2.451
INTRA 109 LEU ( 119) A C -- 111 PRO ( 121) A
CD 0.340 2.860 INTRA 163 ASP ( 36) B O --
165 SER ( 38) B N 0.328 2.372 INTRA 114 HIS (
124) A O -- 115 PHE ( 125) A C 0.328 2.472
INTRA 165 SER ( 38) B O -- 166 ASP ( 39) B C
0.303 2.497 INTRA 98 PHE ( 108) A CB -- 120
ILE ( 130) A CG1 0.297 2.903 INTRA 132 LEU (
5) B O -- 133 GLY ( 6) B C 0.296 2.504
INTRA BF 246 LEU ( 119) B O -- 247 GLY ( 120) B
C 0.295 2.505 INTRA 113 THR ( 123) A CB --
120 ILE ( 130) A CD1 0.286 2.914 INTRA 131 PRO (
4) B O -- 132 LEU ( 5) B C 0.282 2.518
INTRA BF 151 ARG ( 24) B NH1 -- 153 LEU ( 26) B
CD2 0.278 2.822 INTRA 81 GLU ( 91) A C --
83 GLY ( 93) A N 0.277 2.623 INTRA 96 HIS (
106) A CD2 -- 216 LEU ( 89) B CD2 0.255 2.945
INTRA
. . .
. . .
37
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
Comparison of XPLOR and DYANA
XPLOR
Also known as XPLOR-NIH, CNS and CNX
Calculates structures using Cartesian
coordinates
Uses a modified PDB file format
Optimizes
Number of specific Target Functions to refine
protein structure
1H -1H distance (NOEs)
Chemical shifts (both 13C 1H)
Coupling constants (3JNHCa)
Ramachandran database
Empirical Backbone-Backbone Hydrogen-Bonding
Potential
Radius of Gyration
Residual Dipolar Coupling Constants
DYANA/CYANA
Dynamics geometry Algorithm for NMR Applications
Calculates structures using Torsional Space
Bond lengths and bond angles are kept fixed only
torsion angles

38
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
First Step is Determining a Molecular Structure
File for Your Specific Protein Sequence
Molecular Structure File (PSF)
Contains all the information to describe the
connectivity of the protein
Contains atom/residue information (names, types,
charges masses, etc.)
Contains structure terms (bond, angle, dihedral,
improper, etc.)
Does not contain atomic coordinates!
Information is obtained from two standard
databases
Topallhdg_new.pro
- connectivity information for each amino acid
- need to define topology for ALL non-amino
acids
Parallhdg_new.pro
- defines expected values for bond lengths, bond
angles, etc
PSF patches
define disulphide bonds
define cis peptide bonds
PSF file is required for ALL XPLOR calculations
PSF file must match exactly all the information
in the structure or

39
Software for Protein Structures by NMR

An Example
You want to compare your NMR structure with an
X-ray structure you obtained from the PDB
X-ray structure
- does not contain hydrogens.
- There is a loop that doesnt have coordinates
(no electron density)
- The structure contains a number of water
molecules and detergent molecules
- Identifiers are 1PDB, WAT, DET
NMR structure
- has a His-tag at the C-terminus (aid in
purification)
- has three additional residues at the
N-terminus (artifact of the cloning process)
- the residue numbering start at 1 instead of
185 in the X-ray structure
- Identifier is the atom type (C,H,N,O)
Your PSF file is consistent with your NMR
structure, so XPLOR will give numerous errors
when you try to read both the NMR and X-ray
coordinate files. What are your options?
1) Make the X-ray coordinate file exactly match
the NMR coordinate file
- add hydrogens
- add dummy coordinates for the missing loop
region
- remove all the water molecules and detergent
molecules
- change identifier
2) Make the NMR coordinate file exactly match
the X-ray coordinate file and create a

40
Software for Protein Structures by NMR
mass H 1.008mass C 12.011mass N
14.007mass O 15.999 residue ALA group
atom N typeNH1 charge-0.36 end atom HN
typeH charge 0.26 end group atom CA
typeCT charge 0.00 end atom HA typeHA
charge 0.10 end group atom CB typeCT
charge-0.30 end atom HB1 typeHA charge
0.10 end atom HB2 typeHA charge 0.10 end
atom HB3 typeHA charge 0.10 end group
atom C typeC charge 0.48 end atom O
typeO charge-0.48 end bond N HN
bond N CA bond CA HA bond CA CB
bond CB HB1 bond CB HB2 bond CB
HB3 bond CA C bond C O improper
HA N C CB !stereo CA improper HB1 HB2 CA
HB3 !stereo CBend

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
calculations
Topallhdg_new.pro

Partial list of atomic masses
Defines and groups all atoms, assigns a type and
charge
Defines pairs of bonded atoms
Defines a group of four atoms comprising an
improper torsion angle
41
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
Topallhdg_new.pro

Atoms defined by an improper angle to maintain
proper sterochemistry are boxed. Usually set to
either 0o or 180o
Atom Types all atoms that have the same
structural properties i.e. same bond lengths,
bond angles, dihedrals are classified to the same
atom type. Simplifies the assignment of
structural parameters while keeping unique atom
identifiers.
Improper Artificial dihedral definition used
primarily to maintain planer arrangement of atoms
or proper stereochemistry in the structure
(peptide bond, aromatic rings, etc). Does not
follow the linear connectivity of a proper
dihedral angles.
The bond lengths and bond angles for CA-HA,
CB-HB1, CB-HB2, and CB-HB3 are identical. So, all
defined as CT-HA
42
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
Parallhdg_new.pro

Force Constant
Ideal Value
bonds H NA kbon
0.98 bond CT CT kbon
1.53 angle HA CT C
kang 109.5 angle CA CA CT
kang 120.0 improper H X
X C kpla 0 0.0 improper
C X X C kpla 0
0.0 dihedral CA CA CT CT kdih
3 0.0 dihedral NA CC CT CT
kdih 3 0.0 NONbonded C
0.0903 3.2072 0.0903 3.2072
NONBonded CA 0.120 3.2072 0.120
3.2072 nbfix H O 44.2 1.0 44.2
1.0 nbfix H OC 44.2 1.0 44.2 1.0
. .
List all possible combinations of bonds, angles,
impropers and dihedral with ideal values, force
constants and multiplicity.
. .
. .
. .
Parameterization of van der Waals equation for
atom-atom contact.
. .
Parameterization of hydrogen-bond interactions.
multiplicity
43
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
Parallhdg_new.pro
Defining atomic parameters is a very active area
of molecular modeling research
The values in the parameter database come from
multiple sources
X-ray database of high-resolution small
molecules
ab initio calculations
experimental observations, IR, Raman, water-ion
neutron and X-ray diffraction data, free energy
of solvation data, etc

44
Protein Structures from an NMR Perspective
Distribution of Bond Distances in Protein
Hydrogen Bonds
45
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
XPLOR PSF Script

remarks build psf file rtf _at_/PROGRAMS/xplor-nih-2.
9.1/toppar/topallhdg_new.pro END parameter
_at_/PROGRAMS/xplor-nih-2.9.1/toppar/parallhdg_new.pr
o END segment name" " SETUPTRUE
chain LINK PEPP HEAD - TAIL
PRO END LINK to PRO LINK PEPT
HEAD - TAIL END FIRSt
PROP TAIL PRO
END FIRSt NTER
TAIL END LAST CTER HEAD
- END
sequence MET THR LEU LYS HIS HIS HIS
end end end write psf outputPROTEIN.psf
end stop
Read parameter and topology files
Initiate a segment. Repeat for each individual
chain or component of the structure
Definitions in the topology file on how to make a
peptide bond and cap the N-terminus and
C-terminus
Complete protein sequence
. .
Write out the PSF file with name PROTEIN.psf
46
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
XPLOR PSF Script
PATCHES

HIS HIS HIS end end end patch CISP
reference"-"(residue 109) reference""(res
idue 110) end patch DISU reference1(residue
29) reference2(residue 57) end patch
ltod referencenil(resid 8) end write psf
outputPROTEIN.psf end stop
Create a cis peptide bond between residues 109
(P) and 110
Create a disulphide bond between residues 29 and
57
Convert residue 8 to a D-amino acid
47
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
XPLOR PSF Script
Using Structures and Multiple Segments

rtf _at_/PROGRAMS/xplor-nih-2.9.1/toppar/topallhdg_ne
w.pro _at_molecule.top END parameter
_at_/PROGRAMS/xplor-nih-2.9.1/toppar/parallhdg_new.pr
o _at_molecule.par END segment namePROT"
SETUPTRUE chain LINK PEPP HEAD -
TAIL PRO END LINK to PRO LINK
PEPT HEAD - TAIL
END coordinates _at_PROTEIN.pdb end
end end segment nameMOLE " SETUPTRUE
CHAIN sequence CPD end
end end write psf outputPROTEIN.psf end stop
Read in your parameter and topology files
defining molecule
Instead of listing sequence, read in PDB file
Define segment MOLE that contains a single copy
of molecule (note no LINK used)
48
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
Second Step is to create a linear extended
structure of the protein sequence using idealized
geometry
Extended structure coordinate File (EXT)
Standard XPLOR PDB coordinate file
Starting point to generate a proper fold for the
protein from experimental data

Typical extended structure created by XPLOR based
on a PSF file
49
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
Third Step is to convert NMR experimental data
into XPLOR format
Distance Constraints
a file (noe.tbl) containing a list of all
observed/assigned NOE distant constraints

a b c
assign ( resid 3 and name HB ) ( resid 49 and
name HD ) 4.0 2.2 3.0
XPLOR assign statement
Residue number and atom name for each atom
involved in the distance constraint
Distance information
Understanding the distance information (a b
c) - a distance constraint is typically defined
with a range as opposed to an absolute
number. an upper and lower bound - in XPLOR
format upper bound a c in our
example upper bound 4.0Å 3.0Å 7.0Å
lower bound a - b in our example lower
bound 4.0Å 2.2Å 1.8Å
50
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
Distance Constraints
Pseudo-Atoms/Wildcards

assign ( resid 3 and name HB ) ( resid 49 and
name HD ) 4.0 2.2 3.0
What atom is HB or HD? - Recall the PDB atom
nomenclature each atom gets a unique atom
identifier but each atom does not
have a unique NMR resonance a distance
constraint to Ala methyl needs to go to HB1,
HB2 and HB3. - XPLOR represents these equivalent
atoms with a single pseudo atom that is
positioned equidistant between them in the
assign statement the equivalent atoms are
represented with a wildcard ( or ) -
represents 1 character i.e. HB ? HB1
HB2 - represents 2 characters i.e. HD ?
distance constraint is to the pseudo-atom
Pseudo-atom (HB)
HD11,HD12,HD13 HD21,HD22,HD23 2 Leu d methyls
51
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
Distance Constraints
Pseudo-Atoms/Wildcards

assign ( resid 14 and name HD ) ( resid 97 and
name HD ) 4.0 2.2 5.8
Why Not Just Use Multiple Assign Statements? -
For a distance constraint between two sets of Leu
d methyls there would be 36 possible
combinations! - Multiple constraints between the
same sets of atoms would bias or
overemphasize that distance constraints relative
to others Each constraint would contribute
independently to a violation energy that
XPLOR attempts to minimize. Each duplication
of a constraint that is violated would
increase the likelihood that that constraint
would be satisfied at the expense of other
constraints Tipping the balance of energy to
favor one constraint - All the hydrogens may not
be simultaneously satisfied for any given
conformation. XPLOR will try to satisfy all
the constraints leading to a distorted
structure.
Pseudo-atom (HB)
52
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
Distance Constraints
Pseudo-Atoms/Wildcards

assign ( resid 14 and name HD ) ( resid 97 and
name HD ) 4.0 2.2 5.8
What Not Just Choose One Hydrogen to Represent
the Set? - Which one do you choose? - How do
you make the proper choice when there are
multiple distance constraints going to the
same set of hydrogens and when the
constraints are coming from very different
directions? Using Pseudo-Atoms is Not a Perfect
Solution. - distance constraint is going to
location that is spatially distinct from any
of the real atoms. - going to a center average
location - need to adjust the distance
constraints to account for the location of
the pseudo atom.
Pseudo-atom (HB)
53
Software for Protein Structures by NMR

Protein Based Programs Calculate Protein
Structures
General overview of XPLOR Protein Structure
Calculations
Distance Constraints
Pseudo-Atoms/Wildcards

assign ( resid 14 and name HD ) ( resid 97 and
name HD ) 4.0 2.2 3.0
Distance information
How are the Distance Assignments Made? - One
common approach uses a qualitative analysis of
the NMR data to cluster the assignments as
strong, medium, weak and very weak based on the
intensity of the NOE crosspeak. - The
following rules apply Strong 2.5 0.7 0.2 ?
for NH-NH constraints use 2.5 0.7
0.6 Medium 3.0 1.2 0.3 ? for NOEs with NH
use 3.0 1.2 0.5 Weak 4.0 2.2 1.0 Very
Weak 5.0 2.0 1.0 the lower limit is always set
to slightly less than twice the hydrogen van
der Waals radius (1.8Å) For hydrogen bond
constraints constraint between O N 2.8 0.4
0.5 constraint between O HN 1.8 0.3 0.5
54
Software for Protein Structures by NMR