Title: The Protein Databank
1The Protein Databank
- Working with protein data-files
2Determining Biomolecule Structures
- X-ray crystallography
- Nuclear magnetic resonance
3The Protein Databank
4The PDB Growth Chart
5Maxim 10.1
- Beware of anything in the PDB Header Section
6The PDB Data-File Formats
7Example PDB structure 1LQT
8Example PDB structure 1M7T
9Downloading PDB data-files
- http//www.rcsb.org/pdb/
- http//www.ebi.ac.uk/services/
10Accessing Data In PDB Entries
- Accessing PDB Annotation Data
- Free R and resolution
11Example PDB data-file
- REMARK 2
- REMARK 2 RESOLUTION. 1.05 ANGSTROMS.
- REMARK 215 NMR STUDY
- REMARK 215 THE COORDINATES IN THIS ENTRY WERE
GENERATED FROM SOLUTION - REMARK 215 NMR DATA. PROTEIN DATA BANK
CONVENTIONS REQUIRE THAT - REMARK 215 CRYST1 AND SCALE RECORDS BE INCLUDED,
BUT THE VALUES ON - REMARK 215 THESE RECORDS ARE MEANINGLESS.
12Example PDB data-file, cont.
- .
- .
- .
- REMARK 3 FIT TO DATA USED IN REFINEMENT.
- REMARK 3 CROSS-VALIDATION METHOD THROUGHOUT
- REMARK 3 FREE R VALUE TEST SET SELECTION RANDOM
- REMARK 3 R VALUE (WORKING TEST SET) 0.134
- REMARK 3 R VALUE (WORKING SET) 0.134
- REMARK 3 FREE R VALUE 0.153
- REMARK 3 FREE R VALUE TEST SET SIZE () NULL
- REMARK 3 FREE R VALUE TEST SET COUNT 2200
- .
- .
- .
13Plotting Free R Values against Resolution
14Database cross references
- DBREF 1LQT A 1 456 GB 13882996 AAK47528 1 456
- DBREF 1LQT B 1 456 GB 13882996 AAK47528 1 456
- DBREF 1AFI 1 72 SWS P04129 MERP_SHIFL 20 91
- DBREF 1M7T A 1 66 SWS P10599 THIO_HUMAN 0 65
- DBREF 1M7T A 67 106 SWS P00274 THIO_ECOLI 68 107
15Coordinates section
- REMARK 210
- REMARK 210 BEST REPRESENTATIVE CONFORMER IN THIS
ENSEMBLE 21 - REMARK 210
16Data section
- ATOM 1 N ARG A 2 26.318 -8.010 39.090 1.00 20.71
N - ANISOU 1 N ARG A 2 2040 3071 2755 114 -339 -393 N
- ATOM 2 CA ARG A 2 25.150 -8.702 38.505 1.00 18.85
C - ANISOU 2 CA ARG A 2 2029 2677 2455 67 -321 -209 C
- ATOM 3 C ARG A 2 24.846 -8.176 37.123 1.00 17.23
C - ANISOU 3 C ARG A 2 1689 2429 2429 143 -282 -258 C
- ATOM 4 O ARG A 2 25.151 -7.048 36.775 1.00 18.14
O - .
- .
- TER 7215 GLY A 456
- ATOM 7216 N ARG B 2 -19.423 25.709 6.980 1.00
21.57 N - ANISOU 7216 N ARG B 2 2476 3012 2707 -165 -370 95
N - ATOM 7217 CA ARG B 2 -18.718 26.510 8.024 1.00
19.01 C - ANISOU 7217 CA ARG B 2 2127 2672 2424 -63 -285 91
C - ATOM 7218 C ARG B 2 -17.250 26.207 8.002 1.00
17.22 C - ANISOU 7218 C ARG B 2 1955 2392 2196 -91 -299 121
C - ATOM 7219 O ARG B 2 -16.851 25.158 7.535 1.00
18.15 O
17Data section, cont.
- TER 14289 GLY B 456
- HETATM14290 C ACT 1866 -13.075 1.733 10.218 1.00
27.25 C - ANISOU14290 C ACT 1866 3493 3560 3299 -39 -36 -44
C - .
- .
- CONECT14290142911429214293
- CONECT1429114290
- CONECT1429214290
- TER
- .
- .
- CONECT1469014663
- MASTER 389 0 15 46 38 0 0 620280 2 401 72
- END
18Data section, cont.
- MODEL 1
- ATOM 1 N MET A 1 3.110 -4.682 -3.025 1.00 0.00 N
- ATOM 2 CA MET A 1 2.546 -3.712 -2.053 1.00 0.00 C
- ATOM 3 C MET A 1 1.134 -3.295 -2.450 1.00 0.00 C
- ATOM 4 O MET A 1 0.882 -2.130 -2.758 1.00 0.00 O
- ATOM 5 CB MET A 1 3.466 -2.491 -2.002 1.00 0.00 C
- ATOM 6 CG MET A 1 3.781 -1.903 -3.370 1.00 0.00 C
- ATOM 7 SD MET A 1 4.256 -0.166 -3.285 1.00 0.00 S
- ATOM 8 CE MET A 1 6.004 -0.307 -2.920 1.00 0.00 C
- ATOM 9 1H MET A 1 2.906 -4.327 -3.980 1.00 0.00 H
- ATOM 10 2H MET A 1 2.650 -5.601 -2.859 1.00 0.00
H - ATOM 11 3H MET A 1 4.134 -4.738 -2.858 1.00 0.00
H - ATOM 12 HA MET A 1 2.517 -4.178 -1.079 1.00 0.00
H - ATOM 13 1HB MET A 1 2.996 -1.724 -1.405 1.00 0.00
H - ATOM 14 2HB MET A 1 4.397 -2.778 -1.536 1.00 0.00
H - ATOM 15 1HG MET A 1 4.596 -2.461 -3.807 1.00 0.00
H - ATOM 16 2HG MET A 1 2.907 -1.993 -3.998 1.00 0.00
H - ATOM 17 1HE MET A 1 6.344 -1.302 -3.167 1.00 0.00
H - ATOM 18 2HE MET A 1 6.169 -0.120 -1.869 1.00 0.00
H
19Data section, cont.
- TER 1659 VAL A 107
- ENDMDL
- MODEL 2
- ATOM 1 N MET A 1 2.750 -6.779 -1.627 1.00 0.00 N
- ATOM 2 CA MET A 1 2.487 -5.475 -2.290 1.00 0.00 C
- .
- .
- .
- TER 1660 VAL A 107
- ENDMDL
20Extracting 3D co-ordinate data
- my ( X, Y, Z ) ( substr( _, 30, 8 ),
-
- substr( _, 38, 8 ),
- substr( _, 46, 8 ) )
21The simple_coord_extract program
- ! /usr/bin/perl -w
- simple_coord_extract ltPDB Filegt - Demonstrates
the extraction of - C-Alpha co-ordinates from a PDB
- data-file.
- use strict
- while ( ltgt )
-
- if ( /ATOM/ substr( _, 13, 4 ) eq "CA "
) -
- my ( X, Y, Z ) ( substr( _, 30, 8
), - substr( _, 38, 8 ),
- substr( _, 46, 8 ) )
- X s/ //g
- Y s/ //g
- Z s/ //g
22Results from simple_coord_extract ...
- X, Y Z 25.150, -8.702, 38.505
- X, Y Z 23.675, -8.497, 35.069
- X, Y Z 20.747, -6.252, 34.332
- X, Y Z 17.545, -8.297, 34.292
- X, Y Z 15.182, -7.484, 31.454
- X, Y Z 11.736, -8.952, 30.942
- X, Y Z 10.261, -9.014, 27.451
- X, Y Z 6.507, -9.548, 27.173
23The graphic image contact map
24STRIDE Secondary Structure Assignment
25Maxim 10.2
- It is often easier and desirable to regenerate
database annotation than trawl through entries
reconstituting the annotation using custom code.
26Installation of STRIDE
- tar -zxvf stride.tar.gz
- cd stride
- make
- ./stride
27Assigning Secondary Structures
28Simplified definition of a Hydrogen Bond
29Example of Secondary Structure Elements in
Proteins
30Definition of Dihedral angles in the backbone of
protein structures
31Using STRIDE and parsing the output
- ./stride
- You must specify input file
- Action secondary structure assignment
- Usage stride Options InputFile gt file
- Options
- -f File Output file
- -mFile MolScript file
- -o Report secondary structure summary
Only - -h Report Hydrogen bonds
- -rId1Id2.. Read only chains Id1, Id2 ...
- -cId1Id2.. Process only Chains Id1, Id2 ...
- -qFile Generate SeQuence file in FASTA
format and die - Options are position and case insensitive
- stride -cA 1lqt.pdb
32Using gawk ...
- gawk '/ASG/ print 8 " " 9' 1lqt.A.stride
- 360.00 156.52
- -75.72 161.36
- -71.26 145.24
- -111.08 119.10
- -118.65 131.78
- .
- .
- gawk '(/ASG/ /Strand/) print 8 " " 9'
1lqt.A.stride - gawk '(/ASG/ /AlphaHelix/) print 8 " "
9' 1lqt.A.stride
33Ramachandran Plot of dihedral angles of chain A
from 1LQT
34Extracting amino acid sequences using STRIDE
- stride -q 1lqt.pdb
- gt1lqt.pdb A 452 1.050
- RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRS
GVAPDHPKIK - .
- .
- gt1lqt.pdb B 454 1.050
- RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRS
GVAPDHPKIK - .
- .
- stride -cA -q 1lqt.pdb
- gt1lqt.pdb A 452 1.050
- RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRS
GVAPDHPKIK - .
- .
35Introducing The mmCIF Protein Format
36Converting mmCIF
- Converting mmCIF to PDB
- Converting mmCIFs to PDB with CIFTr
37The CIFTr program
- cd
- tar -zxvf ciftr-v2.0-linux.tar.gz
- cd ciftr-v2.0-linux/
- setenv RCSBROOT /ciftr-v2.0-linux
- export RCSBROOT /ciftr-v2.0-linux
- ./CIFTr -i 1lqt.cif
38More on mmCIF
- Problems with the CIFTr conversion
- Some advice on using mmCIF
- Automated conversion of mmCIF to PDB
39Where To From Here