www.wwpdb.org

About This Presentation

Title:

www.wwpdb.org

Description:

Title: PowerPoint Presentation Last modified by: Christine Zardecki Document presentation format: On-screen Show Company: Helen Berman Other titles – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 48

Provided by: wwp6

Learn more at: https://cdn.rcsb.org

Category:

more less

Transcript and Presenter's Notes

Title: www.wwpdb.org

1
www.wwpdb.org
September 7, 2007
2
Agenda

Welcome and introductions
Accomplishments
Remediation rollout summary
Toward the future
Break
Matters arising
Incorrect structures
Executive session
Feedback to wwPDB
Set next meeting date

3
wwPDB AchievementsOctober 2006 - September 2007

Continued growth of archive
Website updates
Publications and presentations
Time-stamped archive
Remediation rollout
Annotation document
One stop shop NMR, cryoEM

4
Depositions since wwPDB establishment
5
PDB entry processing

1-1-2000 10,997 entries in PDB
Today 10-Jul-2007 44,578 entries in PDB
Size now is 4 times larger than when the 3 sites
started
In 1999, 2361 entries were deposited
In 2006, 7282 entries were deposited
We handle more than 3 times as many entries per
year with less staff and all wwPDB sites
produce high quality annotated PDB entries
No current backlog of unprocessed entries

6
Time-stamped copies of the archive

57 Gbytes of data for 2006, released January 2,
2007
68 Gbytes of data for July 2007 snapshot
Both include
PDB format entries
mmCIF format entries
PDBML format entries
Experimental data
Dictionary, schema, and format documentation

7
Outreach

wwPDB website
Discussion forums
NMR Task Force
Publications
Professional society meetings

8
(No Transcript)
9
Joint publications

Nucleic Acids Research, 35 D301 (2007)
The worldwide Protein Data Bank (wwPDB) ensuring
a single, uniform archive of PDB data
Nature Structure Molecular Biology, 14354 (2007)
Reply to Building meaningful models of
glycoproteins
Nature Biotechnology, 25 854 (2007)
Response to Overhauling the PDB
Methods in Molecular Biology, in press
Data deposition and annotation at the wwPDB
Structural Bioinformatics 2nd Edition, in press
The wwPDB

10
Interactions since October 2006

Exchange visits
MSD/RCSB PDB (4)
PDBj/RCSB PDB (1)
PDBj/BMRB (2)
BMRB/RCSB PDB (1)
Phone conference with site directors-twice a year
VTCs among staff
BMRB/RCSB PDB twice a month (ADIT-NMR)
MSD/RCSB PDB twice a week (annotation procedures,
remediation)
RCSB PDB/PDBj and BMRB/PDBj on necessary
occasions
Email among staff
MSD/RCSB PDB 2 per day
PDBj/RCSB PDB 2 per day

11
New initiatives

One stop shop for NMR data and models
One stop shop for electron microscopy maps and
models (NIH-funded)

12
Recommendations from 2006 wwPDBAC report

Implement the recommendations from November 19-20
2005 modeling workshop (Berman et al. Structure
14, 1211-1217)
Models phased out October 16, 2006
Rollout remediated data to superusers by December
31, 2006 to all users by July 1st 2007 Provide
access to PDB formatted files following the most
current format.
Superusers had access to data November 2006, all
users in April 2007

13
Recommendations from 2006 wwPDBAC report

Work with SAXS community to create appropriate
representation of these data, and circulate
progress reports to the Committee as appropriate
Not done
Expand the four character PDB ID codes before the
number of depositions reaches 400,000
Number of available PDB ID codes has been
increased by allowing IDs to start with a
character
Develop and present a formal recommendation to
the wwPDBAC regarding the purview of the PDB at
our September 2007 meeting in Princeton, NJ
In process

14
Recommendations from 2006 wwPDBAC report

Coordinate with the wwPDBAC to obtain formal
letters of support when seeking funding
establish a coordinated plan to both educate and
lobby funding agency representatives establish a
charitable organization to serve as a conduit for
receipt of both grant funding and gifts from
pharmaceutical and biotechnology companies,
involving individual Committee members as needed.
Funding Representatives Round Table Discussion

15
Remediation
16
Key drivers

Chemistry and nomenclature
Sequence and taxonomy
Citations
Viruses

17
IUPAC, NMR, and the PDB Atom nomenclature and
NMR restraints

John L. Markley

18
History of the NMR-led requested remediation of
hydrogen atom nomenclature

When BMRB was established in the late 1980s, it
adopted the IUPAC atom nomenclature
recommendations from Biochemistry 9, 3471-3479,
1970
At that time, we noted that NMR structures being
deposited in the PDB did not adhere to these
recommendations (particularly for H-atoms e.g.
HB1/HB2 instead of HB2/HB3), and I brought this
to the attention of the director of the PDB at
Brookhaven with the request that it be remedied
A group of NMR spectroscopists led by Kurt
Wüthrich worked with the NMR community to develop
recommendations for the deposition of NMR
structures all agreed that the prior IUPAC
recommendations be maintained (Pure Appl.
Chem., 70, 117-142, 1998)
Over the years, wwPDB Task Force on NMR has
pushed strongly for remediation of atom
nomenclature

19
Accomplished atom nomenclature remediation

Nomenclature in PDB now matches that in BMRB
The single format will avoid confusion and errors
All discrepancies have been resolved in the
remediated files, with the minor exception of
atoms at the C-terminus
IUPAC-IUBMB-IUPAB wwPDB
H''
HXT
O'
O
O''
OXT
Since these atoms are not observed by NMR
spectroscopists, we do not consider this to be a
problem
We plan to write an addendum to the
IUPAC-IUBMB-IUPAB Recommendations for
submission to Pure Appl. Chem. to formalize
these as accepted atom designators

20
Remediation of NMR structure files

Required the linking of structure files and
restraint files
Atom names, residue numbers and chain identifiers
needed to be updated
Remediation of restraint files required the
unpacking, parsing, and regularization of legacy
information contained in PDB MR files into the
NMR Restraints Grid

21
NMR Restraints Grid development

BMRB, University of Wisconsin-Madison, USA
MSD, European Bioinformatics Institute, Hinxton,
UK
Department of Computer Sciences/Condor Project,
University of Wisconsin, USA
Department of NMR Spectroscopy, Utrecht
University, The Netherlands
Centre for Molecular and Biomolecular
Informatics, Radboud University, The Netherlands

22
NMR Restraints Grid development

PDB MR files are converted into NMR-STAR
NMR-STAR file and the corresponding PDB
coordinate file are parsed the information is
connected inside the CCPN framework and the
results are written out as NMR-STAR files
converted restraint files are filtered to remove
redundant restraints
Files made available in the NMR Restraints Grid
with access from links in each corresponding PDB
entry
NMR restraint data files with atom nomenclature
corresponding to remediated PDB data files will
be available by the end of 2007

23
Current state of the NMR Restraints Grid

Grid contains 3583 entries with a total of
3,882,595 parsed restraints
3583 entries out of 6508 in PDB have restraints
Database is updated continuously as new PDB
entries are released that have associated NMR
restraints

24
Recent agenda items considered by the wwPDB NMR
Task Force

Strongly recommend that restraints be mandatory
for all NMR depositions to the PDB
Commissioned the development of procedures for
representing uncertainty in NMR structures and
for specifying the single model meant to be most
representative of the structure
Task Force should write an article for J. Biomol.
NMR on its recommendations for data
representation and submission of experimental
data
It was suggested that the Task Force begin to
discuss validation issues

25
Most X-ray structures are supported by structure
factors
26
Less than half of NMR structures are supported by
restraint data
27
Most structural genomics centers regularly
provide restraints, but the overall average is
low
Number of NMR structures deposited
247
Percent of deposited structures with restraints
1127
880
Structural genomics center
28
Remediation rollout

Helen M. Berman

29
Remediation scope and statistics

All primary citations verified (45K)
Sequences taxonomy updated for 61K sequences
Ligand stereochemistry and nomenclature for 13M
monomers and 170K non-polymer molecules
Symmetry and coordinate transformations for 280
virus entries
10814 diffraction source beamline updates
1000 miscellaneous uniformity issues

30
Remediation process

Corrections contributed and reviewed by all wwPDB
members
Corrections on the archival mmCIF data files
tracked in a version tracking system (CVS)
New PDBx/mmCIF, PDBML-XML, and PDB format data
files produced
Validated by each wwPDB group
Staged public testing began January 2007
Iterative corrections based on external comments
made through July 2007
Remediated archive released August 1, 2007

31
Remediation-supporting infrastructure

Internal (wwPDB) CVS archive remediation data
files
Internal (wwPDB) rsync distribution site for
remediated data files
Early tests of web, rsync, ftp distribution
sites for dictionaries, PDB, mmCIF, and XML data
files
Complete wwPDB ftp site for remediated data and
dictionaries updated with remediation corrections
and weekly PDB updates
200K CVS remediated data file updates
1M remediated file updates to support testing
and distribute from January 2007 - present

32
Checking the remediated files

Haruki Nakamura

33
Different checks

References to external databases
Data processing consistency checks
PDBML/XML validation
Database loads
User-contributed diagnostics

34
References to external databases

Sequence and taxonomy (UniProt)
Primary Citations (PubMed)

35
Data processing consistency checks

Covalent geometry and stereochemistry
Compliance with wwPDB Chemical Component
Dictionary
Molecular and stereochemical assignment
Atom and residue nomenclature
Compliance with PDB Exchange Dictionary
Data types, controlled vocabularies, parent-child
relations
External tools such as WhatIF

36
PDBML/XML schema validation

Version control
Data type consistency
Data ranges
Controlled vocabularies
Referential integrity
XPath traversal of PDBML data hierarchy

37
Database loads

Diagnostics obtained from loading remediated data
into existing database systems
Relational databases used by MSD-EBI and RCSB PDB
XML database used by PDBj

38
User-contributed diagnostics

Batch checking of remediated files by Phenix
revealed consistency issues with alternate
conformations - Ralf Grosse-Kunstleve
Batch checking for inconsistent linkages and
missing residues by docking software - Tommy
Carstensen
Nomenclature - Tom Goddard Chimera Group
Sequence and assembly diagnostics - Roland
Dunbrack
Relational data integrity diagnostics - Dan
Bosler
Nomenclature and experimental details - Clemens
Vonrhein
Many specific issues related to chemical
assignments, disorder, and nomenclature

39
Looking toward the future

Kim Henrick

40
Annotation project

Standardize annotation rules and policies among
wwPDB sites
Document annotation rules and policies
Create venue to update annotation rules and
policies as necessary

41
Annotation project

How did we get there?
Review and discussion of each PDB field by email
and VTC
Document written and reviewed by all staff
Final review by site directors
Software compliant to new annotation procedures
implemented
Tested software and trained annotators
Published document on web (January 2007)

42
Annotation document

Specification of ALL fields in PDB file
Clarification of policies
Assignment of PDB IDs
Release of files and information
Changes to entries
Clarification of data representation
Chain ID for all atoms in the file
Multi-model representation for alternate
conformation or disorder
Chimeras
Microheterogenity

43
PDB IDs and DOIs

Credit for a PDB entry in CVs
Used as a reference in publications
http//dx.doi.org/10.2210/pdb4hhb/pdb

See also DOIs for Biological Databases Philip
E. Bourne, CrossRef 7th Annual Meeting, 1
November 2006 Cambridge, MA
44
Outstanding issues

Microheterogeniety
Disorder
Large structures

45
wwPDB and software developers

ACA 24th July 2007 meeting in Salt Lake City
Future Challenges for the PDB What should the
PDB be doing in 2015?
Attended by software developers and wwPDB staff

46
July 24 meeting

Technical discussions
TLS
Multiple models
Large structure
demand for one file per structure
Microheterogeneity
Twinning
George Sheldrick, Paul Adams and Garib Murshudov
produce a draft of the PDB format to describe
twinning and to represent the data in HKLF
Procedural outcomes
Yearly developer meeting
Editorial board to assist in difficult annotation
problems
Ongoing electronic forum

47
Toward a single processing tool

This weekend wwPDB retreat with contributors
from RCSB PDB Rutgers and UCSD, BMRB, PDBj, and
EBI-EMBL
Task come to agreement to pool resources to
produce a single deposition tool and design of
new processing pipeline

Write a Comment

User Comments (0)

About PowerShow.com

www.wwpdb.org - PowerPoint PPT Presentation

www.wwpdb.org

Title: PowerPoint Presentation Last modified by: Christine Zardecki Document presentation format: On-screen Show Company: Helen Berman Other titles – PowerPoint PPT presentation