Title: Enabling Distributed Computing for Structural Biology
1Enabling Distributed Computing for Structural
Biology
http//www.pdb.org/ ? info_at_rcsb.org
2(No Transcript)
3(No Transcript)
4(No Transcript)
5PDB Deposition and Distribution Sites
In place
Planned
SDSC, Rutgers, NIST, BMRB
Cambridge Crystallographic Data Centre,
UK National University of Singapore, Singapore
Osaka University, Japan Universidade
Federal de Minas Gerais, Brazil Max-Delbrück-Cente
r, Germany
6Essential PDB Statistics
- gt17,500 structure entries in PDB archive
- All files in archive available in a common data
representation - Two search engines implemented atop a loose
federation of text, object and relational
databases - 100,000 file downloads per day
- 7 full mirror sites world-wide (45 GB/site)
71cag ? Collagen
7hvp ? HIV-1 Protease
2lyz ? Lysozyme
1mbn ? Myoglobin
1tau ? DNA Polymerase
1rvc ? Restriction Enzyme
1oco ? Cytochrome c Oxidase
1aoi ? Nucleosome
1ffk ? Ribosome
1cd3 ? Bacteriophage phiX174
Molecule of the Month images by David Goodsell
(http//www.rcsb.org/pdb/molecules/molecule_list.h
tml)
8(No Transcript)
9 Demographics of Depositions
- Macromolecule Type
- 90 - Protein
- 6 - Nucleic acid
- 4 - Protein/Nucleic acid complexes
- Experimental Method
- 83.0 - X-Ray diffraction
- 15.0 - NMR
- 1.6 - Model
- 0.3 - Electron diffraction
- 0.1 - Neutron diffraction
Geographic Region 59.7 - North America 24.8 -
Europe 12.9 - Asia 2.4 - Australia/New
Zealand 0.1 - South America Release
Status 63 - HPUB 21 - REL 16 - HOLD
10New Folds
11Current PDB Data Infrastructure
- PDB is unique among biological data resources in
being built on a community standard data
representation. - Representation is encoded in a data dictionary
that contains a comprehensive ontology for
macromolecular structure and experiment. - The metadata model supporting this representation
is used by all PDB data processing and database
software tools. - Access is predominately file oriented.
12Structural Genomics
- The determination of macromolecular structures on
a genomic scale - . . .the discovery, analysis and dissemination
of three-dimensional structures of protein, RNA
and other biological macromolecules representing
the entire range of - structural diversity found in nature?
- www.nigms.nih.gov/news/meetings/hinxton.html
- First International Structural Genomics Meeting
- April 4-6 2000, Hinxton, UK
13Structural Genomics Impact on PDB
- Increase in number of structures
- Types of information archived will increase
- Experimental conditions for cloning, expression,
crystallization, data collection, structure
determination, refinement - Query requirements and need to support
distributed computation will increase
14Application Level DistributionCORBA - API
- Specification adopted by OMG in February 2001
- Based on the mmCIF data ontology
- Provides high performance access
- Direct access to binary data structures
- Broad granularity of access (individual atoms to
biological assemblies)
15Open Interfaces - OpenMMS
- Provides a Java-only toolkit that creates XML,
CORBA and relational DB representations of the
mmCIF data ontology. - Allow programmers to more easily create
efficient, high performance and robust
applications that use PDB data - Provides database-to-database interoperability
- OpenMMS server under development
- Code and examples available at
http//openmms.sdsc.edu/ -
16Access
- PDB SDSC Access Site
- http//www.pdb.org/
- PDB Deposition Sites
- http//autodep.ebi.ac.uk/
- http//pdbdep.protein.osaka-u.ac.jp/adit/
- http//pdb.rutgers.edu/adit/
- PDB Software Download Site
- http//pdb.rutgers.edu/software/
- PDB mmCIF Resource Site http//pdb.rutgers.edu/mmc
if/ - mmCIF Beta Data Site
- ftp//beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/
17The PDB is supported by funds from the NSF, DOE,
and two units of the NIH the NIGMS and NLM