Title: Abstract
1Overview and Development Status of the Biodefense
Proteomics Data Center
X. Chen1, O. Crasta1, C. Zhang1, R. Jha1, S.
Baker1, W. Sun1, C. Kommidi1, M. Moore2, K.
Stemple3, B. Sobral1 1Virginia Bioinformatics
Institute, Virginia Tech, Blacksburg, VA 24061,
2Social Scientific Systems, Inc. 8757 Georgia
Ave, Silver Spring, MD 20910, 3Genomics Program,
DHHS/NIH/NIAID/DMID, 6610 Rockledge Dr. MSC 6603
Room 6W06, Bethesda, MD 20892.
ADMINISTRATIVE RESOURCE FOR BIODEFENSE PROTEOMIC
RESEARCH CENTERS
Abstract VBI is part of the Administrative
Resource Center for Biodefense Proteomics
Research Centers funded through an NIH/NIAID
proteomics initiative with a goal to improve
defense against bioterrorism and
emerging/reemerging infectious diseases. A
Proteomics Data Center has been developed at VBI
(VBIPDC), with its publicly accessible web site
at http//proteinbank.vbi.vt.edu/bprc. The VBIPDC
has developed an Oracle-based relational database
system with multiple schemas modeled on multiple
data types 2D gel, mass spectrometry (MS), yeast
two-hybrid (Y2H), and administration. The
administration and multiple proteomics database
schemas have been developed using the Erwin data
modeling tool with a process-oriented entity
relationship. This system currently supports
proteomics data management, data downloads and
uploads. The software for data loading and data
processing are implemented in the JAVA
programming language. The data visualization and
web interface are implemented with J2EE
technology within the Struts framework.
The dynamically generated data query interface is
implemented on top of an Oracle-based relational
database management system that currently
contains multiple schemas based on various
proteomics data types such as 2D gel, MS, and Y2H
protein-protein interaction. The data are stored
in a process-oriented fashion based on modified
Pedro class diagram (Taylor et al 2003)ref1.
Three phases of database design and development
have been initiated (1) the current database
using a process-oriented design for datasets
generated by each PRC site, (2) a single schema
that uses a normalized and consolidated data
model to remove information redundancy this will
be developed within next year, and (3) a
multi-layered database architecture with
physical, logical, and application layers. This
will be developed during the course of this
project as a final production instance (Fig. 6).
Introduction The National Institute of Allergy
and Infectious Diseases (NIAID), part of National
Institutes of Health (NIH), awarded a five-year
contract (2004-2009) to Social Scientific
Systems, Inc. (SSS), which subcontracted to the
Virginia Bioinformatics Institute (VBI) and
Georgetown University (GU), to serve as the
Administrative Resource Center (ARC) for the
seven awarded Proteomics Research Centers (PRCs)
(Fig. 1). The contract supports NIAIDs goals to
improve defense against bioterrorism and
emerging/reemerging infectious diseases. Under
this contract, the ARC provides support to the
PRCs by developing and maintaining a publicly
accessible website (http//www.proteomicsresource.
org) that contains data, reagents, standard
operating procedures (SOP), and technology
protocols generated by PRCs. Within the ARC, SSS
coordinates the project, reagent inventory,
protocol, SOP development and maintenance GU
builds data analysis tools for the PRCs VBI
designs, develops and maintains a centralized
relational database to manage proteomics datasets
generated by the PRCs and develops web interfaces
to query the data (Fig. 2).
Fig. 6. Three design logics and three phases
of VBI proteomics database system
Fig. 5. VBIPDC system architecture
VBIPDC will also support genomics and micro-array
data management, through partial funding from
PATRIC and PathPort projectsref2,3. The genomic
data management uses GUS schema
(http//www.gusdb.org) and the micro-array data
management system is modeled after the
ArrayExpress (http//www.ebi.ac.uk/arrayexpress).
The administration and multiple proteomics
database schemas have been developed using the
Erwin data modeling tool with process-oriented
entity relationship (Fig. 7). The database system
will adhere to the community standards such as
MIAME, MIAPE, and PSIMI. Proteomics datasets
from various sources are converted into a
standard format for normalization and data
decomposition into relational format for data
uploading into the database (Fig. 8). During the
course of this project, VBIPDC will advance the
system with key-value pair based generic database
schema as physical layer, process-oriented
view/materialized view as object/logical layer,
and stored procedure or package as the data
processing layer. Until data is made available to
the public, secure data access has been
implemented using a dual security mechanism of
role-based Java Authentication and Authorization
at the application layer with users matching
roles at the database level.
Fig. 2. Responsibilities of Administrative
Resource Center
Fig. 1. NIAID Proteomics Research Program
Progress of NIAID-funded VBI Proteomics Database
Project VBIPDC has developed a proteomics
database system with a web interface to
facilitate storage, visualization, and analysis
of proteomics datasets generated by PRCs (Fig.3)
. The system employs two data storage systems
one is an Oracle-based relational database and
the other is a networked file server that is used
to facilitate data queries and quick data
download/upload (Fig. 4, 5). VBIPDC currently
contains system functionalities with
account/document management, data query, data
download, data upload/submission, and data
management with datasets generated from various
proteomics experiments. Three instances of
databases have been created in our system as our
database development environment --- development,
test/stage, and production databases.
Fig. 8. Data flow of proteomics data
management, data reformatting, and data
standardization
Fig. 7. An example database schema of VBI Y2H
protein-protein interaction modeled using Erwin
based on Pedro class diagram
- References
- C.F.Taylor et. al., (2003). A systematic
approach to modeling, capturing, and
disseminating proteomics experimental data.
Nature Biotechnology 21 247 254 - PathoSystem Resource Integration Center at VBI
(https//patric.vbi.vt.edu/) - The Pathogen Portal Project at VBI
(http//pathport.vbi.vt.edu/) - Acknowledgement
- This work is funded through NIAID contract
number HHSN266200400061C. We thank June Mullins
at VBI for the help in designing this poster.
Fig. 3. VBI Proteomics Data Center web interface
Fig. 4. VBI computing system architecture