Title: The Cybercell Database CCDB
1The Cybercell Database (CCDB)
Presenter Shan Sundararaj
Provisional PhD. Candidate
Department of Pharmacy
University of Alberta
Supervisor Dr. D. Wishart
Date December 12, 2002.
2Outline
- What is the Cybercell project?
- What is the CCDB?
- How was it made how does it work?
- What can it do?
- Conclusion Future plans
3Project Cybercell
- Objective
- Create a virtual biological cell
- Whos involved
- Researchers across Canada internationally
- Both universities and private companies
- Dozens of structural biologists, computer
scientists, bioinformaticists, etc.
4Project Cybercell
- Objective Create a virtual cell
- simulate all elements and processes of a
biological cell on a computer - use Escherichia coli as a model
- use it to predict cellular phenomena
- e.g. response to drugs or environment
- Test simulations with the best quantitative data
possible
5Cybercell example
A model of a cell with a metabolite diffusing
into it through transport proteins and being
converted to a different molecule by an enzyme
http//129.128.166.250/research/simulation_gallery
/4d/cybercell4.mov
This simulation requires very specific
information about enzyme availability and rates
to be useful
6Cybercell Database (CCDB)
7Outline
- What is the Cybercell project?
- What is the CCDB?
- How was it made how does it work?
- What can it do?
- Conclusion Future plans
8CCDB
- the CCDB is an evolving repository of
quantitative data on proteins and genes compiled
from a variety of data sources - Includes browsable, searchable and sortable lists
and links of
- protein names
- sequences
- functions
- structures
- physicochemical
- constants
- cofactors
- copy numbers
- products
- reactants
- binding partners
- More
9CCDB Statistics
10The Colicard
- Fundamental unit of the database (e.g. the
record) - One exists for all 4374 proteins identified in E.
coli K12
11Interesting Features
- Can view 3-D structures using WebMol
- Homology models created for proteins with
appropriate template - Contains lists of protein interacting partners
and protein complexes - Useful when looking for homologous interacting
proteins - Enzyme information from BRENDA
12Related Databases
- CCRNA
- Equivalent database for tRNA and rRNA molecules
- CC3D
- Database that focuses on structural information
- CCMD
- Database of data on small metabolic compounds
13Outline
- What is the Cybercell project?
- What is the CCDB?
- How was it made how does it work?
- What can it do?
- Conclusion Future plans
14Data gathering
- There is a VAST amount of varied information
available about proteins, nucleic acids and other
molecules!! - All stored nicely in the CCDB now, but where does
it originate?
15Information Sources
Manual Literature Searches
Other databases
16Database structure
- CCDB is a flat-file database
- Individual text file for each Colicard
- Create an index file of all the pertinent
information and use Perl cgi scripts to make a
searchable, sortable web interface - Contains many links to references and other
databases for more detailed information - Use other tools to perform search and data
extraction functions for extra functionality
17Data Gathering
- Would take years to do all manually!
- Use manual forms to fill in some information
- Make use of web robot to query databases and
build as much of Colicards as possible
18Robots Can Be Annoying
- Many sites do not like robots (e.g. EcoCyc)
- Some sites disallow robots altogether (e.g.
BRENDA) - Have to be polite, look at robots.txt file, dont
query too fast or too often
19Update robots (cont.)
- Weekly check to web-based databases to look for
updates - If update available, ALL updated files are
downloaded - Information is parsed and compared to last
available local copy (e.g. line-by-line
comparison of Swiss-Prot files)
20Outline
- What is the Cybercell project?
- What is the CCDB?
- How was it made how does it work?
- What can it do?
- Conclusion Future plans
21Search/Sort functions
- Main browsing page allows you to sort by any of
11 characteristics, for example - Protein Name
- Swiss Prot ID
- of amino acids
- Function
- Gene Name
- Good for broad overview, but what if you want to
look at a specific protein?
http//redpoll.pharmacy.ualberta.ca/CCDB/cgi-bin/E
CARD_BROWS_NEWn.cgi?hits20browsn8pag1acco11
22Data Extractor
- Simple to use JavaScript-based interface
- Can choose proteins based on ANY type of
information in database - Uses pre-parsed pairwise data sets and cgi
scripts to extract and display data
http//redpoll.pharmacy.ualberta.ca/CCDB/CCDB_Ext.
html
23WebGlimpse Search
- Even simpler, though less robust search method
- Uses program called WebGlimpse that indexes all
the Colicard files and returns those that match
the search criteria somewhere in the file - Useful because it allows for misspellings, which
the data extractor does not
24The Whole Thing
- The entire CCDB database in text files is
available for download at http//redpoll.pharmacy
.ualberta.ca/bahram/cgi-bin/Pdownload.cgi - This is for people who would like to have a local
copy of all the information
25Outline
- What is the Cybercell project?
- What is the CCDB?
- How was it made how does it work?
- What can it do?
- Conclusion Future plans
26Conclusion
- The Cybercell project aims to simulate a
biological cell on a computer in the next 5-10
years - The CCDB will act as a repository of all known
and discovered information regarding E. coli
molecules during the project - CCDB is updated automatically as well as by
manual forms to remain up-to-date - CCDB can be queried in several ways (browsing,
data extractor, search)
27Thanks
- Dr. Wishart
- Bahram Habibi-Nazhad
- An Chi Guo
- Haiyan Zhang
- Melania
- Everyone in lab