Title: ChEBI: The story so far
1ChEBI The story so far
2Private Data
Public Data
3The state of affairs of bioinformatics in 2002
- Bioinformatics is booming
- Human Genome sequence rough draft published June
2000 - Free resources and free data
4A different story for chemoinformatics
- Private data and private software
5Too hard to solve lets put our head in the sand
6Bioinformatics data too large to keep track of
chemical compounds
- 100000 Protein entries in SwissProt (2002)
- 20 million entries in EMBL Database (2002)
- Small databases unable to keep track
- ENZYME resources 3500 enzymatic reactions
7New initiatives start up
- PubChem
- Chemical repository, millions of entries, focus
on screening assays - ChEBI
- Manually annotated database, nomenclature
reference and compound database, tens of
thousands of entries
8Principles of foundation
- December 2002 email exchanges within the EBI to
address the issue of chemistry - Three principles outlined
9- Nothing held in the database must be
proprietary or derived from a proprietary source
that would limit its free distribution/availabilit
y to anyone.
10Every data item in the database should be fully
traceable and explicitly referenced to the
original source/version.
11Although the EBI will provide a web interface,
the entirety of the data should be available to
all without constraint as, for example, SQL table
dumps, ASCII tables, and XML (e.g. DAMLOIL)
12We make a start using existing resources
- Integrate three resources
- KEGG Compound
- IntEnz
- Chemical Ontology
- Annotation starts summer 2003
- Focus on nomenclature
13Our first release was modest but it was a start
- 21 July 2004
- 2783 annotated entities
- Data
- ChEBI Name, ChEBI Id
- IUPAC Names, Synonyms
- Formula
- Cross-references
14We introduce structures - Sep 2005
- Molfiles
- InChI (IUPAC International Chemical Identifier)
- SMILES (Simplified Molecular Input Line Entry
System) - Image (PNG)
15Marvin in ChEBI
16We start editing the chemical ontology Dec 2005
17Web Services - Oct 2006
- Programmatic access to a ChEBI entry
- SOAP based Java implementation
- Clients currently available in Java and perl
- Four methods with which to access data
- getLiteEntity
- getCompleteEntity
- getOntologyParents
- getOntologyChildren
18Automated Cross References Aug 2007
Current Databases UniProtKB, Reactome,
BioModels, IntAct, SABIO-RK, PubChem and
ArrayExpress
19Chemical Structure Searching May 2008
20After all this, where are we?
21(No Transcript)
22(No Transcript)
23Annotation is linear
24Diversity of users
- Constant challenge of balancing our users' varied
interests.
25Our positives
- Nomenclature database
- Manually annotated data
- Attention to detail
- Free and accessible
- Loyal users
26Our not so positives
- Size for some people
- Not well integrated into other bioinformatics
resources - Community interaction
- No software publicly available to manipulate the
database
27Involve the community
- Create a submission web based tool
- Users can easily submit their entities on a one
to one basis - Also allowing bulk submission from other
resources.
28Improvements to data depth
- Addition of more Xrefs PDB, MACIE ???
- Addition of more chemical attributes? What
chemical attributes? - Text mining projects to extract relevant chemical
information from patents, journals - European Patent Office
29Going Open Source
- Commercial software packages will be replaced
with Open Source - Long term goal allow people to create a free
local installation of ChEBI - Distribution of data in useful formats CML, SDF
30Acknowledgements
- IntEnz Team
- Rafael Alcantára, Volker Ast, Kristian Axelsen,
Anne Morgat - EPO Collaborators
- Hélène Courrier, Stephane Nauche, Jeremy Parsons
- Database supporters
- ArrayExpress, IntAct, Reactome, SABIO-RK, RSC,
GO, RESID etc
- ChEBI Team
- Paula de Matos, Kirill Degtyarenko, Marcus Ennis,
Janna Hastings, Christoph Steinbeck - Alumni
- Michael Darsow, Mickael Guedj, Alan McNaught,
Martin Zbinden - ChEBI supporters
- Rolf Apweiler, Michael Ashburner, Henning
Hermjakob, Janet Thornton
31Requirement for submitting data to ChEBI
- Disclaimer this is only the summary of a chat I
have had with the ChEBI coordinator last night.
So no promises ! - Information needed to submit a compound
- Structure
- Name, synonyms
- Registry
- Database accession(s)
- Mapping to ChEBI Ontology
- ChEBI currently quite busy with ongoing projects,
but would consider taking submissions.
ChEBI The story so far
31
32What Could be done within APO-SYS
- From Pekkas talk, I gathered that there are
about 5,000 to 10,000 compounds in these siRNA
libraries. - Question who else is dealing with compounds in
APO-SYS? - One could use the ChEBIs web service using InCHI
to identify what is already in the database. - ChEBI can do targeted curation provided funding
for the curation team.
ChEBI The story so far
32