Title: ChEBI
1ChEBI
Kirill Degtyarenko, EMBL-EBI / EPO
2The team
- Rafael Alcántara
- Michael Ashburner
- Volker Ast
- Michael Darsow
- Paula de Matos
- Marcus Ennis
- Janna Hastings
- Alan McNaught
- Inma Spiteri
- Christoph Steinbeck
- Martin Zbinden
3ChEBI What is it?
Chemical Entities of Biological Interest an
EBI database/dictionary of biochemical compounds
4What are the biochemical compounds?
Can be defined as consisting of molecules not
directly encoded by the genome ... that are
either the products of nature or are synthetic
products used ... to intervene in the processes
of living organisms Michael Ashburner
5Molecular entity
Any constitutionally or isotopically distinct
atom, molecule, ion, ion pair, radical, radical
ion, complex, conformer etc., identifiable as a
separately distinguishable entity IUPAC Gold
Book
6In fact, ChEBI contains
- Molecular entities
- trans-vaccenic acid
- Groups
- trans-vaccenoyl group
- Classes
- fatty acids
7Small molecules?
- Yes, but big molecules as well!
- alumina
- amylose
- metaborate
- poly(vinyl alcohol)
8Current status (17.12.08)
91-D ChEBI
- Numeric ID
- Carefully checked terminology
- Unambiguous ChEBI name
- IUPAC names
- Cross-references to free resources
10Unambiguous ChEBI name
- CHEBI28918
- L-adrenaline
- not just adrenaline
11Systematic Name (IUPAC)
2-3-(trifluoromethyl)phenylaminobenzoic acid
12Common Name
- flufenamic acid (INN English)
- acide flufénamique (INN French)
- ácido flufenámico (INN Spanish)
- acidum flufenamicum (INN Latin)
- Flufenaminsäure (German)
13The Unpronounceables
CHEBI48935 (E)-roxithromycin
IUPAC name (3R,4S,5S,6R,7R,9R,10E,11S,12R,13S,1
4R)-4-(2,6-dideoxy-3-C-methyl-3-O-methyl-a-L-ribo-
hexopyranosyloxy)-14-ethyl-7,12,13-trihydroxy-10-
(2-methoxyethoxy)methoxyimino-6-3,4,6-trideoxy
-3-(dimethylamino)-ß-D-xylo-hexopyranosyloxy-3,5,
7,9,11,13-hexamethyloxacyclotetradecan-2-one
14What is the common name of roxithromycin?
15Roxithromycin (2)
CHEBI48844 roxithromycin
(E)-roxithromycin
(Z)-roxithromycin
16What is thiamine?
17Need for 2-D
- Better to see the face than to hear the name
(Zen proverb) - Structures and identifiers based on structures
offer new ways of crosslinking to other databases - Structure search
18Connection table
ChEBI 9 10 0 0 0 0 999 V2000
11.8219 -7.2713 0.0000 C 0 0 0 0 0 0
0 0 0 0 0 0 11.8219 -8.0922 0.0000 C
0 0 0 0 0 0 0 0 0 0 0 0 12.6074
-7.0165 0.0000 N 0 0 0 0 0 0 0 0 0
0 0 0 11.1072 -6.8574 0.0000 C 0 0
0 0 0 0 0 0 0 0 0 0 12.6039 -8.3505
0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
11.1072 -8.5027 0.0000 N 0 0 0 0 0
0 0 0 0 0 0 0 13.0886 -7.6818
0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
10.3923 -7.2713 0.0000 N 0 0 0 0 0 0
0 0 0 0 0 0 10.3888 -8.0922 0.0000 C
0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0
0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0
0 2 5 1 0 0 0 0 2 6 1 0 0 0 0
3 7 1 0 0 0 0 4 8 2 0 0 0 0 6 9
2 0 0 0 0 5 7 2 0 0 0 0 8 9 1 0
0 0 0 M END
192-D ChEBI
- One or more 2-D (or 3-D) connection tables
- One is default
- Autogenerated images (PNG)
- Default diagrams should be unambiguous
20The Fine Art of chemical drawing
21Linear forms of monosaccharides
22Pyranose forms of monosaccharides
23Fused systems
(R)-camphor
ambiguous
unambiguous
24Square planar geometry
cisplatin
transplatin
25From 2-D back to 1-D
26SMILES (1)
- Simplified Molecular Input Line Entry
Specification - Developed by David Weininger in 1988
- Extended by others (e.g. Daylight)
- String of standard ASCII characters
- A number of valid SMILES can be produced for the
same molecule
27SMILES (2)
- N1CNC2C1CNCN2
- c1ncc2ncnc2n1
- C1N\CN/C\2N/CN\C1/2
- c1ncnc2/NC\Nc12
- n1cc2c(nc1)ncn2
- Hc1nc(H)c2n(H)c(H)nc2n1
28InChI (1)
- IUPAC International Chemical Identifier or InChI
- Open source
- Developed by Stein, Heller, Tchekhovskoi and
McNaught - Used by NIST, PubChem, CML and ChEBI
29InChI (2)
InChI1/C5H4N4/c1-4-5(8-2-6-1)9-3-7-4/h1-3H,(H,6,7
,8,9)/f/h7H
InChIKeyKDCGOANMDULRCW-QDQILVOLCG
30Limitations (1)
- Stereochemistry other than sp3 tetrahedral and
sp2 trigonal planar - Polymers
- Conformers
- Radicals/different spin state
- Topological isomers
- Mixtures
- Markush structures
31Limitations (2)
cisplatin
transplatin
InChI1/2ClH.2H3N.Pt/h21H21H3/q2/p-2
323-D ChEBI
cisplatin
33Uncertainty and ambiguity in chemistry
- Compositional uncertainty
- Positional uncertainty
- Configurational uncertainty
- Conformational uncertainty
34Compositional uncertainty
- Examples
- an alkali metal cation
- vanadate(V) anion
- 2Hethanol
35Positional uncertainty
- Examples
- L-bromohistidine residue
- pteroic acid (several tautomers)
36Configurational uncertainty
- Examples
- androstane
- rel-(2R,3R)-2-amino-3-methylpentanoic acid
- tetradec-11-enoic acid
37Conformational uncertainty
- Examples
- cyclohexane chair, boat, twist
- protein secondary structure ?, ?, ?
38ChEBI ontology
- Molecular structure ontology
- Subatomic particle ontology
- Role ontology
- Biological role
- Application
39L-adrenaline
- Molecular structure ontology
- catecholamines
- Biological role
- hormone
- Application
- antiglaucoma
- bronchodilator
- cardiostimulant
40The family relations
L-cystein-S-yl
L-cysteine()
L-cysteine zwitterion
cysteine
D-cysteine
L-cysteino
L-cysteine
L-cysteinium
L-cysteinyl
L-cysteinate(1)
L-cysteine residue
L-cysteinate(2)
L-cysteinate residue
41Relationships in ChEBI
? Is A generic
? Has Part generic
? Is Conjugate Acid Of specific
? Is Conjugate Base Of specific
? Is Enantiomer Of specific
? Is Tautomer Of specific
R Is Substituent Group From specific
H Has Parent Hydride specific
F Has Functional Parent specific
? Has Role generic?
42Is A relationship
?
L-cysteine
cysteine
is a
43Is Enantiomer Of
?
L-cysteine
D-cysteine
is enantiomer of
44Has Part
has part
?
L-cysteinium
L-cysteine hydrochloride
is part of
45Is Conjugate Acid Of
L-cysteinium
L-cysteinate(2)
L-cysteine
L-cysteinate(1)
is conjugate acid of
46Is Conjugate Base Of
L-cysteinium
L-cysteinate(2)
L-cysteine
L-cysteinate(1)
47Acid/base relationships
L-cysteinium
L-cysteinate(2)
?
?
L-cysteine
L-cysteinate(1)
48Is Tautomer Of
L-cysteine
L-cysteine zwitterion
49Is Tautomer Of
1H-pyrrole
3H-pyrrole
2H-pyrrole
50Has Parent Hydride
is parent hydride of
H
salutaridinol
has parent hydride
morphinan
51Has Functional Parent
is functional parent of
F
7-O-acetylsalutaridinol
has functional parent
salutaridinol
52Is Substituent Group From
L-cysteine
L-cysteinyl
L-cysteino
L-cysteine residue
53The family relations
L-cysteine()
L-cysteinium
L-cystein-S-yl
cysteine
L-cysteine zwitterion
L-cysteine
D-cysteine
L-cysteino
L-cysteinyl
L-cysteinate(1)
L-cysteine residue
L-cysteinate(2)
L-cysteinate residue
54Ontology of L-cysteine
55Ontology of L-cysteine (1)
56Ontology of L-cysteine (2)
57Thank you