Title: Protein Pathways and Pathway Databases
1Protein Pathways and Pathway Databases
- Shan Sundararaj
- University of Alberta
- Edmonton, AB
- ss23_at_ualberta.ca
2Interactions ? Networks ? Pathways
- A collection of interactions defines a network
- Pathways are a subset of networks
- All pathways are networks of interactions,
however not all networks are pathways! - Difference in the level of annotation/understandin
g - We can define a pathway as a biological network
that relates to a known physiological process or
phenotype
3Pathways
- However, there is no precise biological
definition of a pathway - Our partitioning of networks into pathways is
somewhat arbitrary - We choose the start/finish points based on
important or easily understood compounds - Gives us the ability to conceptualize the mapping
of genotype ? phenotype
4Biological pathways
- There are 3 type of interactions that can be
mapped to pathways - 1) enzyme ligand
- metabolic pathways
- 2) protein protein
- cell signaling pathways
- complexes for cell processes
- 3) gene regulatory elements gene products
- genetic networks
5Pathways are inter-linked
Signalling pathway
Genetic network
STIMULUS
Metabolic pathway
6Metabolic Pathways
1993 Boehringer Mannheim GmbH - Biochemica
7What the pathway represents
- Metabolites involved
- Enzymes/transport proteins
- Order of reactions
- General biological function
- Reaction rates
- Expression data
- Inhibitors, activators, alternate pathways
- Genetic regulatory information
8Describing metabolic networks
- Classical biochemical pathways
- glycolysis, TCA cycle, etc.
- Stoichiometric modeling
- flux balance analysis, extreme pathways
- Kinetic modeling (CyberCell, E-cell, )
- Need to accumulate comprehensive kinetic
information
9Complexity
- Pathways involve multiple enzymes, which may have
multiple subunits, alternate forms, alternate
specificities - Enzymes may be involved in multiple pathways
- Malate dehydogenase appears in 6 different
metabolic pathways in some databases
10Metabolic Pathway Reconstruction
- Given a genomic sequence, we can infer what
metabolic pathways are available to an organism - Used to design culture medium for Tropheryma
whipplei by seeing what nutrients were essential
for growth (Renesto et al., Lancet, 362, 447-449,
2003)
11Co-expression within pathways
- Tempting thought genes that occur within the
same pathway will show similar expression
profiles - Reality depends greatly on how you identify your
pathways, KEGG pathways show at best 50
co-expression in survey of available yeast
expression data (Ihmels et al., Nat Biotechnol.
22, 86-92, 2004). - Expression levels do not correlate very well with
protein interactions (unless they are stable
complexes, maintained in many different
conditions)
12Pathway Databases
- KEGG
- BioCyc
- Reactome
- GenMAPP
- BioCarta
- TransPATH
- 175 more at Pathway Resource List
http//www.cbio.mskcc.org/prl/index.php
13BioPAX(www.biopax.org)
- Collaborative effort to create a data exchange
format for biological pathway data
14KEGG
- 5904 chemical reactions
- 15,037 pathways
- 229 reference pathways
- 85 ortholog tables
- 181 organisms
http//www.genome.ad.jp/kegg/
15KEGG
- GENES Database
- The universe of genes and proteins in complete
genomes - LIGAND Database
- The universe of chemical reactions involving
metabolites and other biochemical compounds - Pathway Database
- Molecular interaction networks, metabolic and
regulatory pathways, and molecular complexes
16Connection between KEGG and other Databases
17Pathways
- Represented as diagrams, manually created, stored
as gifs - Easy to link to, highlight genes of interest
- Generate orthologous pathways in other organisms
2.7.2.4 1.2.1.11 1.1.1.3 2.3.1.46 2.5.1.48 4.4.1.8
2.1.1.13 2.5.1.6
18http//www.biocyc.org/
19BioCyc
- The primary database was EcoCyc (E. coli)
- 21 more curated pathway/genome databases (PGDB),
each focusing on one organism (e.g. HumanCyc) - Also 142 more non-curated (computationally
generated) pathways - MetaCyc database contains non-redundant reference
pathways from more than 240 organisms - Supports Pathway Tools software suite to
analyze PGDBs, and PathoLogic pathway
prediction program for new genomes
20BioCyc
- Each PGDB includes info about
- Pathways, reactions, substrates
- Enzymes, transporters
- Genes, replicons
- Transcription factors, promoters, operons, DNA
binding sites - MetaCyc and EcoCyc are literature-based, the
others are compu-tationally derived
Pathways
Reactions
Compounds
Proteins
Operons, Promoters, DNA Binding Sites
Genes
Chromosomes, Plasmids
21164 datasets
Query by protein, gene, compound, reaction,
pathway
BLAST sequence if protein name unknown
22MetaCyc Statistics
23EcoCyc Statistics
24BioCyc Pathway Tools
(Adapted from Pathway Tools tutorial,
http//bioinformatics.ai.sri.com/ptools/)
- Full Metabolic Map
- Paint gene expression data on metabolic network
compare metabolic networks - Pathways
- Pathway prediction (PathoLogic)
- Reactions
- Balance checker
- Compounds
- Chemical substructure comparison
- Enzymes,Transcription Factors
- Genes Blast search
- Operons
- Operon prediction
25PathoLogic Making PGDBs
26Completeness of Pathways
27Completeness of Pathways
28Issues with predicting pathways
- Predicting metabolic pathways from genome
- Predict genes
- Assign enzymatic function to genes
- Look for enzymes unique to pathway
- Check if pathway is balanced (no holes)
- Try to fill holes by re-searching genome
29Reactomehttp//www.reactome.org/
30Reactome
- Joint venture of CSHL and EBI (supercedes the
Genome Knowledgebase project) - Curated database of biological processes in
humans - Also rat, mouse, fugu, zebrafish, chicken
- Everything referenced by curators to literature
citation or inference based on sequence
similarity
31Reactome model
- Model reactions (input_entities)
?(output_entities) - Distinguishes between modified/unmodified
proteins (modification is an explicit reaction) - Highly annotated at every step, very
micromanaged, hope to find interesting links
between reactions
32Reactome PathFinder
- Pathfinding between distant processes
- Enter two molecules or events and see if they can
be joined together by reactions
33Reactome SkyPainter
- Find all reactions that contain a molecule or
event - Very flexible input, any one or more of
- protein/gene ID (UniProt, Genbank or others)
- protein/gene sequence
- GO or OMIM identifier
- time series from a gene expression study
34Reactome SkyPainter
- Starry sky output
- If expression data used, you get different
colours for different levels of expression - If time series available, you can make an
animation
35GenMAPP(www.genmapp.org)
- Designed to rapidly analyze gene profiling data
in the context of known biochemical pathways - Pathways (MAPPs) are authored by experts, as well
as adapting several pathways from KEGG - Pathways easily web-queryable
- Free for all users
- But Windows platform only
36GenMAPP
- Easy to draw/edit pathways
- Color genes from user imported expression data
37MAPPFinder maps to GO ontology
38BioCarta(www.biocarta.com)
39BioCarta
- Not a public database, but offers free,
clickable, graphics-rich pathway database and
gene information - Community annotation
- Easy to use glyph system for genes
- 355 pathways
- mostly human/mouse metabolic and signaling
pathways
40TransPATH
41TransPATH
- Part of larger BioBase package (commercial)
- PathwayBuilder package for network visualization
- Highly integrated with signaling networks and
transcription factor networks (TransFAC) - Linked to extensive enzyme information in BRENDA
(www.brenda.uni-koeln.de/) - 28,456 molecules 52,007 reactions 54 hand-drawn
pathways
42Pathway Database Comparison
43Conclusion
- Pathway databases are continually evolving, and
are an important abstract mid-level of expressing
data between genes/proteins and observable
phenotypes - Metabolic pathways are most well studied/modeled
- Many different formats of storage and display,
but moving towards standards (PSI-MI, Biopax)