ComPath Comparative Metabolic Pathway Analysis Tool - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

ComPath Comparative Metabolic Pathway Analysis Tool

Description:

... Comparison of Sequence Analysis Methods Four methods (abbr.) HMMer HMM search using the whole sequence CSR HMM search using common shared regions generated ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 33
Provided by: Kwangm1
Category:

less

Transcript and Presenter's Notes

Title: ComPath Comparative Metabolic Pathway Analysis Tool


1
ComPath Comparative Metabolic Pathway Analysis
Tool
  • Kwangmin Choi and Sun Kim
  • School of Informatics
  • Indiana University

2
Contents
  • Introduction
  • System Components
  • Current Implementation
  • Experiment Result
  • Future Plan

3
INTRODUCTIONSYSTEM COMPONENTS
4
Introduction
  • ComPath is a web-based sequence analysis system
    built upon
  • KEGG (Kyoto Encyclopedia of Genes and Genomes)
  • PLATCOM (A Platform for Computational Comparative
    Genomics)

5
KEGGKyoto Encyclopedia of Genes and Genomes
  • Four Databases
  • PATHWAY  32,657 pathways generated from 262
    reference pathways
  • GENES 1,213,035 genes in 32 eukaryotes 260
    bacteria 24 archaea
  • LIGAND 13,387 compounds, 2,543 drugs, 11,161
    glycans, 6,446 reactions
  • BRITE 7,817 KO (KEGG Orthology) groups
  • KEGG adopts EC enzyme classification system

6
EC system 0/2
  • An Old, but still universally accepted system by
    biochemists
  • EC system was developed long before protein
    sequence or structure information were available,
    so the system focuses on reaction, not sequence
    homology and structure
  • Many biochemists and structural biologists try to
    harmonize newly available chemical, sequential,
    and structural data with traditional
    understanding of enzyme function.

7
Problems in EC system 1/2
  • Inconsistency in the EC hierarchy
  • For each of the six top-level EC classes,
    subclasses and sub-subclasses may have different
    meanings.
  • e.g. EC1. are divided by substrate type, but
    EC5. by general isomerase type
  • Problem with Multi-functional enzymes and
    multiple subunits involved in a function
  • EC presumes only a 111 relationship between
    gene, protein, and reaction.
  • Different sequence/structure, but similar EC
  • Two enzymes with lower sequence identities
    sometimes belong to the same or very similar EC.
  • e.g. o-succinylbenzoate synthase across several
    bacteria have below the 40 sequence identity

8
Problems in EC system 2/2
  • Similar sequence/structure, but different EC
  • Even variation in the fourth digit of the EC
    number is rare above a sequence identity
    threshold of 40.
  • However, exceptions to this rule are prevalent.
  • e.g. melamine deaminase and atrazine
    chlorohydrolase have 98 identical, but belong to
    different EC.
  • No information on sequence/structure-mechanism
    relationship
  • EC system considers only overall transformation
  • Similarity among sequences is strongly correlated
    with similarities in the level of a common
    (structural domain-related) partial reaction,
    rather than overall transformation
  • How to combine enzyme structure data with partial
    reaction data?
  • Research Goal
  • We provide a computational environment for enzyme
    analysis via genome comparison
  • And it will be built on PLATCOM system

9
Our Research Goal
  • We provide a computational environment for enzyme
    analysis via multiple genome comparison
  • And it will be built on PLATCOM system

10
PLATCOMA Platform for Comparative Genomics
  • Providing a platform for comparative genomics ON
    THE WEB
  • Comparative analysis system for users to freely
    select any sets of genomes
  • Scalable system interactively combining
    high-performance sequence analysis tools

11
CURRENT IMPLEMENTATION
12
ComPath
  • ComPath KEGG PLATCOM
  • Not just for retrieving information from
    Database,
  • but focuses on analyzing enzymes using the
    enzyme-genome table
  • Easy to use
  • Optional Upload a user sequence and/or a saved
    enzyme-genome table data
  • Select a metabolic pathway
  • Select any combination of genomes in KEGG
  • Create an enzyme-genome table
  • Then use the table for various enzyme sequence
    analysis tasks

13
Screenshot Pathway Selection
  • 11 categories
  • 123 pathways
  • Users can upload the previous Enzyme-Genome table
    datatype to continue analysis

14
Screenshot Genome Selection
  • 250 genomes from KEGG database
  • Users can select genomes by taxonomical and
    alphabetical order

15
Enzyme-Genome Table
  • An enzyme-genome table allows for tests on
    whether a certain enzyme in a given pathway is
    present or missing using sequence analysis
    techniques.
  • Information in this table can be easily saved,
    uploaded, transferred.
  • Users also can upload their sequence set, e.g.,
     an entire set of predicted proteins in a newly
    sequenced genome, and perform annotation of the
    sequences in terms of KEGG pathways.

16
Screenshot KEGGs Ortholog Table STATIC!
17
Screenshot ComPath Enzyme-Genome Table
INTERACTIVE!
18
Screenshot Upload Query Genome and Table
Editing Functions
19
Sequence Analyses
  • Missing enzyme search
  • Pairwise (FASTA) and multiple sequence alignment
    (CLUSTALW),
  • Domain search using SCOPEC/SUPERFAMILY and PDB
    domains
  • Domain-based analysis using hidden markov models
    (HMM),
  • Contextual sequence analysis (currently not
    available)
  • Sequence analysis for further investigation
  • Phylogenetic analysis of enzymes in selected
    genomes,
  • Gibbs motif sampler.
  • BAG clustering
  • Contextual sequence analysis (currently not
    available)

20
Screenshot Sequence Analysis Functions
21
TEST
22
ExperimentsGenomes, Queries, Pathways
  • Selected Genomes
  • B.subtilus, B.Halodurans, E.coli
  • H.Influenza, H.pylori, M.genitalium, Y.pestis KIM
  • Query genomes
  • M.tuberculosis
  • A.aeolicus
  • B.anthracis
  • Metabolic Pathways
  • 00010 (glycolysisglycogenesis),
  • 00020 (TCA cycle)

23
ExperimentsComparison of Sequence Analysis
Methods
  • Four methods (abbr.)
  • HMMer
  • HMM search using the whole sequence
  • CSR
  • HMM search using common shared regions generated
    by BAG program
  • SCOPEC
  • Domain search using SCOP/SUPERFMAILY and PDB
    database
  • FASTA
  • Simple FASTA search
  • Cutoff
  • 1e-10, 1e-20, 1e-30

24
ExperimentsOverall Design
25
Screenshot ComPath Enzyme-Genome Table
INTERACTIVE!
26
Experiment Results (e.g.)
Query Genome Pathway Mehthod Sensitivity Specificity E-value
M. tuberculosis Path 00010 HMMer 0.596491228 0.454545455 1.00E-30
CSR 0.666666667 0.454545455 1.00E-30
SCOPEC 0.614035088 0.348484848 1.00E-30
FASTA 0.649122807 0.378787879 1.00E-30
HMMer 0.623188406 0.524590164 1.00E-10
CSR 0.739130435 0.360655738 1.00E-10
SCOPEC 0.652173913 0.418032787 1.00E-10
FASTA 0.811594203 0.204918033 1.00E-10

Query Genome Pathway Method Sensitivity Specificity E-value
M. tuberculosis Path 00020 HMMer 0.535714286 0.769230769 1.00E-30
CSR 0.642857143 0.846153846 1.00E-30
SCOPEC 0.535714286 0.769230769 1.00E-30
FASTA 0.678571429 0.615384615 1.00E-30
HMMer 0.516129032 0.777777778 1.00E-10
CSR 0.709677419 0.666666667 1.00E-10
SCOPEC 0.548387097 0.777777778 1.00E-10
FASTA 0.741935484 0.333333333 1.00E-10
27
A. aeolicus
28
B. anthracis
29
M. tuberculosis
30
FUTURE PLAN
31
Future PlanMore Resources
  • ComPath is being extended to incorporate more
    resources, including
  • KEGG LIGAND A composite database consisting of
    compound, glycan, reaction etc.
  • ProRule A new database containing functional
    and structural information on PROSITE profiles
  • SFLD Structure-Function Linkage Database
  • Also we are developing databases and algorithms
    for enzyme analysis, e.g. Classifiers using a
    database of enzyme-specific HMMs.
  • ComPath is in an early stage of system
    development and we solicit feedback and
    suggestions from biology and bioinformatics
    communities.

32
Future PlanMore Algorithms and Tools
  • More integrative understanding on biochemical
    network evolution
  • Algorithms to handle isozyme problem
  • Algorithms to computationally reconstruct
    alternative pathways
  • Algorithms to combine sequence, structure,
    chemical reaction, and contextual information for
    better enzyme annotation
  • Etc.
  • ComPath is in an early stage of system
    development and we solicit feedback and
    suggestions from biology and bioinformatics
    communities.
Write a Comment
User Comments (0)
About PowerShow.com