ComPath Comparative Metabolic Pathway Analysis Tool - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

ComPath Comparative Metabolic Pathway Analysis Tool

Description:

... Comparison of Sequence Analysis Methods Four methods (abbr.) HMMer HMM search using the whole sequence CSR HMM search using common shared regions generated ... – PowerPoint PPT presentation

Number of Views:136

Avg rating:3.0/5.0

Slides: 33

Provided by: Kwangm1

Category:

more less

Transcript and Presenter's Notes

Title: ComPath Comparative Metabolic Pathway Analysis Tool

1
ComPath Comparative Metabolic Pathway Analysis
Tool

Kwangmin Choi and Sun Kim
School of Informatics
Indiana University

2
Contents

Introduction
System Components
Current Implementation
Experiment Result
Future Plan

3
INTRODUCTIONSYSTEM COMPONENTS
4
Introduction

ComPath is a web-based sequence analysis system
built upon
KEGG (Kyoto Encyclopedia of Genes and Genomes)
PLATCOM (A Platform for Computational Comparative
Genomics)

5
KEGGKyoto Encyclopedia of Genes and Genomes

Four Databases
PATHWAY 32,657 pathways generated from 262
reference pathways
GENES 1,213,035 genes in 32 eukaryotes 260
bacteria 24 archaea
LIGAND 13,387 compounds, 2,543 drugs, 11,161
glycans, 6,446 reactions
BRITE 7,817 KO (KEGG Orthology) groups
KEGG adopts EC enzyme classification system

6
EC system 0/2

An Old, but still universally accepted system by
biochemists
EC system was developed long before protein
sequence or structure information were available,
so the system focuses on reaction, not sequence
homology and structure
Many biochemists and structural biologists try to
harmonize newly available chemical, sequential,
and structural data with traditional
understanding of enzyme function.

7
Problems in EC system 1/2

Inconsistency in the EC hierarchy
For each of the six top-level EC classes,
subclasses and sub-subclasses may have different
meanings.
e.g. EC1. are divided by substrate type, but
EC5. by general isomerase type
Problem with Multi-functional enzymes and
multiple subunits involved in a function
EC presumes only a 111 relationship between
gene, protein, and reaction.
Different sequence/structure, but similar EC
Two enzymes with lower sequence identities
sometimes belong to the same or very similar EC.
e.g. o-succinylbenzoate synthase across several
bacteria have below the 40 sequence identity

8
Problems in EC system 2/2

Similar sequence/structure, but different EC
Even variation in the fourth digit of the EC
number is rare above a sequence identity
threshold of 40.
However, exceptions to this rule are prevalent.
e.g. melamine deaminase and atrazine
chlorohydrolase have 98 identical, but belong to
different EC.
No information on sequence/structure-mechanism
relationship
EC system considers only overall transformation
Similarity among sequences is strongly correlated
with similarities in the level of a common
(structural domain-related) partial reaction,
rather than overall transformation
How to combine enzyme structure data with partial
reaction data?
Research Goal
We provide a computational environment for enzyme
analysis via genome comparison
And it will be built on PLATCOM system

9
Our Research Goal

We provide a computational environment for enzyme
analysis via multiple genome comparison
And it will be built on PLATCOM system

10
PLATCOMA Platform for Comparative Genomics

Providing a platform for comparative genomics ON
THE WEB
Comparative analysis system for users to freely
select any sets of genomes
Scalable system interactively combining
high-performance sequence analysis tools

11
CURRENT IMPLEMENTATION
12
ComPath

ComPath KEGG PLATCOM
Not just for retrieving information from
Database,
but focuses on analyzing enzymes using the
enzyme-genome table
Easy to use
Optional Upload a user sequence and/or a saved
enzyme-genome table data
Select a metabolic pathway
Select any combination of genomes in KEGG
Create an enzyme-genome table
Then use the table for various enzyme sequence
analysis tasks

13
Screenshot Pathway Selection

11 categories
123 pathways
Users can upload the previous Enzyme-Genome table
datatype to continue analysis

14
Screenshot Genome Selection

250 genomes from KEGG database
Users can select genomes by taxonomical and
alphabetical order

15
Enzyme-Genome Table

An enzyme-genome table allows for tests on
whether a certain enzyme in a given pathway is
present or missing using sequence analysis
techniques.
Information in this table can be easily saved,
uploaded, transferred.
Users also can upload their sequence set, e.g.,
an entire set of predicted proteins in a newly
sequenced genome, and perform annotation of the
sequences in terms of KEGG pathways.

16
Screenshot KEGGs Ortholog Table STATIC!
17
Screenshot ComPath Enzyme-Genome Table
INTERACTIVE!
18
Screenshot Upload Query Genome and Table
Editing Functions
19
Sequence Analyses

Missing enzyme search
Pairwise (FASTA) and multiple sequence alignment
(CLUSTALW),
Domain search using SCOPEC/SUPERFAMILY and PDB
domains
Domain-based analysis using hidden markov models
(HMM),
Contextual sequence analysis (currently not
available)
Sequence analysis for further investigation
Phylogenetic analysis of enzymes in selected
genomes,
Gibbs motif sampler.
BAG clustering
Contextual sequence analysis (currently not
available)

20
Screenshot Sequence Analysis Functions
21
TEST
22
ExperimentsGenomes, Queries, Pathways

Selected Genomes
B.subtilus, B.Halodurans, E.coli
H.Influenza, H.pylori, M.genitalium, Y.pestis KIM
Query genomes
M.tuberculosis
A.aeolicus
B.anthracis
Metabolic Pathways
00010 (glycolysisglycogenesis),
00020 (TCA cycle)

23
ExperimentsComparison of Sequence Analysis
Methods

Four methods (abbr.)
HMMer
HMM search using the whole sequence
CSR
HMM search using common shared regions generated
by BAG program
SCOPEC
Domain search using SCOP/SUPERFMAILY and PDB
database
FASTA
Simple FASTA search
Cutoff
1e-10, 1e-20, 1e-30

24
ExperimentsOverall Design
25
Screenshot ComPath Enzyme-Genome Table
INTERACTIVE!
26
Experiment Results (e.g.)
Query Genome Pathway Mehthod Sensitivity Specificity E-value
M. tuberculosis Path 00010 HMMer 0.596491228 0.454545455 1.00E-30
CSR 0.666666667 0.454545455 1.00E-30
SCOPEC 0.614035088 0.348484848 1.00E-30
FASTA 0.649122807 0.378787879 1.00E-30
HMMer 0.623188406 0.524590164 1.00E-10
CSR 0.739130435 0.360655738 1.00E-10
SCOPEC 0.652173913 0.418032787 1.00E-10
FASTA 0.811594203 0.204918033 1.00E-10

Query Genome Pathway Method Sensitivity Specificity E-value
M. tuberculosis Path 00020 HMMer 0.535714286 0.769230769 1.00E-30
CSR 0.642857143 0.846153846 1.00E-30
SCOPEC 0.535714286 0.769230769 1.00E-30
FASTA 0.678571429 0.615384615 1.00E-30
HMMer 0.516129032 0.777777778 1.00E-10
CSR 0.709677419 0.666666667 1.00E-10
SCOPEC 0.548387097 0.777777778 1.00E-10
FASTA 0.741935484 0.333333333 1.00E-10
27
A. aeolicus
28
B. anthracis
29
M. tuberculosis
30
FUTURE PLAN
31
Future PlanMore Resources

ComPath is being extended to incorporate more
resources, including
KEGG LIGAND A composite database consisting of
compound, glycan, reaction etc.
ProRule A new database containing functional
and structural information on PROSITE profiles
SFLD Structure-Function Linkage Database
Also we are developing databases and algorithms
for enzyme analysis, e.g. Classifiers using a
database of enzyme-specific HMMs.
ComPath is in an early stage of system
development and we solicit feedback and
suggestions from biology and bioinformatics
communities.

32
Future PlanMore Algorithms and Tools

More integrative understanding on biochemical
network evolution
Algorithms to handle isozyme problem
Algorithms to computationally reconstruct
alternative pathways
Algorithms to combine sequence, structure,
chemical reaction, and contextual information for
better enzyme annotation
Etc.
ComPath is in an early stage of system
development and we solicit feedback and
suggestions from biology and bioinformatics
communities.