Title: Bioinformatics Group
1From genomes to Phenotypes PUMA2 an
Integrated System for High-throughput
Evolutionary analysis of Metabolism
- Bioinformatics Group
- Mathematics and Computer Science division
- Argonne National Laboratory
Natalia Maltsev, Mark DSouza, Elizabeth Glass,
Alex Rodriguez, Dina Sulakhe, Mustafa Syed,
Tanuja Bompada, Yi Zhang
2What are we going to talk about
- Our approach analysis of co-evolution of
genomes, enzymes and metabolic networks in
taxonomic and phenotypic framework - PUMA2 system (http//compbio.mcs.anl.gov/puma2)
- Overview
- Grid technology-based infrastructure
- Genetic sequence analysis in PUMA2
- Metabolic reconstructions
- Genome Comparison Frameworks Taxonomy,
Phenotypes and Physiological profiles
3The goal of PUMA2 To provide an environment for
Co-Evolutionary and Comparative Analysis
ofGenomes, Metabolic networks and Enzymesin
the framework of Taxonomic, Phenotypic and
Physiological information
Physiological Profiles
Taxonomy
Environmental Conditions
Genomes
Co-Evolution
Enzymes
Metabolic Networks
Protein Families
4 Steps of Analysis of Genomes in PUMA2
(http//compbio.mcs.anl.gov/puma2)
Comparative Framework
Step 1. Assignment of functions to genes
Identification and Characterization of Proteins
Genetic Sequence Analysis
Predicted Gene Functions
Step 2. Reconstruction of Metabolic and
Functional networks from the results of sequence
analysis (E. Selkov 1993)
Metabolic Reconstructions From Sequence Data
Predicted Metabolic Pathways
Step 3. Prediction of Metabolic Phenotypes
Identification of Evolutionary and
Co-evolutionary patterns and Signatures
Prediction of Metabolic Phenotypes
5PUMA2 Overview
Comparative Framework
- PUMA2 is an Interactive Integrated
Environment for High-throughput Genetic Sequence
analysis and Metabolic reconstructions with
Grid-based computational backend - PUMA2 contains
- Pre-computed analysis of publicly available
completely and almost completely sequenced
genomes - (193 bacterial, 230 plasmid, 20 archaeal,
22 eukaryotic, 638 mitochondrial and 1427 viral
genomes) in interactive PUMA2 framework - Automated Metabolic reconstructions for over 200
completely sequenced organisms - User Models a framework for analysis of genomes
provided by users (Shewanella federation,
Apicomplexa genomes, strains of B. anthracis,
Staphylococcus, etc) - A suite of unique tools for evolutionary analysis
of enzymes and metabolic networks (Chisel,
PhyloBlocks, etc) - PUMA2 satellite databases Pathos (GLRCE
biodefence), TarGet (MCSG structural bilogy),
Sentra (prokaryotic signal transduction),
SubUnit, Physiological Profiles. MetaGenomes
(PNNL Hanford Site), etc
6PUMA2 Infrastructure
Secure Collaborative Computational Framework
PUMA2 Integrated Database over 20 databases
sequence NCBI (RefSeq, GenBank), PIR,
UniProt, TIGR structural PDB, SCOP,
CATH metabolic EMP, KEGG, Brenda, Enzyme
phenotypic NCBI, literature and
results of pre-computed analyses of sequence data
by Blast, Blocks, InterPro, TMHMM, etc User
Annotations
Automated Multistep Data Analysis by variety of
bioinformatics tools Blast Blocks, Pfam,
Interpro TMHMM, etc Chimera Controlled workflow
pipelines, automated update cycle
Scalable Grid technology based Computational
Backend (TeraGrid, OSG, DOECG)
7PUMA2Grid-based computational backendGADU/Gnare
Gnare portal
Blast NR vs NR (2.3 M sequences) takes 7, 5
daysusing Grid vs 389 days on 1 CPU Less than
2 hours to analyze an average bacterial genome
(4000 protein-coding genes) by Blast, Blocks,
Pfam, Gene function prediction algorithm, Chisel,
Metabolic Reconstruction tools, etc)
GADU
Grid Resources
Chimera
8Sequence analysis in PUMA2
- Pre-computed analysis of publicly available
completely and almost completely sequenced
genomes by conventional bioinformatics tools
Blast, Blocks, InterPro, TMHMM, PepStat, etc - Automated assignment of functions to genes by
PUMA2 tools voting algorithm and rules-based
Chisel algorithm - Interactive analysis by users in PUMA2 framework
User comments and annotations
Information from public Databases
Interactive analysis of homologs
Interactive analysis
Precomputed data
9PUMA2 Metabolic Reconstructions
Assignments of functions resulting from Sequence
analysis are superimposed onto collection
of Metabolic pathways from the EMP database (over
4000 pathways) Currently PUMA2 contains
automated Metabolic reconstructions for over 200
completely sequenced organisms
10Chisel the PUMA2 workbench for identification
of taxonomic and phenotypic versions of enzymes
Step 2. Interactive (or automated) development
ofHMMs, concensus sequences, Blocks, etc for
identification of taxonomic or
phenotypic Variations of enzymes
Step 1 Rules-based clustering of Enzymatic
sequences
11Evolutionary versions of enzymes why its
important?
- Identification of proteins
- Diagnostics
- Biotechnology
- Drug design
- Interpretation of MetaGenomes
- Accurate Metabolic reconstructions
12Another Comparative Perspective Phenotypes
Phenotypic data in PUMA2 is obtained From NCBI
and directly from the literature Phenotypic
framework will be available In the next release
of PUMA2 in September 2005
13 and yet another comparative perspective PUMA2
Metabolic Profiles (a prototype)
- To provide comparative framework for evolutionary
analysis of metabolic pathways we are classifying
organisms based on their major metabolic features
predicted from Metabolic reconstructions - Respiration
- Sources of Carbon
- Nitrogen Metabolism
- Etc.
- The next release of PUMA2 will contain
Respiratory Metabolic profiles for organisms with
completely sequenced genomes
14Examples Nitrogen Metabolism
How to find out in an hour who does what?
15PUMA2 User Models (Gnare)
- Analysis of genomes provided by users
- Shewanella federation,
- Apicomplexa genomes
- strains of B. anthracis
- Staphylococcus, etc
- Prediction of gene function
- Automated Metabolic reconstructions from sequence
data - Interactive PUMA2 style framework
- Requires 2 hours for analysis of an average
bacterial genome
16Here we are
17Acknowledgements
- Globus Mike Wilde, Nika Nefedova, Jens Voeckler,
Ian Foster Condor Zach Miller, Miron Livny
Grid3 people - EBI Robert Petryszak -- ClustR
- MCS Rick Stevens, systems, Susan Coghlan, Von
Welch and a lot of others.
18Problems
- DATA INTEGRATION!!!!!
- NO SUFFICIENT DATA REGARDING PHYSIOLOGY OF
ORGANISMS !!!!
19Some History Argonnes WIT/PUMA family of
Integrated Systems for Genetic sequence Analysis
and Metabolic Reconstructions
- PUMA (1995) -- R. Overbeek, E.Selkov, N. Maltsev,
T. Gaasterland - WIT (1996) -- R. Overbeek, E.Selkov, N.
Maltsev, N. Larsen -
- WIT2 (1998-2004) -- R. Overbeek, E.Selkov, N.
Maltsev, E.Selkov Jr., M. D'Souza, G. Pusch
(http//wit.mcs.anl.gov/WIT2) - The SEED the FIG/U. Chicago, ANL (Ross
Overbeek, R. Stevens, V. Fonstein, et al.)
analysis of the Metabolic Subsystems - PUMA2 MCS, ANL N. Maltsev, M. DSouza, D.
Sulakhe, A. M. Syed, E. Glass, Rodriguez, T.
Bompada, Yi Zhang (http//compbio.mcs.anl.gov/puma
2) whole organisms models, co-evolutionary
analysis
20Metabolic Signatures How does metabolic networks
architecture reflect phenotypes?
TCA cycle in Cyanobacteria