Bioinformatics Group

1 / 20

About This Presentation

Title:

Bioinformatics Group

Description:

Mark D'Souza, Elizabeth Glass, Alex Rodriguez, Dina Sulakhe, Mustafa Syed, ... N. Maltsev, M. D'Souza, D. Sulakhe, A. M. Syed, E. Glass, Rodriguez, T. Bompada, ... –

Number of Views:31

Avg rating:3.0/5.0

Slides: 21

Provided by: dong

Category:

more less

Transcript and Presenter's Notes

Title: Bioinformatics Group

1
From genomes to Phenotypes PUMA2 an
Integrated System for High-throughput
Evolutionary analysis of Metabolism

Bioinformatics Group
Mathematics and Computer Science division
Argonne National Laboratory

Natalia Maltsev, Mark DSouza, Elizabeth Glass,
Alex Rodriguez, Dina Sulakhe, Mustafa Syed,
Tanuja Bompada, Yi Zhang
2
What are we going to talk about

Our approach analysis of co-evolution of
genomes, enzymes and metabolic networks in
taxonomic and phenotypic framework
PUMA2 system (http//compbio.mcs.anl.gov/puma2)
Overview
Grid technology-based infrastructure
Genetic sequence analysis in PUMA2
Metabolic reconstructions
Genome Comparison Frameworks Taxonomy,
Phenotypes and Physiological profiles

3
The goal of PUMA2 To provide an environment for
Co-Evolutionary and Comparative Analysis
ofGenomes, Metabolic networks and Enzymesin
the framework of Taxonomic, Phenotypic and
Physiological information
Physiological Profiles
Taxonomy
Environmental Conditions
Genomes
Co-Evolution
Enzymes
Metabolic Networks
Protein Families
4

Steps of Analysis of Genomes in PUMA2
(http//compbio.mcs.anl.gov/puma2)
Comparative Framework
Step 1. Assignment of functions to genes
Identification and Characterization of Proteins
Genetic Sequence Analysis
Predicted Gene Functions
Step 2. Reconstruction of Metabolic and
Functional networks from the results of sequence
analysis (E. Selkov 1993)
Metabolic Reconstructions From Sequence Data
Predicted Metabolic Pathways
Step 3. Prediction of Metabolic Phenotypes
Identification of Evolutionary and
Co-evolutionary patterns and Signatures
Prediction of Metabolic Phenotypes
5
PUMA2 Overview
Comparative Framework

PUMA2 is an Interactive Integrated
Environment for High-throughput Genetic Sequence
analysis and Metabolic reconstructions with
Grid-based computational backend
PUMA2 contains
Pre-computed analysis of publicly available
completely and almost completely sequenced
genomes
(193 bacterial, 230 plasmid, 20 archaeal,
22 eukaryotic, 638 mitochondrial and 1427 viral
genomes) in interactive PUMA2 framework
Automated Metabolic reconstructions for over 200
completely sequenced organisms
User Models a framework for analysis of genomes
provided by users (Shewanella federation,
Apicomplexa genomes, strains of B. anthracis,
Staphylococcus, etc)
A suite of unique tools for evolutionary analysis
of enzymes and metabolic networks (Chisel,
PhyloBlocks, etc)
PUMA2 satellite databases Pathos (GLRCE
biodefence), TarGet (MCSG structural bilogy),
Sentra (prokaryotic signal transduction),
SubUnit, Physiological Profiles. MetaGenomes
(PNNL Hanford Site), etc

6
PUMA2 Infrastructure
Secure Collaborative Computational Framework
PUMA2 Integrated Database over 20 databases
sequence NCBI (RefSeq, GenBank), PIR,
UniProt, TIGR structural PDB, SCOP,
CATH metabolic EMP, KEGG, Brenda, Enzyme
phenotypic NCBI, literature and
results of pre-computed analyses of sequence data
by Blast, Blocks, InterPro, TMHMM, etc User
Annotations
Automated Multistep Data Analysis by variety of
bioinformatics tools Blast Blocks, Pfam,
Interpro TMHMM, etc Chimera Controlled workflow
pipelines, automated update cycle
Scalable Grid technology based Computational
Backend (TeraGrid, OSG, DOECG)
7
PUMA2Grid-based computational backendGADU/Gnare

Gnare portal
Blast NR vs NR (2.3 M sequences) takes 7, 5
daysusing Grid vs 389 days on 1 CPU Less than
2 hours to analyze an average bacterial genome
(4000 protein-coding genes) by Blast, Blocks,
Pfam, Gene function prediction algorithm, Chisel,
Metabolic Reconstruction tools, etc)
GADU
Grid Resources
Chimera
8
Sequence analysis in PUMA2

Pre-computed analysis of publicly available
completely and almost completely sequenced
genomes by conventional bioinformatics tools
Blast, Blocks, InterPro, TMHMM, PepStat, etc
Automated assignment of functions to genes by
PUMA2 tools voting algorithm and rules-based
Chisel algorithm
Interactive analysis by users in PUMA2 framework

User comments and annotations
Information from public Databases
Interactive analysis of homologs
Interactive analysis
Precomputed data
9
PUMA2 Metabolic Reconstructions
Assignments of functions resulting from Sequence
analysis are superimposed onto collection
of Metabolic pathways from the EMP database (over
4000 pathways) Currently PUMA2 contains
automated Metabolic reconstructions for over 200
completely sequenced organisms
10
Chisel the PUMA2 workbench for identification
of taxonomic and phenotypic versions of enzymes
Step 2. Interactive (or automated) development
ofHMMs, concensus sequences, Blocks, etc for
identification of taxonomic or
phenotypic Variations of enzymes
Step 1 Rules-based clustering of Enzymatic
sequences
11
Evolutionary versions of enzymes why its
important?

Identification of proteins
Diagnostics
Biotechnology
Drug design
Interpretation of MetaGenomes
Accurate Metabolic reconstructions

12
Another Comparative Perspective Phenotypes
Phenotypic data in PUMA2 is obtained From NCBI
and directly from the literature Phenotypic
framework will be available In the next release
of PUMA2 in September 2005
13
and yet another comparative perspective PUMA2
Metabolic Profiles (a prototype)

To provide comparative framework for evolutionary
analysis of metabolic pathways we are classifying
organisms based on their major metabolic features
predicted from Metabolic reconstructions
Respiration
Sources of Carbon
Nitrogen Metabolism
Etc.
The next release of PUMA2 will contain
Respiratory Metabolic profiles for organisms with
completely sequenced genomes

14
Examples Nitrogen Metabolism
How to find out in an hour who does what?
15
PUMA2 User Models (Gnare)

Analysis of genomes provided by users
Shewanella federation,
Apicomplexa genomes
strains of B. anthracis
Staphylococcus, etc
Prediction of gene function
Automated Metabolic reconstructions from sequence
data
Interactive PUMA2 style framework
Requires 2 hours for analysis of an average
bacterial genome

16
Here we are
17
Acknowledgements

Globus Mike Wilde, Nika Nefedova, Jens Voeckler,
Ian Foster Condor Zach Miller, Miron Livny
Grid3 people
EBI Robert Petryszak -- ClustR
MCS Rick Stevens, systems, Susan Coghlan, Von
Welch and a lot of others.

18
Problems

DATA INTEGRATION!!!!!
NO SUFFICIENT DATA REGARDING PHYSIOLOGY OF
ORGANISMS !!!!

19
Some History Argonnes WIT/PUMA family of
Integrated Systems for Genetic sequence Analysis
and Metabolic Reconstructions

PUMA (1995) -- R. Overbeek, E.Selkov, N. Maltsev,
T. Gaasterland
WIT (1996) -- R. Overbeek, E.Selkov, N.
Maltsev, N. Larsen
WIT2 (1998-2004) -- R. Overbeek, E.Selkov, N.
Maltsev, E.Selkov Jr., M. D'Souza, G. Pusch
(http//wit.mcs.anl.gov/WIT2)
The SEED the FIG/U. Chicago, ANL (Ross
Overbeek, R. Stevens, V. Fonstein, et al.)
analysis of the Metabolic Subsystems
PUMA2 MCS, ANL N. Maltsev, M. DSouza, D.
Sulakhe, A. M. Syed, E. Glass, Rodriguez, T.
Bompada, Yi Zhang (http//compbio.mcs.anl.gov/puma
2) whole organisms models, co-evolutionary
analysis