Bioinformatics Group - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Bioinformatics Group

Description:

Mark D'Souza, Elizabeth Glass, Alex Rodriguez, Dina Sulakhe, Mustafa Syed, ... N. Maltsev, M. D'Souza, D. Sulakhe, A. M. Syed, E. Glass, Rodriguez, T. Bompada, ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 21
Provided by: dong
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics Group


1
From genomes to Phenotypes PUMA2 an
Integrated System for High-throughput
Evolutionary analysis of Metabolism
  • Bioinformatics Group
  • Mathematics and Computer Science division
  • Argonne National Laboratory

Natalia Maltsev, Mark DSouza, Elizabeth Glass,
Alex Rodriguez, Dina Sulakhe, Mustafa Syed,
Tanuja Bompada, Yi Zhang
2
What are we going to talk about
  • Our approach analysis of co-evolution of
    genomes, enzymes and metabolic networks in
    taxonomic and phenotypic framework
  • PUMA2 system (http//compbio.mcs.anl.gov/puma2)
  • Overview
  • Grid technology-based infrastructure
  • Genetic sequence analysis in PUMA2
  • Metabolic reconstructions
  • Genome Comparison Frameworks Taxonomy,
    Phenotypes and Physiological profiles

3
The goal of PUMA2 To provide an environment for
Co-Evolutionary and Comparative Analysis
ofGenomes, Metabolic networks and Enzymesin
the framework of Taxonomic, Phenotypic and
Physiological information
Physiological Profiles
Taxonomy
Environmental Conditions
Genomes
Co-Evolution
Enzymes
Metabolic Networks
Protein Families
4

Steps of Analysis of Genomes in PUMA2
(http//compbio.mcs.anl.gov/puma2)
Comparative Framework
Step 1. Assignment of functions to genes
Identification and Characterization of Proteins
Genetic Sequence Analysis
Predicted Gene Functions
Step 2. Reconstruction of Metabolic and
Functional networks from the results of sequence
analysis (E. Selkov 1993)
Metabolic Reconstructions From Sequence Data
Predicted Metabolic Pathways
Step 3. Prediction of Metabolic Phenotypes
Identification of Evolutionary and
Co-evolutionary patterns and Signatures
Prediction of Metabolic Phenotypes
5
PUMA2 Overview
Comparative Framework
  • PUMA2 is an Interactive Integrated
    Environment for High-throughput Genetic Sequence
    analysis and Metabolic reconstructions with
    Grid-based computational backend
  • PUMA2 contains
  • Pre-computed analysis of publicly available
    completely and almost completely sequenced
    genomes
  • (193 bacterial, 230 plasmid, 20 archaeal,
    22 eukaryotic, 638 mitochondrial and 1427 viral
    genomes) in interactive PUMA2 framework
  • Automated Metabolic reconstructions for over 200
    completely sequenced organisms
  • User Models a framework for analysis of genomes
    provided by users (Shewanella federation,
    Apicomplexa genomes, strains of B. anthracis,
    Staphylococcus, etc)
  • A suite of unique tools for evolutionary analysis
    of enzymes and metabolic networks (Chisel,
    PhyloBlocks, etc)
  • PUMA2 satellite databases Pathos (GLRCE
    biodefence), TarGet (MCSG structural bilogy),
    Sentra (prokaryotic signal transduction),
    SubUnit, Physiological Profiles. MetaGenomes
    (PNNL Hanford Site), etc

6
PUMA2 Infrastructure
Secure Collaborative Computational Framework
PUMA2 Integrated Database over 20 databases
sequence NCBI (RefSeq, GenBank), PIR,
UniProt, TIGR structural PDB, SCOP,
CATH metabolic EMP, KEGG, Brenda, Enzyme
phenotypic NCBI, literature and
results of pre-computed analyses of sequence data
by Blast, Blocks, InterPro, TMHMM, etc User
Annotations
Automated Multistep Data Analysis by variety of
bioinformatics tools Blast Blocks, Pfam,
Interpro TMHMM, etc Chimera Controlled workflow
pipelines, automated update cycle
Scalable Grid technology based Computational
Backend (TeraGrid, OSG, DOECG)
7
PUMA2Grid-based computational backendGADU/Gnare

Gnare portal
Blast NR vs NR (2.3 M sequences) takes 7, 5
daysusing Grid vs 389 days on 1 CPU Less than
2 hours to analyze an average bacterial genome
(4000 protein-coding genes) by Blast, Blocks,
Pfam, Gene function prediction algorithm, Chisel,
Metabolic Reconstruction tools, etc)
GADU
Grid Resources
Chimera
8
Sequence analysis in PUMA2
  • Pre-computed analysis of publicly available
    completely and almost completely sequenced
    genomes by conventional bioinformatics tools
    Blast, Blocks, InterPro, TMHMM, PepStat, etc
  • Automated assignment of functions to genes by
    PUMA2 tools voting algorithm and rules-based
    Chisel algorithm
  • Interactive analysis by users in PUMA2 framework

User comments and annotations
Information from public Databases
Interactive analysis of homologs
Interactive analysis
Precomputed data
9
PUMA2 Metabolic Reconstructions
Assignments of functions resulting from Sequence
analysis are superimposed onto collection
of Metabolic pathways from the EMP database (over
4000 pathways) Currently PUMA2 contains
automated Metabolic reconstructions for over 200
completely sequenced organisms
10
Chisel the PUMA2 workbench for identification
of taxonomic and phenotypic versions of enzymes
Step 2. Interactive (or automated) development
ofHMMs, concensus sequences, Blocks, etc for
identification of taxonomic or
phenotypic Variations of enzymes
Step 1 Rules-based clustering of Enzymatic
sequences
11
Evolutionary versions of enzymes why its
important?
  • Identification of proteins
  • Diagnostics
  • Biotechnology
  • Drug design
  • Interpretation of MetaGenomes
  • Accurate Metabolic reconstructions

12
Another Comparative Perspective Phenotypes
Phenotypic data in PUMA2 is obtained From NCBI
and directly from the literature Phenotypic
framework will be available In the next release
of PUMA2 in September 2005
13
and yet another comparative perspective PUMA2
Metabolic Profiles (a prototype)
  • To provide comparative framework for evolutionary
    analysis of metabolic pathways we are classifying
    organisms based on their major metabolic features
    predicted from Metabolic reconstructions
  • Respiration
  • Sources of Carbon
  • Nitrogen Metabolism
  • Etc.
  • The next release of PUMA2 will contain
    Respiratory Metabolic profiles for organisms with
    completely sequenced genomes

14
Examples Nitrogen Metabolism
How to find out in an hour who does what?
15
PUMA2 User Models (Gnare)
  • Analysis of genomes provided by users
  • Shewanella federation,
  • Apicomplexa genomes
  • strains of B. anthracis
  • Staphylococcus, etc
  • Prediction of gene function
  • Automated Metabolic reconstructions from sequence
    data
  • Interactive PUMA2 style framework
  • Requires 2 hours for analysis of an average
    bacterial genome

16
Here we are
17
Acknowledgements
  • Globus Mike Wilde, Nika Nefedova, Jens Voeckler,
    Ian Foster Condor Zach Miller, Miron Livny
    Grid3 people
  • EBI Robert Petryszak -- ClustR
  • MCS Rick Stevens, systems, Susan Coghlan, Von
    Welch and a lot of others.

18
Problems
  • DATA INTEGRATION!!!!!
  • NO SUFFICIENT DATA REGARDING PHYSIOLOGY OF
    ORGANISMS !!!!

19
Some History Argonnes WIT/PUMA family of
Integrated Systems for Genetic sequence Analysis
and Metabolic Reconstructions
  • PUMA (1995) -- R. Overbeek, E.Selkov, N. Maltsev,
    T. Gaasterland
  • WIT (1996) -- R. Overbeek, E.Selkov, N.
    Maltsev, N. Larsen
  • WIT2 (1998-2004) -- R. Overbeek, E.Selkov, N.
    Maltsev, E.Selkov Jr., M. D'Souza, G. Pusch
    (http//wit.mcs.anl.gov/WIT2)
  • The SEED the FIG/U. Chicago, ANL (Ross
    Overbeek, R. Stevens, V. Fonstein, et al.)
    analysis of the Metabolic Subsystems
  • PUMA2 MCS, ANL N. Maltsev, M. DSouza, D.
    Sulakhe, A. M. Syed, E. Glass, Rodriguez, T.
    Bompada, Yi Zhang (http//compbio.mcs.anl.gov/puma
    2) whole organisms models, co-evolutionary
    analysis

20
Metabolic Signatures How does metabolic networks
architecture reflect phenotypes?
TCA cycle in Cyanobacteria
Write a Comment
User Comments (0)
About PowerShow.com