Creating the Genomic Encyclopedia for Bacteria and Archaea - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Creating the Genomic Encyclopedia for Bacteria and Archaea

Description:

Rob Edwards, Jonathan A. Eisen, Ross ... Tapping into prokaryotic biodiversity - Industrial Biotechnology ... Current Opinion in Biotechnology, 15:280 284 ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 21
Provided by: scien5
Category:

less

Transcript and Presenter's Notes

Title: Creating the Genomic Encyclopedia for Bacteria and Archaea


1
Creating theGenomic Encyclopedia for Bacteria
and Archaea
  • Rick Stevens Eddy Rubin
  • Argonne National Laboratory Joint Genome
    Institute
  • The University of Chicago Berkeley Lab

Rob Edwards, Jonathan A. Eisen, Ross Overbeek,
George Garrity, Veronika Vonstein, Sveta Gerdes,
Folker Meyer, Kevin White, Tim Lilburn, Barney
Whitman, et. al.
2
The Basic Idea of the Project
  • To build an enterprise that can take advantage of
    the expected exponential improvements of
    sequencing capabilities to sequence all known
    cultured and described prokaryotes
  • Ride the expected Moores law of sequencing
    capability
  • To develop a distributed high-throughput
    industrial approach to the cultivation,
    characterization, sequencing, annotation and
    analysis of prokaryotic genomes
  • Build a team from groups that have expertise and
    track records
  • To build and curate a database of genome
    sequences, metabolic reconstructions, and
    standardized phenotype assays associated with
    each target organism
  • Streamline the release of data, provide a
    foundation for derivative projects

3
Concept of the Bergeys/GEBA Sequencing Project
  • A Fixed cost annual investment
  • Each year more can be sequenced as sequencing
    costs decrease and as cultivation efficiencies
    improve based on experience
  • Leverage the expected improvement of sequencing
    costs
  • Address the overall scope within 5 to 6 years
  • Increase amount of near complete sequences per
    year
  • Optimize the choice of organisms to maximize
    diversity at each stage
  • Exploit the Bergeys Trust and International
    Committee on Systematics for Prokaryotes for
    Taxonomic coverage (e.g. Garrity and Whitman)
  • Involve the microbiology community for
    prioritization
  • Industrialize the pipeline
  • Biological Resource Centers to produce and
    characterize type material
  • DOE JGI, NIAID/DMID Centers, NSF/USDA Centers for
    Sequencing
  • Laboratories for bioinformatics (Argonne, JGI,
    TIGR, ORNL, etc.)
  • Universities and Laboratories for modeling and
    analysis

4
The Question is not if, but When and How ?
  • Why should we want to accelerate this transition?
  • Why not just let it happen as a matter of course?
  • What is in the current sequencing pipeline?
  • Completed Genomes Ongoing/In the Pipeline
  • Archaeal 29 56
  • Bacterial 397 991
  • Eukaryal 44 631
  • The existing process of bottoms up selection of
    organisms for sequencing is leaving many
    important groups underrepresented, closure will
    take a long time
  • There are groups are well represented in the
    literature, but not in the sequencing databases
  • Under representation is also an issue in
    environmental sequencing data

5
Tapping into prokaryotic biodiversity -
Industrial Biotechnology
  • Rapidly growing field
  • by 2010 biocatalysis will be used in
    production of 60 of fine chemicals (McKinsey
    analysis)
  • In US coordinated by USDA Biobased Products
    and Bioenergy Coordination Council (BBCC)
  • Applications
  • pharmaceuticals
  • food ingredients (sweeteners, vitamins)
  • feed additives and other agrochemicals
  • organic solvents
  • polymer raw materials
  • biofuels
  • Advantages over chemical methods
  • exquisite substrate specificity
  • excellent chemo-, regio- and
    stereoselectivity
  • environmentally friendly green chemistry
    based on biorenewables
  • Needed
  • novel enzymes and pathways

Straathof et al. 2002. Curr Opinion Biothech
13548-56
150 compounds are currently produced on
industrial scale using biocatalysts. Examples
Hans E. Schoemaker, et al. 2003. Science
2991694-97
6
Analysis of 1000s of new bacterial genomes will
likely yield completely novel pathways and
enzymes for industrial applications
Examples of recently discovered biocatalytic
transformations of novel organic functional
groups
  • Current approaches to discovery of new enzymes
  • Screening environmental samples by enrichment
    cultures (BUT only ltlt1 prokaryotes are
    currently culturable)
  • Metagenome approach cloning expression of DNA
    samples in a surrogate host, then screening for
    desired function (BUT only known functions can
    be screened for, new biochemistry cannot be
    discovered)
  • Sequence-based discovery (growing
    explosively, generating knowledge base for basic
    sciences and biotechnological applications)

L.P. Wackett. 2004. Current Opinion in
Biotechnology, 15280284
Still to be discovered enzymes involved in the
biosynthesis or catabolism of approximately 40
naturally occurring chemical functional groups
are still not known
7
Building the Case
  • There is a disparity between the literature and
    the existing genomes
  • We cant fully exploit the communitys historical
    knowledge and investments without closing this
    gap
  • There is a disparity between the rank/abundance
    curves from 16s studies and from environmental
    sequencing projects and the existing genomes
  • We cant fully understand the new datasets
    without closing this gap (I.e. lack of complete
    sequence coverage of known culturables is holding
    back future work)
  • There is likely to be new biochemical pathways
    and novel enzymes in the set of culturable but
    unsequenced organisms, sequencing non-cultured
    organisms to expand diversity
  • These represent the low hanging fruit for
    discovery since the investment has already be
    made in determining culture conditions
  • A comprehensive database produced under
    controlled conditions that includes phenotype
    data and genotype data will accelerate research
    in understanding the genotype-phenotype
    relationship
  • Genome-Scale reconstruction and modeling will be
    dramatically accelerated by comprehensive
    databases that include phenotype data

8
Estimated Sequencing Rates
Selection of Targets
Produce DNA
Sequencing Assembly
Rapid Annotation (24 Hours)
Metabolic Reconstruction
Model Generation
Phenotype Prediction
Database Repository
9
Technical Feasibility FAQ
  • How many genomes would the project propose to
    sequence?
  • About 5000 over 5-7 years
  • Who would produce the biomass needed for DNA
    extraction?
  • Type culture centers until enrichment and
    environmental methods mature
  • Will the biomass/DNA be available for
    distribution?
  • Yes, both the DNA and the libraries could be
    stored for distribution
  • What throughput is needed for DNA production?
  • In the beginning of the project 300 taxa per
    year to 2000 per yr at the end
  • What combinations of sequencing technologies need
    to be employed?
  • Sanger and Pyrosequencing initially, others as
    they come online
  • What throughput is needed for annotation?
  • 24 hour turnaround from assembled sequence to
    initial availability this has already been
    achieved at Argonne, TIGR and elsewhere
  • Is is possible to have a standard set of
    phenotype assays given the broad spectrum of
    organisms and conditions?
  • We are considering Biolog as a model, but it is
    too limited
  • How would the genomes be selected and
    prioritized?
  • At each cycle we choose genomes (e.g. via 16s) to
    minimize the diversity gaps
  • Community input would be solicited to insure the
    project is tracking the communities interests
  • Is it necessary to close the genomes?
  • We think no. Libraries would be archived for
    groups that might be interested in closing.

10
The Project Would Provide a Comprehensive Set of
Genome Sequences for
  • Biofuels, and bioproduction of alternative
    feedstocks
  • Understanding and managing the microbial carbon
    cycle
  • Soil and subsurface microbial ecology
  • Bioremediation and bioconversion of waste streams
  • Evolution and microbial ecological dynamics
  • Context for environmental sequencing and
    metagenomics
  • Basis for developing predictive models of
    phenotypes
  • Source of components for synthetic biology
  • Improving our understanding of cultivability
  • Dramatically improving the reliability and
    quality of genome annotations

11
How Many Known Cultured Organisms?
  • Latest version of the Prokaryotic Taxonomic
    Outline will contain 7951 named species of
    Bacteria and Archaea.
  • Of these, 178 are non-cultivable or not
    represented by viable type material.
  • An additional 1222 are synonyms.
  • Of the 6543 type strains for which viable
    material is reportedly deposited, we have
    assembled a minimal set of 6389 strains that are
    available from 16 major public culture
    collections or biological resource centers in the
    US, Europe, and Asia.
  • The remaining 154 are in minor or non-public
    collections.
  • This information is derived from Release 6.1 of
    the Taxonomic Outline of the Prokaryotes which
    will be published in 2007 and is current through
    May 2006.

12
What Has Been Sequenced or is In Play
  • Of the 6400 strains available from public sources
  • About 380 are human, animal or plant pathogens
  • Order 1/3-1/2 of the known pathogens have been
    sequenced
  • 360 complete prokaryotic genomes published
  • 56 archaeal and 940 bacterial genomes in progress
  • From 897 prokaryotic genomes in progress in GOLD
  • 400 are pathogens (many duplicate taxa)
  • 221 are supported by DOE (156 biotech, 51
    environment)
  • Approximately 5000 prokaroytes not yet in play
  • We estimate about 4800 non-pathogen taxa

13
Strain Distribution in Collections
  • US Collections / BRCs Strains
  • American Type Culture Collection (ATCC) 4027
  • USDA ARS Collection (NRRL) 223
  • European Collections
  • Deutsche Sammlung vor Microoransmen (DSMZ) 1302
  • Culture Collection University Gottenberg (CCUG)
    183
  • Pasteur Institute (CIP) 170
  • Laboratory for Micrbiology, Gent (LMG) 101
  • National Collection of Industrial and
  • Marine Bacteria 25
  • French Collection of Phytopathogens (CFPB)
    15
  • National Collection of Type Cultures (NCTC)
    12
  • National Collection of Phytopathogenic
  • Bacteria 11
  • Asia
  • Japan Collection of Microorganisms (JCM) 185
  • Institute of Fermentation, Osaka (IFO) 34
  • Korean Collection of Type Cultures (KCTC) 28
  • Institute of Applied Microbiology, Tokyo (IAM)
    26

14
Distribution of Genome Sizes in the Pipeline
Average Sequence 4Mbp
15
Getting Value from the Genomes
  • Genomes would be assembled by the groups doing
    the sequencing
  • Assembled contigs would be sent to the initial
    high-throughput annotation server for draft
    annotations and immediately published on-line
  • The accumulated (additional) genomes will be used
    to improve annotations (gene calls, functional
    coupling)
  • Genomes will be integrated into databases to
    support comparative analysis and evolutionary
    analysis
  • Annotated genomes can be used to
    semi-automatically construct genome-scale models
    which could be used to make metabolic phenotype
    predictions

16
Background
  • online at
  • http//www.sequencingbergeys.org
  • login required (just ask us)
  • guest read-only access after the meeting?
  • make maximum information available
  • Bergey hierarchy, NCBI taxonomy, 16s RNA, strain
    collections, GOLD, SEED,

17
List of organisms for sequencing
- based on 16s clusters
18
Cluster Page
select strain for cluster
19
Bergey Browser
20
Species Page
Write a Comment
User Comments (0)
About PowerShow.com