Title: GEBA A genomic encyclopedia of bacteria and archaea
1GEBAA genomic encyclopedia of bacteria and
archaea
- Jonathan A. Eisen
- JGI User Meeting 2009
2Nothing in biology makes sense except in the
light of evolution. T. Dobzhansky (1973)
3(No Transcript)
4rRNA Tree of Life
5The Tree is not Happy
6From http//genomesonline.org
7As of 2002
- At least 40 phyla of bacteria
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Based on Hugenholtz, 2002
8As of 2002
- At least 40 phyla of bacteria
- Genome sequences are mostly from three phyla
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Based on Hugenholtz, 2002
9As of 2002
- At least 40 phyla of bacteria
- Genome sequences are mostly from three phyla
- Some other phyla are only sparsely sampled
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Based on Hugenholtz, 2002
10As of 2002
- At least 40 phyla of bacteria
- Genome sequences are mostly from three phyla
- Some other phyla are only sparsely sampled
- Same trend in Archaea
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Based on Hugenholtz, 2002
11Need for Tree Guidance Well Established
- Common approach within some eukaryotic groups
- Many small projects funded to fill in some
bacterial or archaeal gaps - Phylogenetic gaps in bacterial and archaeal
projects commonly lamented in literature
12- NSF-funded Tree of Life Project
- A genome from each of eight phyla
- At least 40 phyla of bacteria
- Genome sequences are mostly from three phyla
- Some other phyla are only sparsely sampled
- Solution I sequence more phyla
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Eisen, Ward, Badger, Wu, Wu, et al.
Chloroflexi
13Bacterial aTOL Project AIMS
- Improve resolution of deep branches in the
bacterial tree - Launch biological studies of these phyla and
discover functional novelty - Leverage data for interpreting environmental
surveys
14T. roseum genome
15The Tree of Life is Still Angry
16Within Phyla Diversity Immense
- Each phyla represents billions of years of
evolution - Some have hundreds of major lineages
- New lineages are being discovered all the time
- Most branches within most phyla have few or no
genomes
17Major Lineages of Actinobacteria
18Additional Impetus for Tree Guided Projects
- Suggestion to sequence all bacteria and archaea
in Bergeys Manual (Stevens et al) - Success in sequencing genomes from across the
tree in animals - Multiple government reports suggest a more
systematic approach to sequencing is needed
19- At least 100 phyla of bacteria
- Genome sequences are mostly from three phyla
- Most phyla with cultured species are sparsely
sampled - Lineages with no cultured taxa even more poorly
sampled - Solution - use tree to really fill gaps
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Well sampled phyla
20http//www.jgi.doe.gov/programs/GEBA/pilot.html
21GEBA Pilot Project Overview
- Select 200 organisms using tree
- Develop high throughput pipeline for strain
growth and DNA preparation - Sequence and finish 100
- Annotate, analyze, release data
- Assess benefits of tree guided sequencing
22GEBA Pilot I Selecting Targets
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28GEBA Pilot II The Importance of Project
Management
29Annotation
Project Initiation
Sequencing
IMG1
Draft Sequencing and Assembly1
GEBA Proposal
Complete Genome GenBank Submission1
Shotgun Genome GenBank Submission1
Scientific and Technical Review1
OK?
OK?
IMG ER1
IMG ER1
Finish Sequencing and Assembly2
Negotiate Scope of Work
Gene-QA1
Draft Annotation3
Receive Starting Material1
Finish Annotation3
OK?
1 PGF 2 LANL 3 ORNL
David Bruce, Lynne Goodwin et al
30GEBA Pilot III Partnership with DSMZ
31GEBA Biggest ChallengeGetting DNA
- Getting quality DNA is biggest bottleneck
- Solution Beg Borrow and Steal
- DSMZ offered to do for free
- ATCC is doing a small number for a fee
- In discussions with other PCC and other
collections
32(No Transcript)
33Quantification gel of the genomic DNA isolated
from Conexibacter woesei (DSM 14684T)
Microorganisms
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Lane 1 c(?-Marker) 15 ng Lane 2 c(?-Marker)
30 ng Lane 3 c(?-Marker) 50 ng Lane 4 DNA
Molecular Weight Marker II (Roche 236250) Lane 5
DSM 13279, Collinsella stercoris Lane 6 DSM
43043, Intrasporangium calvum Lane 7 DSM 18053,
Dyadobacter fermentans Lane 8 DSM 20476, Slackia
heliotrinireducens
Lane 9 DSM 18081, Patulibacter
minatonensis Lane 10 DSM 14684, Conexibacter
woesei Lane 11 DSM 11002, Dethiosulfovibrio
peptidovorans Lane 12 DSM 11551, Halogeometricum
borinquense Lane 13 DNA Molecular Weight Marker
II (Roche 236250) Lane 14 c(?-Marker) 125
ng Lane 15 c(?-Marker) 250 ng Lane 16
c(?-Marker) 500 ng
Conexibacter woesei (DSM 14684T) was taken from
the German Collection of Microorganisms and Cell
Cultures (DSMZ). The genomic DNA was isolated
using the Qiagen Genomic 500 DNA Kit (Qiagen
10262). The genomic DNA was 10-250 kb in size as
determined by Pulsed Field Gel Electrophoresis
(PFGE). The bulk of DNA had a size of 50-250 kb
(see attached PFGE image). The DNA concentration
is 500 ng/µl as estimated from the gel.
Spectrophotometric measurements yielded a DNA
concentration of 450 µg/ml 300 µl of genomic DNA
are shipped (150 µg).
34GEBA Pilot IV Sequencing, Annotation, Data
Release
35Current Status
- gt100 in progress
- GEBA 56 (focus of first paper)
- 34 finished genomes
- 55 submitted to Genbank
- Released to IMG-GEBA page and JGI-FTP site
- All data is completely Open for anyone to use
36IMG/GEBA
http//img.jgi.doe.gov/cgi-bin/geba/main.cgi
37Adopt a Microbe
38GEBA Pilot IV Assess Benefits of GEBA56
- All genomes have some value
- But what, if any, is the benefit of tree-guided
sequencing over other selection methods
39Why Increase Taxonomic Coverage II?
- Gene discovery
- Annotation, functional prediction
- Metagenomic analysis
- Mechanisms of diversification
- Species phylogeny and classification
40(No Transcript)
41Value of diverse genomes I Gene discovery
- Premise
- New genomes frequently contain genetic novelty
- Phylogenetic diversity of a genome should be
correlated to novelty - Caveat
- Does lateral gene transfer wipe out contribution
of phylogenetic diversity to novelty?
42Protein Family Rarefaction Curves
- Take data set of multiple complete genomes
- Identify all protein families using MCL
- Plot of genomes vs. of protein families
43(No Transcript)
44Number of proteins
Total Gene Number
Genome Number
45Novelty 2 - Structural Novelty
- Of the 17000 protein families in the GEBA56, 1800
are novel in sequence (Wu) - Structural modeling suggests many are
structurally novel too (D'haeseleer) - 372 being crystallized by the PSI (Kerfeld)
46Novelty 3
- Diversity within known families
47Transporter Profiles
Sebaldella termitidis ATCC 33386 has 2x number of
sugar PTS transporters of any genome
48Novelty 4
- Unusual distribution patterns
49Shotgun Sequencing Detects More Diversity than
PCR-methods
50First Bacterial Actin Related Protein
First found by V. Kunin, Structure Analysis by
Patrik D. et al
51Most Closely Related to ARP8
52Value of 100 diverse genomes II Annotation
- Premise
- Increased phylogenetic coverage should improve
our ability to annotate genes in other (e.g.,
reference/model genomes)
53Annotation Improves
- Conversion of hypothetical into conserved
hypotheticals - Linking distantly related members of protein
families - Non-homology functional prediction methods
54Linking Protein Families Improved
55Fusion Based Predictions Improved
56Improving Rosetta Stone Predictions
57Value of 100 diverse genomes III Metagenomics
- Premise
- Increased sampling of diverse genomes should
improve many aspects of metagenomic analysis - To test
- Annotation
- Binning
58Metagenomic Annotation Improves (Slightly)
59Compositional Binning Improves (Slightly)
60Phylogenetic Binning Improves Slightly
61Value of 100 diverse genomes V Phylogeny
6216s Says Hyphomonas is in Rhodobacteriales
Badger et al. 2005
63WGT Says Its Related to Caulobacterales
Badger et al. 2005
64(No Transcript)
65(No Transcript)
66GEBA - After the Pilot
67PD of sequenced organisms
68PD with GEBA
69(No Transcript)
70- At least 40 phyla of bacteria
- Genome sequences are mostly from three phyla
- Most phyla with cultured species are sparsely
sampled - Lineages with no cultured taxa even more poorly
sampled
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Well sampled phyla
Poorly sampled
No cultured taxa
71As of 2002
- At least 40 phyla of bacteria
- Genome sequences are mostly from three phyla
- Some other phyla are only sparsely sampled
- Same trend in Viruses
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Based on Hugenholtz, 2002
72As of 2002
- At least 40 phyla of bacteria
- Genome sequences are mostly from three phyla
- Some other phyla are only sparsely sampled
- Same trend in Microbial Eukaryotes
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Based on Hugenholtz, 2002
73Need experimental studies from across the tree too
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
0.1
Chloroflexi
Tree based on
Hugenholtz (2002)
with some
modifications.
74(No Transcript)
75MICROBES
76A Happy Tree of Life