Title: Genome dynamics in Bacillus megaterium
1Genome dynamics in Bacillus megaterium
What genomic sequencing tells us about the
genetic forces that shape Bacillus genomes
- October 29, 2009
- Dept. of Biological Sciences
- NIU
2The Genus Bacillus
- Gram-positive, aerobic endospore-forming
rod-shaped bacteria - Normal habitat the soil (plus lots of other
places) - Mostly mesophilic, but some grow as low as 0and
as high as 65. - Pathogens B. anthracis and B. cereus
- Industrial uses enzyme production, Bt
insecticidal corn - Endospores very resistant to heat and chemicals
3Relatives among the Firmicutes
4 A Bit of History
- Bacillus subtilis, originally named Vibrio
subtilis, by Christian Gottfried Ehrenberg in
1835. He was the first to use the name
bacteria. - Ferdinand Cohn (1872) renamed the species
Bacillus subtilis, as part of his description of
bacteria by their shape (bacillus little
stick). - --He is also responsible for bacteria being
considered plants and not animals - Robert Koch first showed that a specific
bacterium caused a specific disease B. anthracis
and anthrax. (1876) - B. megaterium was first described by Heinrich
Anton de Bary in 1884.
5Bacilluss Position in the Tree of Life
- Anything called Bacillus in the 1800s would
now be a member of the Firmicutes (strong
skin), a phylum that contains the Gram-positive
low GC bacteria. - An alternative model, based on indels in
universal genes, puts the Firmicutes near the
root of the tree.
6Bacillus Taxonomy
- Bacillus is a very old genus name, and it has
been split several times. - Bergeys Manual of Systematic Bacteriology, first
edition (1986) lists 32 valid species, with about
an equal number of synonyms. - Based on morphology, biochemistry, some DNA-DNA
hybridization, numerical taxonomy - Carl Woese introduced the use of 16S rRNA
sequences for phylogeny in 1977. - Bergeys Manual second edition (2004) splits the
Bacillus genus into 4 families, with 37 genera in
the Bacillaceae. Over 200 species. - Bacillus is still a genus, and still contains
both B. subtilis and B. megaterium. - As in other taxa, a common phenotype is well
correlated with a common genotype
7Ash et al. (1991) Lett. Appl. Microbiol.
13202-206.
8Genome Sequencing
- Strain QM B1551, containing 7 plasmids
- NSF Grant, to Pat Vary and Jacques Ravel
- Most lab work done at TIGR/U. Maryland
- NIUs role annotating the 6000 genes
- Joined forces with Dieter Jahns group at the
Technische Universität in Braunschweig, Germany,
who were sequencing the DSM 319 plasmidless
strain - In addition, there are about 20 other fully
sequenced genomes from Bacillus and related
genera - DSM319 has no plasmids, but at least 70 genes on
the QM plasmids have good homologues on the DSM
chromosome (purple ring near the middle)
9Common Features, Genetic Forces
Assuming that all Bacillus species descended from
a common ancestor, what is similar and different
between them, and why?
- Genetic Forces
- Vertical descent
- Background substitution and indel mutations
- Horizontal gene transfer (about 10 different
genes between QM and DSM) - Intragenomic recombination
- Homogenization of rRNA operons, presumably by
gene conversion
- Common Features
- Morphological and biochemical characteristics
- 16S rRNA genes
- A group of common protein-coding genes
- Chromosomal synteny
- rRNA operons
1016S Variation, Phylogeny, and Species
Identification
- B. megaterium has 11 rRNA (rrn) operons on the
chromosome in both sequenced strains, in the same
genomic positions. - QM also has an rrn operon on plasmid pBM400,
which is not found in DSM. - The 16S genes in B. megaterium are 1540 bp long
and very similar, but not identical. - Gene conversion is thought to homogenize rRNA
operons - Recombination between rrn operons leads to
deletions - The question addressed here what effect does 16S
variation within the genome have on phylogeny and
species identification?
11Differences between 16S genes with B. megaterium
- Seven identical 16S genes the rrnE, rrnF, and
rrnI genes in QM and the rrnA, rrnB, rrnF, and
rrnK alleles in DSM. - Also, the rrnA and rrnB alleles in QM were
identical to each other - Note the lack of clear vertical descent in this
pattern - Total of 20 sites with polymorphisms.
- All but 4 are unique to a single operon
- All but one shared polymorphism are found in both
QM and DSM
- Positions 461 and 474 are probably a stem-loop
- all genes with an A at 461 have a T at 474, and
all lines with a G at 461 have a C at 474.
12Mismatch Differences in Completely Sequenced
Genomes
13Differences in Completely Sequenced Genomes
- Maximum differences within any genome 16 (B.
clausii) - My basic argument there is no point in having
two different species which are less different
than 16S genes within the same genome. - Among the cereus group genomes, there are fewer
differences between genes in B. cereus, B.
anthracis, and B. thurengiensis, than there are
between genes in the same genome. - Also, B. weihenstephanesis has only very few
differences from these - B. subtilis and B. amyloliquifaciens are also
very similar. - Effects on phylogeny pick a random 16S gene from
each genome, align, count differences, do a
neighbor-joining tree. 1000 reps.
14Neighbor-Joining Trees with Completely Sequenced
Genomes
- Different choices of which 16S genes to use leads
to different phylogenies, both at the
species/subspecies level and at higher levels. - The variable nodes in the cereus group and the
halodurans/clausii group are independent. Thus,
these three tree represent 9 variants.
15Defining B. megaterium and distinguishing it from
other species, using 16S
- Comparison of B. megaterium isolates from Genbank
to QM-rrnA - A total of 185 isolates that were gt1390 bp (i.e.
gt 90 of full length) and had fewer than 10
ambiguity characters were aligned with QM_rrnA,
and the number of variant positions were counted. - 70 have 9 or fewer differences
- 86 have 20 or fewer differences
- 95 have 46 or fewer differences.
- Most isolates seem to fall into a single group,
but there may be some significantly different
subtypes in B. megaterium. - Or, new species may be defined
16Positions of Nucleotide Variants in Genbank
Isolates
- 43 of the 1540 nucleotide positions in the 16S
gene have at least one variant in the B.
megaterium strains from Genbank. - Most of the variation occurs at the ends of the
16S gene. This is also the region where missing
data is most common. - PCR primers for 16S need to be internal to the
gene - The variant positions in the middle were seen in
QM and DSM the paired 461/474 positions, and
1140. There are no major polymorphisms outside
the end regions that are not seen in QM and DSM.
17Closely Related Species
- How easy is it to distinguish between B.
megaterium and closely related species? - What species are closely related to B.
megaterium? Different phylogenetic trees give
different answers. - All of the species on the next slides appear to
be more similar to B. megaterium than members of
the cereus group on at least one phylogenetic
tree. - All are in genus Bacillus except Lysinibacillus
sphaericus. - Total of 344 strains used
18Number of Differences from QM-rrnA for Different
Species
- Except for B. flexus and one B. simplex isolate,
all strains are well-differentiated from B.
megaterium with a minimum of 58 differences. - B. flexus overlaps the B. megaterium distribution
heavily. The average B. flexus isolaate had 29.4
differences from QM_rrnA, with some isolates
indistinguishable. Type strain differs at 16
positions the B. megaterium type strain differs
at 4 positions. - The average B. simplex isolate had 79.4
differences from B. megaterium the one
exceptional strain had 17 differences (maybe its
a mis-labeling)
19Conclusions about 16S genes
- Choosing different 16S genes from within genomes
can affect the resulting phylogenetic trees. - The 16S genes within B. megaterium and other
completely sequenced Bacillus genomes differ from
each other by up to 16 positions. - Some species differ from other species at fewer
positions than 16S genes differ within individual
genomes. - Although most B. megaterium strains are very
similar to QM and DSM, there are a few strains
with very different 16S genes that may represent
subtypes within B. megaterium, or which may
ultimately be assigned to new species. - Most of the polymorphisms in the 16S genes are
almost unique all of the widespread . megaterium
polymorphisms are found in QM and DSM. - Most of the closely related species fall outside
the range of variation seen within B. megaterium,
but B. flexus is a major exception. - some isolates of B. flexus are indistinguishable
from B. megaterium, and most fall within the same
range of variation seen in B. megaterium
20Common Genes and Synteny
- Bacillus is a relatively well-sequenced genus,
with 11 complete genomes publicly available (not
including B. megaterium). - What genes are found in all Bacillus species, the
core genome? - Where on the chromosome are the conserved genes?
21Bacillus Core Genome
22QM vs. DSM Genes
23Between Species
24Synteny Results
- The syntenic region around the origin of
replication is shared throughout the Bacillaceae,
including the genera Bacillus, Geobacillus,
Oceanobacillus, and Anoxybacillus. - 99 of the 2000 core genes are in the syntenic
region. - Next rRNA operons and adjacent genes concrete
examples of conserved synteny.
25rRNA operons (rrn)
- There are 11 rRNA operons on the B. megaterium
chromosomes, plus one on plasmid pBM400 in the QM
strain. - Other Bacillus species have 8-15 rrn operons
- The rrn operons are in the conserved synteny
region. - Only in Bacillus and relatives
- rrn operons are all on the leading DNA strand
transcribed in the same direction as the
replication fork moves. - Most Bacillus rrn operons are on the right
replichore, near the origin of replication
26From Stewart and Cavanaugh, 2007, J. Mol.
Evol. 6544-67
27Common Sites
- Nearly all the rrn operons in the Bacillaceae can
be found between sets of common flanking genes. - Sometimes with DNA insertions separating the rrn
locus from one side - A few unique rrn operons, including 2 in B.
megaterium - Not in Paenibacillus
A DNA repair protein recF B DNA gyrase, subunit
B C DNA gyrase, subunit A D
inosine-5'-monophosphate dehydrogenase E
D-alanyl-D-alanine carboxypeptidase. F glutamine
amidotransferase, synthase subunit
28rrn operons in Bacillaceaae are in specific sites
29Variations
- Seven sites on the right replichore, plus one on
the left replichore. - Also, a site shared within the cereus group, and
two sites shared in Geobacillus and
Anoxybacillus. - Individual rrn sites can contain 0-5 rrn operons.
- Some sites are empty the flanking genes are
adjacent, with no rrn operon between - A few sites are missing the flanking genes are
not present in the genome or are dispersed to
very different locations. - Tandem duplications of rrn operons are common
- Several variations caused by apparent
intragenomic recombination
30Tandem Duplication Copy Number
31Intragenomic Recombination at rrn Sites
- rrn operons are almost identical, among the very
few repeated sequences in bacterial genomes - A second example insertion sequences (IS) ,
which are mobile genetic elements found in many
genomes (very few in B. megaterium ). - The presence of highly conserved genes and the
consequences of intragenomic recombination in a
circular genome constrains genome rearrangements.
- The arrangement of rrn operons and their sites
can be understood as the result of three forces - intragenomic recombination between rrn operons,
- insertions/deletions of blocks of protein-coding
genes, - recombination events within tandem arrays of rrn
operons.
32Symmetrical Inversion Between Replichores
- Anoxybacillus flavithermus
33Double Crossover Re-orders Flanking Genes
34Double Crossover in Flanking tRNA Regions
- B. amyloliquifaciens rrnE.
- Resulted in loss of 2/3 of the 16S gene.
- 23S and 5S OK
- very little obvious homology on the right side.
35Tandem Duplication Events Duplication by
Unequal Crossing Over
- rrnD in Oceanobacillus iheyensis
36Tandem Duplications in the cereus group rrnG site
- Every deletion between adjacent rrn operons can
be seen. - Deletion of genes between rrn 2 and rrn3
(preserving one gene in the middle). - Region between rrn 3 and rrn 4 completely
replaced.
37Intragenomic Recombination Conclusions
- Most rrn operons are found in the same sites in
all Bacillus genomes - Differences in rrn operon number are mostly due
to tandem duplications within these sites - Intragenomic recombination is well documented in
Bacillus genomes - Anoxybacillus symmetric inversion across ori
- B. pumilis double crossover involving 3 regions
- Oceanobacillus rrnD CO between tandem copies
- rrnD in other species at least 2 events
- cereus group rrnG deletions between tandem
copies (at least 4 different events) - cereus group rrnG replacement of inter-rrn
region by presumed 2CO - cereus group rrnG deletion of inter-rrn region,
leaving a central portion intact (2 deletions?). - B. amyloliquifaciens rrnE 2CO involving 3
regions, with the central section having the COs
150 bp apart - B. megaterium rrnBC 2CO involving 3 regions,
with little homology at one end - Several other duplication/deletion events within
tandem duplications
38Some Events NOT Observed
- The lack of certain events supports several
current ideas. - to the extent that lack of evidence constitutes
evidence. - Crossovers between rrn sites despite numerous CO
events within rrnG in the cereus group, plus many
other CO events - supports the idea that the flanking genes are
necessary - Asymmetric CO across ori only one symmetric one
observed, so evidence is not strong. - Supports the idea that symmetric replichores are
selectively advantageous - Inversions within a replichore all rrn in all
species are on the leading strand, in both
replichores. - supports the idea that replication and rrn
transcription must proceed in the same direction
39Some Unsolved Questions
- Replichore asymmetry
- most of the rrn are in the right replichore
- compositional bias between replichores
- Mechanism of insertion/deletion/horizontal gene
transfer - a big question. We are examining insertion sites
for clues. - Is there a common phylogeny for the conserved
synteny region in B. megaterium? - Finding and analyzing allegedly unique events
(indels and recombinations)
40DNA Composition shows replichore asymmetry
41Simple vs. Compound Insertions
42Thanks!
- NIU Biology Dept.
- Pat Vary
- Janaka Edirisinghe
- Kirthi Kumar Kutumbaka
- Sandhya Balasubramanian
- Jenn Hintzsche
- Chris Braun
- Denise Tombolato
- Judy Luke
- Scott Grayburn
- NIU Computer Science Dept.
- Stephen Snow
- Reva Freedman
- Minmei Hou
- Argonne National Labs
- Ross Overbeek
- Gordon Pusch
- Terry Disz
- TIGR/U. Maryland
- Jacques Ravel
- Mark Eppinger
- MJ Rosovitz
- Technische U. Braunschweig
- Dieter Jahn
- Boyke Bunk