Title: Bacterial Genomics
1http//pastime.cgu.edu.tw/petang/index.htm
Bacterial Genomics
2Late 19th centuryNucleic acids were long-chain
polymers of nucleotides, made up of sugar,
phosphoric acid, and several nitrogen-containing
bases. The sugar in nucleic acid can be ribose or
deoxyribose, giving two forms RNA and DNA.
1943, Oswald Avery proved that DNA carries
genetic information. 1948, Linus Pauling
discovered that many proteins take the shape of
an alpha helix, spiraled like a spring coil.
1950, Erwin Chargaff found that the arrangement
of nitrogen bases in DNA varied widely, but the
amount of certain bases always occurred in a 11
ratio. 1951, Maurice Wilkins and Rosalind
Franklin -the "wet" form of DNA (in the higher
humidity) had all the characteristics of a
helix 1952, Erwin Chargaff -matching base pairs
interlocked in the middle of the double helix to
keep the distance between the chains constant.
1953, Francis Crick and James Watson
3Franklin wrapped up her DNA work. She turned her
attention to viruses, publishing 17 papers in
five years. She worked up until a few weeks
before her death from ovarian cancer in 1958 at
age 37
4DNA Content of Haploid Genomes of a Range of
Phyla
5THE WORLD OF GENOMICS
6Haemophilas influenzae the first free-living
organism to be sequenced
7Timeline of Selected Bacterial Genome
Sequencing Projects
8Map-based and Whole-genome Shotgun Sequencing
9How to Sequence a Bacteria Genome by Map-based
Sequencing
10Dideoxy DNA Sequencing
11Automated Sequencing
12Molecular weight marker every 5th lane
Restriction fragment fingerprinting
- BAC clones are grown in 96-well format -
Hind III digest - 1 agarose
13Regional mapping
14Regional mapping
15Regional mapping
Minimal tiling path selected for sequencing.
16Contig assembly
- FPC
- Overlap identification by
- restriction pattern similarities
- Facilitated contig assembly
- Sanger Centre
- C. Soderlund, I Longden and R. Mott
Clone
All restriction fragments within a clone selected
for the tiling path must be verified by
their presence in overlapping clones.
insert fragments
vector fragments
17BCM- HGSC
18How to Sequence a Bacteria Genome by Shotgun
Sequencing
19Shotgun Sequencing I RANDOM PHASE
Sheared DNA 1.0-2.0 kb
Bac Clone 100-200 kb
Random Reads
Sequencing Templates
20Shotgun Sequencing IIASSEMBLY
Consensus
21Shotgun Sequencing III FINISHING
Consensus
22Shotgun Sequencing III FINISHING
Consensus
23Shotgun Sequencing III FINISHING
Consensus
24Shotgun Sequencing III FINISHING
Consensus
25Shotgun Sequencing III FINISHING
26Whole Genome Shotgun SequencingAssembly
Consensus
27Whole Genome Shotgun SequencingAssembly
Low Base Quality
Consensus
BCM- HGSC
28598 Bacterial Genomes, 231 completed 367
incomplete
29(No Transcript)
30(No Transcript)
31- Bacterial Genome Sizes
- Smallest Buchnera spp. 460 kb (0.46 Mb)
- Mycoplasma genitalium 580 kb (0.58 Mb)
- Largest Myxococcus xanthus 9200 kb (9.2 Mb)
- Median 2000 kb (2.0 Mb)
- Average gene size 0.9-1.0 kb
- 90 of genome encodes protein and stable RNA
- ? The larger the bacterial genome, the more
- genes the genome contains
- Bacterial gene number reflects bacterial
lifestyle - small genomes obligate parasites
- large genomes metabolically flexible and/or
development
32- Minimal genome size
- Mycoplasma genitalium genome
- smallest procaryotic genome sequenced
- 108-121 genes not required for growth in
laboratory - 265-350 genes required for growth in laboratory
33Genome Sizes in Major Bacterial Groups
34Genome Geometries in Major Bacterial Groups
35- Bacterial Chromosome Copy Numbers
- Bacteria may contain gt4 copies of sequences
near their replication origin under fast growth
conditions due to multiple initiations
- A few bacteria contain gt1 complete copy of
their - chromosome e.g. Deinococcus radiodurans
36- Bacterial Chromosome Numbers
- Most bacteria contain a single chromosome (
extrachromosomal elements) - Some bacteria have been found also to contain
2-3 replicons which can be considered either
megaplasmids or minichromosomes - e.g. 3.0 Mb and 0.9 Mb replicons in Rhodobacter
sphaeroides - A few bacterial genera contain gt1 chromosome
- e.g. 2.1 Mb and 1.2 Mb chromosomes in Brucella
- Some bacteria harbour large replicons essential
for survival in a specific ecological niche but
not under laboratory conditions - e.g. 1.4 Mb and 1.7 Mb replicons inRhizobium
meliloti are required for plant symbiosis - Likely that multiple chromosomes have arisen
independently a number of times from single
chromosomes
37E. coli Genome
Escherichia coli O157H7, complete genome  Â
http//www.ncbi.nlm.nih.gov/genomes/framik.cgi?db
genomegi176
http//www.ncbi.nlm.nih.gov/cgi-bin/Entrez/paltik?
gi115dbG
38- Gene Order and Orientation
- Gene order in bacteria is fluid over
evolutionary time,even among bacteria within the
same phylum - No obvious rationale for gene order although
genes near the replication origin may be present
at increased numbers - Gene orientation is often more regular
replication and transcription often proceed in
the same direction - The order of genes within operons is commonly
conserved
39- Is Genome Structure Conserved Between
Closely-Related Bacteria? - E.coli and S. typhimurium, and S. paratyphi sp.
are closely-related enterobacteria with similar
genome structures
40- Is Genome Structure Conserved in Different
Isolates? - Genomes of S. typhi natural isolates show
structural rearrangements by homologous
recombination between rRNA operons
41Is Genome Structure Conserved in Different
Species?
- Mycobacterium tuberculosis (TB) and Mycobacterium
leprae (leprosy) - though closely related have very different sized
genomes - M. tuberculosis large genome (4.4 Mb,
4000 genes) - gt250 genes devoted to lipid synthesis
- large number of regulatory genes
- M. leprae much smaller genome (2000 genes?)
- half of genome devoid of functional genes
42Chromosome Rearrangements in Mycoplasma
- M. genitalium and M. pneumoniae are
closely- related bacteria whose genomes have
been sequenced - Genome structural alterations
- Insertions, deletions, and other rearrangements
required to reorder six segments (possibly
mediated by repeat sequences)
43- Summary
- Many bacterial genomes have been sequenced
even more are in progress - Both sequencing and physical analyses give
valuable information about genome structure and
organization - Bacterial genomes vary in size more DNA more
genes - Chromosomes are mainly circular, but may be
linear - Some bacteria contain gt1 chromosome, or gt1 copy
of an individual chromosome - Most of the genome is composed of coding
sequences - Gene order is fluid
- Operons are conserved
- Genome structure may be conserved over long
evolutionary periods or may undergo rearrangement
44Gene density is much higher in bacteria than in
eukaryotic genomes and there are fewer genes
(most bacterial genomes have lt 5000 genes). The
smallest bacterial genome that has been sequenced
(M. genitalium at about 0.6 Mb) contains only
400-500 genes and has been studied to determine
the minimum number of genes needed for life
(estimated to be between 250 and 350).
45Microbial Genes in the Human Genome
- Nature 409860 (2001)/ human genome
- bacterial infections led to permanent transfer of
genes into their host? (Lateral transfer) - (223 BVTs- Bacteria Vertebrate transfer genes)
-
- Science 2921903-1906 (2002)
- - About 40 genes were found to be exclusively
shared by humans and bacteria - Gene loss due to evolutionary rate variation
46GC skew with respect to replication
47Bacterial tRNA genes (61anticodons)
- Redundancy? Why so many?
- - E. coli MG1655- 88
- - E. coli O157H7- 100
- - Mycoplasma- 36
- - Archaea- 35-45
- - Human 648
- - C. elegans 794
48Circular representation of the S. typhi genome.
The outer scale is marked in megabases. Circles
range from 1 (outer circle) to 9 (inner circle).
Circles 1 and 2, genes on forward and reverse
strand circles 3 and 4, genes conserved with E.
coli circles 5 and 6, genes unique to S. typhi
with respect to E. coli circle 7, pseudogenes
circle 8, GC content circle 9, GC bias
((GÂ -Â C/GÂ C) All genes are colour-coded by
function dark blue, pathogenicity/adaptation
black, energy metabolism red, information
transfer dark green, membranes/surface
structures cyan, degradation of macromolecules
purple, degradation of small molecules yellow,
central/intermediary metabolism light blue,
regulators pink, phage/IS elements orange,
conserved hypothetical pale green, unknown
function brown, pseudogenes.
Nature 413, 848 - 852 (2001) Complete genome
sequence of a multiple drug resistant Salmonella
enterica serovar Typhi CT18
49Figure 1 The Salmonella enterica serovar
Typhimurium LT2 genome. a, The chromosome. Base
pairs are indicated outside the outer circle. The
outer two circles represent the coding
orientation, with the forward strand on the
outside and the reverse strand on the inside. Red
indicates close homologues in all eight genomes.
Green indicates genes with a close homologue in
at least one other Salmonella (S. typhi, S.
paratyphi A, S. paratyphi B, S. arizonae or S.
bongori) but not in E. coli K12, E. coli O157H7
and K. pneumoniae. Blue indicates genes present
only in S. typhimurium LT2. Grey indicates other
combinations. The black inner circle is the GC
content the purple/yellow innermost circle is
the GC bias. The positions of the origin of
replication (ORI) and terminus (TER) are shown.
b, The plasmid pSLT. Base pairs are indicated
outside the outer circle. The plasmid is not to
scale. The colouring scheme is the same as for a.
50Molecular Characterization of Bacterial genomes
Consider four pre-eminent techniques 1.
Nucleotide sequencing of entire genome 2. DFI
(differential fluorescence induction) 3. IVET
(in vivo expression technology) 4. STM
(signature-tagged transposon method).
51Differential fluorescence induction
52"Green fluorescent protein (Gfp) can be expressed
in a variety of microorganisms without adversely
affecting their pathogenicity. The method has
been employed to identify genes of Salmonella
that respond to an acidic environment as well as
those genes that are exclusively expressed within
macrophages..."
53IVET (in vivo expression technology)
- "...make a library in which random genomic
fragments are ligated to a gene for a selectable
marker that is required for survival in the host
animal. - Only those bacteria harboring a fusion that
contains an active promoter will survive passage
through the host. - Fusions bearing promoters with constitutive
activation can be identified and discarded by
examining reporter activity on laboratory medium. - By harvesting bacteria from different sites in
the body, a list of genes required for different
stages of infection can be compiled.."
54IVET (in vivo expression technology)
- These statements are a bit opaque. What is being
asserted is - random genomic fragments (from a bacterial
genome being characterized) constitute the
"upstream components" of the fusion - what is being measured is if they contain a
promoter followed by an essential genetic
product - "downstream" (of the promoter and the product in
this fusion) is "a reporter" - this segment "reports" to the observer that the
bacterium is present.
55IVET (in vivo expression technology)
- Thus, the investigator can look in various host
tissues. - If there is no reporter reporting, there is no
bacterium present. - If there is a reporter reporting, the bacterium
is present and is present because a gene
necessary for presence had been induced by the
environment of the tissue. - The induction is ascertained by not seeing the
reporter reporting on laboratory medium which
would identify nonspecific constitutive
expression. So an inventory of genes necessary
for infection in a tissue can be developed.
56STM (signature-tagged transposon method)
- "In STM, each member of a complex library of
mutants is marked with a unique oligonucleotide
sequence. - If a mutant is absent after passage of the
library through an infected animal or another
selective environment, the mutation it harbored
may be a gene essential for survival.
57STM (signature-tagged transposon method)
- Create a bunch of mutants by disrupting a gene
with a "transposon (thus, producing mutant 1,
mutant2, mutant 3) This collection is the
complex library. - Infect host.
- Look for what mutant microorganisms are present
(readily determined by looking for the "unique
nucleotide sequence of the transposn" which was
used to create to each mutant). - If the unique nucleotide sequence is absent, then
the organism is absent. - And, the organism is absent because it lacks
something that was necessary for it to be
present. And that something is the unmutated
gene. - Or, stated another way, by having the mutation,
there was no longer the wild type gene that is
necessary to achieve infection.