Title: An Introduction to Community Genomics
1An Introduction to Community Genomics
- ??
- Laboratory of Computational Molecular Biology
- College of Life Sciences, Beijing Normal
University
2Nico M. van Straalen and Dick Roelofs Oxford
University Press 2006
A scientific discipline that studies the
structure and functioning of a genome with the
aim of understanding the relationship between the
organism and its biotic and abiotic environments
3SCIENCE 3151781 30 MARCH 2007
4References
- Whitaker RJ Banfield JF. 2006. Population
genomics in natural microbial communities. TREE.
doi10.1016/j.tree.2006.07.001 - Allen, E.E. and Banfield, J.F. (2005) Community
genomics in microbial ecology and evolution. Nat
Rev Microbiol, 3, 489-498 - Chen K. and Pachter L. (2005) Bioinformatics for
whole-genome shotgun sequencing of microbial
communities. PLos Computational Biology. 1, e24. - Three online papers in PLoS Biology (Vol 5. March
2007)
5The genomics revolution invading ecology
- 1900 rediscovery of the laws of inheritance
- 1902 genetics (William Bateson)
- 1909 - gene (Wilhelm Ludvig Johannsen)
- 1920 genome (Hans Winkler)
- Mid-1980s genomics
- The interaction between genomics and ecology gt
evolutionary and ecological functional genomics
focusing on the genes that affect evolutionary
fitness in natural environments and populations
(Feder Mitchell 2003).
6Not just a new tool for analyzing fundamental
ecological questions
- With the new technology
- New scientific questions emerge
- Existing questions can be answered in a way that
was not considered before. - Forging a mechanistic basis for ecology that is
often felt to be missing. - Strengthening the relationship between ecology
the other life sciences.
7An important challenge confronting evolutionary
ecologists
- Genotypes ? Phenotypes
- Understanding how individual traits of organisms
affect both their interactions with other species
and the dynamics of the ecosystem community.
8Now ecologists can perform delicate genetic
manipulations to obtain well-characterized
genotypes, which, together with mechanistic
knowledge of phenotypic plasticity, provide
information on the effects of individual traits
on species interactions in communities.
Aims at understanding the mechanistic basis for
adaptation and phenotypic variation by using
genomic techniques to investigate the mechanistic
and evolutionary basis of species interactions,
and focuses on identifying the genes affected by
evolution.
9Microbial ecology the use of genomics approaches
has proceeded further than in any other
subdiscipline of ecology.
10Genetics studies genes one by one. Genomics
analyses the genome in its entirety.
Ecological genomics investigates phenotypic
biodiversity as well as diversity in the genome.
11Internal tangled bank
- The interactions between the genes within the
genome and the dynamic character of the genome on
an evolutionary scale (Dover 1999). - It emphasizes the role of genetic turbulence
(gene duplication, genetic sweeps, exon
shuffling, transposition, etc.) in the genome and
it illustrates that - there is ample scope
- for innovation from
- within.
-
- Tangled bank (Darwin 1859)
12Evolution through tinkering (Francois Jacob 1977)
- These innovations are then checked against the
external tangled bank, and this constitutes the
process of evolution. - Genetic turbulence leaves many traces in the
genome that do not have direct negative
phenotypic consequences these traces from the
past provide a valuable historical record for
genome investigators to discover.
13Model organism selection?
From an ecologists point of view, the absence of
reptiles, amphibians, molluscs (????), and
annelids(????) is striking.
1138 Prokaryotic Genome Sequencing Projects
Selected Complete - 461, Draft assembly - 307,
In Progress 370 257 Eukaryotic Genome
Sequencing Projects Selected Complete - 24,
Draft assembly - 95, In Progress 138 (NCBI
2007-2-27, http//www.ncbi.nlm.nih.gov/genomes/sta
tic/gpstat.html)
14Some model species genomes their genetic
variation
Saccharomyces cerevisiae
Caenorhabditis elegans
They do not match the model organisms favored by
ecologists.
Drosophila melanogaster
Takifugu rubripes
Xenopus tropicalis
Gallus gallus
Mus musculus
Pan troglodytes
Homo sapiens
(different population variations)
Population A
Population B
Population C
15Classic molecular phylogenetics is based on
variation of orthologuos DNA sequences across
species.
Why a functional analysis of the genome can
provide a different picture than an inventory of
genes?
Almost all cells of an organism have the same
genome, but not the same transcriptome.
16(No Transcript)
17Most of microorganisms are uncultivated
- Genome sequence information for isolates from
phylogenetically diverse lineages has had a
marked impact on our understanding of microbial
physiology, biochemistry, genetics, ecology and
evolution. - Various cultivation-independent methodologies
- 16S rRNA gene clone library collections
- Group-specific FISH
- How to link these organisms to their ecological
roles?
18Definitions
- Community genomics direct genome sequencing of
communities of microorganisms from environmental
samples. It emphasizes the sequencing of an
assemblage of genomes of different evolutionary
and ecologically distinct organisms that
co-occur. - Population genomics population genetic analysis
of individual-level variation across the entire
genome.
19Environmental metagenomics
20(No Transcript)
21Sequencing an isolated genome
22- Two methods for collecting community genome
sequence data
23Two methods in metagenomics
24????????????????,????????250bp,??????????,???,???R
NA,PCR??,BAC????????? Sanger??????????,FLX????????
??
25Two studies about the extremities
- The acid mine biofilm community (Tyson et al.
2004. Nature) consists of only four dominant
species. Shotgun sequencing of 75 Mbp gt two
near-complete genome sequences and detailed
information about metabolic pathways and
strain-level polymorphism. - The Sargasso Sea community containing more than
1,800 species Venter et al. 2004. Science
Acinas et al. 2004. Nature). With an enormous
amount of sequencing (1.6 Gbp), gt vast amounts
of previously unknown diversity, including over
1.2 million new genes, 148 new species, and
numerous new rhodopsin genes.
26NATURE VOL 428 4 MARCH 2004 37
27(No Transcript)
28Metabolic pathways
29(No Transcript)
30(No Transcript)
31J. Craig Ventor, a Charles Darwin of the 21st
century,head of the J. Craig Venter Institute in
Rockville, Maryland
So many new microbial species that the
researchers want to redraw the tree of microbial
life.
Global Ocean Sampling 7.7 million snippets of
sequence in a trio of online papers in PLoS
Biology (13 March).
Sampled at 41 locations, isolating and
subsequently freezing bacterium-sized cells. They
also recorded the temperature, salinity, pH,
oxygen concentration, and depth.
(Science 16 March 2007
32(No Transcript)
33- Estimating the number of species in the samples
based on slowly evolving marker genes. - more than 400 microbial species new to science
- more than 100 of those are sufficiently different
to define new taxonomic families - This is a great milestone event for
environmental microbiology, says Dawn Field, a
molecular evolutionary biologist at the Centre
for Ecology and Hydrology in Oxford, U.K
34Paradox of the plankton
The staggering variety of genes may endow each
species with sufficiently different metabolic
tool kits to take advantage of slightly different
combinations of resources, including the waste
products of others, such that they can all
coexist.
35Venter expects that some of these can be
exploited to develop new synthetic materials,
clean up pollution, or bioengineer fuel
production.
CAMERA- the Community Cyberinfrastructure for
Advanced Marine Microbial Ecology Research and
Analysis Hunting for correlations between DNA
sequence and environment for clues about
co-occurring microbes.
36March 2007 Vol. 5 (3)
37Nature 3504. 2005
Breitbart et al. (2002) Genomic analysis of
uncultured marine viral communities. PNAS 99,
14250-14255.
38Marine viromes the oceans are awash with viruses
Locations the Sargasso Sea (SAR), Gulf of
Mexico (GOM), British Columbia (BBC), and the
Arctic Ocean.
- Most viral species are widely dispersed, but
local environmental conditions dictate which
species are most common in a particular oceanic
region (Angly, et al. Plos Biology 2006).
39Angly et al. 2006
Metagenomic analyses of 184 viral assemblages
collected over a decade and representing 68 sites
in four major oceanic regions showed that most of
the viral sequences were not similar to those in
the current databases. There was a distinct
marine-ness quality to the viral assemblages.
Global diversity was very high, presumably
several hundred thousand of species, and regional
richness varied on a North-South latitudinal
gradient. The marine regions had different
assemblages of viruses. Cyanophages (????) and a
newly discovered clade of single-stranded DNA
phages dominated the Sargasso Sea sample, whereas
prophage-like sequences were most common in the
Arctic.
40(No Transcript)
41Human gut microbiome Understanding complex
microbial communities in human ecology
- Trillions (10131014) of bacteria (microbiota)
reside in our gastrointestinal tracts. At least
100 times as many genes as our own genome. - Each person has a unique collection of bacteria,
and an individuals dominant flora is relatively
stable over time (assays of fecal samples).
42Human gut microbes associated with obesity
The composition of the microbial community in the
gut affects the amount of energy extracted from
the diet.
T0, T1, T2, T3 baseline, 12 weeks, 26 weeks,
52 weeks. FAT-Rfat restricted CARB-R
carbohydrate restricted 18,348 16S rRNA
sequences 12 obese people Firmicutes
???? Bacteroidetes ????
- Correlation between body-weight loss and gut
microbial ecology (Ley, et al. Nature 2006).
43Variation in Bacterial Diversity within the Human
Colonic Microbiotas(3 Healthy Humans) fan-like
phylogenetic architecture (why?)
- 16S rRNA bacterial sequence data set (n 11,831)
and alignment of Eckburg et al. (Science 2005). - (A), (B), and (C) the portion of the whole tree
that is contributed by individuals A, B, and C
from the study. Each tree represents the whole
data set. - Red, blue, and yellow diversity unique to the
individual - White portions of the tree that are shared with
another individual - Black diversity that was not encountered in a
given individual.
44Cell 124, 837848, February 24, 2006
ecogenomic views of how pathogens arise and
function within our indigenous microbial
communities.
the microbial diversity of the human gut is the
result of coevolution between microbial
communities and their hosts.
45Two different objectives
- Metagenomics establishing gene inventories and
natural product discovery gt the functional and
sequence-based analysis of the collective
microbial genomes that are contained in an
environmental sample. - Community genomics emphasizing the analysis of
species populations and their interactions,
recognizing that both species composition and
interactions change over time, and in response to
environmental stimuli.
The system under investigation should be sampled
repeatedly.
46Paired-end sequences sequences from both ends of
the same cloned genome fragment. The genetic
distance between paired sequences is estimated as
the average insert size for the clone library.
47Broad-range PCR amplification and sequencing of
microbial 16S rRNA genes.
48Microbial diversity in environmental samples
Why? Clusters of very closely related sequences
at the tips of phylogenetic trees separated by
relatively long branches.
We require the evolutionary and ecological
mechanisms
49NATURE 2004. 430551
50In the microbial world, the ecological and
evolutionary significance of this pattern of
genetic diversity is not well understood.
- What evolutionary mechanisms drive divergence
among clusters of sequences? - Does every sequence cluster represent an
ecologically unique species? - What types of differential adaptations enable
closely related clusters of organisms to coexist
within a single environment? - We need the evolutionary and ecological
mechanisms that structure diversity in
microorganisms.
51Environmental genomic tags for functional
analysis of complex microbial communities.
52Science 2005. 38554
The predicted metaproteome, based on fragmented
sequence data, is sufficient to identify
functional fingerprints that can provide insight
into the environments from which microbial
communities originate.
The environment-specific distribution of unknown
orthologous groups and operons offers exciting
avenues for further investigation.
53Two-way clustering of samples and KEGG maps
54(No Transcript)
55What genomics can help?
- Resolving the genetic and metabolic potential of
communities - Establishing how functions are partitioned in and
among populations - Revealing how genetic diversity is created and
maintained - Identifying the primary drivers of genome
evolution and speciation.
56Current topics (i)
- Identification of novel enzymes, antibiotics, and
signaling molecules through functional screening - Riesenfeld, C.S., Schloss, P.D. and Handelsman,
J. (2004) Metagenomics genomic analysis of
microbial communities. Annu Rev Genet, 38,
525-552.
57Current topics (ii)
- Identification of the community structure of
viral populations - Breitbart, M., Salamon, P., Andresen, B.,
Mahaffy, J.M., Segall, A.M., Mead, D., Azam, F.
and Rohwer, F. (2002) Genomic analysis of
uncultured marine viral communities. Proc Natl
Acad Sci U S A, 99, 14250-14255.
58Current topics (iii)
- Identification of the diversity of metabolisms
that can be recognized from functional genes - Tringe, S.G., von Mering, C., Kobayashi, A.,
Salamov, A.A., Chen, K., Chang, H.W., Podar, M.,
Short, J.M., Mathur, E.J., Detter, J.C. et al.
(2005) Comparative metagenomics of microbial
communities. Science, 308, 554-557.
59Current topics (iv)
- Identification of broad estimates of
community-level diversity based on sampling
statistics of genomic libraries - Venter, J.C., Remington, K., Heidelberg, J.F.,
Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D.,
Paulsen, I., Nelson, K.E., Nelson, W. et al.
(2004) Environmental genome shotgun sequencing of
the Sargasso Sea. Science, 304, 66-74.
60Current topics (v)
- Coarse-scale assessments of the types of
organisms present within a sample using
phylogenetic anchors that are well represented
in the public sequence databases. - DeLong, E.F., Preston, C.M., Mincer, T., Rich,
V., Hallam, S.J., Frigaard, N.U., Martinez, A.,
Sullivan, M.B., Edwards, R., Brito, B.R. et al.
(2006) Community genomics among stratified
microbial assemblages in the ocean's interior.
Science, 311, 496-503. - Tringe, S.G., von Mering, C., Kobayashi, A.,
Salamov, A.A., Chen, K., Chang, H.W., Podar, M.,
Short, J.M., Mathur, E.J., Detter, J.C. et al.
(2005) Comparative metagenomics of microbial
communities. Science, 308, 554-557. - Venter, J.C., Remington, K., Heidelberg, J.F.,
Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D.,
Paulsen, I., Nelson, K.E., Nelson, W. et al.
(2004) Environmental genome shotgun sequencing of
the Sargasso Sea. Science, 304, 66-74.
61Estimating the community sequencing endeavor
- Community diversity
- Analysis of 16S rRNA gene libraries
- Relative species richness (number of species)
using FISH(??????) - Relative species evenness (relative abundance of
each species) - Genome size
- From known sizes of related species if available
- Using the average prokaryotic genome size
- (3.16Mb 1.79MB from 215 genomes)
- Amount of sequencing
- if the abundance of a given organism is 1, with
a genome size of 3Mb, then 2.4Gb of sequence
would be required to obtain 8X (near complete)
genome coverage of that organism.
62Calculating how much shotgun sequencing
is needed for population
genomics
Fig. a shows a rank-abundance curve with the
relative distribution of different species in a
model community (highly diverse), where a few
species are abundant but most are rare. Assuming
that there is no bias in cloning, the proportion
of the clone library with sequence from each
species (Pi) can be calculated as
where Gi is the average genome size of each
species i and Ai is the relative abundance of
that species in the community.
63The average depth of coverage for each species
from the model community, assuming that every
species has same genome size (2 Mb) and a total
of 100 Mbp is sequenced from the library. The
average number of times each base pair will be
sequenced (depth of coverage, C) equals (Pi
T)/Gi, where T is the total number of base pairs
sequenced for the library. Numbers above thick
bars represent estimated average coverage of each
base pair (i.e. 10X means that, on average, each
base pair in the genome was sequenced ten times).
64How much to sequence
65WGS sequencing of microbial communities a more
global view of the communities
66With regards to community diversity, the WGS
approach is less biased than PCR
67Genomic sequence composition (genome
signature) The phenomenon is sufficiently
pronounced to allow the simultaneous supervised
or unsupervised discrimination among several
different species.
68The most interesting computational problems
- Assembling communities
- How to assemble genomes with low-abundance?
- Issues about the increased amount of polymorphism
highly conserved sequences shared between
different species. - Comparative metagenomics
- Gene finding is a fundamental goal, regardless of
whether complete genome sequences can be
assembled or not. Each read is likely to contain
a significant portion of a gene gt to compare
different communities in a gene-centric fashion. - Phylogeny community diversity
- Partial sequences are the crux of the phylogeny
problem in the context of metagenomics - Community modeling based on the analysis of
assembly data gt species abundance curves are not
lognormal (Angly et al. 2005 Curtis et al. 2002)
69Gene content difference
Sequence divergence
Multiple strain sequence types Gene
insertion Gene rearrangement
Resolving strain-level heterogeneity.
70(No Transcript)
71Integrating community genomics and functional
assays in situ.
Genetic potential of 4 populations. Pyrite
dissolution reaction (???????)
Two dominant clonal variants
72doi10.1038/nature05624
73Why do comparative analyses?
- Comparative genomics of coexisting, closely
related organisms will enable identification of
genes that were acquired after speciation or
strain divergence, and might enable
identification of the sources of these genes. - Comparative community genomic analysis in a
spatio-temporal context can paint a dynamic
portrait of the forces that shape community
diversity and stability.
74Galapagos Islands Darwins finches Lakes of East
Africa cichlid(????????)fishes Hawaii Islands
tetragnathid spiders Soil bacteria Pseudomonas
(?????) fluorescens. Evolution is generally slow
in natural ecosystems.
75Diversity controlled by predation immigration
history
Evolution occurs more rapidly in microbial
systems, allowing controlled experiments that
provide insight into how communities develop
through evolution or immigration, and the
potential roles of competition and predation in
driving the process. Soil bacterium Pseudomonas
fluorescens exists as several different forms, or
ecomorphs identifiable genotypes adapted to a
particular niche including SM (smooth), WS
(wrinkly spreader) and FS (fuzzy spreader). Meyer
and Kassen and Fukami et al. have used such
microbial systems to investigate two factors that
help to explain diversity through adaptive
radiation predation and immigration history.
76What can do in my lab?
- Genomic tools for predicting novel functions of
complex microbial communities have not existed
(Weng, et al. 2006. genome Research. 16316-312.
Application of sequence-based methods in human
microbial ecology). - Environmental gene tags (ETGs) (Tringe et al.
2005) analysis of EGTs overrepresented in
specific environments indicates they perform
functions important for survival in that
environment (e.g., sodium transporters in
seawater). - However, such fingerprinting is probably limited
by our current inability to assign functional
roles to a large fraction of the predicted
proteins, many of which are lineage- and
environment-specific.
77Painting a dynamic portrait of the forces that
shape community diversity and stability
Metagenomics? Community genomics? Ecogenomics?
Functional annotation Gene finding Assembling
1000 genomes
Completely Sequenced Microbial Genomes
Sampling WGS sequencing
KEGG
COG
Bioinformatics tools eg BLAST etc.
Various communities
78About us http//cmb.bnu.edu.cn Laboratory
of Computational Molecular Biology
79Thanks
- for your
- Comments and suggestions