An Introduction to Community Genomics - PowerPoint PPT Presentation

1 / 79
About This Presentation
Title:

An Introduction to Community Genomics

Description:

A scientific discipline that studies the structure and functioning of a genome ... Xenopus tropicalis. Gallus gallus. Mus musculus. Pan troglodytes. Homo sapiens ... – PowerPoint PPT presentation

Number of Views:200
Avg rating:3.0/5.0
Slides: 80
Provided by: Kui1
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to Community Genomics


1
An Introduction to Community Genomics
  • ??
  • Laboratory of Computational Molecular Biology
  • College of Life Sciences, Beijing Normal
    University

2
Nico M. van Straalen and Dick Roelofs Oxford
University Press 2006
A scientific discipline that studies the
structure and functioning of a genome with the
aim of understanding the relationship between the
organism and its biotic and abiotic environments
3
SCIENCE 3151781 30 MARCH 2007
4
References
  • Whitaker RJ Banfield JF. 2006. Population
    genomics in natural microbial communities. TREE.
    doi10.1016/j.tree.2006.07.001
  • Allen, E.E. and Banfield, J.F. (2005) Community
    genomics in microbial ecology and evolution. Nat
    Rev Microbiol, 3, 489-498
  • Chen K. and Pachter L. (2005) Bioinformatics for
    whole-genome shotgun sequencing of microbial
    communities. PLos Computational Biology. 1, e24.
  • Three online papers in PLoS Biology (Vol 5. March
    2007)

5
The genomics revolution invading ecology
  • 1900 rediscovery of the laws of inheritance
  • 1902 genetics (William Bateson)
  • 1909 - gene (Wilhelm Ludvig Johannsen)
  • 1920 genome (Hans Winkler)
  • Mid-1980s genomics
  • The interaction between genomics and ecology gt
    evolutionary and ecological functional genomics
    focusing on the genes that affect evolutionary
    fitness in natural environments and populations
    (Feder Mitchell 2003).

6
Not just a new tool for analyzing fundamental
ecological questions
  • With the new technology
  • New scientific questions emerge
  • Existing questions can be answered in a way that
    was not considered before.
  • Forging a mechanistic basis for ecology that is
    often felt to be missing.
  • Strengthening the relationship between ecology
    the other life sciences.

7
An important challenge confronting evolutionary
ecologists
  • Genotypes ? Phenotypes
  • Understanding how individual traits of organisms
    affect both their interactions with other species
    and the dynamics of the ecosystem community.

8
Now ecologists can perform delicate genetic
manipulations to obtain well-characterized
genotypes, which, together with mechanistic
knowledge of phenotypic plasticity, provide
information on the effects of individual traits
on species interactions in communities.
Aims at understanding the mechanistic basis for
adaptation and phenotypic variation by using
genomic techniques to investigate the mechanistic
and evolutionary basis of species interactions,
and focuses on identifying the genes affected by
evolution.
9
Microbial ecology the use of genomics approaches
has proceeded further than in any other
subdiscipline of ecology.
10
Genetics studies genes one by one. Genomics
analyses the genome in its entirety.
Ecological genomics investigates phenotypic
biodiversity as well as diversity in the genome.
11
Internal tangled bank
  • The interactions between the genes within the
    genome and the dynamic character of the genome on
    an evolutionary scale (Dover 1999).
  • It emphasizes the role of genetic turbulence
    (gene duplication, genetic sweeps, exon
    shuffling, transposition, etc.) in the genome and
    it illustrates that
  • there is ample scope
  • for innovation from
  • within.
  • Tangled bank (Darwin 1859)

12
Evolution through tinkering (Francois Jacob 1977)
  • These innovations are then checked against the
    external tangled bank, and this constitutes the
    process of evolution.
  • Genetic turbulence leaves many traces in the
    genome that do not have direct negative
    phenotypic consequences these traces from the
    past provide a valuable historical record for
    genome investigators to discover.

13
Model organism selection?
From an ecologists point of view, the absence of
reptiles, amphibians, molluscs (????), and
annelids(????) is striking.
1138 Prokaryotic Genome Sequencing Projects
Selected Complete - 461, Draft assembly - 307,
In Progress 370 257 Eukaryotic Genome
Sequencing Projects Selected Complete - 24,
Draft assembly - 95, In Progress 138 (NCBI
2007-2-27, http//www.ncbi.nlm.nih.gov/genomes/sta
tic/gpstat.html)
14
Some model species genomes their genetic
variation
Saccharomyces cerevisiae
Caenorhabditis elegans
They do not match the model organisms favored by
ecologists.
Drosophila melanogaster
Takifugu rubripes
Xenopus tropicalis
Gallus gallus
Mus musculus
Pan troglodytes
Homo sapiens
(different population variations)
Population A
Population B
Population C
15
Classic molecular phylogenetics is based on
variation of orthologuos DNA sequences across
species.
Why a functional analysis of the genome can
provide a different picture than an inventory of
genes?
Almost all cells of an organism have the same
genome, but not the same transcriptome.
16
(No Transcript)
17
Most of microorganisms are uncultivated
  • Genome sequence information for isolates from
    phylogenetically diverse lineages has had a
    marked impact on our understanding of microbial
    physiology, biochemistry, genetics, ecology and
    evolution.
  • Various cultivation-independent methodologies
  • 16S rRNA gene clone library collections
  • Group-specific FISH
  • How to link these organisms to their ecological
    roles?

18
Definitions
  • Community genomics direct genome sequencing of
    communities of microorganisms from environmental
    samples. It emphasizes the sequencing of an
    assemblage of genomes of different evolutionary
    and ecologically distinct organisms that
    co-occur.
  • Population genomics population genetic analysis
    of individual-level variation across the entire
    genome.

19
Environmental metagenomics
20
(No Transcript)
21
Sequencing an isolated genome
22
  • Two methods for collecting community genome
    sequence data

23
Two methods in metagenomics
24
????????????????,????????250bp,??????????,???,???R
NA,PCR??,BAC????????? Sanger??????????,FLX????????
??
25
Two studies about the extremities
  • The acid mine biofilm community (Tyson et al.
    2004. Nature) consists of only four dominant
    species. Shotgun sequencing of 75 Mbp gt two
    near-complete genome sequences and detailed
    information about metabolic pathways and
    strain-level polymorphism.
  • The Sargasso Sea community containing more than
    1,800 species Venter et al. 2004. Science
    Acinas et al. 2004. Nature). With an enormous
    amount of sequencing (1.6 Gbp), gt vast amounts
    of previously unknown diversity, including over
    1.2 million new genes, 148 new species, and
    numerous new rhodopsin genes.

26
NATURE VOL 428 4 MARCH 2004 37
27
(No Transcript)
28
Metabolic pathways
29
(No Transcript)
30
(No Transcript)
31
J. Craig Ventor, a Charles Darwin of the 21st
century,head of the J. Craig Venter Institute in
Rockville, Maryland
So many new microbial species that the
researchers want to redraw the tree of microbial
life.
Global Ocean Sampling 7.7 million snippets of
sequence in a trio of online papers in PLoS
Biology (13 March).
Sampled at 41 locations, isolating and
subsequently freezing bacterium-sized cells. They
also recorded the temperature, salinity, pH,
oxygen concentration, and depth.
(Science 16 March 2007
32
(No Transcript)
33
  • Estimating the number of species in the samples
    based on slowly evolving marker genes.
  • more than 400 microbial species new to science
  • more than 100 of those are sufficiently different
    to define new taxonomic families
  • This is a great milestone event for
    environmental microbiology, says Dawn Field, a
    molecular evolutionary biologist at the Centre
    for Ecology and Hydrology in Oxford, U.K

34
Paradox of the plankton
The staggering variety of genes may endow each
species with sufficiently different metabolic
tool kits to take advantage of slightly different
combinations of resources, including the waste
products of others, such that they can all
coexist.
35
Venter expects that some of these can be
exploited to develop new synthetic materials,
clean up pollution, or bioengineer fuel
production.
CAMERA- the Community Cyberinfrastructure for
Advanced Marine Microbial Ecology Research and
Analysis Hunting for correlations between DNA
sequence and environment for clues about
co-occurring microbes.
36
March 2007 Vol. 5 (3)
37
Nature 3504. 2005
Breitbart et al. (2002) Genomic analysis of
uncultured marine viral communities. PNAS 99,
14250-14255.
38
Marine viromes the oceans are awash with viruses
Locations the Sargasso Sea (SAR), Gulf of
Mexico (GOM), British Columbia (BBC), and the
Arctic Ocean.
  • Most viral species are widely dispersed, but
    local environmental conditions dictate which
    species are most common in a particular oceanic
    region (Angly, et al. Plos Biology 2006).

39
Angly et al. 2006
Metagenomic analyses of 184 viral assemblages
collected over a decade and representing 68 sites
in four major oceanic regions showed that most of
the viral sequences were not similar to those in
the current databases. There was a distinct
marine-ness quality to the viral assemblages.
Global diversity was very high, presumably
several hundred thousand of species, and regional
richness varied on a North-South latitudinal
gradient. The marine regions had different
assemblages of viruses. Cyanophages (????) and a
newly discovered clade of single-stranded DNA
phages dominated the Sargasso Sea sample, whereas
prophage-like sequences were most common in the
Arctic.
40
(No Transcript)
41
Human gut microbiome Understanding complex
microbial communities in human ecology
  • Trillions (10131014) of bacteria (microbiota)
    reside in our gastrointestinal tracts. At least
    100 times as many genes as our own genome.
  • Each person has a unique collection of bacteria,
    and an individuals dominant flora is relatively
    stable over time (assays of fecal samples).

42
Human gut microbes associated with obesity
The composition of the microbial community in the
gut affects the amount of energy extracted from
the diet.
T0, T1, T2, T3 baseline, 12 weeks, 26 weeks,
52 weeks. FAT-Rfat restricted CARB-R
carbohydrate restricted 18,348 16S rRNA
sequences 12 obese people Firmicutes
???? Bacteroidetes ????
  • Correlation between body-weight loss and gut
    microbial ecology (Ley, et al. Nature 2006).

43
Variation in Bacterial Diversity within the Human
Colonic Microbiotas(3 Healthy Humans) fan-like
phylogenetic architecture (why?)
  • 16S rRNA bacterial sequence data set (n 11,831)
    and alignment of Eckburg et al. (Science 2005).
  • (A), (B), and (C) the portion of the whole tree
    that is contributed by individuals A, B, and C
    from the study. Each tree represents the whole
    data set.
  • Red, blue, and yellow diversity unique to the
    individual
  • White portions of the tree that are shared with
    another individual
  • Black diversity that was not encountered in a
    given individual.

44
Cell 124, 837848, February 24, 2006
ecogenomic views of how pathogens arise and
function within our indigenous microbial
communities.
the microbial diversity of the human gut is the
result of coevolution between microbial
communities and their hosts.
45
Two different objectives
  • Metagenomics establishing gene inventories and
    natural product discovery gt the functional and
    sequence-based analysis of the collective
    microbial genomes that are contained in an
    environmental sample.
  • Community genomics emphasizing the analysis of
    species populations and their interactions,
    recognizing that both species composition and
    interactions change over time, and in response to
    environmental stimuli.

The system under investigation should be sampled
repeatedly.
46
Paired-end sequences sequences from both ends of
the same cloned genome fragment. The genetic
distance between paired sequences is estimated as
the average insert size for the clone library.
47
Broad-range PCR amplification and sequencing of
microbial 16S rRNA genes.
48
Microbial diversity in environmental samples
Why? Clusters of very closely related sequences
at the tips of phylogenetic trees separated by
relatively long branches.
We require the evolutionary and ecological
mechanisms
49
NATURE 2004. 430551
50
In the microbial world, the ecological and
evolutionary significance of this pattern of
genetic diversity is not well understood.
  • What evolutionary mechanisms drive divergence
    among clusters of sequences?
  • Does every sequence cluster represent an
    ecologically unique species?
  • What types of differential adaptations enable
    closely related clusters of organisms to coexist
    within a single environment?
  • We need the evolutionary and ecological
    mechanisms that structure diversity in
    microorganisms.

51
Environmental genomic tags for functional
analysis of complex microbial communities.
52
Science 2005. 38554
The predicted metaproteome, based on fragmented
sequence data, is sufficient to identify
functional fingerprints that can provide insight
into the environments from which microbial
communities originate.
The environment-specific distribution of unknown
orthologous groups and operons offers exciting
avenues for further investigation.
53
Two-way clustering of samples and KEGG maps
54
(No Transcript)
55
What genomics can help?
  • Resolving the genetic and metabolic potential of
    communities
  • Establishing how functions are partitioned in and
    among populations
  • Revealing how genetic diversity is created and
    maintained
  • Identifying the primary drivers of genome
    evolution and speciation.

56
Current topics (i)
  • Identification of novel enzymes, antibiotics, and
    signaling molecules through functional screening
  • Riesenfeld, C.S., Schloss, P.D. and Handelsman,
    J. (2004) Metagenomics genomic analysis of
    microbial communities. Annu Rev Genet, 38,
    525-552.

57
Current topics (ii)
  • Identification of the community structure of
    viral populations
  • Breitbart, M., Salamon, P., Andresen, B.,
    Mahaffy, J.M., Segall, A.M., Mead, D., Azam, F.
    and Rohwer, F. (2002) Genomic analysis of
    uncultured marine viral communities. Proc Natl
    Acad Sci U S A, 99, 14250-14255.

58
Current topics (iii)
  • Identification of the diversity of metabolisms
    that can be recognized from functional genes
  • Tringe, S.G., von Mering, C., Kobayashi, A.,
    Salamov, A.A., Chen, K., Chang, H.W., Podar, M.,
    Short, J.M., Mathur, E.J., Detter, J.C. et al.
    (2005) Comparative metagenomics of microbial
    communities. Science, 308, 554-557.

59
Current topics (iv)
  • Identification of broad estimates of
    community-level diversity based on sampling
    statistics of genomic libraries
  • Venter, J.C., Remington, K., Heidelberg, J.F.,
    Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D.,
    Paulsen, I., Nelson, K.E., Nelson, W. et al.
    (2004) Environmental genome shotgun sequencing of
    the Sargasso Sea. Science, 304, 66-74.

60
Current topics (v)
  • Coarse-scale assessments of the types of
    organisms present within a sample using
    phylogenetic anchors that are well represented
    in the public sequence databases.
  • DeLong, E.F., Preston, C.M., Mincer, T., Rich,
    V., Hallam, S.J., Frigaard, N.U., Martinez, A.,
    Sullivan, M.B., Edwards, R., Brito, B.R. et al.
    (2006) Community genomics among stratified
    microbial assemblages in the ocean's interior.
    Science, 311, 496-503.
  • Tringe, S.G., von Mering, C., Kobayashi, A.,
    Salamov, A.A., Chen, K., Chang, H.W., Podar, M.,
    Short, J.M., Mathur, E.J., Detter, J.C. et al.
    (2005) Comparative metagenomics of microbial
    communities. Science, 308, 554-557.
  • Venter, J.C., Remington, K., Heidelberg, J.F.,
    Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D.,
    Paulsen, I., Nelson, K.E., Nelson, W. et al.
    (2004) Environmental genome shotgun sequencing of
    the Sargasso Sea. Science, 304, 66-74.

61
Estimating the community sequencing endeavor
  • Community diversity
  • Analysis of 16S rRNA gene libraries
  • Relative species richness (number of species)
    using FISH(??????)
  • Relative species evenness (relative abundance of
    each species)
  • Genome size
  • From known sizes of related species if available
  • Using the average prokaryotic genome size
  • (3.16Mb 1.79MB from 215 genomes)
  • Amount of sequencing
  • if the abundance of a given organism is 1, with
    a genome size of 3Mb, then 2.4Gb of sequence
    would be required to obtain 8X (near complete)
    genome coverage of that organism.

62
Calculating how much shotgun sequencing
is needed for population
genomics
Fig. a shows a rank-abundance curve with the
relative distribution of different species in a
model community (highly diverse), where a few
species are abundant but most are rare. Assuming
that there is no bias in cloning, the proportion
of the clone library with sequence from each
species (Pi) can be calculated as
where Gi is the average genome size of each
species i and Ai is the relative abundance of
that species in the community.
63
The average depth of coverage for each species
from the model community, assuming that every
species has same genome size (2 Mb) and a total
of 100 Mbp is sequenced from the library. The
average number of times each base pair will be
sequenced (depth of coverage, C) equals (Pi
T)/Gi, where T is the total number of base pairs
sequenced for the library. Numbers above thick
bars represent estimated average coverage of each
base pair (i.e. 10X means that, on average, each
base pair in the genome was sequenced ten times).
64
How much to sequence
65
WGS sequencing of microbial communities a more
global view of the communities
66
With regards to community diversity, the WGS
approach is less biased than PCR
67
Genomic sequence composition (genome
signature) The phenomenon is sufficiently
pronounced to allow the simultaneous supervised
or unsupervised discrimination among several
different species.
68
The most interesting computational problems
  • Assembling communities
  • How to assemble genomes with low-abundance?
  • Issues about the increased amount of polymorphism
    highly conserved sequences shared between
    different species.
  • Comparative metagenomics
  • Gene finding is a fundamental goal, regardless of
    whether complete genome sequences can be
    assembled or not. Each read is likely to contain
    a significant portion of a gene gt to compare
    different communities in a gene-centric fashion.
  • Phylogeny community diversity
  • Partial sequences are the crux of the phylogeny
    problem in the context of metagenomics
  • Community modeling based on the analysis of
    assembly data gt species abundance curves are not
    lognormal (Angly et al. 2005 Curtis et al. 2002)

69
Gene content difference
Sequence divergence
Multiple strain sequence types Gene
insertion Gene rearrangement
Resolving strain-level heterogeneity.
70
(No Transcript)
71
Integrating community genomics and functional
assays in situ.
Genetic potential of 4 populations. Pyrite
dissolution reaction (???????)
Two dominant clonal variants
72
doi10.1038/nature05624
73
Why do comparative analyses?
  • Comparative genomics of coexisting, closely
    related organisms will enable identification of
    genes that were acquired after speciation or
    strain divergence, and might enable
    identification of the sources of these genes.
  • Comparative community genomic analysis in a
    spatio-temporal context can paint a dynamic
    portrait of the forces that shape community
    diversity and stability.

74
Galapagos Islands Darwins finches Lakes of East
Africa cichlid(????????)fishes Hawaii Islands
tetragnathid spiders Soil bacteria Pseudomonas
(?????) fluorescens. Evolution is generally slow
in natural ecosystems.
75
Diversity controlled by predation immigration
history
Evolution occurs more rapidly in microbial
systems, allowing controlled experiments that
provide insight into how communities develop
through evolution or immigration, and the
potential roles of competition and predation in
driving the process. Soil bacterium Pseudomonas
fluorescens exists as several different forms, or
ecomorphs identifiable genotypes adapted to a
particular niche including SM (smooth), WS
(wrinkly spreader) and FS (fuzzy spreader). Meyer
and Kassen and Fukami et al. have used such
microbial systems to investigate two factors that
help to explain diversity through adaptive
radiation predation and immigration history.
76
What can do in my lab?
  • Genomic tools for predicting novel functions of
    complex microbial communities have not existed
    (Weng, et al. 2006. genome Research. 16316-312.
    Application of sequence-based methods in human
    microbial ecology).
  • Environmental gene tags (ETGs) (Tringe et al.
    2005) analysis of EGTs overrepresented in
    specific environments indicates they perform
    functions important for survival in that
    environment (e.g., sodium transporters in
    seawater).
  • However, such fingerprinting is probably limited
    by our current inability to assign functional
    roles to a large fraction of the predicted
    proteins, many of which are lineage- and
    environment-specific.

77
Painting a dynamic portrait of the forces that
shape community diversity and stability
Metagenomics? Community genomics? Ecogenomics?
Functional annotation Gene finding Assembling
1000 genomes
Completely Sequenced Microbial Genomes
Sampling WGS sequencing
KEGG
COG
Bioinformatics tools eg BLAST etc.
Various communities
78
About us http//cmb.bnu.edu.cn Laboratory
of Computational Molecular Biology
79
Thanks
  • for your
  • Comments and suggestions
Write a Comment
User Comments (0)
About PowerShow.com