Title: Computational Characterization of Short Environmental DNA Fragments
1Computational Characterization of Short
Environmental DNA Fragments
- Jens Stoye1, Lutz Krause1, Robert A.
Edwards2,Forest Rohwer2, Naryttza N.
Diaz1,Alexander Goesmann1, Scott Kelley2, Alfred
Pühler1
1Bielefeld University 2San Diego State University
2Computational Characterization of Short
Environmental DNA Fragments
- Jens Stoye1, Lutz Krause1, Robert A.
Edwards2,Forest Rohwer2, Naryttza N.
Diaz1,Alexander Goesmann1, Scott Kelley2, Alfred
Pühler1
1Bielefeld University 2San Diego State University
3Metagenomics
4CARMA - A Pipeline for Characterizing Short-Read
Metagenomes
- Quantitative analysis of metagenomes
- Which microbes live in an environment?
- What are they doing?
- What are the differences between communities from
different environments?
5(A) Functional Analysis
- Reads directly analyzed without prior assembly
- Protein family fragments used as Environmental
Gene Tags (EGTs) for quantitative analysis of
gene content - GO-term profiles characterize genetic diversity
and potential metabolism of underlying communities
6Heat Map Comparing GO-Term Frequencies
- Comparative analysis reveals genetic and
metabolic trends - Significantly overrepresented GO-terms identified
with G-test
7(B) Analyzing the Community Structure
- EGTs assigned to taxonomic groups based on a
phylogenetic analysis - Taxonomic profiles characterize the composition
of the underlying communities
8Taxonomic Classification of Short Environmental
Gene Tags
Phylogenetic tree reconstructed for each matching
Pfam family
Multiple alignment of known family
members (downloaded from Pfam web site)
Krause et al., submitted
9Taxonomic Classification of Short Environmental
Gene Tags
Phylogenetic tree reconstructed for each matching
Pfam family
PF1-PF7 Known family members EGT1-EGT3
Environmental Gene Tags matching family
Identified EGTs matching family added to full
multiple alignment
Krause et al., submitted
10Taxonomic Classification of Short Environmental
Gene Tags
Phylogenetic tree reconstructed for each matching
Pfam family
- Multiple alignment used to calculate distance
matrix - Pairwise distance sequence identity in aligned
region - Missing values determined with additive
estimation (Landry et al., 1996)
11Taxonomic Classification of Short Environmental
Gene Tags
- Distance matrix used to reconstruct phylogenetic
tree (with Neighbor Joining) - EGTs classified based on their location in tree
Krause et al., submitted
12Performance Evaluation Creating Standard of Truth
- Test set 77 complete genomes
- 2 Superkingdoms (Archaea and Bacteria)
- 10 Phyla
- 29 Classes
- 62 Genera
- 77 Species
- Test set excluded from reference set
- (Pfam members from any of the 77 species omitted
from - full multiple alignments)
13Performance Evaluation Creating Standard of Truth
- 77 genomes fragmentized with ReadSim (Schmid et
al., submitted) - Simulates sequencing using 454 pyrosequencing
- Fragments randomly sampled (2x)
- Fragment length 80-120bp, mean 100bp
- Simulates sequencing errors at homopolymers
14Classification Accuracy for Short Environmental
Gene Tags
Sens Sensitivity, fraction of correctly
classified EGTs Spec Specificity, reliability of
predictions FNrate False negative rate,
proportion of wrongly classified EGTs Urate
Unknown rate, proportion of EGTs not assigned to
any taxonomic group
15Application Example Comparative Analysis of Four
Microbial Coral Reef Communities
- In cooperation with Rob Edwards
- and Forest Rohwer
- (San Diego State University, California)
Dinsdale, et al., submitted
16Influence of Human Activities on Coral Reef
Microbial Communities
Kingman
Palmyra
Tabuaeran
Kiritimati
17GO-Term Profiles Indicate Transition in Metabolic
Activities
Significantly different (p lt 0.01)
Color indicates abundance of GO-terms in each
sample
18Community Structure
19Taxonomic Profiles Indicate Transition from
Prochlorococcus to Synechococcus
(most abundant marine Cyanobacteria)
20Application Example Comparative Analysis of
Three Aquatic Microbial Communities
L. Krause, N. N. Diaz, A. Goesmann, F. Rohwer, S.
Kelley, R. A. Edwards and J. Stoye. Taxonomic
classification of short environmental DNA
fragments. submitted
21Sampling Locations
Rios Mesquites stromatolites, Mexico
Kingman coral reef, Northern Line Islands
San Diego solar salterns, USA
Sample data provided by Forest Rohwer and Robert
Edwards
22Community Structure
pEGTs prokaryotic fraction of EGTs
23Community Structure
Genus
pEGTs prokaryotic fraction of EGTs
24Taxonomic Diversity
H' Diversity, including richness and evenness
(Shannon index)
J Evenness, relative commonness and rarity of
organisms
25Further Applications of CARMA
- Diversity of coral reef viruses
- (in cooperation with Stuart Sandin, Scripps
- Institution of Oceanography, San Diego, USA)
- Waste Water Treatment Plant
- plasmid sample
- (in cooperation with Andreas Schlüter,
- Bielefeld University)
26Conclusions
- Gene fragments identified using Pfam profile
hidden Markov models - Fragments can be assigned to functional role and
taxonomic origin - Profiling allows detection of trends in species
composition, metabolism, and genetic potential - Pyrosequencing combined with profiling techniques
enables rapid and cost-effective assay of
microbial communities
27Acknowledgements
- Co-authors
- Lutz Krause1, Robert A. Edwards2, Forest
- Rohwer2, Naryttza N. Diaz1, Alexander
- Goesmann1, Scott Kelley2, and Alfred Pühler1
- Also many thanks to
- Andreas Schlüter1, Elisabeth Dinsdale2, Scott
Kelley2, Beltran Rodriguez-Brito2, and Christelle
Desnues2
1Bielefeld University 2San Diego State University
28- Thank you for your attention!!!!
29Taxonomic Diversity
pi proportion of EGTs classified into i-th
taxonomic group Hmax total number of
taxa found
Diversity
Evenness