Title: Methods
1The Sorcerer II Global Ocean Sampling Expedition
Metagenomic Characterization of Viruses within
Aquatic Microbial Samples Shannon J. Williamson,
Douglas B. Rusch, Shibu Yooseph, Aaron L.
Halpern, Karla B. Heidelberg, John I. Glass,
Cynthia Andrews-Pfannkoch, Douglas Fadrosh,
Christopher S. Miller, Granger Sutton, Marvin
Frazier, J. Craig Venter
2Why so hard for Classical Marine Biologists and
Microbiologists?
- Enormous and diversity of microorganisms,
- Difficult to culture and study in the lab, etc.
- Venters answers,
- Whole Genome Shotgun Sequencing,
- computationally derived metabolisms
- http//biocyc.org/
3http//camera.calit2.net/metagenomics/what-is-meta
genomics.php
4(No Transcript)
5Discoverynth1,500 liters of water
- 1.045 x 109 new bp of non-redundant sequence,
- gt1,800 new species,
- 148 new bacterial phylotypes,
- 1.2 x 106 new protein sequences,
- 70,000 novel (no match in the database),
- etc.
"We chose the Sargasso seas because it was
supposed to be a marine " says Venter wryly. "The
assumption was low diversity there because of the
desert, extremely low nutrients. - Venter
(Bio-IT World, 4/16/04)
6Figure 1
S1
7(No Transcript)
8Points to Ponder
- What about Unitigs, and the assembly of an
environmental sample? - All data went to Genbank.
9Figure 2
10Figure 3
11Figure 5
12Close-up
13Figure 6
14(No Transcript)
15(No Transcript)
16The Sorcerer II Global Ocean Sampling Expedition
Metagenomic Characterization of Viruses within
Aquatic Microbial Samples Shannon J.
Williamson, Douglas B. Rusch, Shibu Yooseph,
Aaron L. Halpern, Karla B. Heidelberg, John I.
Glass, Cynthia Andrews-Pfannkoch, Douglas
Fadrosh, Christopher S. Miller, Granger Sutton,
Marvin Frazier, J. Craig Venter
17Detection of Viruses in the OceanProblems
- Large viruses (0.1 µm0.22 µm) get caught in the
filters because of their size and geometric
shape, - Small free living phages flow through the filter,
- When filtrating large volumes, biomass
accumulates on the filter and viruses get caught, - Most viruses found within the aquatic microbial
communities studies seemed to be in the lytic
infection cycle.
18(No Transcript)
19Methods
First
- Cruise the world
- Collect 90-200 L of seawater
- from each of 37 different stations
- Record pH, salinity, temperature,
- etc. of water
20Methods
- Pass water through 2.0, 0.8, 0.1
- µm filters,
- Store at -20C until shipment from
- next port.
21Sequencing Preparation
- Extract DNA
- Nebulize DNA
- Average of 1.0-2.2 kb fragments
- Gel electrophoresis extraction
- purify and determine lengths
- Subclone into E. coli
- Colonies selected for inserts
- Shotgun sequence inserts
22Sequencing
- End sequence each insert
- Average of 822 bp sequenced per end
www.pasteur.fr/recherche/genopole/PF8/equipement_e
n.htmlnopole/PF8/equipement_en.html
23Metagenomic Assembly
- Same procedure as in humans, Drosophila, dogs,
etc.
Unitigs using 98 or 94 homology for overlap
Scaffolding
Consensus sequence
24Metagenomic Assembly
- New uses for shotgun sequencing and assembly
- Multiple organisms at once,
- Likely novel organisms.
Problems?
- Mate-pair data relied on more heavily, since
overlap coverage is - low or unknown,
- Need verification of assembly somehow?
25Metagenomic Assembly
- Created multiple distinct assemblies
- 98 and 94 homology unitigs
- non-preassembled end-pairs at various
stringencies for multiple sequence alignments - Multiple assemblies allowed cross-referencing,
- quality assurance.
26Taxonomic Assignment
- Protein-ORF based strategy
-
- 5.6 million sequences from GOS
- All ORFs in same sequence scaffold compared to
- NCBI protein database using BLAST
- Votes tallied from each ORF into pools for
scaffold -
- Archea, Bacteria, Eukaryota, Viral
- 5.0 million sequence assigned using this
method
27Quantitative PCR
- Quantifying genes in environmental samples
- from station to station?
- versus one another?
http//www.invitrogen.com/content.cfm?pageid10037
28(No Transcript)
29Clustering and Phylogeny
- Genes clustered and compared to NCBI
- Sequence alignments, not just domains
- Phylogeny trees generated
- Multiple sequence alignments CLUSTALW
- Used only long, fairly homologous samples
- PHYLIP used to build trees
- Based on difference matrix
30Phylogenetic Analyses
Figure 2. Phylogenetic trees of all GOS and
publicly available psbA(A) and psbD(B) sequences.
BS indicates bootstrap values. GOS and public
viral sequences are colored aqua and pink
respectively. GOS and public prokaryotic
sequences are navy blue and lime green
respectively. doi10.1371/journal.pone.0001456.g00
2
31Figure 3. Phylogenetic trees of all GOS and
publicly available pstS(A) and talC(B) sequences.
BS indicates bootstrap values. GOS and public
viral sequences are colored aqua and pink
respectively. GOS and public prokaryotic
sequences are navy blue and lime green
respectively. GOS eukaryotic sequences are
colored yellow. doi10.1371/journal.pone.0001456.g
003
32Identification of Viral Sequences
- Data from microbial fraction of water samples was
examined - Looked for viral sequences by comparison to the
NCBI non-redundant protein database - 154,662 viral peptide sequences were identified
- Approximately 3 of predicted proteins were
identified as viral sequences - Number of viral sequences thought to be largely
underestimated
33Classification through Protein Clustering
- Of 154,662 viral peptide sequences, 117,123 or
76 fell within 380 protein clusters containing
at least 20 proteins - Remaining sequences fell within clusters
containing less than 20 proteins - Average cluster size contained 258 peptide
sequences
34(No Transcript)
35(No Transcript)
36(No Transcript)
37All viral gene families were positively
correlated with water temperature Some viral
gene families were correlated with salinity,
water depth, and calculated trophic status
indices Different environmental pressures may
influence acquisition of these genes by
viruses Table S7 shows the correlations between
viral gene families and environmental parameters
38Neighbor Functional Linkage Analysis
- Used to verify that they were on viral instead of
pro-viral regions of bacterial genomes - Proportion of viral same-scaffold ORFs range from
32 to 92 for the metabolic gene families
studied - Occurrence of viral neighbors on same scaffolds
as host-derived viral genes supports hypothesis
that sources of the sequences are viruses rather
than bacterial
39Viruses with Metabolic Genes
- Through lateral gene transfer, metabolic genes
can be acquired from the host - Acquisition, retention, and expression of
metabolic genes may increase fitness - Key metabolic processes and pathways running
during infection allows maximum replication - Previous studies on host-derived metabolic viral
genes has been on the photosynthesis genes psbA
and psbD of a cyanophage - Previous studies did not focus on abundance or
distribution of these genes in the oceans
40Host-Derived Metabolic Gene Families
- In aquatic viral communities sampled,
host-derived genes were found widely distributed
in significant proportions - Quantitative PCR of the these genes confirmed
high abundance - Not known if these genes were expressed at the
time of sampling - Unlikely to see these genes in high abundance if
they - Were not expressed
- Did not have a fitness advantage
41Suggests that viruses may play a more
substantial role in environmentally relevant
metabolic processes than previously recognized
such as the conversion of light to energy,
photoadaptation, phosphate acquisition, and
carbon metabolism
42Discussion
- Most studies have focused on the filtered viral
fraction of the data - This is the first study to focus on the viral
components in the microbial fraction of the data - Strong evidence for abundance and distribution of
environmentally important host-derived viral gene
families - Distribution patterns of host-derived viral
families over environmental gradients - Evidence of interactions between bacteriophage
and host organisms
43Potential Evolutionary Viral-Host Relationships
- The study of the cyanophage found that the
host-derived genes undergo higher mutation rates
than their cyanobacterial nucleotide counterpart - After phage acquisition, the genes could
diversify - Mutated viral genes could form gene reservoirs
for the host - Through horizontal gene transfer, viruses could
promote diversity and distribution
44Prochlorococcus P-SSM4-like Phage
- Prochlorococcus is one of the most widespread
picophytoplankton in the ocean - P-SSM4-like phage may influence the abundance,
diversity, and distribution of Prochlorococcus - Statistically significant relationship between
the Prochlorococcus and the P-SSM4-like phage
45Metagenomic Viral-Microbial Interactions
- This study of viral-microbial association between
communities was coincidental - Horizontal transfer of metabolic genes
- More studies necessary on the viral-microbial
diversity and genetic complement - Community relationships
- Evolutionary relationships
46Any Questions?