Title: Biological Oceanography Scientific Domain
1Biological OceanographyScientific Domain
DataSpace
- Ed DeLong MIT
- Department of Biological Engineering Department
of Civil and Environmental Engineering
2BIOLOGICAL OCEANOGRAPHY
- Coupling of physical biological oceanographic
processes - Comparative ecosystem analysis
- Biodiversity, biomass and productivity
- C-N-P cycling and energy flow
- Production, consumption of greenhouse gases
climate - Measurement, modeling and experiments with
microbial communities in the sea - Education, training and knowledge exchange
3Scope Scale Challenges in Biological
Oceanography
Oceanagraphic sampling approaches in the context
of scales
(Genomes to Biomes)
Microbial and sampling scales, based on Dickey
(1991) and Allen (2000) Ricardo Letelier
4ADVANCEDINSTRUMENTATION
Continuous, autonomous collection of 4D physical
, chemical and bio-optical datasets
5Physics
- 2 Eddies
- 1 frontal system
- Sub-mesocale features?
Biology
- Higher Chla bellow cyclone
- DCM constant
- Patchy distribution of small particles
- Advection/local production of small particles in
the Ze
6Further specialization Marine Metagenomics
- Traditional microbiology and microbial genome
sequencing studies rely on cultivated cultures - Marine metagenomics DNA sequences of microbial
assemblages from the environment - Metagenomic data is used by scientists across
multiple disciplines, e.g., - Biological engineering biotechnology
- Genomics and computation biology
- Ecology and environmental science
- Climate relationship between marine microbes
the oceans carbon cycle, productivity,
greenhouse gases
7(No Transcript)
82ND Gen Sequencing Platforms
AB3730
454 FLX/titan.
ILLUMINA
Cost per run 50 lt12K lt5K
Bases read/run 72 Kbp 100 Mbp 500 Mbp gt2 Gbp gt 200 Gbp !!!
Bases per read 750 250 450 gt36 (gt 100 Paired end reads)
Reads per run 96 reads/run 400K reads/run 20M reads/run
per Mbp 694 120 7
AB3730 work equivalent - 100x AB3730/dy 300x AB3730/dy
Errors Diverse (cloning bias) Homopolymeric runs Diverse (base subn.)
Run time 1 hour 6.5 hours 2-14 days
9Biological Oceanography Data Challenges
- Wide variety and heterogeneity of data types
- Oceanographic cruise data
- Oceanographic time series data
- Laboratory field experiments
- Remote sensing datasets
- Data from gliders, AUVs moorings
- Genomics, metagenomics, gene expression data
- Numerical simulations synthesis products
- Distributed data (multi-institution
researchers) - Need to balance PI, project public data
accessibility - Data visualization analysis needs
- Long term archiving requirements
10Why do biological oceanographers need DataSpace?
11 UH?MIT?OSU?UCSC?WHOI?MBARI
12 DataSpace partners MIT-OSU Oceanographic
Science PartnersEd DeLong (MIT) Ricardo
Letelier (OSU)Library IT PartnersMacKenzie
Smith (MIT) Terry Reese (OSU)
DeLong and Letelier Co-PIs on three major
projects Center for Microbial Oceanography
Research and Education (C-MORE) Microbial
Oceanography of Oxygen Minimum Zones (MOOMZ)
Microbial diversity and activity in seasonal
hypoxic waters (MI-LOCO)
13Existing Data Portal
http//cmore.soest.hawaii.edu/data.htm
Currently a distributed approach. Consists of
weblinks to individually managed heterogeneous
datasets.
14Where is the data now ? (Oceanographic data)
BCO-DMO
Biological and Chemical Oceanography Data
Management Office database
http//osprey.bcodmo.org/index.cfm
15Where is the data now? (Genomic/metagenomic)
In-house Databases
Public Databases NCBI and CAMERANational
Center for Biotechnology Information Community
Cyberinfrastructure for Advanced Marine Microbial
Ecology Research and Analysis
http//www.ncbi.nlm.nih.gov/
http//camera.calit2.net/
16Why do biological oceanographers need DataSpace ?
- Data access, storage, search not centralized
- Large heterogeneous datasets
- Complex data management/sharing requirements
- Shared multiple Institutions Investigator
- Long term requirements (2017)
- Need cross-investigator,institution,project
search - Currently lots of data is lost, e.g. not
utilizable
17Why do biological oceanographers need DataSpace ?
How many autonomous surveys, cruises, mooring
datasets, hydrocasts, deckboard experiments had
chlorophyll concentrations than X ? Of those
data, how many had light levels and oxygen
concentrations corresponding Y and Z ? Of
those data, how many have corresponding microbial
community taxonomic composition and gene content
data ? (retrieve) What is the relationship
between light, chlorophyll, oxygen and microbial
community taxonomic composition and gene content,
across all datasets ? How do taxa and gene
content relate to oxygen levels and the balance
of production and consumption ? Greenhouse (GHG)
gas levels ? Are there specific gene proxies
that predict oxygen or GHG levels ?
Note centralized data access, search and
storage will also drive the way we (sceintists)
ask our questions, collect, and annotate our
data. A collaboration between scientists, IT,
curators and database managers.
18The DataSpace Project Biological Oceanography
- Enable new discoveries by facilitating access,
search - storage of large, complex heterogeneous datasets
- Provide infrastructure for digital archiving
preservation - at appropriate scales matching scope/complexity
of data
- Enable more integrated intra- inter-project
collaborations, analyses, data encoding,
documentation, sharing, visualizing, and
preservation
- Establish standards best practices to capture,
express, - encode and publish the policies related to
archived data
19The DataSpace Project Biological Oceanography