Storing Data - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Storing Data

Description:

Storing Data Forever Funding Long-Term Preservation of Research Data Special Thanks To MacKenzie Smith, MIT Libraries Managing Research Data 101 https ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 29
Provided by: Serge229
Learn more at: http://www.stonesoup.org
Category:

less

Transcript and Presenter's Notes

Title: Storing Data


1
Storing Data Forever
  • Funding Long-Term Preservation of
  • Research Data

2
Special Thanks To
  • MacKenzie Smith, MIT Libraries
  • Managing Research Data 101
  • https//libshare.library.gatech.edu/clearspace/doc
    s/DOC-3634.pdfjsessionidDF96E09B9D6BE9E5EC62A277
    17DC5868

3
What is Data?
  • Numbers?
  • Recorded? Collected? Generated?
  • Images? Video? Audio?
  • Shoah
  • In what format?
  • Code?
  • Publications/Text?
  • In what format?
  • Transcription service
  • Is pure raw data useful
  • May require extensive meta-data to be useful

4
What is Forever?
  • Longer than a typical project?
  • Longer than a typical career?
  • Longer than a typical institution?
  • 5 years, 10 years, 25 years, 100 years?
  • Suggestion treat data same way library treats
    books
  • Intent is to preserve indefinitely
  • As long as practical, feasible
  • Cannot be precisely defined

5
Why Save Data Forever
  • Because we have to
  • Funding agencies want data sharing plans
  • NIH Data Sharing Policy (2003)
  • http//grants.nih.gov/grants/guide/notice-files/NO
    T-OD-03-032.html
  • all investigator-initiated applications with
    direct costs greater than 500,000 in any single
    year will be expected to address data sharing in
    their application.

6
NIH Data Sharing Policy
  • Applicants may request funds for data sharing
    and archiving. The financial issues should be
    addressed in the budget section of the
    application.
  • Specifics depend on grant, published in RFP, RFA
    or PA

7
NSF Data Archiving Policy
  • Division of Social and Economic Scienes
  • http//www.nsf.gov/sbe/ses/common/archive.jsp
  • Grantees from all fields will develop and submit
    specific plans to share materials collected with
    NSF support, except where this is inappropriate
    or impossible.

8
NSF Data Archiving
  • From Grant Proposal Guide
  • NSF expects PIs to share with other researchers,
    at no more than incremental cost and within a
    reasonable time, the data, samples, physical
    collections and other supporting materials
    created or gathered in the course of the work.
  • Specifics depend on grant and program officer

9
NSF Data Sharing Policy
  • Hot off the Presses
  • Science Insider, May 5 reports Edward Seidel,
    acting head of NSF's mathematics and physical
    sciences directorate, described NSF's intention
    to require all applicants to submit a data
    management plan along with their grant
    application in a presentation this morning to the
    National Science Board, NSF's oversight body.
    NSF's current policy requires grantees to share
    their data within a reasonable length of time so
    long as the cost is modest. "That's nice, but it
    doesn't have much teeth," said Seidel. Under the
    new policy, which is expected to be unveiled this
    fall, a researcher would submit a data management
    plan as a two-page supplement to any regular
    grant proposal. That would make it an element of
    the merit review process.

10
Other agency Policies
  • See Gary Kings Page on Data Sharing and
    Replication
  • http//gking.harvard.edu/replication.shtml
  • See National Academy of Sciences Ensuring the
    Integrity, Accessibility, and Stewardship of
    Research Data in the Digital Age, July, 2009
  • http//www.nap.edu/catalog/12615.html

11
Why Save Data Forever
  • Because we want to
  • Available to ourselves and our students and
    colleagues
  • Where are the data sitting today? On a
    departmental server? On a computer under your
    desk? On a CD or DVD somewhere?
  • Where is your dissertation data?
  • Available to future scholars, including ourselves

12
Why Save Data Forever
  • Because we need to
  • Encourage honesty?
  • Gregor Mendel probably cheated
  • Like open-source help uncover mistakes, bugs?
  • Open Data Movement
  • Mostly library/catalog data, map data, WordNet
  • Open Access Movement
  • Mostly publications
  • Because its not our data

13
Current Storage Models
  • Let someone else do it
  • Government agency/lab/bureau
  • NOAA National Geophysical Data Center
  • GenBank (DNA data)
  • fMRIDC (fMRI publications and data)
  • NCSA Astronomy Digital Image Library

14
Current Storage Models
  • Professional society/Journals
  • Global Ocean Observing System coordinates
    distributed data
  • Dryad ecology/evolutionary biology
  • Nice folks at another University
  • ICPSR, University of Michigan (political/social)
  • Dryad ecology/evolutionary biology
  • Protein Data Bank (PDB) 3-D protein data
  • NCSA Astronomical Image Library
  • Sloan Digital Sky Survey
  • The Cloud

15
Digital preservation/curation timeline
  • 2000 Library of Congress 100M for National
    Digital Information Infrastructure and
    Preservation Program (NDIIPP)
  • 2004 UK Digital Curation Centre (DCC)
  • 2004 NDIIPP gives 14M to 8 partners
  • 2007 Blue Ribbon Task Force on Sustainable
    Digital Preservation and Access

16
Digital preservation/curation timeline (2)
  • 2007 NSF Office of Cyberinfrastructure (OCI)
    Sustainable Digital Data Preservation and Access
    Network Partners (DataNet) solicitation
  • 2009 First 2 DataNet awards

17
Conferences and groups
  • Preservation and Archiving Special Interest Group
    (PASIG)
  • International Conference on Preservation of
    Digital Objects (iPRES)
  • Open Repositories (OR)

18
Current Funding Models
  • Institution/department pays
  • Grants pay monthly/yearly
  • Haphazard
  • Some grant money
  • Some departmental money
  • Use whatever is available
  • Dont worry, someone will pay

19
13. Long-term (preservation) storage of research
data
What are we Doing? Survey says
Answer Response
1 NO 3 16
2 Yes, centrally run 11 58
3 Yes, departmentally run 9 47
4 Yes, run otherwise (specify) 3 16
20
14. Are your centrally run long-term data
storage/preservation systems
Answer Response
1 Funded by charge back 3 27
2 Funded centrally 10 91
3 Funded otherwise (specify) 4 36
21
14. Are your centrally run long-term data
storage/preservation systems
Funded otherwise (specify)
grant-funded
central and faculty. There is uncertainty on this front.
also through the condo-style central cluster system
grants
22
15. Are your departmentally run long-term data
storage/preservation systems
Answer Response
1 Funded by charge back 3 33
2 Funded departmentally 8 89
3 Funded otherwise (specify) 3 33
23
Current Funding Models
  • Most require some form of on-going payment
  • Advantages
  • Capitalist approach to data storage
  • If someone wants to pay, data gets saved
  • Natural expiration process
  • Disadvantages
  • Capitalist approach to data storage
  • Who pays to save rarely used data?

24
Different Approach
  • PAY ONCE, STORE ENDLESSLY (POSE)
  • Why Pay Once?
  • Grants expire often and quickly
  • Researchers expire pretty often
  • How Store Forever?
  • Administrators expire slowly
  • Institutions expire rarely

25
The Business Model (1)
  • I Initial cost of storage
  • D rate at which storage costs decrease yearly,
    expressed as a fraction (e.g., 20 would be 0.2)
  • R How often, in years, storage is replaced
  • T Cost to store the data forever
  • T I (1-d)r I (1-d)2r I .
  • If d20, r 4
  • T I (.84 ) I (.88) I .

26
The Business Model (2)
  • If d gt0,
  • T I (1-d)r I (1-d)2r I .
    I/(1-d)r
  • For d20, r 4 TI 2
  • Charge 2x initial storage cost, save half, store
    forever!

Because this will result in a surge in demand
for long-term data storage.
The Serge Equation
Patent Pending
0.01/gigabyte
27
An Example DataSpace at Princeton
  • FC costs decrease by about 16 per year
  • SATA costs decrease by about 17 per year
  • Additional savings every few years from new
    storage

28
The Serge for DataSpace
  • SATA cost 1.81/gb
  • Replace every four years
  • Costs decrease by 20 year
  • Serge 1.81/(1-.8 4) 3/gb
  • Adding tape backup jumps this to 5/gb

5K one-time to store a terabyte forever
Write a Comment
User Comments (0)
About PowerShow.com