Scientific Data Libraries - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Scientific Data Libraries

Description:

Molecular biology: you can't publish a paper reporting a protein structure ... Yet molecular biology data has potentially enormous economic value, whereas ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 22
Provided by: michae59
Category:

less

Transcript and Presenter's Notes

Title: Scientific Data Libraries


1
Scientific Data Libraries
New paradigm for science. Old style form
hypothesis, design experiment, run experiment,
analyze results, evaluate hypothesis New style
form hypothesis, look up data to test it,
evaluate hypothesis Molecular biology has been
first, astronomy next, many other fields will
follow
2
Model or lookup?
Weather measure today run equations, or
measure today and find a similar day in the
past? Chess the opening and endgame are done
by lookup the middle game is done by calculation
3
Protein Data Bank
22,700 protein structures growth over last
thirty years
4
Alcohol dehydrogenase
5
Sky pictures

6
National Virtual Observatory
Traditionally, astronomers figured out what they
needed to see in the sky to test their theories,
then signed up for two weeks at an observatory
such as Kitt Peak, and sat there at night taking
photographs. Now the Sloan Digital Sky Survey
and other resources may let them do their work
without using a telescope. The large synoptic
survey telescope will gather 7-10 terabytes PER
NIGHT and 10 petabytes/yr.
7
2Micron, Sloan survey
Finding a brown dwarf.
8
IRIS seismic data consortium
9
(No Transcript)
10
Rhododendron in CalFlora database
11
Medical MRI scan (UCLA)
12
Digital orthophotoquad
13
Eckerd College dolphin digital library
14
Tim Rowe vertebrate fossil CT scans
15
Peter Allen Beauvais Cathedral
16
Marc Levoy Forum Urbis Romae
17
Q Where will the Data Come From?A Sensor
Applications
  • Earth Observation
  • 15 PB by 2007
  • Medical Images Information Health Monitoring
  • Potential 1 GB/patient/y ? 1 EB/y
  • Video Monitoring
  • 1E8 video cameras _at_ 1E5 MBps ? 10TB/s ? 100
    EB/y ? filtered???
  • Airplane Engines
  • 1 GB sensor data/flight,
  • 100,000 engine hours/day
  • 30PB/y
  • Smart Dust ?? EB/y

This slide taken from a presentation by Jim Gray
18
Data sharing ethics
  • Vary by field
  • Molecular biology you cant publish a paper
    reporting a protein structure without depositing
    the structure in the public data bank. Genomic
    data also public.
  • Astronomy convention is you get two years use
    of the data you collect, then must make available
    to others
  • Dead Sea Scrolls kept secret for forty years.
  • Yet molecular biology data has potentially
    enormous economic value, whereas cosmology and
    ancient scrolls have none.
  • What should we urge on new fields?

19
Cyberinfrastructure
  • NSF has traditionally paid for some
    infrastructure
  • Supercomputer centers (now some 80M/yr)
  • Backbone networking (perhaps some 40M/yr)
  • What about content? (NSF/NIH support much
    already)
  • Cyberinfrastructure task force looked at this
    however the recommendation for support of data is
    mixed with a proposal to go to 7 supercomputer
    centers, and the total is 1B/yr, which is
    politically unrealistic.
  • Librarians, and scientists with data, dont have
    the organization or political weight of
    supercomputing.

20
Large scale storage
Where are these resources? Generally in
computer centers, or in scientific departments,
or sometimes at private corporations (Microsoft,
in particular) Not enough in libraries in
general libraries do not have funds to support
such services and are not well placed to get
them. We need more cooperative projects
examples are UCSB and UIUC.
21
Guess at the future
  • Written material, as a storage problem, is
    insignificant compared with data. The data
    requires too much specialized knowledge to share
    easily.
  • Each project, as well as storing its data, is
    likely to store its own publications.
  • Libraries might be marginalized, with only old
    stuff.
  • What to do?
  • Develop techniques for general data storage to
    let libraries share this work
  • Create an ethic for public sharing
  • Find public funding for data storage.
Write a Comment
User Comments (0)
About PowerShow.com