Information Management in a NonBibliograpic Environment: Scientific Data - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Information Management in a NonBibliograpic Environment: Scientific Data

Description:

Federated Search of Solar Physics Data. 14 organizations ... VHO : location of observer, time, spectral range. Observatories are moving, in situ measurements ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 39
Provided by: JoeHo8
Category:

less

Transcript and Presenter's Notes

Title: Information Management in a NonBibliograpic Environment: Scientific Data


1
Information Management in aNon-Bibliograpic
Environment Scientific Data
  • Joseph A. Hourclé
  • 2007-Nov-20
  • FLICC Learning_at_Lunch

2
About Me
3
STEREO Solar TErrestrial RElations Observatory

4
The Virtual Solar Observatory

5
The Virtual Solar Observatory
  • Federated Search of Solar Physics Data
  • 14 organizations (currently)
  • 4 more organizations being integrated
  • 62 instruments
  • Hundreds of distinct data collections
  • 10s of millions of records
  • Terabytes of Data

6
The data is growing
  • STEREO
  • Launched Oct 2006
  • Over 1.5 million images _at_ up to 8MB
  • Hinode (Sunrise aka Solar-B)
  • Launched Sept 2006
  • Over 3 million images _at_ up to 8 MB
  • SDO
  • Scheduled to launch Aug 2008
  • 1 image per second _at_ 32 MB
  • 1.5TB/day dedicated connection

7
Other disciplines have even more data
  • NVO US National Virtual Observatory
  • LSST (Large Synoptic Survey Telescope)
  • Scheduled to start observing in 2012
  • 7-10 TB/night, 3.2Gpix images
  • 10 PB/yr
  • EOS/DIS Earth Observing System/Data Information
    System
  • About 2TB/day, per satellite (8?)
  • Planned to be 16 PB

8
and were not the only one
  • Heliospheric
  • Magnetospheric
  • Radiation Belt
  • ITM (upper atmosphere)
  • NVO / IVOA nighttime astronomy
  • PDS planetary
  • EOS earth

9
What is Scientific Data?

10
How is Scientific Data Gathered?
  • Scientist thinks up a problem
  • Scientist (and Engineers) create an instrument to
    conduct an investigation
  • The instrument collects data via sensors
  • Data are calibrated
  • Data are written into scientifically useful
    formats
  • Data are distributed to the scientists

11
But really, what is data?
  • There is no formal definition.
  • Its as ambiguous as the term book
  • Data may be shorthand for
  • Data Collection
  • Data Series
  • Data Set
  • Data Product
  • Data Granule

12
The problem with data
  • Every investigation has different data needs
  • Each investigation organizes and catalogs the
    data to answer their scientific question
  • What is good data for one group may not be
    useful for another
  • Because data is being collected continuously,
    there may not be a consistent boundary on one
    granule of data
  • Some data is tracked as individual values, and
    only packaged upon request
  • Mostly time-series data, not images

13
Types of Data Archives
  • Instrument Archives
  • Maintained by the PI team
  • Little or no consideration towards re-use
  • Resident Archive
  • Maintained by a specific discipline
  • Re-use within the given discipline
  • Long-Term Archive
  • Required for federally funded studies
  • Focus on preservation, not use of data

14
Active Archives
  • Still changing
  • May be ingesting from an active mission
  • May still be processing their data
  • May serve multiple editions or processed states
    of the data
  • Final Data in Physical Units typically isnt
    available until one or more years after the
    mission
  • Not directly comparable with data from other
    instruments until then

15
Isnt this just Knowledge Management?
  • There is no knowledge in the raw data
  • But there is knowledge in the design of the
    instruments sensors
  • What spectral range are the instruments sensitive
    to?
  • What are the instruments possible operating
    modes?
  • Knowledge of the instruments sensors affect how
    the scientists interpret data
  • The scientists have to interpret the results to
    determine the knowledge
  • May be reluctant to have others catalog their
    data, as it requires understanding the science

16
Multiple Operating ModesFilters on SOHO/EIT
171Å
195Å
284Å
304Å
17
Known Sensor Issues SOHO/LASCO
18
Knowledge Mgmt, cntd
  • We do have Event and Feature Catalogs
  • Scientists will record when/where they think
    something interesting is occurring, and share
    with others.

19
Data Processing Raw Image (Linear)
20
Data Processing Calibrated (Greyscale)
21
Data Processing Before Calibration
22
Data Processing Best Calibration
23
Data ProcessingCCD Aging
24
CCD Calibration
195Å
171Å
304Å
284Å
25
Higher Level Data
26
The Problems
  • Cross discipline translation is difficult
  • Concepts of what makes data useful differs
    between disciplines
  • Different disciplines use different search
    parameters
  • VSO time, spectral range, location on sun
  • Always looking at the same object
  • VHO location of observer, time, spectral range
  • Observatories are moving, in situ measurements
  • EOS location of object observed
  • NVO direction of pointing (assumed from earth)

27
Problems, cntd.
  • Even when there is agreement, there are still
    problems
  • Which time is important?
  • Start time?
  • Average time?
  • Spacecraft time?
  • Which coordinate system is used?

28
Problems, still cntd
  • Each discipline is working on solutions within
    their field
  • Build systems that suit the needs of their
    community
  • Each discipline has different first class data
  • Currently working on metadata standards so data
    can be discovered and used by other disciplines
  • SPASE MMI GEON
  • Some work on ontologies to help with discovery
    and use
  • VSTO SWEET GEON SESDI

29
Lots of Permutations
30
I know what youre thinking
31
And it mostly works
32
How does this affect libraries?
  • The library is a changing organism
  • Data is relatively unanalyzed in LIS
  • Data connects to bibliographic records, and
    visa-versa
  • What data was used in this journal article?
  • Where can I get documentation on using this data?
  • Has anyone published anything using this data?
  • Data connects to other data
  • What other instruments observed a given event?
  • Is there an alternate version that better meets
    my needs?

33
Theres funding for research
  • NSF
  • CDI Cyber-Enabled Discovery and Innovation
  • INTEROP Community-based Data Interoperability
    Networks
  • IIS Information and Intelligent Systems
  • DataNet Sustainable Digital Data Preservation
    and Access Network Partners
  • NASA
  • AISR Advanced Info. Systems Research
  • ACCESS Advancing Collaborative Connections for
    Earth Science Access

34
Sunspot on 15 July 2002 from the Swedish 1-m
Solar Telescope on La Palma
35
http//virtualsolar.org/ http//stereo.gsfc.nasa.
gov
  • joseph.a.hourcle_at_nasa.gov

36

37

38
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com