Regional Databases and Archives: the Effects of Scale - PowerPoint PPT Presentation

About This Presentation
Title:

Regional Databases and Archives: the Effects of Scale

Description:

the Effects of Scale... Military land use assessment. Climate change research (atmospheric research) ... resources - who's going to pay for it? Lack of ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 60
Provided by: raymo81
Category:

less

Transcript and Presenter's Notes

Title: Regional Databases and Archives: the Effects of Scale


1
Regional Databases and Archivesthe Effects of
Scale
  • A Presentation for Scalable Information Networks
    for the Environment Workshop
  • October 31, 2001
  • San Diego, California
  • Raymond McCord
  • Oak Ridge National Laboratory
  • Oak Ridge National Laboratory is operated by
    UT-Battelle, LLC, for the U.S. Department of
    Energy under contract DE-AC05-00OR22725

2
Credits
  • Concepts are derived from managing data for
    environmental projects over the past 25 years.
  • Variations of the concepts have been observed
    from these disciplines.
  • plant community research
  • impact assessment in marine systems
  • national acid rain surveys
  • Environmental monitoring and cleanup projects at
    DOE facilities
  • Military land use assessment
  • Climate change research (atmospheric research)
  • Ideas are freely traded with Dick Olson (ORNL)

3
Presentation Strategy
  • Motivation and concerns
  • Archive overview
  • Definition, components, functions, why why not,
    examples
  • Archives and scale
  • Effects of scale
  • Mitigate scale effects
  • Generate and manage metadata
  • Future Archive issues to resolve

4
My Motivation Concerns
The enemy is our behavior. Will we change or
whine???
  • Motivation
  • Describe observations about the effects of scale
    on Archives
  • Describe remedies to minimize scale effects
  • Minimize remedy pain
  • Concerns
  • Preaching to the choir!!
  • Nothing new will happen!!
  • Continuing unnecessary limits to future science!!

5
You cant keep running in here and demanding
data every two years
Challenge engage scientists in the process of
archiving their data and provide the
mechanism for archiving.
Source American Scientist,Vol 886 p 525.
6
Archives and Scale Presumptions
  • Regional data live in Archives
  • Information sharing is important
  • The archiving can be improved
  • Archive neurons are metadata
  • Multidisciplinary data will foster broader
    ecological discoveries
  • The limited number of permanent data archives for
    ecological data will increase

7
What Is an Archive?
8
What Is a Data Archive?
  • A data archive is a permanent, electronic
    collection of datasets with accompanying metadata
    such that users of the data can acquire,
    understand, and use the data.
  • More than a long-term backup
  • More than an index or catalog with pointers to
    datasets stored elsewhere
  • For more details, see Michener, W. A. and J. W.
    Brunt. 2000. Ecological Data Design,
    Management and Processing. Blackwell Science.
    180 pp.

9
Components of an Archive
  • Data and metadata
  • Storage devices
  • Information system
  • Network connections
  • Staff
  • Data/metadata preparation and review
  • Systems development and maintenance
  • User support

10
Archive Functions
  • Store data
  • Submitted by others
  • Build catalog and structure
  • Maintain storage across technology generations
  • Review new data (QA, metadata)
  • Advertise contents
  • Find data for users
  • Query and browse logic
  • Distribute data
  • Provide access to data
  • References to documentation

11
Data Centers at ORNL
  • CDIAC - Carbon Dioxide Information Analysis
    Center
  • ARM Archive - Atmospheric Radiation Measurement
    Program
  • ORNL DAAC - Distributed Active Archive Center for
    Biogeochemical Dynamics
  • NARSTO - tropospheric air pollution information
    for North America
  • OREIS - Oak Ridge Environmental Information System

12
Atmospheric Radiation Measurement (ARM) Program
  • ARM research questions
  • What happens to all of the sunlight energy?
  • How is light absorbed by clouds?
  • What does partly cloudy mean? Statistically?
    Spatially?
  • What types of clouds form? When and How?
  • ARM is a once in a lifetime research adventure
    for atmospheric scientists
  • ARM research includes instrumentation, system
    development, data analysis, and modeling (climate
    and process)

13
ARM Measurements Scope
All data collection is highly automated -- a
REAL BLAST!!
Data collection is now a peer outcome with
scientific discovery
14
ARM Archive
  • ARM Archive stores and provides access to the
    entire accumulation of data
  • Currently 5 million files and 14,000 GB and
    growing
  • The ARM data in the Archive will be accessed for
    research for many years (decades)
  • Currently distributes 50-100,000 files per month
    (100-200 GB)
  • More information
  • ARM Program www.arm.gov
  • ARM Archive www.archive.arm.gov

15
Archive webUser Interface
ARM Archive SchematicArchive Input Output
user copy
Requestedfiles
query specifications
location
DataRetrieval
date
measurement
catalogmeta data
filelist
IncomingData Files
DataReception
Other ARM Systems
MassStorage System
backupdata files
operationsmeta data
16
Data Flow
Data
Metadata
User Interface
Network
Core archive functions
17
Why Archive??
I am doing Science. Trust me.
18
Cycles of ResearchAn Information View
Archive of Data
Publications
Automation and review
Selection and extraction
Analysis and modeling
Information review
Measurement Collection
Original Observations
Secondary Observations
200 yrs
20 yrs
Planning
Planning
Problem Definition (Research Objectives)
19
Why Dont I Archive My Data?
  • No incentives - whats in it for me?
  • No acknowledgment - does a dataset paper?
  • Give up publication rights - will somebody scoop
    me?
  • Poor planning - it was not in the Plan
  • No resources - whos going to pay for it?
  • Lack of training - what do I do first?
  • Unsure about metadata content - how much is
    enough?

20
Why Should I Archive My Data?(management hints!!)
  • Career advancement (give them credit)
  • you will get some recognition
  • you can publish data paper in ESA Ecological
    Archives
  • it may help me do science with broader scope
  • Professional incentives (give them training)
  • good scientific practice (create peer pressure)
  • Institutional incentives (have expectations)
  • required by the sponsor
  • Technological advances (give them systems)
  • its easier and there are more options

21
Archiving Supports Science
  • Metadata required for archiving will improve data
    quality
  • Extends data usefulness
  • Increases your information base for doing
    research
  • data volume and diversity
  • Permits replication of results

A KEY concept of Science
22
The Effects of Project Scale on Archives
Metadata are archive neurons??
23
Metadata Depends on Your World View
  • Investigator
  • Doesnt need extensive formal metadata
  • Project
  • Metadata needed for project integration and
    modeling activities
  • Project data manager may help write metadata
  • Data archive
  • More detailed metadata (e.g., spatial
    coordinates)
  • More standardization (e.g., keywords) to
    communicate clearly with future users
  • Who writes the metadata?

24
(In the beginning, was the measurement. It was
formless and desolate. Without context)
Measurement
25
Single Experiment View
parameter name
Measurement
sample ID
location
date
26
Research Project View
parameter name
media
QA flag
Measurement
sample ID
location
date
27
Long-term or Multidisciplinary View
method
parameter name
Units
media
QA flag
Measurement
records
generator
sample ID
location
date
28
Integrated System Archive View
words, words units method Parameter def.
lab field Method def.
method
Units def.
parameter name
Units
media
date words, words. QA def.
Record system
QA flag
Measurement
records
generator
sample ID
location
date
GIS
org.type name custodian address, etc.
coord. elev. type depth
Sample def. type date location generator
29
Another View of Scale
30
Project Scale and Recorded Metadata
Increasing User Scope
Program
PI
Metadata
Group
Archive
  • Units
  • Method
  • QA flag
  • Media
  • Parameter name
  • Measurement
  • Date
  • Sample ID
  • Location
  • Generator
  • Records

31
Data Maturation and Scale
  • Individual Investigators
  • collect data, quality assure, document, analyze,
    publish
  • Groups or Science Teams
  • collate data, enhance, synthesize, model, publish
  • Project Information System
  • collate data, review completeness, maintain data
    for project
  • Data Distribution and Archive Center
  • long-term archive, distribute freely to users
  • Master Data Directory
  • searchable index with pointers to data

32
Preparing for Archiving
I will not wait. I will not wait. I will not
wait. I will not
33
Generic Environmental Data Model (Which Piece Is
First?)
words, words units method Parameter def.
lab field Method def.
method
Units def.
parameter name
Units
media
date words, words. QA def.
Record system
QA flag
Measurement
records
generator
sample ID
location
date
GIS
org.type name custodian address, etc.
coord. elev. type depth
Sample def. type date location generator
34
Sequence of Information Birth
words, words units method Parameter def.
lab field Method def.
method
Units def.
parameter name
Units
media
date words, words. QA def.
Record system
QA flag
Measurement
records
generator
sample ID
location
date
GIS
org.type name custodian address, etc.
coord. elev. type depth
Sample def. type date location generator
35
Research Publishing Metadata
  • Metadata design can be a checklist for research
    planning
  • Metadata preparation can be integrated with
    publication process
  • Metadata are an investment in current and future
    science

36
Where to Archive Data?
37
Archive Choices
  • What determines your options?
  • Sponsor requirements
  • Repository access
  • Metadata requirements
  • Scalable storage
  • Personal web pages and files
  • Project or network data centers
  • Federal data centers
  • Links transcend storage structures
  • Master directory
  • Mercury

38
Personal Web Page
  • Its fun, rewarding, relatively easy, can share
    data quickly, can control access to data
  • Data issues??
  • complete metadata
  • QA checks
  • Connected to basic archival center functions??
  • ready access to data (24 h/d, 7 d/wk)
  • user support
  • data available on multiple media
  • secure, backed-up, long-term storage

39
ESA Ecological Archives
  • Publishing datasets as peer reviewed, citable
    papers (with volume and page numbers)
  • Data papers are announced in abstract form in a
    print journal with data available electronically
  • Citation example
  • Esser, G., H.F.H. Lieth, J.M.O. Scurlock and R.J.
    Olson. 2000. Osnabrück net primary productivity
    data set. (Ecological Archives data paper
    E081-011). Ecology 81, 1177-1177.
  • Bill Michener, Editor
  • http//esa.sdsc.edu/esapubs/Journals_main.htm

40
Master Data Directory
  • Provides search capability and pointers to a
    source of the data (Center does not archive data)
  • Maintains standard keywords/indices
  • Collects metadata from many sources
  • Examples
  • Global Change Master Directory (GCMD)
    http//gcmd.gsfc.nasa.gov
  • ORNL DAAC Mercury System http//mercury.ornl.gov

41
What is Mercury?
1. The data provider uses the Metadata Editor to
create a metadata file containing links to the
data and documentation
NASA / ORNL
Metadata Index
2. Mercury harvests the metadata and builds an
index
Mercury is used to assist an investigator with
documenting data and making these data available
to others.
5. User links to data providers server
6. Data and documentation are downloaded
directly from the data provider
3. Users query the index
4. Full metadata are returned to the user,
including links back to the data provider
42
Regional Archives
43
Sources of Regional Data
  • Carbon Dioxide Information Analysis Center
  • National Geophysical Data Center
  • National Environmental Satellite, Data, and
    Information Service
  • National Soils Data Access Facility
  • National Water Information System
  • Forest Inventory and Analysis
  • Breeding Bird Survey
  • Threatened and Endangered Species
  • Global Change Master Directory

44
NASA EOSDIS Distributed Active Archive Centers
45
Global scale, 280 parameters surface,
atmospheric, fluxes
46
Future Issues to Resolve
  • Size, diversity, and longevity
  • Accommodating change
  • Teaching good practices

47
Issues Size, Diversity, Longevity
  • Size
  • Online vs. Offline
  • Database vs. File structure
  • Multiple institutions
  • Too big for technology migration??
  • Diversity
  • Increased logic and documentation for finding
    data
  • Spatial distribution
  • Increased potential for uniqueness conflicts
  • Longevity
  • Too old to explain or decode
  • Too much evolution of methods and practices
  • Asynchronous change in data and metadata

48
Issues Planning and Requirements
  • Plan for archiving early and ongoing
  • Avoids missing metadata
  • Avoids panic
  • Improves overall data quality and consistency
  • Consider the timing of requirements
  • Requirements
  • Standards to be or not to be?
  • Documentation expectations
  • Accessibility

Its mine!! Its my data!! You CANT have it!!
49
Research Implies Change
Research
Not always true for other information systems
repeat
Discovery
New information requirements
New questions
50
Issues Accommodating Change
  • Change must be considered in the design
  • Things that will change
  • Access expectations
  • Logical hierarchy of information scope
  • New parameters
  • New disciplines
  • New study sites
  • New data sources or methods

51
Issues More Changes
  • Unpredictable variation is
  • no excuse!!
  • Often used as an excuse to avoid standards
  • Cannot avoid all of it, but try
  • Missing values will occur Plan ahead
  • Do not do Temp, temp, t, T, temperature
  • Be clear, avoid ambiguity
  • Minimal observational intensity is
  • no excuse!!
  • Quick study no documentation??

The unexpected are rare and most valuable??
52
Rules for CreatingDatabases for Archiving
  • Unique occurrences
  • Each type of measurement is represented in a
    consistent way
  • Each measurement event is represented by only one
    value
  • Identifiers
  • Each value is associated with a parameter name
  • Each measurement value has a quality indicator
    and link to a method description
  • Place and time
  • Each value is associated with a unique place name
    with a quantitatively defined location
    (geographic coordinates)
  • Each value is associated with a date and time
  • Data Storage and Transport
  • Data are stored or managed with a database
    management system or equivalent

53
Best Practices for Preparing Ecological and
Ground-Based Data Sets to Share and Archive
  • Best Practices include
  • Assign descriptive file names
  • Use consistent and stable file formats
  • Define the parameters
  • Use consistent data organization
  • Perform basic quality assurance
  • Assign descriptive data set titles
  • Provide documentation
  • Published Cook et al. 2001. Ecological
    Bulletin
  • http//www.daac.ornl.gov/DAAC/bestpractices.html

54
Reflecting Into the Future
55
Workshop Reactions
  • Distributed (sensor) processing
  • Yes / No
  • Automated QA
  • Getting data dirty
  • Metadata early
  • 10X easier, scalable
  • Differentiate standards
  • Intentional variance only
  • Partition / isolate exceptions when possible
  • Look for 3, 5, 10X changes
  • 20-30 not worthwhile

56
Summary Points
  • Archives need structure and standards
  • Social and education solutions VERY important
  • Metadata are the neurons of Archives
  • Metadata early better than late
  • Need to think about our choices.

57
Future Thoughts
  • Will we be able to know Where are we? in the
    information structure
  • How many 30 KB files are on a 100 GB tape
    cartridge?
  • The future limits will not be technology
  • But our minds
  • We need to plan NOW how to best leverage the
    future

58
A Future Scientists View
  • I told my college-age daughter about the Japanese
    announcement of 1 TB of optical memory in 1 cubic
    centimeter.
  • Her reply
  • We need to know how to think critically and
    select what kinds of projects and data we need to
    keep because the limiting factor will be our
    minds, not the technology.

59
Looking Forward to a Future With Archives!!
Write a Comment
User Comments (0)
About PowerShow.com