Title: Preservation Environment Working Group
1Preservation Environment Working Group
- Officers Bruce Barkstrom (NASA Langley)
- Reagan Moore (SDSC)
- Goals
- Demonstrate interoperability between multiple
preservation environments that are based on data
grid technology - Interactions with Astro Working Group
- IVOA preservation working group
- Define standards for preservation of astronomy
collections - Sustainability
- Governance
- Preservation authenticity, integrity,
infrastructure independence - Standards
- FITS data format
- UCD semantics
- Hyperatlas plates
- IVOA access services
2Intellectual Property Policy
- I acknowledge that participation in GGF8 is
subject to the GGF Intellectual Property Policy. - Intellectual Property Notices Note Well All
statements related to the activities of the GGF
and addressed to the GGF are subject to all
provisions of Section 17 of GFD-C.1 (.pdf), which
grants to the GGF and its participants certain
licenses and rights in such statements. Such
statements include verbal statements in GGF
meetings, as well as written and electronic
communications made at any time or place, which
are addressed to the GGF plenary session, - any GGF working group or portion thereof,
- the GFSG, or any member thereof on behalf of the
GFSG, - the GFAC, or any member thereof on behalf of the
GFAC, - any GGF mailing list, including any working group
or research group list, or any other list
functioning under GGF auspices, - the GFD Editor or the GWD process
- Statements made outside of a GGF meeting, mailing
list or other function, that are clearly not
intended to be input to an GGF activity, group or
function, are not subject to these provisions. - Excerpt from Section 17 of GFD-C.1 Where the GFSG
knows of rights, or claimed rights, the GGF
secretariat shall attempt to obtain from the
claimant of such rights, a written assurance that
upon approval by the GFSG of the relevant GGF
document(s), any party will be able to obtain the
right to implement, use and distribute the
technology or works when implementing, using or
distributing technology based upon the specific
specification(s) under openly specified,
reasonable, non-discriminatory terms. The working
group or research group proposing the use of the
technology with respect to which the proprietary
rights are claimed may assist the GGF secretariat
in this effort. The results of this procedure
shall not affect advancement of document, except
that the GFSG may defer approval where a delay
may facilitate the obtaining of such assurances.
The results will, however, be recorded by the GGF
Secretariat, and made available. The GFSG may
also direct that a summary of the results be
included in any GFD published containing the
specification. GGF Intellectual Property
Policies are adapted from the IETF Intellectual
Property Policies that support the Internet
Standards Process.
3Preservation Components
- Authenticity - manage links to preservation
metadata - Data grid
- OGSA naming / OGSA DAIS / Information
Dissemination / DFDL - Integrity - assure data and metadata are not
corrupted, track chain of custody, manage access
controls, update state information - Data grid
- OGSA naming / OGSA DAIS / Grid File Systems /
OGSA Data / Grid Information Retrieval / OGSA
Authorization - Infrastructure independence - assure that no
dependencies are introduced on use of a
particular vendor product - Data grid
- Grid File Systems / DFDL / OGSA Data Replication
/ Grid Storage Management / GridFTP / Transaction
Management / OGSA Data / Grid Remote Procedure
Call
4Preservation Approach
- Standard semantics
- IVOA - Uniform Content Descriptors
- Standard data encoding format
- IVOA - FITS file
- Standard access services
- IVOA - Cone Search, Simple Image Access Protocol,
Simple Spectrum Access Protocol, VOEvent
notification, Mosaic service - Standard validation services
- FITS header validation - correct coordinate
information - HyperAtlas standard plates - re-project pixels to
standard plate - Federation across independent systems
- Address sustainability by replicating across
sustainability models
5Data Grids as Basis for Preservation
- Authenticity mechanisms
- Link images to preservation metadata
- Provenance information for source of image (FITS
header extraction) - Descriptive information - UCDs
- Integrity mechanisms
- Chain of custody - tracking where images have
been stored - Audit trail - tracking operations performed on
images - Persistent name spaces for users, files, metadata
- Checksums
- Replicas
- Validation of checksums, synchronization of
replicas - Federation - managing integrity across
independent data grids - Infrastructure independence
- Ability to migrate archives onto new technology
6NOAO Preservation - Irene Barg
Federated SRB data grids Goals Replicate
images Deposit into an archive Maintain
availability Capture data daily Implementation
Federation of data grids Pull environment
Reliable transport Preservation environment
Separate data grid Reliable storage
Archive
7Sustainability - Federation of Federations
GGF Data Grid Interoperability Demonstration
8Preservation at Scale
- Creation of standard plates for publication in a
Hyperatlas - Roy Williams (Caltech) - Used Montage mosaic code developed at
IPAC/Caltech (John Good) - Created mosaics by re-projecting 4,121,440 images
from the 2MASS archive of 8 TB that had been
replicated to the Teragrid. - Because of overlap, required manipulating
6,275,494 files, and 14 TB of data. - Processing time was over 100,000 CPU-hours on the
Teragrid. - Each mosaic covered a 6 degree square
- Tiled each mosaic into a 12x12 array
- Registered plates into the Hyperatlas
- Advantages
- Standard projection
- Ability to composite images for improved signal
to noise ratio - Incorporated domain knowledge in generation of
the standard product
9Collection-based Approach
- Authenticity - assertions made by creator of
records - Provenance metadata
- Descriptive metadata
- Encapsulation of metadata with data in an
Archival Information Package - Validation of consistency between authenticity
metadata and stored data - Verify data file exists for each metadata record
- Verify for each stored data file, a metadata
record exists - Validation of provenance metadata
- Verify consistency of defined metadata attributes
across all records - Verify preservation consistency constraints (a
record appears only once)
10Collection-Based Approach
- Authenticity
- Validation of assertions about the
collectionCharacterization of assertions as
management policies - Mapping of management policies to executable
rules - Specification of state information on which the
rules operate - Specification of state information to manage rule
outcomes - Implementation
- Granularity of application Type of rule
- Enterprise Setting of rule parameters
- Archives Aperiodic rule
- Collection Periodic rules
- Record Atomic rules
11Collection-based Approach
- Integrity - assertions made by archivists that
both the data and metadata are uncorrupted, the
chain of custody can be tracked, all actions
performed by identified persons, the risk of data
loss has been minimized - Requires mechanisms for
- Checksums - checks based on file size, System5
checksum, MD5 checksum - Replicas, backups, versions
- Synchronization - between replicas, between
system buffers and storage, between archives and
local storage - Federation - replication of both metadata and
data, while coordinating name spaces - Authentication - unique identity for archivists
independently of storage system - Authorization - access controls managed
independently of storage system
12Implementations
- NARA
- Research prototype persistent archive
- Electronic Records Archive
- Persistent Archive Testbed
- SDSC
- NSDL persistent archive
- CDL Digital Preservation Repository
- NASA Langley
- Archive Next Generation - ANGe
- Taiwan
- Caspar / Digital Curation Centre
- Diligent
13Preservation Services
- Appraisal
- DAIS / Grid File Systems
- Accession
- GridFTP / Grid File Systems / DAIS / Transaction
Management / OGSA Data / OGSA Naming / GridFTP - Description
- DAIS / OGSA Naming / DFDL / Transaction
Management - Arrangement
- Grid File Systems / DAIS
- Preservation
- Grid File Systems / Grid Storage Management /
OGSA Data Replication / GridFTP / Transaction
Management / OGSA Naming - Access
- DAIS / DFDL / Grid File Systems / GridFTP /
Transaction Management
14Propose Preservation Demonstration
- Formal validation of existing archives
- Consistency between metadata and stored data
- Verification of name space integrity
- Formal extraction of records
- Bulk operations to extract metadata
- Formal deposition of records into a federated
data grid - Federation with a second data grid
- Bulk operations to load metadata and data into
remote data grid - Formal validation of new archives
- Consistency between metadata and stored data
- Verification of name space integrity
- Formal export of records from the new archive and
import back into the original archives, without
loss of authenticity or integrity