MultiArchival Syndicated Storage Platform - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

MultiArchival Syndicated Storage Platform

Description:

multiple software systems, hardware, organization ... Institutional failure restoration: transfer entire holdings of an archive to another ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 20
Provided by: MicahA3
Category:

less

Transcript and Presenter's Notes

Title: MultiArchival Syndicated Storage Platform


1
Multi-Archival Syndicated Storage Platform
Bryan Beecher University of Michigan Director,
Computing Network Services E
bryan_at_umich.edu W http//www.icpsr.umich.edu/ICPS
R/staff/beecher.html/
Micah Altman Harvard University Archival
Director, Henry A. Murray Research
Archive Associate Director, Harvard-MIT Data
Center Senior Research Scientist, Institute for
Quantitative Social Sciences E
micah_altman_at_harvard.eduW http//maltman.hmdc.ha
rvard.edu/
2
This talk
  • Roadmap
  • Why replicate for preservation?
  • What is institutional model for replication in
    Data-PASS use?
  • How do we build on LOCKSS to support these
    institutional needs?
  • Collaborators and Conspirators
  • Leonid Andreev, IQSS Steve Burling, ICPSR
    Jonathan Crabtree, Odum Marc Maynard, Roper
    Nancy McGovern, ICPSR

3
Nexuses for Preservation Failure
  • Technical
  • Media failure storage conditions, media
    characteristics
  • Format obsolescence
  • Preservation infrastructure software failure
  • Storage infrastructure software failure
  • Storage infrastructure hardware failure
  • External Threats to Institutions
  • Third party attacks
  • Institutional funding
  • Change in legal regimes
  • Quis custodiet ipsos custodes?
  • Unintentional curatorial modification
  • Loss of institutional knowledge skills
  • Intentional curatorial deaccessioning
  • Change in institutional mission

4
Replication as Part of a Multi-Institutional
Preservation Strategies
  • There are potential single points of failure in
    both technology, organization and legal regimes
  • Diversify your portfolio multiple software
    systems, hardware, organization
  • Find diverse partners diverse business models,
    legal regimes

http//failblog.org/2008/02/08/floppy-fail/
5
Preservation is impossible to demonstrate
conclusively
  • Consider organizational credentials
  • No organization is absolutely certain to be
    reliable
  • Consider the trust relationships across
    institutions

http//flickr.com/photos/phauly/35555985/
6
Data-PASS Requirements for SPP
  • Policy Driven
  • Institutional policy creates formal replication
    commitments
  • Replication commitments are described in
    metadata, using schema
  • Metadata drives
  • Configuration of replication network
  • Auditing of replication network

7
Requirements (more)
  • Asymmetric CommitmentsPartners vary in
  • storage commitments to replication
  • size of holdings being replicated
  • what holdings of other partners they replicate

8
Requirements (more)
  • Completeness
  • Complete public holdings of each partner
  • Retain previous version of holdings
  • Include
  • metadata
  • data
  • documentation
  • legal agreements

9
Requirements (more)
  • Restoration guarantees
  • Restore groups of versioned content
  • to owning archive
  • to replication hosts
  • Institutional failure restoration transfer
    entire holdings of an archive to another

10
Requirements (more)
  • Trust Verification
  • Each partner is trusted
  • to hold the public content of other(not to
    disseminate improperly)
  • to add units to be harvested
  • No partner is trusted to be super-user
  • No deletion (or directly manipulation of
    replication storage owned by another partner
  • Legal agreements reinforce trust model
  • Schema based auditing used to
  • verify replication guarantees are met
  • record replication and storage commitments
  • document related TRAC criteria

11
SPP Commitment Schema
  • Network level
  • Identification name description contact
    access point URI
  • Capabilities protocol version number of
    replicates maintained replication frequency
    versioning/deletion support
  • Human readable documentation restrictions on
    content that may be placed in the network
    services guaranteed by the network Virtual
    Organization policies relating to network
    maintenance
  • Host level
  • Identification name description contact
    access point URI
  • Capabilities protocol version storage available
  • Human readable terms of use Documentation of
    hardware, software and operating personnel in
    support of TRAC criteria 
  • Archival unit level
  • Identification name description contact
    access point URI
  • Attributes update frequency, plugin required for
    harvesting, storage required
  • Terms of use Required statement of content
    compliance with network terms. Dissemination
    terms and conditions
  • TRAC Integration
  • A number of elements comprise documentation
    showing how the replication system itself
    supports relevant TRAC criteria
  • Other elements that may be use to include text,
    or reference external text that documents
    evidence of compliance with TRAC criteria.
  • Specific TRAC criteria are identified implicitly,
    can be explicitly identified with attributes
  • Schema documentation describes each elements
    relevance to TRAC, and mapping to particular
    TRAC criteria

12
Main SSP Use Cases
  • Initialization Given schema instance distribute
    AU harvesting responsibility to hosts
  • Auditing Does current host harvesting allocation
    history match replication commitment in schema?
  • Recovery of hosts
  • Deliver AU content to source archive
  • Addition of AUs, hosts
  • Growth of AUs over initial commitment
  • Assumptions
  • Nothing is deleted
  • Resources in network grow monotonically
  • Off-the-path behavior is detected
    automatically, resolved manually

13
DRAFT USE CASE INITIALIZATION
14
DRAFT Use Case AUDITING
15
Our approach LOCKSS
  • LOCKSS
  • CLOCKSS
  • Very easy to build and deploy
  • 5 minutes
  • Very easy to plug into public LOCKSS network
  • 5 minutes
  • Very easy to manage thereafter it is basically
    an appliance
  • Also easy to set-up
  • Grouping your CLOCKSS devices into a private
    network and paring the 20k-line configuration
    file into the right 200-line configuration file
    is not
  • Managing a network now, not a device

16
SPP LOCKSS Technology Integration
  • Standard LOCKSS used for
  • Harvesting
  • Recovery
  • New LOCKSS bulk update mechanism used for
  • Initial configuration
  • Adding AUs
  • CLOCKSS mechanisms (certificates, cache monitor)
  • Content delivery
  • Optimize recovery
  • Auditing
  • Data-PASS customizations for schema processing
  • Translating schema instance into bulk update
    requests
  • Reporting on compliance based on cache monitor
    database

17
Progress so far
  • Summer 2007 Attended the MetaArchive LOCKSS
    tutorial
  • Very good overview of LOCKSS
  • Summer 2007 SSP System Requirements Developed
    Approved
  • Winter 2007 First public LOCKSS network nodes
    built at two Data-PASS sites
  • Winter 2007 SSP Replication Commitment Schema
    Developed
  • Spring 2008 Completed Test harvest of MRA
    collection into LOCKSS
  • Sprint 2008 SSP System Use Cases Developed
  • Spring 2008 Prototype plugin developed to
    harvest Dataverse Networks
  • Spring 2008 Data-PASS sites are joined into
    single Private LOCKSS Network (PLN)
  • Spring 2008 Met with LOCKSS developers to review
    use cases
  • SSP will leverage functionality in the works by
    LOCKSS team

18
Data-PASS PLN as of June 2008
19
Summary
  • Replication ameliorates institutional risks to
    preservation
  • Data PASS requires policy based, auditable,
    asymmetric replication commitments
  • Formalize policy in schema
  • (Re)Configure audit LOCKSS using schema
  • Replication uses standard LOCKSS mechanisms
Write a Comment
User Comments (0)
About PowerShow.com