ERA Project: Research Testbed and Related Outcomes - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

ERA Project: Research Testbed and Related Outcomes

Description:

Release of Version 1.0 of ACE A tool for policy-driven auditing to ensure the ... Hardware/media degradation. Security breaches, malicious alterations ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 21
Provided by: Josep203
Category:

less

Transcript and Presenter's Notes

Title: ERA Project: Research Testbed and Related Outcomes


1
ERA Project Research Testbed and Related Outcomes
  • Joseph JaJa, Mike Smorul, and Sangchul Song
  • Institute for Advanced Computer Studies
  • Department of Electrical and Computer Engineering
  • University of Maryland, College Park

2
Background
  • Started as an ERA project focusing on setting up
    and testing a distributed archiving
    infrastructure.
  • Outcomes include the development of archiving
    tools and services that are scalable and platform
    independent.
  • Complementary research efforts have been more
    recently supported by NSF, Library of Congress,
    and the Mellon Foundation.

3
Transcontinental Persistent Archive Prototype
(TPAP) ERA Project
  • Partnership between NARA, San Diego Supercomputer
    Center, and the University of Maryland.
  • A distributed testbed built on a set of
    heterogeneous grid bricks linked by the SRB data
    grid technology.
  • Outcomes scalable, platform-independent tools
    and technologies tested and evaluated over TPAP.

4
Outcomes of the ERA Research Testbed
  • Empirical testing and evaluation of technologies
    using extensive NARA selected collections.
  • Flexible software environment for ingestion and
    for handling producers archive interactions
    PAWN. Developed with extensive collaborations
    with NARA.
  • Release of Version 1.0 of ACE A tool for
    policy-driven auditing to ensure the long term
    authenticity of digital holdings of an archive.
  • Tracking and Monitoring tool of the digital
    holdings of an archive part of the ERA TPAP
    Project

5
Overall Methodology ADAPT
  • Layered digital object architecture and a set of
    modular tools built using open standards and web
    technologies.
  • Can easily accommodate emerging standards and
    policies.
  • Will evolve gracefully as the underlying
    technologies change.
  • Evaluation and demonstration of tools on widely
    different collections.

6
Software Developed and Tested on TPAP
7
PAWN Producer Archive Workflow Network A
Collaborative Effort with NARA
  • Software that provides a flexible and
    customizable ingestion framework
  • Handles the process in a reliable and secure
    fashion
  • From package assembly
  • To archival storage
  • Simple interface for end-users
  • Flexible interface for archive managers
  • Designed for use in multiple contexts

8
Overall Organization
  • Producers organized into domains, each domain
    contains a transfer agreement negotiated with the
    archive.
  • Each domain contains a hierarchical organization
    of data grouped into record sets/templates
    (convenient groupings from the transfer
    agreement).
  • An end-user operates within a domain with record
    sets associated with the account.

9
Producer-Archive Agreement
10
Package Workflow Overview
  • Create Producer-Archive Agreement and client
    package template.
  • Create package based on template
  • Once approved, packages can be archived
  • Rejected packages can be held until rectified or
    deleted for resubmission.

11
Customizable Components
  • Definable Roles
  • Actions in PAWN can be grouped to create
    arbitrary types of users
  • Flexible Approval Requirements
  • Signature requirements can be placed on parts of
    a package.
  • Automated Processing
  • API for creating processes to validate,
    transform, approve, or publish items in a package
  • Processes can be invoked manually or
    automatically
  • Processes may have dependencies on item approval

12
PAWN Summary
  • Flexible environment to handle ingestion between
    many producers and an archive.
  • Very little effort for producers to push their
    data into the archive.
  • Granular workflow definition.
  • Fully automated to completely manual.
  • Easy to include new standards (metadata,
    packaging, ).
  • Tested extensively in TPAP environment.
  • Interest from different communities including
    NDIIPP.

13
ACE Auditing Control Environment
  • Software to protect the integrity of digital
    assets in the long term
  • Hardware/media degradation
  • Security breaches, malicious alterations
  • Infrequent access to most data
  • Evolution of cryptographic schemes
  • Underpinnings are based on rigorous cryptographic
    techniques.
  • Scalable, cost-effective, and can interoperate
    with any archiving architecture.

14
ACE Basic Methodology
  • Three-tiered Cryptographic Information
  • A integrity token (IT) for each digital object is
    generated upon its deposit into the archive 1KB
    per object.
  • Cryptographic summary information (CSI) is
    periodically computed over the generated
    integrity tokens 100MB/year.
  • Very compact cryptographic summaries (witnesses)
    are generated periodically - 2-3KB/year.
  • Each tier is periodically audited separately
    according to policies set by managers.

15
ACE System Architecture
16
Software Developed in Version 1.0
  • Audit Local Files Audit Manager software
    periodically audits files as specified by the
    archive manager.
  • Audit Local Manager An independent IMS can
    verify the correctness of the local audit
    manager.
  • Independent Auditing Any third-party can audit
    the IMS using the published witness values.

17
ACE Summary
  • Third-party auditable
  • Cryptographically rigorous yet cost-effective
  • Update-aware
  • Highly interoperable
  • Scalable
  • High Performance
  • Easily configured
  • Version 1.0 just released after extensive testing
    on large collections using the ERA research
    testbed.
  • Currently, running on the Chronopolis testbed.

18
Tracking and Replication Monitoring ERA Project
in Support of TPAP
  • Portal that provides overview of a collection
    status over different zones.
  • Ensures that new objects are replicated to
    relevant sites.
  • Tracks files at master locations and periodically
    copy new files to replica sites.
  • Log actions on a collection and errors during
    replication
  • Currently in use on TPAP and Chronopolis.

19
Other Technologies
  • PAWN Related
  • APIs for different packaging technologies (METS
    and XFDU).
  • ICDL Book Builder Interface to enable bulk
    ingestion of digital objects already managed by a
    database.
  • FOCUS (FOrmat CUration Service) a scalable, and
    secure registry for persistent information and
    services applied to formats.

20
Conclusion
  • Partnership with NARA has been critical in
    enabling an extensive research program.
  • Focus has been on empirical testing on a
    distributed research testbed using a wide variety
    of NARA collections.
  • Outcomes include the development of tools and
    services in support of ingestion and preservation
    for long-term archives.
  • Recent expansion into new areas such as web
    archiving, information discovery, and access.
Write a Comment
User Comments (0)
About PowerShow.com