Title: Dr' Martin Halbert
1Comparison of Strategies and Policies for
Building Distributed Digital Preservation
Infrastructure Initial Findings from the
MetaArchive Cooperative
- Dr. Martin Halbert
- MetaArchive Cooperative
- Wednesday, December 3, 2008
- International Digital Curation Conference
- Edinburgh, Scotland
2Overview
- Needs of cultural memory organizations (CMO) for
digital preservation infrastructure that led to
creation of MetaArchive - Framing comparison of some major digital
preservation efforts and service offerings - Common distributed digital preservation (DDP)
strategies - Findings from the MetaArchive Cooperative about
DDP cooperatives
3Cultural Memory Organizations (CMOs)
- Small to medium-sized libraries
- Small research institutes
- Historical associations
- Archives
- Museums
- NOT enormous national agencies (US LoC, UK BL)
- Organizations responsible for institutional
memory / research assets of their communities - Culture here means any resource of primary
research value, for humanities, science, or other
scholarship
4Gaps in Digital Preservation Efforts
- 66 of cultural heritage institutions (academic
libraries, archives, art museums, public
libraries, and other similar kinds of
institutions) report that no one is responsible
for digital preservation activities - 30 of all archives have been backed up one time
or not at all
Source 2005 NEDCC Survey by Bishoff and Clareson
5The Problem
- CMOs are rapidly digitizing or acquiring local
digital archives with long term value for both
scholarly and public research purposes - Yet CMO professionals most often lack affordable
and scalable DP infrastructures - This lack of access to effective means for long
term preservation of digital content is
aggravated by a lack of consensus on DP issues
and professional roles and responsibilities
6Digital Curation/ Preservation An Emerging Field
- Historically CMOs have been responsible for
preservation of institutional memory - CMO administrators and funders are uncertain
about how to carry out these responsibilities in
the digital age - No consensus in CMOs on roles, best practices, or
priorities in digital preservation - Many competing frameworks and assumptions brought
forward from external groups and practitioners
seeking to create this new field
7What led to MetaArchive?
- Planning meetings by a group of US librarians and
archivists in 2002-2003 on concerns about
preserving digital archives - Felt that we needed to do something practical to
help each other preserve our data - Not based on studies, just the observation of our
anxieties about doing something together to keep
our (expensive) digital materials preserved and
viable
8The Need for Collaborative Approaches
- The increased number and diversity of those
concerned with digital preservationcoupled with
the current general scarcity of resources for
preservation infrastructuresuggests that new
collaborative relationships that cross
institutional and sector boundaries could provide
important and promising ways to deal with the
data preservation challenge. These
collaborations could potentially help spread the
burden of preservation, create economies of scale
needed to support it, and mitigate the risks of
data loss. - - The Need for Formalized Trust in Digital
Repository Collaborative Infrastructure - NSF/JISC Repositories Workshop (April 16,
2007)
9MetaArchive
- A distributed digital preservation cooperative
for digital archives - Established in 2003 under the auspices of and
with funding from the National Digital
Information and Infrastructure Preservation
Program (NDIIPP) of the US Library of Congress - A functioning DDP network using/building open
source software, - Organized as an incorporated nonprofit
cooperative of libraries and other cultural
memory organizations - Sustained by organization fee memberships,
cooperative agreement with US LoC , and other
sponsored funding - Provides training and models for other groups to
establish similar distributed digital
preservation networks - Fosters broader awareness of digital preservation
issues - Designed to address in-the-trenches needs of
CMOs after environmental scans of other options
10Comparison of Selected Digital Preservation
Efforts
- National Scientific Research Agency Efforts
- PubMed Central Efforts in US and UK
- Social Science Dataset Archives (UK DA, US ICPSR)
- Big-Science Agency Efforts (UKRDS, NSF DataNET)
- Cross-Disciplinary National Efforts
- US NDIIPP
- UK PLANETS
- Non-Governmental E-Journal DP Efforts
- LOCKSS
- Portico
11Differences and Variations
- Variation evident in understanding of what
constitutes digital curation/preservation (scope,
practices, priorities) - Relative differences in prescriptivity and degree
of centralization (top-down vs. bottom-up
planning) between UK and US - Many specific differences in preservation and
access aims and technologies
12Similar Patterns
- Emphasis on collaboration between groups to
accomplish digital curation/preservation - Exploration of new professional roles, expertise,
models, and best practices - Virtually all efforts examined embrace
distributed digital preservation strategies - Most programs (then and now) do not directly
address the needs of CMOs
13Distributed Digital Preservation Strategies
- Digital curation/preservation starts with secure
and distributed bit-preservation good metadata - Technology for secure replication Many good DDP
options (we use a private LOCKSS network) - Collaboration for digital curation/preservation
- Provides a framework for systematically exploring
new data curation lifecyle roles for CMOs to
carry out their core responsibility for curating
institutional memory materials - Cooperative strategies for sustaining distributed
digital preservation infrastructures
14MetaArchive Phase I (2004-2007)
- Developed a functioning network for distributed
digital preservation (DDP) used by institutions
with shared subject domain focus for mutual
benefit - Developed this technical solution for DDP based
on a reuse of LOCKSS technology, in the form of a
separate network with higher capacity nodes - Created a conspectus database to capture
collection-level preservation metadata pre-ingest - Created an administrative nonprofit corporation
as an independent legal entity for membership
agreements - Now preserving via DDP more than 650 collections
from many different organizations
15Collection Variety
- Collections include
- Images
- Text files
- Multimedia files
- Datasets
- Program executables
16MetaArchive Membership
- 11 institutions currently
- Emory, GA Tech, Auburn, VA Tech, FSU, Louisville,
Hull, Rice, Boston College, Folger, and US
Library of Congress - Doubled in size of membership within past year,
plan to double again in next 12 months - Now undertaking strategic alliances with other
membership organizations to provide DDP services
(NDLTD)
17Catalytic Efforts
- Host workshops in distributed digital
preservation strategies - Instructing new MetaArchive members in network
processes - Advise other groups considering DDP approaches
- Advised/assisted in creation of two additional
DDPNs - Alabama
- Arizona
18MetaArchive Phase II (2007-2010)
- Established additional distributed archives
- African Diaspora
- Electronic Theses and Dissertations
- Early modern literature
- New software tools for enhanced conspectus,
interoperability with grid-computing, format
migration services - Became international with addition of Hull
University in UK - Upcoming DDP workshops
- Plan to double in size each year (on average)
for this period, to reach a robust cooperative
size - With funding from NHPRC will provide consulting
and outreach services on the MetaArchive model
for DDP services
19Membership Levels
- Contributing Member Sites are institutions that
need to preserve digital content, and therefore
decide to contribute digital content into the
preservation network. The preservation network
acts for the common good to preserve the at-risk
content submitted by the contributing sites.
Contributing sites may also be preservation
sites. - Preservation Member Sites are responsible for the
basic ongoing activity of preserving digital
content. At a minimum, every preservation site
must include responsible staff and a node server
of the relevant preservation network.
Preservation sites collectively comprise a
preservation network. - Sustaining Member Sites are responsible for
steering committee of the cooperative, technical
development of the computer systems that enable
the preservation network. Obviously, development
sites may also be preservation sites and/or
contributing sites.
20Individual Roles
- Program Managers are leaders that accept
responsibility for coordinating the activities of
a digital preservation network. - Data Wranglers are programmers and other
technically adept workers that prepare local
digital archives for ingestion into a
preservation network. - System Administrators are staff members that
maintain individual preservation node servers of
the relevant preservation network. - Selectors are staff that identify and prioritize
content to be preserved. They will most often be
knowledgeable concerning the content of an
institutions digital archives, and may have been
the same individuals that originally created or
acquired the archives.
21FindingsWhy DDP Cooperatives?
- Enables collaborative pooling of resources
(staff, expertise areas, technology,
infrastructure, funds) - Also allows institutions to retain ownership
individually of their part of the infrastructure,
expertise, and operations - Defuses competitive jockeying between CMOs no
one institution is the primary leader to which
the others sign agreements - Allows for decentered ongoing operations as
individual institutions may join or leave - Flexible cooperatives can be assembled quickly
without onerous new overhead, by leveraging sunk
costs in existing institutions - Nonprofit organization promotes trust by other
institutions from public sector
22Questions and Answers
- Some contacts
- Martin Halbert (MetaArchive President, Emory
representative) mhalber_at_emory.edu - Tyler Walters (MetaArchive Treasurer, GA Tech
representative) tyler.walters_at_library.gatech.edu - Katherine Skinner (MetaArchive Executive
Director) kskinne_at_emory.edu - Martha Anderson (LoC Program Officer)
mande_at_loc.gov