Title: Digital Asset Management Part 7: Archiving and Preservation
1Digital Asset ManagementPart 7 Archiving and
Preservation
- Presented by Anthony D. Smith
- Prepared for OceanTeacher Academy Training
Course, 2 7 October 2009
2Archiving and Preservation
- In general, archives consist of records which
have been selected for permanent or long-term
preservation on grounds of their enduring
cultural, historical or evidentiary value.
WikipediA
3Archiving and Preservation
- National Archives and Records Administration
(NARA) in the U.S., states its preservation
objectives are to - Â
- Ensure the material health of Federal records in
our custody so they remain available for public
use as long as possible. - DefinitionPreservation encompasses the
activities which prolong the usable life of
archival records. Preservation activities are
designed to minimize the physical and chemical
deterioration of records and to prevent the loss
of informational content.
Putrajaya, Malaysia
4Archiving and Preservation
The ambitious conservationist lives by the
ideology that,
Must understand the physical properties and how
environmental conditions impact the physical
composition of an information resource.
Creating new instantiations is sometimes
important toward the continued use of an
informational resource.
5Archiving and Preservation
- Digital Preservation has same concerns
- ? 0
- New instantiations or interpretations are needed
in support of continued use. - Both of these basic concepts are addressed in the
Open Archival Information System (OAIS)
AFRICA. East and South Africa. 1997.
6Archiving and Preservation
- Open Archival Information System (OAIS)
- Provides a framework for the understanding and
increased awareness of archival concepts needed
for long term digital information preservation
and access. - Long-term meaning,
- long enough to be concerned with the impacts of
changing technologies, including support for new
media and data formats, or with a changing user
community. Long Term may extend indefinitely.
Advertisement for the Ostend-Dover turbine
steamer, showing people on shore In foreground
looking at steamer in background., ca. 1900
7Archiving and Preservation
- OAIS is a reference model defined by the
Consultative Committee for Space Data Systems in
2002 - Member Agencies
- Agenzia Spaziale Italiana (ASI)/Italy.
- British National Space Centre (BNSC)/United
Kingdom. - Canadian Space Agency (CSA)/Canada.
- Centre National dEtudes Spatiales (CNES)/France.
- Deutsches Zentrum für Luft- und Raumfahrt e.V.
(DLR)/Germany. - European Space Agency (ESA)/Europe.
- Instituto Nacional de Pesquisas Espaciais
(INPE)/Brazil. - National Aeronautics and Space Administration
(NASA)/USA. - National Space Development Agency of Japan
(NASDA)/Japan. - Russian Space Agency (RSA)/Russian Federation.
Zagreb, Croatia
8Archiving and Preservation
- The OAIS reference model has been widely adopted
by information management community - The reference model addresses a full range of
archival information preservation functions
including - Ingest, archival storage, data management,
access, and dissemination. - It also addresses the migration of digital
information to new media and forms - The data models used to represent information
- The role of software in information preservation,
and the exchange of digital information among
archives. - It identifies both internal and external
interfaces to the archive functions, and it
identifies a number of high-level services at
these interfaces.
KENYA. 2007. KENYA. Mombasa. The old city
inhabited mainly by Muslims. Schoolchildren play
in front of an ad for a beauty salon painted on
the wall.
9Archiving and PreservationOAIS
Producer
Consumer
Management
- Producer is the role played by those persons, or
client systems, which provide the information to
be preserved. - Management is the role played by those who set
overall OAIS policy as one component in a broader
policy domain. In other words, Management control
of the OAIS is only one of Managements
responsibilities. Management is not involved in
day-to-day archive operations. The responsibility
of managing the OAIS on a day-today basis is
included within the OAIS in an administrative
functional entity that will be described in 4.1. - Consumer is the role played by those persons, or
client systems, that interact with OAIS services
to find and acquire preserved information of
interest. A special class of Consumers is the
Designated community. The Designated Community is
the set of Consumers who should be able to
understand the preserved information.
10Archiving and PreservationOAIS
- Content Information Target information for
preservation. It might be a PDF, TIFF, or even a
complex web site. - Preservation Description Information (PDI)
Contains 4 types of preserving info - Provenance describes the source of the Content
Information, who has had custody since its
origination as well as processing history. - Context How does the Content Information relate
to other information in the world. There may be
a description here of why it was produced. If
its an article, this section might include cited
sources. - Reference any identifiers that uniquely define
the item, for example an ISBN. - Fixity protects against undocumented change.
Usually fixity is invoked using checksum error
checking. - Packaging Information binds the Content
Information and the PDI, but also will include
information related to the operational
environment of the content information.
11Archiving and PreservationOAIS
Entity / Relationship Model
12Archiving and Preservation
- What scenarios warrant digital preservation?
- Deteriorating analog
- Born digital
- Digitization to enhance access?
UKRAINE. Kiev. Independence Square. 2006.
13Archiving and Preservation
- What is metadata? Metadata means data about data.
- Descriptive - describes a resource for purposes
such as discovery and identification. It can
include elements such as title, abstract, author,
and keywords. - Structural - indicates how compound objects are
put together, for example, how pages are ordered
to form chapters. - Administrative - provides information to help
manage a resource, such as when and how it was
created, file type and other technical
information, and who can access it. - Preservation
- Rights management
- Technical
CROATIA. Zagreb . 1995.
14Archiving and PreservationPREMIS
- PREMIS (Preservation Metadata Implementation
Strategies) - Charge Develop a core set of implementable
preservation metadata, broadly applicable across
a wide range of digital preservation contexts and
supported by guidelines and recommendations for
creation, management, and use.
15Archiving and PreservationPREMIS
- The entities in the PREMIS data model are defined
as follows - Intellectual Entity a set of content that is
considered a single intellectual unit for
purposes of management and description - Object (or Digital Object) a discrete unit of
information in digital form. - Event an action that involves or impacts at
least one Object or Agent associated with or
known by the preservation repository. - Agent person, organization, or software
program/system associated with Events in the life
of an Object, or with Rights attached to an
Object. - Rights assertions of one or more rights or
permissions pertaining to an Object and/or Agent.
16Archiving and PreservationFile Management and
Data Integrity
17Archiving and PreservationFile Management and
Data Integrity
18Archiving and Preservation File Management and
Data Integrity
- Some Strategies for Redundancy
- RAID (Redundant Array of Inexpensive Disk)
- Series of disk drives are used to store data
- Different Levels of RAID but essentially all
store data in manner where it can be
reconstructed in the event of a disk drive
casualty. - Failed drive is removed, new drive installed and
the data is automatically reconstructed.
VIETNAM. 1992. Nha Trang
19Archiving and Preservation File Management and
Data Integrity
- Some Strategies for Redundancy
- Offline Backup
- Tape backup is the most widely used system in a
typical IT server facility - A formal backup policy outlines
- A backup schedule (frequency)
- Type of backup at given intervals
- Differential - data that has changed since the
last full backup - Incremental changes since the last backup
- Full - Everything
- Good practice should include multiple sets of
tapes, allowing for offsite storage to minimize
the risk of loss from fire, etc.
MALAYSIA. Kuala Lumpur. Airport. 2003
20Archiving and Preservation File Management and
Data Integrity
- Some Strategies for Redundancy
- LOCKSS (Lots of Copies Keep Stuff Safe)
- Based at Stanford University Libraries, is an
international community initiative that provides
libraries with digital preservation tools and
support so that they can easily and inexpensively
collect and preserve their own copies of
authorized e-content. - How it Works - A library uses LOCKSS software to
turn a low-cost PC into a digital preservation
appliance called a LOCKSS Box that performs the
following four functions - It collects content from the target web sites
using a web crawler similar to those used by
search engines. - It continually compares the content it has
collected with the same content collected by
other LOCKSS Boxes, and repairs any differences. - It acts as a web proxy or cache, providing
browsers in the library's community with access
to the publisher's content or the preserved
content as appropriate. - It provides a web-based administrative interface
that allows the library staff to target new
journals for preservation, monitor the state of
the journals being preserved, and control access
to the preserved journals.
21Archiving and Preservation File Management and
Data Integrity
- Fixity - In preservation terms, means that the
digital object has not been changed between two
points in time or events. - -PREMIS Definition
- Natural events in the lifecycle of a digital file
may cause change to a file - Submission
- Migration
- Transfer to different media
- Network transfer
- Time
- Typically, a fixity check is run to determine the
algorithmic value of a file on ingest. - The value is stored and serves as a reference
value for later use. - Monitoring for no change over time
- At specified intervals, the files reevaluated for
their algorithmic value and this value is then
compared to the reference value for change. - If the values are equal, the archived file has
survived another day - If the values are unequal, action must be taken
quickly. - Normally, a backup is called into service
22 E-Science/Cyberinfrastructure
- We need a definition!
- Describes computationally intensive science that
is carried out in highly distributed network
environments - Science that uses immense data sets that require
grid computing - Term sometimes includes technologies that enable
distributed collaboration, such as the Access
Grid.
2007. TANZANIA. Zanzibar. Young men at play,
jumping in the Indian Ocean.
23 E-Science/Cyberinfrastructure
- At this stage, the primary question being asked
by libraries - What exactly is our role in this?
SENEGAL. Dakar. University student 2007.
24Defining Digital Asset Mgt
- Management objectives
- Acquire and publish, organize, provide access,
curate, and de-accession resources
25Scholarly Communication
in the past,libraries involved at this end
traditional research publication
publishedresearchnon-traditional
unpublishedresearchtraditional
secondarytertiaryresources
publishedresearchtraditional
publisheddata/datasets
analyzeddata/datasets
currently many attempts todata mine to uncover
data
processeddata/datasets
metadata curation profiles for data(i.e., data
repositories) allowforward/backward movement
through scholarly communication process ltD.
Scott Brandt (Purdue)gt
rawdata/datasets
ARL Workshop on New Collaborative
Relationships 26-27 September 2006
26 E-Science/Cyberinfrastructure
- Long-term stewardship is not about saving bytes
its about creating, building, and evolving
expertise in the community. - There are multiple players/responsible parties in
the problem (and solution) space, who have
varying levels of understanding of and interest
in the issues - Universities
- Libraries and librarians (lower-case l)
- Domain specialists
- Computer scientists
- Standards-setting bodies
- Editors
- Professional societies
- Publishers
- Commercial and not-for-profit vendors
- Funding agencies
The Role of Academic Libraries in the Digital
Data Universe
- Break-Out Session New Partnership Models
- Bob Hanisch and Brian Schottlaender
- Co-Leaders
- ARL Workshop on New Collaborative Relationships
- 26-27 September 2006
27 E-Science/Cyberinfrastructure
- My Own Institution
- Formed an e-science committee to begin
investigating that looming question, what is our
role? - Started having dialogue with our High Performance
Computing Center. - 576 CPU IBM p5-575 cluster and a 224 CPU Linux
Xeon Cluster - Developing a survey instrument to better
understanding the data needs of our faculty
Lome Central Market, Togo's capital city
28 E-Science/Cyberinfrastructure
- How can libraries help?
- Should we just stay out of the way?
- What about our responsibility to safeguard the
recorded record of humankind? - Theyve never asked us to store data before
- Why has it suddenly become important?
29Part 7 Archiving and Preservation