Title: Digital Preservation - Outline
1Digital Preservation - Outline
- Introduction - Definitions, Facts, Challenges
- Digital Archiving A Life Cycle View
- Metadata
- Strategies
- RUL Projects
- Trusted Digital Repositories
2Digital Dark Ages?
- As we move into the electronic era of digital
objects it is important to know that there are
new barbarians at the gate and that we are moving
into an era where much of what we know today,
much of what is coded and written electronically,
will be lost forever. We are, to my mind, living
in the midst of digital Dark Ages consequently,
much as monks of times past, it falls to
librarians and archivists to hold to the
tradition which reveres history and the published
heritage of our times. (Kuny, 1998)
3Preservation
- The urge to preserve is endemic to our roles as
librarians. - The patent office, home to nearly 6.5 million
patents dating to 1790, is converting to an
electronic database and discarding a significant
portion of its paper files after they have been
scanned and digitized. -Mitchell, A. (2001).
Ingenuitys Blueprints, Into Historys Dustbin.
NY Times. December 30, 2001, p. A1. - A scenario A truck loaded with hazardous waste
is headed toward a dump site. Will our
descendants know where we have buried the waste?
(Bide, et al, 1999)
4Digital Preservation Some Numbers
- 20 Trillion loss of information expected over
the next 20 years (Lysakowski and Leibowitz,
2000) - Within 10 years, the total number of electronic
records could be doubling every 60 minutes. - From an economic model, the cost of converting
from MS-Office95 to Office97 is estimated at
711,110 work years. - 80 Million books in the US are rapidly
deteriorating - Yale University states that 80 of their
collection is endangered. - Print material
- All print material (ascii text) published in the
world each year could be stored in about 5
terabytes - Images
- Over 80 billion photographs are taken each year
which would take 400 petabytes to store.
5Numbers continued(from http//www.ccsf.caltech
.edu/roy/dataquan/)
- Megabyte one million bytes
- Gigabyte 1000 megabytes
- Terabyte 1000 gigabytes
- 10 terabytes the printed collection of the US
Library of Congress - Petabyte 1000 terabytes
- 2 petabytes all the material in US academic
research libraries - Exabyte 1000 petabytes
- 5 exabytes all words ever spoken by human beings
6Preservation in Digital Libraries
- PreservationThe managerial, financial, and
technical issues involved in preserving library
(or archive) materials in all formats - and/or
their information content - so as to maximize
their useful life (Eden, 1997) - Digital preservation is defined as the managed
activities necessary for ensuring - 1. The long term maintenance of a byte stream and
- 2. Continued accessibility of the contents thru
time and changing technology. - Digital Libraries vs. Digital Archives Archives
make a commitment to long-term preservation of
digital information. (Joint Task Force on
Digital Archiving)
7Why Would You Digitally Preserve?
- Protect original print artifact
- Provide access by accurately representing
originals - Preserve material that exists in electronic form
only - Enhance research by improving originals
- High resolution imagery to study details
- Searchable text
8The Challenges of Digital Preservation
- Lack of standards (or too many standards)
- Lack of documentation on production and use
- Cost and rapid obsolescence of technology
- Impermanence of the medium
- Mutability of the content (easily changed legal
issues) - Version control
- Need to guarantee integrity of digital
information - Migration of information (driven by external
factors)
9What to Archive A Checklist
- Historical and research value
- Aesthetic and artistic merit
- Uniqueness of an item
- Subject content relevant to Institution
- Access Restrictions and inventory
- Condition
- Frequency of use frequency of change
- Ownership
- Redundancy concern for loss or modification
- Length of preservation
- Is any other institution archiving the material?
10Candidates for Preservation
- Material created (not digitized) in digital
format - Reference databases (online catalogs, subject
specific indexes, etc) - Electronic journals
- Digital maps
- Data
- Websites (e.g. research guides, web-based
databases, documents) - Government information
- Census data, international statistics (Do we rely
on the government to preserve this material?) - Consortiums such as Inter-university Consortium
for Political and Social Research (ICPSR) have a
role. - Print material/manuscripts that are digitized for
access and/or preservation - Original documents not retained (e.g. as in the
NJ Environmental Digital Library) - Original document retained (as in Special
Collections) - Electronic (analog) media that is digitized
(audio, video tapes)
11Digital Archiving A Life Cycle View
- Creation
- Acquisition and Collection Development
- Identification and Cataloging
- Storage
- Preservation (incl. Metadata)
- Access
- from (Hodge, 2000)
12Digital Preservation Strategies
- Migration transferring digital materials from
one media or format to another because of
obsolescence, failure in media, software updates,
standards, etc. - Emulation refers to the process of mimicking, in
software, a piece of hardware or software so that
other processes think the original
equipment/function is still available in its
original form. (http//www.nla.gov.au/padi/topics/
17.html) - Encapsulation A technique of grouping together a
digital object and anything else necessary to
provide access to that object. This technique
aims to overcome the problems of the
technological obsolescence of file formats
because the details of how to interpret the
digital bits in the object can be part of the
encapsulated information. (http//www.nla.gov.au/p
adi/topics/17.html)
13Migration of Digital Information
- Reasons for Migration
- Medium refreshing (e.g. rewrite a CD)
- Medium conversion (diskette to CD)
- Format conversion (ascii to pdf)
- Version upgrade (Office97 to Office2000)
- Migration of technical environment (W98 to NT)
14The Migration Process
- Error Prone
- Labor intensive and expensive
- Governed by external factors
- The only approach that works for now
15RUL Projects A Sampling
- Medieval Early Modern Data Bank
- Eagleton Public Opinion Polls
- The Augustine Collection
- REALITI A Digital Preservation Framework
16Medieval Early Modern Data Bank - MEMDB
- Characteristics
- At http//www.scc.rutgers.edu/memdb
- Content commodity prices in the medieval period
- Access public domain
- Compiler Co-directors of MEMDB
- Owner RUL?
- Archiver (who should archive?)
- Type Database on the web
- Format html, Active server pages, MS-Access,
html - Metadata reqmts numeric data
- Questions What is the primary document? How long
should it be preserved? Extent of document?
Owner? Preserve look feel?
17Eagleton Public Opinion Polls
- Characteristics
- At http//www.scc.rutgers.edu/eagleton_tst
- Content New Jersey public opinion (1970 - )
- Access public domain
- Compiler Eagleton Institute
- Owner Eagleton/Star Ledger
- Archiver RUL/Scholarly Communication Center
- Type database on the Web
- Format html, MS-Access, portable spss files
- Metadata reqmnts Questionnaires numeric data
- Questions Preserve look feel, spss
(proprietary software)
18The Augustine Collection
- Characteristics
- At http//www.scc.rutgers.edu/augustine
- Content Photographs from 19th Century New Jersey
Shore - Access public domain
- Compiler William F. Augustine
- Owner RUL Special Collections
- Archiver RUL Special Collections
- Type image archive
- Format html, jpeg
- Metadata reqmnts original artifacts
- Questions image format, preserve digital
archive, individual items/collection
19REALITI A Digital Preservation
Framework(Rutgers Electronic Access to Library
Information thru Technology Integration)
- Characteristics
- At http//www.scc.rutgers.edu/realiti
- Content Civil War period in New Jersey
- Access public domain
- Compiler RUL Special Collections
- Owner RUL
- Archiver RUL Special Collections/SCC
- Type Images on the Web
- Format html, ColdFusion, MS-Access, PDF,
djvu,tiff - Metadata Preservation, multiple formats
- Questions formats, compression, metadata,
original artifact
20Preservation Metatdata for Digital Collections
- Collection Level
- Persistent identifier
- Date of creation
- Structural type (e.g. ascii text, jpeg images,
etc) - Technical infrastructure files, databases, html,
etc. - File description
- System requirements
- Installation requirements
- Storage information
- Access inhibitors
- Access facilitators
- Preservation action permission
- Validation (information about validation
mechanism) - Relationships (to other objects)
- (continued)
- Quirks (any characteristic that may cause loss
in funtionality) - Archiving decision (work)
- Decision reason (work)
- Institution responsible for archiving decision
- Archiving decision (manifestation)
- Decision reason (manifestation)
- Institution Responsible for Archiving Decision
(manifestation) - Intention Type
- Institution with preservation responsibility
- Process
- Record Creator
- Other
(from National Library of Australia
http//www.nla.gov.au/preserve/pmeta.html )
21Trusted Digital Repositories (http//www.rlg.org/
pr/pr2001-attributes.html)
- A Proposed Framework for a Trusted Archival
Agent - Administrative adherence to agreed upon
standards - Organizational commitment to management on
behalf of depositors - Financial adherence to good business practices
- Technological infrastructure in place with
upgrade policies - Security policies for security, auditability,
and backup - Procedural Repository practices will be in
place and documented.
22Possible Organizational Models(Who might be a
digital repository)
- Originators (e.g. individual researchers)
- Publishers (What happens when the publisher goes
out of business?) - Libraries, museums, and other conservator
institutions - National libraries and archives
- Cooperative service agencies (e.g. OCLC, RLG,
ICPSR for social science research) - Segmented market providers (e.g. Bell Howell
for preserving dissertation literature and Early
English Books) - Private storage providers
- Computer centers
- Scholarly associations (e.g. American Institute
of Physics) - Indexing and abstracting services
- Certified digital archives.
23Institutional Efforts
- OCLC and Web Document Digital Archive (WDDA)
Project - Tools for libraries and archives to preserve and
maintain access to digital content - At http//www.oclc.org/digitalpreservation
- RLG Cultural Materials
- Cultural Materials is being developed through
members to set the conditions for contributing
and distributing their digital surrogates of
valuable collections. - The goal is a growing, significant, online
resource and service solution. - At http//www.rlg.org/culturalres
- LOCKSS A permanent web publishing and access
system - Addresses problem of material no longer available
from the publisher - Modeled on distributed print libraries. Reich, et
al, (2001). D-Lib Magazine, 7, (6). - OAIS Open Archival Information System Reference
Model - Requirements for any system responsible for
preserving any type of information over a long
period. - At http//ssdoo.gsfc.nasa.gov/nost/isoas
24Digital Preservation Concluding Thoughts
- Librarians and archivists are a key to the
solution. - A major academic scandal will have to happen
first . . . in order to focus attention and
resources. (Graham, 2000). - A combination of solutions will be employed
including migration and emulation. - Digital archaeology will be used to recover
lost data.
25Preservation Resources on the Web
- Institutional Issues
- ARL Preservation Program (http//www.arl.org/prese
rv/index.html) - Digital Preservation Needs and Requirements in
RLG Member Institutions (http//www.rlg.org/preser
v/digpres.html) - RLG DigiNews (http//www.rlg.org/preserv/diginews/
) - Technical Information/Papers
- Avoiding Technological Quicksand
(http//www.clir.org/pubs/reports/rothenberg/conte
nts.html) - PADI - Preserving Access to Digital Information -
from the National Library of Australia
(http//www.nla.gov.au/padi/). - Background Papers and Technical Information -
from LOC American Memories site
(http//memory.loc.gov/ammem/ftpfiles.html) - Preservation of electronic information - a
bibliography (http//homes.ukoln.ac.uk/lismd/pres
ervation.html) - Digital Imaging Tutorial - http//www.library.corn
ell.edu/preservation/tutorial/
26More Information on the Web
- Technical Information/Papers (continued)
- CLIR Publications (http//www.clir.org/pubs/report
s/reports.html) - Kuny, T. (1998/May). The digital dark dges?
Challenges in the preservation of electronic
information. International Preservation News,
(17), At http//www.ifla.org/VI/4/news/17-98.htm
2 - Hodge, G. M. (2000). Best practices for digital
archiving An information life cycle approach.
D-Lib Magazine, 6, (1), available at
http//www.dlib.org/dlib/janauary00/01hodge.html - Handbooks
- Hunter, G. S. (2000). Preserving Digital
Information A How-To-Do-It Manual, New York
Neil-Schuman Publishers - Sitts, M. K. (2000). Handbook for Digital
Projects A Management Tool for Preservation and
Access, Andover, Massachusetts Northeast
Document Conservation Center
27References
- Bide, M, Potter, E, Watkinson, A. (1999) ,
Digital Preservation an introduction to the
standards issues surrounding the deposit of
non-print publications. At www.bic.org.uk/digpres
.doc - Graham, P. (2000). RLG and Archiving at the
heart of the research library mission. RLG News.
Winter 2000. (50). P. 12 13. - Graham, P. (1998/February). Digital strategies
for the Rutgers University Libraries a white
paper draft. DRAFT 4. - Hedstrom, M. Montgomery, S. (1998). Digital
Preservation Needs and Requirements in RLG Member
Institutions A Study Commissioned by the
Research Libraries Group. Available at
http//www.rlg.org/preserv/digpres.html - Hodge, G. (2000). Best practices for digital
archiving An information life cycle approach.
D-Lib Magazine, 6, (1). Available at
http//www.dlib.org/dlib/january00/01hodge.html. - Lysakowski, R. Leibowitz, Z. (2000). Looming
information age crisis expected to cause
trillion-dollar losses over the next 20 years
Titantic 2020 a call to action. Available at
http//www.censa.org. - Rothenberg, J. (1998/January). Avoiding
Technological Quicksand Finding a Viable
Technical
Foundation for Digital Preservation. Available
at http//www.clir.org/pubs/reports/rothenberg/co
ntents.html
28(No Transcript)
29Migration Complexity of the Technical
Environment
30Preservation in Digital Libraries
- PreservationThe managerial, financial, and
technical issues involved in preserving library
(or archive) materials in all formats - and/or
their information content - so as to maximize
their useful life (Eden, 1997) - Digital preservation The term refers exclusively
to the preservation (whatever exactly that
entails) of material which is available solely?
in electronic form (Bide, 1999). - And the digital version is considered to be the
primary archival item. (Hedstrom, 1998) - Digital Libraries vs. Digital Archives Archives
make a commitment to long-term preservation of
digital information. (Joint Task Force on
Digital Archiving)
31Digital Archiving- Getting Started
- Form an archiving working group
- Prepare a preliminary policy statement
- Trial the policy statement with several small,
existing projects - Examine what others are doing and bring in best
ideas. - Collaborate with others who are interested in
digital in preservation. - Initiate forums on digital archiving invite
colleagues, students, researchers, etc. - Submit a recommendation for a digital archiving
program and next steps.
32Trusted Digital Repositories(http//www.rlg.org/p
r/pr2001-attributes.html)
- A Proposed Definition (from RLG document)
- Technology Infrastructure
- Auditability, security, and communication
- Backup policies incl. avoiding, detecting and
restoring corrupted data - Organization
- Certification
- Compliance
- Reputation and performance
- Agreements between creators and providers
- Open sharing of what is being preserved and for
whom - Balanced risk, benefit, and cost