Title: Preservation Rumination
1Preservation Rumination
- Priscilla Caplan,
- FCLA
- OCLC DSS
- February 16, 2005
2Preservation Basics
3THE NEED FOR DIGITAL PRESERVATION
- Number of academic/scholarly journals published
online 15,757 - Percent of U.S. federal government publications
produced only online in 2003 65 percent - Estimated percent of U.S. federal government
publications available only online by 2008 90
percent - From California Digital Library
- http//www.cdlib.org/inside/projects/preservation
4The problem of abundance
5The problem of ephemerality
- Percent of web-based references in scientific
articles from 3 major journals inaccessible
within 2 years of publication 21 - Proportion of websites in 1998 gone in 1999 44
- Life of an average website 44 days
6The problems of media life expectancy and
obsolescence
7The problem of format obsolescence
8(No Transcript)
9The problem of rights
10Preservation strategies
Renderability
Media management
Viability
Secure storage
Integrity
Identity
Description
Capture Selection
Availability
The Preservation Pyramid
11Authenticity
12Traditionally, preserving things meant keeping
them unchanged however if we hold on
to digital information without modifications,
accessing the information will become
increasingly more difficult, if not
impossible. From The Paradox of
Preservation, Su-Shing Chen
13Preservation metadata ...is the information
necessary to maintain the viability,
renderability, and understandability of digital
resources over the long-term. OCLC/RLG Preservat
ion Metadata Framework Working Group
Understandability
14Understandability
Authenticity
Preservation strategies
Renderability
Media management
Viability
Integrity
Secure storage
Identity
Description
Availability
Capture Selection
Revised Preservation Pyramid
15Who is doing preservation?
Research Libraries Government Archives Historical
Societies Individual Collectors
16Who is doing digital preservation?
Research Libraries Government Archives Historical
Societies Individual Collectors National
Libraries Research Centers Public broadcasting
17Understandability
Authenticity
Preservation strategies
Renderability
Media management
Viability
Integrity
Secure storage
Identity
Description
DSPACE
Availability
Capture Selection
18Understandability
Authenticity
Preservation strategies
Renderability
Media management
Viability
Integrity
Secure storage
Identity
Description
Availability
Capture Selection
LOCKSS
19Understandability
Authenticity
Preservation strategies
Renderability
Media management
Viability
Integrity
Secure storage
OCLC Digital Archive
Identity
Description
Availability
Capture Selection
20Understandability
Authenticity
Preservation strategies
Renderability
Media management
Viability
Integrity
Secure storage
LC Minerva
Identity
Description
Availability
Capture Selection
21FCLA Digital Archive
Understandability
Authenticity
Preservation strategies
Renderability
Media management
Viability
Integrity
Secure storage
Identity
Description
Availability
Capture Selection
22Preservation in Action
23(No Transcript)
24State Universities
FCLA
25- Designed as a dark archive
- Preservation repository functions only
- Based on OAIS functional architecture
- Bit-level and Full preservation
- Format migration and normalization
26 OAIS Functional Architecture
27DAITSS Functional Architecture
Reporting
L I B R A R Y
L I B R A R Y
Mgmt DB
Ingest
SIP
Access
DIP
AIP
Storage management
28DAITSS Data Model
Information Package
Intellectual entity (1)
Data File (1..n)
Bitstream (0..n)
29DAITSS Data File Object
DAITSS Bitstream Object
30Risk Management
- Storing multiple master copies of files
- Calculating two message digests
- Storing metadata as XML and in RDBMs
- Normalizing when possible
- Always retaining original
- Action plans and background papers
31Ingest Functions
- METS validation and metadata extraction
- Virus check and checksum verification
- File format identification
- Creation of Data File and Bitstream objects
- Harvesting of external files
- Normalization and Forward Migration
- Technical, relationship and event metadata
- AIP creation
- Storage update
- Data table update
32Ingest Example A simple SIP
XML
SIP
PDF
AVI
33XML
AIP
XML
XML
XML
SIP
TIFF
XML
TIFF
TIFF
PDF
AVI
XML
Database
XML
34Future Plans
- Find partners to install at other places
- Finish DAITSS
- Release under open source license
- Build a community of developers for different
formats
35References
- Priscilla Caplan www.fcla.edu/pcaplan,
pcaplan_at_ufl.edu - FCLA Digital Archive www.fcla.edu/digitalArchive
- Terry Kuny, A Digital Dark Ages?
www.ifla.org/IV/ifla63/63kuny1.pdf - PREMIS Implementation Survey www.oclc.org/research
/projects/pmwg/surveyreport.pdf - Roy Rosenzweig, Scarcity or Abundance?
www.historycooperative.org/journals/ahr/108.3/rose
nzweig.html - ONeil et al. Trends in the Evolution of the
Public Web www.dlib.org/dlib/april03/lavoie/04lav
oie.html - Clifford Lynch, Authenticity and Integrity in
the Digital Environment www.clir.org/pubs/report
s/pub92/lynch.html