Title: Digital Preservation
1Digital Preservation
2The Past is Prologue
- Developing Preservation Approaches
3Diagram by Nancy Y. McGovern based on PhD
Research, March 2001
45 Stages of Digital Preservation
- Digitization leads to understanding that digital
content needs to be managed and protected - Digital Preservation Projects are initiated
- Digital Preservation Projects segue into Programs
- Digital Preservation Programs become
comprehensive and coordinated - Institutional Programs embrace Inter-institutional
Collaboration
5Digital Preservation Officer
- First DPO appointed January 2002
- http//www.library.cornell.edu/iris/dpo/
- coordinates digital preservation policy
development and implementation - serves as the liaison to digital preservation
initiatives and projects - developing a conceptual framework for a cohesive
digital preservation program
6Models and Standards
- Attributes of a Trusted Digital
Repository (RLG-OCLC) - http//www.rlg.org/longterm/attributes01.pdf
- OAIS Reference Model (CCSDS)
- http//www.ccsds.org/documents/pdf/CCSDS-650.0-R-2
.pdf
7Models and Standards
- SIP Transfer Issues
- Producer-Archive Interface Methodology Abstract
Standard (CCSDS) - http//ssdoo.gsfc.nasa.gov/nost/isoas/CCSDS-651.0-
W-1.pdf - AIP Components (OCLC/RLG PMWG)
- Content Information
- Preservation Description Information
- http//www.oclc.org/research/pmwg/
- Format Issues
- Draft Standard - Data Dictionary - Technical
Metadata for Digital Still Images (NISO) - http//www.niso.org/committees/committee_au.html
8Attributes of a Trusted Repository
91. Administrative responsibility
- Provide evidence of fundamental commitment to
standards, best practices - Commit to OAIS model
- Meet standards on environment (6)
- Share measurements with depositors (6)
- Involve external community experts in
validating/certifying practices (6) - Commit to transparency and accountability (6)
102. Organizational viability
- Demonstrate viability and trustworthiness (3)
- Reflect commitment to long-term
retention/management in mission statements - Have appropriate legal status, staff and
professional development (1)(3) - Establish transparent business practices,
effective management policies (6)(3) - Define inclusive agreements with depositors (6)
- Review/maintain policies and procedures (6)
- Undertake risk management, contingency and
succession (trusted inheritors) planning (6)(3)
113. Financial sustainability
- Establish/maintain good business practices and an
auditable business plan (1)(2) - Demonstrate financial fitness and ongoing
financial commitment (1)(2) - Balance risk, benefit, investment, expenditure
- Maintain adequate budget and reserves and
actively seek potential funding sources
124. Technological suitability
- Consider/adopt appropriate preservation
strategies (6) - Ensure appropriate infrastructure for
acquisition, storage, access (5) - Establish technology management policy for
repository (2)(3) - Comply with relevant standards and best
practices, adequate expertise (6) - Undergo regular external audits on system
components and performance (6)
135. System security
- Assure security of systems for digital assets (3)
- Establish policies and procedures to meet
requirements (4)(6) - Stress processes that will detect, avoid and
repair loss, document and notify of changes and
resulting actions (4)(6)
146. Procedural accountability
- Enact policies and procedures for tasks and
functions, document practices (1)(2) - Establish monitoring mechanisms to ensure
continued operation of systems and procedures
(4)(5) - Record/justify preservation strategies (1)(2)
- Set up feedback mechanisms for problem
resolution negotiate evolving requirements
between providers and consumers (1)(2)
15Framework Components
- Administrative Responsibility
- Organizational Viability
- Financial Sustainability
- Technological Suitability
- System Security
- Procedural Accountability
16Diagram by Nancy Y. McGovern based upon the
RLG-OCLC Attributes of a Trusted Repository
17Open Archival Information System (OAIS)
18Framework to Model
19Overview of the OAIS Model
from Reference Model for an Open Archival
Information System 4
20OAIS Categories
- Data Object
- Representation Information
- (Structure, Semantic, and Other Information)
- Content Information 1
- (Data Object Representation Information)
- Preservation Description Information 2
- (Reference, Context, Provenance and Fixity
Information) - Descriptive Information
- (Content Information PDI)
- Packaging Information
- physically and logically binds
21OAIS at Cornell
22Preserving Essential Elements
- Content
- Context
- Structure
- Appearance
- Behavior
23Emulation
- Jeff Rothenberg
- Dutch National Library
- IBM
- CAMiLEON Project
- David Bearman
24Migration
- Risk Management of Digital Information A File
Format Investigation - Charles Dollar
- Margaret Hedstrom
- CAMiLEON Project
- Dutch Testbed Project
25XML and Object-Based
- NARA and SDSC
- Dutch Testbed Project
- Victoria Electronic Records Project (VERS)
- Harvard SIP proposal
26Project Prism
- CUL Research Team
- Anne R. Kenney
- Nancy Y. McGovern
- Peter Botticelli
- Richard Entlich
27Risk Management Stages
Typical Stages Prism Stages
1. Risk identification Data gathering Characterization
2. Risk classification Data gathering Characterization
3. Risk assessment 2. Simple risk declaration 3. Contextualized declaration/detection
4. Risk analysis 2. Simple risk declaration 3. Contextualized declaration/detection
5. Program implementation 4. Automated enforcement
28Levels of Context
- Web page
- as a stand-alone object, ignoring its hyperlinks
- in local context, considering the links into it
and out from it - Web site
- as a semantically coherent set of linked Web
pages - as an entity in a broader technical and
organizational context
29(No Transcript)
30Page-level Monitoring
- Formatting TIDY
- Standards compliance
- Document structure
- Metadata
- HTTP headers
- HTML headers
- Changes
- Content
- Location
- Links
- Out-link structure
- In-link structure
- Intra-site
- Hub
- Volatility
- Page provenance
- URL parsing
- Log analysis
31Site-level Monitoring
- Graph analysis
- Static site analysis and Longitudinal study
- Aggregate page analyses
- Site maintenance indicators
- Backup and archiving policies and procedures
- Hardware and software environment
- Network configuration and maintenance
32Research Plan
- Preservation Risk Management for Web Resources
Virtual Remote Control in Cornells Project Prism
- By Anne R. Kenney, Nancy Y. McGovern, Peter
Botticelli, Richard Entlich, Carl Lagoze, and
Sandra Payette - DLib Magazine, January 2002
- http//www.dlib.org/dlib/january02/kenney/01kenney
.html
33(No Transcript)
34Publisher-Based Digital Archives
35Subject-Based Digital Archives
36Intersection of Digital Archives
Format-based
37(No Transcript)
38Relevant Initiatives
- Metadata Encoding and Transmission Standard
(METS) - http//www.loc.gov/standards/mets/
- highlighted Web site in RLG DigiNews February
2002 - Flexible and Extensible Digital Object and
Repository Architecture (FEDORA) - Mellon Fedora Project
- http//fedora.comm.nsdlib.org
- Slides from January 2002 briefing
http//www.cs.cornell.edu/payette/presentations
39Relevant External Projects
- NEDLIB
- http//www.kb.nl/coop/nedlib/
- CAMiLEON (CEDARS)
- http//www.si.umich.edu/CAMILEON/index.htm
- http//www.leeds.ac.uk/cedars/
- PANDORA
- http//pandora.nla.gov.au/index.html
- Harvard University LDI
- http//hul.harvard.edu/ldi/
- NARA SDSC
- http//www.nara.gov/era/