Title: Archival, Digital Preservation, and Records Management
1Archival, Digital Preservation, and Records
Management
- David Millman, Columbia University
- Ron Thielen, University of Chicago
2Agenda
- Difference between an Archive, Repository, and
Records Management - The Three Reasons to Archive
- The State of the Industry, Government, Higher Ed,
- Standards
- Policies and Processes
- Steps Toward Archival
- Some Key Issues
3Differences between an Archive, Repository, and
Records Management
- Institutional Repository A system for
collecting, preserving, and disseminating
scholarly content. - Archive A collection of data that is maintained
as a long-term record of a business, application,
or information state. Archives are typically kept
for auditing, regulatory, analysis or reference
purposes rather than for application or data
recovery. - SNIA - Records Management The systematic control of
records throughout their life cycle. ARMA
4Reasons to Archive
- Legal and Regulatory Compliance
- As an Aid to Corporate Memory in Order to Improve
Operational Effectiveness - To Preserve Material of Potentially Historic and
Enduring Value
5Legal and Regulatory Issues
- Some financial records need to be retained for
statutory periods varying up to 10 years - Medical research needs to be retained beyond the
life of the subject - Lack of process for retaining records may be at
best lack of due diligence and at worst
obstruction - It is increasingly common that courts are
unwilling to accept the argument that discovery
would be too difficult or expensive - In some cases they are fining companies that are
too slow to comply with court orders
6Improve Operational Effectiveness
- Act as an Aid to Institutional Memory
- Assist Institutional Governance by Capturing the
Rationale for Decisions - Operational in our Context Extends to Scholarly
Effectiveness
7Historic and Enduring Value
- Not always possible to know a priori what will
have enduring value - Will a researcher in the next century be more
interested in the content of a particular web
site or how the content was presented and in our
browser interface interactions? Both.
8State of the IT Industry
- Used to be all about compliance
- Increasing awareness that there are other reasons
for archival - Scan of IT Industry Organizations
- Scan of IT Vendors
- Scan of Government Initiatives
- Scan of Higher Education Initiatives
9IT Industry Organizations
- SNIA (Storage Network Industry Association) Data
Management Forum (DMF) - LTACSI (Long Term Archive and Compliance Storage
Initiative) - 100 Year Archive Task Force
- SDDF (Self Describing Data Format) Task Force
- ARMA - Association for Records Managers and
Administrators (aka RIM Professionals) Working
with the SNIA - AIIM Association for Information and Image
Management Believes that ISO adoption of PDF/A
is the way to address preservation
10Scan of IT Vendors
- Niche (generally seem to get it)
- Archivas, Permabit, Yosemite
- 800 lb Gorillas (some get it, some dont)
- HP, IBM, EMC, Sun (aka StorageTek)
- Archival Vendors (generally dont seem to get
it) - Commvault, Zantaz, ZipLip, iLumin,
11Survey of Government Authorities and Initiatives
- LOC Library of Congress
- NARA National Archives and Records
Administration - NDIIPP National Digital Information
Infrastructure and Preservation Program
12Survey of Higher Education and Library Initiatives
- DSpace (an institutional repository, not an
archive) - FEDORA (ditto)
- Stanford LOCKSS (Lots of Copies Keep Stuff Safe)
- DAITSS (Dark Archive in the Sunshine State)
- NEDLIB (Networked European Deposit Library)
- JORUM (repository service, U.K.)
- Columbia (DSpace pilots FEDORA in Socioeconomic
Data Center Long-Term Archive) - CDAD (Chicago Digital Archive Depository)
- RLG Digital Repository Certification
- UCSD / SRB (Storage Resource Broker)
- JHOVE (Harvard--object validation service)
13Standards(formal, ad-hoc, and otherwise)
- OAIS Open Archival Information System
- PREMIS Preservation Metadata Standard
- METS Metadata Encoding and Transmission
Standard - EAD Encoded Archival Description
- MADS Metadata Authority Description Schema
- MODS "Metadata Object Description Schema"
- DOD 5015.2 Design Criteria Standard for
Electronic Records Management Software
Applications - ISO 15489 (Records Management)
- and on and on and
14Standards for Access and Interoperation
- Institutional Repository service vs Archive
- Scholarly/Instructional Access issues
- Discovery
- Interoperation/reuse
- Citation stability
- Digital Library issues
- Content structure
- Format migration
15(No Transcript)
16Policy/process
- Strategies
- email nightly incrementals (a backup strategy)
- digital library quarterly curator sign-off (an
archival strategy) - Faculty buy-in
- minimum metadata?
- education
17Education experimentSpectrum of Stability
Citable working- paper
Publication
Versioning
Active collaboration
Multiple users w/collab space functions
File system metaphor / w/some metadata
Institutional repository / metadata
Preserved / archived / cataloged
Library curation
Scholarly research activity
18Five Steps to Archival
- Backup - a backup is not an archive, but backup
processes, support personnel, and infrastructure
may (or may not) support parts of the archival
infrastructure - Simple Bitstream Preservation - keep from losing
the information adds fixity checking, digital
media asset management to backup - Records Management - adds policy based
classification and information life-cycle
management - Intellectual Content Preservation - keep the
format current migrate (or emulate) formats
structures - Archival - adds bibliographic and administrative
metadata
19Sampling of Issues
- Not Enough Cooperation to Build Standards Based
Archival Systems - Its not just about the data
- Metadata is key Where does it come from
(harvest, contributor, cataloger?) - Context is often necessary (e.g. roles,
organizational structures both formal and
informal, provenance) - A Backup is not an Archive
- IP DRM
- Whos Archive Is It?
- Digital Media Asset Management (tape is dead,
long live tape) - Balancing Collection of Everything vs.
Determining Suitability of Material for Archival
(Selection Criteria) - Data Classification (Metadata Driven, Policy
Based Selection Processes?) - Requirements for Research Preservation and
Dissemination - Fixity Checking and Repair
- Disaster Recovery
- ?