Title: Planning to Maximize Longevity of Digital Information
1Planning to Maximize Longevity of Digital
Information
- Howard Besser
- UCLA School of Education Information
- http//www.gseis.ucla.edu/howard
2Planning to Maximize Longevity of Digital Info-
- The Ecology Metaphor
- Why are you Managing this Information?
- Major Issues Facing Digital Projects
- The Short Life of Digital Info
- Important Planning Considerations
- Key Considerations for Imaging Projects
3The Ecology Metaphor
4Why are you Managing this Information?
- Organizational mission type
- Users
- Uses
5Major Issues Facing Digital Projects
- Dangerous Changes in Intellectual Property Law
- Intellectual Access
- Storage
- Delivery
- Integration with other tools
- Interoperability
6Serious Longevity Problems
- What we know from prior widespread digital file
formats - Images separating from their metadata
- Inaccessibility of software needed to view a work
- Inability to even decode the file format of a work
7The Short Life of Digital Info Digital Longevity
Problems-
- Disappearing Information
- The Viewing Problem
- The Scrambling Problem
- The Inter-relation Problem
- The Custodial Problem
- The Translation Problem
8The Viewing Problem
- Digital Info requires a whole infrastructure to
view it - Each piece of that infrastructure is changing at
an incredibly rapid rate - How can we ever hope to deal with all the
permutations and combinations
9The Scrambling ProblemDangers from
- Compression to ease storage delivery
- Container Architecture to enhance digital commerce
10The Inter-relation Problem
- -Info is increasingly inter-related to other info
- -How do we make our own Info persist when it
points to and integrates with Info owned by
others? - -What is the boundary of a set of information (or
even of a digital object)?
11The Custodial Problem
- In the past, much of survival was due to
redundancy - How do we decide what to save?
- Who should save it?
- Mellon-funded E-Journal Archives
- How should they save it?-
12The Custodial ProblemHow to save information?
- Methods for later access
- Refreshing
- Migration
- Emulation
- Issues of authenticity and evidence
13The Translation Problem
- Content translated into new delivery devices
changes meaning - -A photo vs. a painting
- -If Info is produced originally in digital form
in one encoded format, will it be the same when
translated into another format? - Behaviors
14Pieces of the Solution (1/2)
- -We need to insist upon clearly readable
standardized ways for digital objects to
self-identify their formats - -We should discourage scrambling
- -We need to better understand information
inter-relates to other Info, and what constitutes
boundaries of Info objects
15Pieces of the Solution (2/2)
- -People and organizations wishing to make
information persist need guidelines of how to go
about doing it - -We need to better understand how translating
from one storage or display format to another
affects the meaning of a work - -We need to save the behaviors of a digital
object, not just its contents
16Conceptual Approaches to Digital Preservation
- Refreshing always necessary due to volatility of
physical strata - Impact on evidential value
- Migration -- advantages disadvantages
- Emulation -- advantages disadvantages
17To deal with Immediately-
18Persistent IDs--the Problem
- Need to separate work ID from work location
- URNs probably wont be ready until 2003
- Becomes a business process issue when one
organization maintains the resource and another
organization references it (ie. licensed from
vendors or managed by separate administrative
structures)
19More Persistent IDs--the Approach for today
- PURLs
- Handles
- HTTP redirects
- And worry about costs now and conversion costs
when URNs become feasible
20Data Set ManagementMore issues with referencing
IDs
- References for mirror sites
- References for back-up sites when main site is
down or bottle-necked - References for off-site copies and archival copies
21Metadata can be the first line of defense
- Can tell you
- where the file is (if you cant find the file)
- where more info about the file is (if you have
the file but most other metadata has become
separated) - what the file format is
- what the compression scheme is
- what application program and version is needed
for the file
22Structural Metadata Issues
- http//sunsite.berkeley.edu/moa2
23Architecture Separating Longevity and Delivery
Servers
24Groups Working onthe Big Problemhttp//sunsite.B
erkeley.EDU/Longevity/
- CPA Task Force
- Getty Time Bits Conference Follow-ups-
- Emulation experiments in US and Europe
- NEDLIB, CURL, Michigan
- Mellon-funded E-Journal Archive experiments
- Internet Archive
- Long Now
25Time Bits
26Time Bits Participants
- Steward Brand
- Howard Besser
- Brian Eno
- Danny Hillis
- Peter Lyman
- Brewster Kahle
- Kevin Kelly
- Jaron Lanier
- Doug Carlston
- John Heilemann
- Ben Davis
- Margaret MacLean
- Bruce Sterling
- Paul Saffo
27Groups Working onPieces of the Big
Problemhttp//sunsite.berkeley.edu/Longevity/
- Internet Archive
- Long Now
- Emulation experiments in US and Europe
- NEDLIB, CURL, Michigan
28Journal Archiving
- License, dont own may not be even able to
obtain right to make archival copy - Increasingly no paper back-up at all
- Usually we dont have the important redundancy
factor - Stanfords LOCKSS Project (Lots of Copies Keeps
Stuff Safe) and its problems (http//lockss.stanfo
rd.edu)
29Complexity of Rich Media
- Works often have artistic nature (including video
games) - Enormous number of elements can, at times, be
very important to preserve (pacing, original
artifact, elements used to construct the
artifact) - Too complex to save every one of these aspects
for every type of material - Importance of saving documentation
30Important Planning Considerations
- File Formats
- Choosing Interoperable Systems
- Adhere to standards
- Vendors with large installed base
- Refreshing and/or Migration
31Key Considerations for Imaging Projects-
- Users' Needs
- Image Quality
- Intellectual Property
- Standards
- Topology
- Tools Processes
32Key Considerations for Imaging Projects (1 of 3)
- Users' Needs
- Quality of Digital Surrogate
- Interoperable desktop applications
- Image Quality
- Archival
- Current online delivery
33Key Considerations for Imaging Projects (2 of 3)
- Intellectual Property
- Standards
- Modular and Layered Architecture
- Terminology
- Technical imaging information
- Topology
34Key Considerations for Imaging Projects (3 of 3)
- Tools Processes
- Scanners
- Compression techniques
- Linking files
- Workflow
- Interoperable desktop applications
35Some nuts-and-boltsPlanning Considerations
- Think about users (and potential users), uses,
and type of material/collection - Scan at the highest quality that does not exceed
the likely potential users/uses/material - Do not let todays delivery limitations influence
your scanning file sizes understand the
difference between digital masters and derivative
files used for delivery - Many documents which appear to be bitonal
actually are better represented with greyscale
scans
- Include color bar and ruler in the scan
- Use objective measurements to determine scanner
settings (do NOT attempt to make the image good
on your particular monitor or use image
processing to color correct) - Dont use lossy compression
- Store in a common (standardized) file format
- Capture as much metadata as is reasonably
possiple (including metadata about the scanning
process itself)
36One Final QuestionWho will collect the digital
works of today that should become the Special
Collections of tomorrow?
- web sites
- zines
- electronic journals
- listserve and email discussions
- drafts of works that later become famous
37Planning to Maximize Longevity of Digital
Information
- Howard Besser
- UCLA School of Education Information
- http//sunsite.berkeley.edu/Longevity/
- http//www.gseis.ucla.edu/howard
- http//sunsite.berkeley.edu/moa2
- http//lockss.stanford.edu
- http//www.longnow.com/10klibrary/TimeBitsDisc/
- http//www.archive.org/