Title: Creating Working Digital Libraries
1Creating WorkingDigital Libraries
- Howard Besser
- UCLA School of Education Information
- http//www.gseis.ucla.edu/howard
2Creating WorkingDigital Libraries-
- Moving from Digital Collections to Digital
Libraries - Interoperability
- Importance of Standards
- Longevity
- Best Practices for Managing Digital Projects
- Some Wild Musings
3Moving from Digital Collections to Digital
Libraries
- Whats the difference?
- Recent history of Library Automation-
4Developmental Stages
- Experiment with methods
- Build real operational systems
- Build interoperable operational systems
5Traditional Digital Library Model
6Ideal Digital Library Model
7Developmental Stages
- Experiment with methods
- Build real operational systems
- Build interoperable operational systems
- For DL Initiatives
- For OPACs
- For I A Services
- For Image Retrieval
8Key problems were facing
- Discovery
- Interoperability-
- Longevity-
9For Interoperability Digital Libraries Need
Standards
- Descriptive Metadata for consistent description
- Discovery Metadata for finding
- Administrative Metadata for viewing and
maintaining - Structural Metadata for navigation
- ... Terms Conditions Metadata for controlling
access...
10Metadata is not just indexing terms
- CBIR attributes used for retrieval on color,
shape, texture, etc. - Structural attributes used for page-turning
- Administrative attributes used for managing a
digital work over time - IPR attributes to limit unauthorized use
- Identification attributes to determine what
application software is needed to view a
particular digital work - Can be located anywhere
11Why are Standards and Metadata consensus
important?
- Managing digital files over time
- Longevity
- Interoperability
- Veracity
- Recording in a consistent manner
- Will give vendors incentive to create
applications that support this
12Why Standards?
- Why do we need standards?
- To make information universally available to
users - facilitate sharing and interchange of
information - To preserve information (make it safe from
changes in hardware and software) - Standards only work if communities widely accept
them, but theyre necessary for communities to
work together
13Serious Longevity Problems
- What we know from prior widespread digital file
formats - Images separating from their metadata
- Inaccessibility of software needed to view an
image - Inability to even decode the file format of an
image
14Journal Archiving
- License, dont own may not be even able to
obtain right to make archival copy - Increasingly no paper back-up at all
- Usually we dont have the important redundancy
factor - Stanfords LOCKSS Project (Lots of Copies Keeps
Stuff Safe) and its problems (http//lockss.stanfo
rd.edu)
15The Short Life of Digital Info Digital Longevity
Problems-
- Disappearing Information
- The Viewing Problem
- The Scrambling Problem
- The Inter-relation Problem
- The Custodial Problem
- The Translation Problem
16The Viewing Problem
- Digital Info requires a whole infrastructure to
view it - Each piece of that infrastructure is changing at
an incredibly rapid rate - How can we ever hope to deal with all the
permutations and combinations
17The Scrambling ProblemDangers from
- Compression to ease storage delivery
- Container Architecture to enhance digital commerce
18The Inter-relation Problem
- -Info is increasingly inter-related to other info
- -How do we make our own Info persist when it
points to and integrates with Info owned by
others? - -What is the boundary of a set of information (or
even of a digital object)?
19The Custodial Problem
- How do we decide what to save?
- Who should save it?
- How should they save it?
- -methods for later access emulation, migration,
etc. - -issues of authenticity and evidence
20The Translation Problem
- Content translated into new delivery devices
changes meaning - -A photo vs. a painting
- -If Info is produced originally in digital form
in one encoded format, will it be the same when
translated into another format? - Behaviors
21Pieces of the Solution (1/2)
- -We need to insist upon clearly readable
standardized ways for digital objects to
self-identify their formats - -We should discourage scrambling
- -We need to better understand information
inter-relates to other Info, and what constitutes
boundaries of Info objects
22Pieces of the Solution (2/2)
- -People and organizations wishing to make
information persist need guidelines of how to go
about doing it - -We need to better understand how translating
from one storage or display format to another
affects the meaning of a work - -We need to save the behaviors of a digital
object, not just its contents
23Metadata can be the first line of defense
- Can tell you
- where the file is (if you cant find the file)
- where more info about the file is (if you have
the file but most other metadata has become
separated) - what the file format is
- what the compression scheme is
- what application program and version is needed
for the file
24Groups Working onthe Big Longevity
Problemhttp//sunsite.Berkeley.EDU/Imaging/Databa
ses/Longevity/
- CPA Task Force
- Getty Time Bits Conference follow-up
- NEDLIB, CURL, Michigan
- Internet Archive
- Long Now
25Migration/Refreshing
- Impact on evidential value
26Best Practices for Managing Digital Projects-
- Who will your users be?
- Best Practices Guidelines
- Workflow and Management Issues
27Why are you Managing this Information?
- Organizational mission type
- Users
- Uses
28Scanning Best Practices
- Think about users (and potential users), uses,
and type of material/collection - Scan at the highest quality that does not exceed
the likely potential users/uses/material - Do not let todays delivery limitations influence
your scanning file sizes understand the
difference between digital masters and derivative
files used for delivery - Many documents which appear to be bitonal
actually are better represented with greyscale
scans
- Include color bar and ruler in the scan
- Use objective measurements to determine scanner
settings (do NOT attempt to make the image good
on your particular monitor or use image
processing to color correct) - Dont use lossy compression
- Store in a common (standardized) file format
- Capture as much metadata as is reasonably
possible (including metadata about the scanning
process itself)
29Why Scale is important
30Digital Object Behaviors
31Metadata Standards(from MOA2)
- Administrative Metadata
- for enhancing resource management
- Structural Metadata
- for reflecting internal hierarchies and
relationships btwn parts - Raw/Seared/Cooked
32Workflow and Management Issues-
- Managing multiple image files
- Persistent Identification
- Making your works accessible throughout the Net
33The number of variant forms of a work can be
enormous
- different views of the same object
- different scans of the same photo
- different resolutions
- different compression schemes
- different compression ratios
- different file storage formats
- different details of the same image
- ...
34Image Families
35Identification/Provenance
- how to deal with different versions (browse,
hi-res, medium res) derived from the same scan or
different encoding schemes (TIFF, PICT, JFIF) - Vocabulary Standards to express this
- VRA Surrogate Categories
- CIMI's "Image Elements
36Persistent IDs--the Problem
- Need to separate work ID from work location
- URNs probably wont be ready until 2003
- Becomes a business process issue when one
organization maintains the resource and another
organization references it (ie. licensed from
vendors or managed by separate administrative
structures)
37More Persistent IDs--the Approach for today
- PURLs
- Handles
- HTTP redirects
- And worry about costs now and conversion costs
when URNs become feasible
38Data Set ManagementMore issues with referencing
IDs
- References for mirror sites
- References for back-up sites when main site is
down or bottle-necked - References for off-site copies and archival copies
39Making your works accessible throughout the Net
- The DLF/Mellon meeting
- An administrative and political issue as much as
a a technical one
40Some Wild Musings-
- Movement towards packages and away from MARC
- The disappearance of OPACs
41Containers and Packages of MetadataWarwick, not
MARC
- modular
- overlapping
- extensible
- community-based
- designed for a networked world to aid commonality
btwn communities while still providing full
functionality within each community
42DC Qualifiers
- allows one community to express important nuances
and qualifications, while still making the basic
importance available to communities with simple
needs - our community can reflect alternate title,
transliterated title, and main title, yet they
will all be found under a simple Web search under
title
43Crosswalks
- mapping btwn differing metadata structures
- eliminate the need for monolithic, universally
adopted standards - focus on flexibility and interoperatiblity
- RDF-based metadata registries
44Crosswalk Example
45Do we still need OPACs?
- Why repeat almost identical bibliographic
descriptions in each local system? - Why not store only local information locally, and
link to bibliographic descriptions stored in the
major utilities? - Could our acquisition systems for monographs
begin to use the acquisition systems imposed on
us by our parent organizations (like those for
supplies)?
46Creating WorkingDigital Libraries-
- Moving from Digital Collections to Digital
Libraries - Interoperability
- Importance of Standards
- Longevity
- Best Practices for Managing Digital Projects
- Some Wild Musings
47Creating Working Digital Libraries
- Howard Besser
- UCLA School of Education Information
- http//www.getty.edu/gri/standard/intrometadata/
- http//www.ifla.org/II/metadata.htm
- http//sunsite.Berkeley.EDU/Imaging/Databases/sta
ndards - http//sunsite.Berkeley.EDU/moa2/
- http//sunsite.Berkeley.EDU/Longevity/
- http//purl.oclc.org/metadata/dublin_core/
- http//www.gseis.ucla.edu/howard/image-meta.html
- http//www.gseis.ucla.edu/howard/Metadata/UC-May0
0/ - http//sunsite.berkeley.edu/Metadata/sp2000.html
- http//www.gseis.ucla.edu/howard/