Creating Working Digital Libraries - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Creating Working Digital Libraries

Description:

Besser--LITA Dig Imaging Preconference 7/7/00. 3 ... Besser--LITA Dig Imaging Preconference 7/7/00. 10. Metadata is not just indexing terms ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 48

Provided by: gseis

Category:

more less

Transcript and Presenter's Notes

Title: Creating Working Digital Libraries

1
Creating WorkingDigital Libraries

Howard Besser
UCLA School of Education Information
http//www.gseis.ucla.edu/howard

2
Creating WorkingDigital Libraries-

Moving from Digital Collections to Digital
Libraries
Interoperability
Importance of Standards
Longevity
Best Practices for Managing Digital Projects
Some Wild Musings

3
Moving from Digital Collections to Digital
Libraries

Whats the difference?
Recent history of Library Automation-

4
Developmental Stages

Experiment with methods
Build real operational systems
Build interoperable operational systems

5
Traditional Digital Library Model
6
Ideal Digital Library Model
7
Developmental Stages

Experiment with methods
Build real operational systems
Build interoperable operational systems
For DL Initiatives
For OPACs
For I A Services
For Image Retrieval

8
Key problems were facing

Discovery
Interoperability-
Longevity-

9
For Interoperability Digital Libraries Need
Standards

Descriptive Metadata for consistent description
Discovery Metadata for finding
Administrative Metadata for viewing and
maintaining
Structural Metadata for navigation
... Terms Conditions Metadata for controlling
access...

10
Metadata is not just indexing terms

CBIR attributes used for retrieval on color,
shape, texture, etc.
Structural attributes used for page-turning
Administrative attributes used for managing a
digital work over time
IPR attributes to limit unauthorized use
Identification attributes to determine what
application software is needed to view a
particular digital work
Can be located anywhere

11
Why are Standards and Metadata consensus
important?

Managing digital files over time
Longevity
Interoperability
Veracity
Recording in a consistent manner
Will give vendors incentive to create
applications that support this

12
Why Standards?

Why do we need standards?
To make information universally available to
users
facilitate sharing and interchange of
information
To preserve information (make it safe from
changes in hardware and software)
Standards only work if communities widely accept
them, but theyre necessary for communities to
work together

13
Serious Longevity Problems

What we know from prior widespread digital file
formats
Images separating from their metadata
Inaccessibility of software needed to view an
image
Inability to even decode the file format of an
image

14
Journal Archiving

License, dont own may not be even able to
obtain right to make archival copy
Increasingly no paper back-up at all
Usually we dont have the important redundancy
factor
Stanfords LOCKSS Project (Lots of Copies Keeps
Stuff Safe) and its problems (http//lockss.stanfo
rd.edu)

15
The Short Life of Digital Info Digital Longevity
Problems-

Disappearing Information
The Viewing Problem
The Scrambling Problem
The Inter-relation Problem
The Custodial Problem
The Translation Problem

16
The Viewing Problem

Digital Info requires a whole infrastructure to
view it
Each piece of that infrastructure is changing at
an incredibly rapid rate
How can we ever hope to deal with all the
permutations and combinations

17
The Scrambling ProblemDangers from

Compression to ease storage delivery
Container Architecture to enhance digital commerce

18
The Inter-relation Problem

-Info is increasingly inter-related to other info
-How do we make our own Info persist when it
points to and integrates with Info owned by
others?
-What is the boundary of a set of information (or
even of a digital object)?

19
The Custodial Problem

How do we decide what to save?
Who should save it?
How should they save it?
-methods for later access emulation, migration,
etc.
-issues of authenticity and evidence

20
The Translation Problem

Content translated into new delivery devices
changes meaning
-A photo vs. a painting
-If Info is produced originally in digital form
in one encoded format, will it be the same when
translated into another format?
Behaviors

21
Pieces of the Solution (1/2)

-We need to insist upon clearly readable
standardized ways for digital objects to
self-identify their formats
-We should discourage scrambling
-We need to better understand information
inter-relates to other Info, and what constitutes
boundaries of Info objects

22
Pieces of the Solution (2/2)

-People and organizations wishing to make
information persist need guidelines of how to go
about doing it
-We need to better understand how translating
from one storage or display format to another
affects the meaning of a work
-We need to save the behaviors of a digital
object, not just its contents

23
Metadata can be the first line of defense

Can tell you
where the file is (if you cant find the file)
where more info about the file is (if you have
the file but most other metadata has become
separated)
what the file format is
what the compression scheme is
what application program and version is needed
for the file

24
Groups Working onthe Big Longevity
Problemhttp//sunsite.Berkeley.EDU/Imaging/Databa
ses/Longevity/

CPA Task Force
Getty Time Bits Conference follow-up
NEDLIB, CURL, Michigan
Internet Archive
Long Now

25
Migration/Refreshing

Impact on evidential value

26
Best Practices for Managing Digital Projects-

Who will your users be?
Best Practices Guidelines
Workflow and Management Issues

27
Why are you Managing this Information?

Organizational mission type
Users
Uses

28
Scanning Best Practices

Think about users (and potential users), uses,
and type of material/collection
Scan at the highest quality that does not exceed
the likely potential users/uses/material
Do not let todays delivery limitations influence
your scanning file sizes understand the
difference between digital masters and derivative
files used for delivery
Many documents which appear to be bitonal
actually are better represented with greyscale
scans

Include color bar and ruler in the scan
Use objective measurements to determine scanner
settings (do NOT attempt to make the image good
on your particular monitor or use image
processing to color correct)
Dont use lossy compression
Store in a common (standardized) file format
Capture as much metadata as is reasonably
possible (including metadata about the scanning
process itself)

29
Why Scale is important
30
Digital Object Behaviors

Book example

31
Metadata Standards(from MOA2)

Administrative Metadata
for enhancing resource management
Structural Metadata
for reflecting internal hierarchies and
relationships btwn parts
Raw/Seared/Cooked

32
Workflow and Management Issues-

Managing multiple image files
Persistent Identification
Making your works accessible throughout the Net

33
The number of variant forms of a work can be
enormous

different views of the same object
different scans of the same photo
different resolutions
different compression schemes
different compression ratios
different file storage formats
different details of the same image
...

34
Image Families
35
Identification/Provenance

how to deal with different versions (browse,
hi-res, medium res) derived from the same scan or
different encoding schemes (TIFF, PICT, JFIF)
Vocabulary Standards to express this
VRA Surrogate Categories
CIMI's "Image Elements

36
Persistent IDs--the Problem

Need to separate work ID from work location
URNs probably wont be ready until 2003
Becomes a business process issue when one
organization maintains the resource and another
organization references it (ie. licensed from
vendors or managed by separate administrative
structures)

37
More Persistent IDs--the Approach for today

PURLs
Handles
HTTP redirects
And worry about costs now and conversion costs
when URNs become feasible

38
Data Set ManagementMore issues with referencing
IDs

References for mirror sites
References for back-up sites when main site is
down or bottle-necked
References for off-site copies and archival copies

39
Making your works accessible throughout the Net

The DLF/Mellon meeting
An administrative and political issue as much as
a a technical one

40
Some Wild Musings-

Movement towards packages and away from MARC
The disappearance of OPACs

41
Containers and Packages of MetadataWarwick, not
MARC

modular
overlapping
extensible
community-based
designed for a networked world to aid commonality
btwn communities while still providing full
functionality within each community

42
DC Qualifiers

allows one community to express important nuances
and qualifications, while still making the basic
importance available to communities with simple
needs
our community can reflect alternate title,
transliterated title, and main title, yet they
will all be found under a simple Web search under
title

43
Crosswalks

mapping btwn differing metadata structures
eliminate the need for monolithic, universally
adopted standards
focus on flexibility and interoperatiblity
RDF-based metadata registries

44
Crosswalk Example
45
Do we still need OPACs?

Why repeat almost identical bibliographic
descriptions in each local system?
Why not store only local information locally, and
link to bibliographic descriptions stored in the
major utilities?
Could our acquisition systems for monographs
begin to use the acquisition systems imposed on
us by our parent organizations (like those for
supplies)?

46
Creating WorkingDigital Libraries-