Title: Breaking down the walls
1Breaking down the walls
- Moving libraries from collectors to portals
Carl LagozeCornell Universitylagoze_at_cs.cornell.e
du
2The Library should selectively adopt the portal
model for targeted program areas. By creating
links from the Librarys Web site, this approach
would make available the ever-increasing body of
research materials distributed across the
Internet. The Library would be responsible for
carefully selecting and arranging for access to
licensed commercial resources for its users, but
it would not house local copies of materials or
assume responsibility for long-term
preservation. LC21 Digital Strategy for the
Library of Congresspage 5
3LC21 Digital Strategy for the Library of
Congresspage 5
4Towards a Virtual Control Zone
Some of the most fundamental aspects of library
operations entail the existence of a border,
across which objects of information are
transferred and maintained. Such a parameter,
demarcating a single, distributed digital library
(the "control zone"), needs to be created and
managed by the academic library community at the
earliest opportunity. Ross AtkinsonLibrary
Quarterly, 1996
5Why distributed collections?
- Scale of the Web
- Prevalence of new publishing models and agents
- Increasing complexity of licensing and access
management - Dynamic nature of content
6Towards Hybrid Portals
- Traditional portal (e.g., Yahoo!)
- linkage without responsibility
- Hybrid Portal
- assertion of (some semblance) of curatorial role
over linked objects
7New models have cultural/organizational
ramifications
- Performance and ranking metrics "bigger is
better" - Levels of confidence
- Trust
8that can be assisted by new technical foundations
- Digital object architectures
- that enable aggregating and customizing content
for local access and management - Metadata frameworks
- that model changes of objects and their
management over time - OAI Harvesting Protocol
- for exchange of structured information
- Preservation models
- that enable non-cooperative and cooperative
offsite monitoring
9Digital Object Architecturesaggregating
localizing distributed content
- Acknowledgements
- Naomi Dushay
- Sandy Payette
- Thorton Staples (U. Va.)
- Ross Wayland (U. Va.)
10From Mediators to Value-Added Surrogates
- Wiederhold mediators between raw data and
end-user applications for integration and
transformation - Paepcke mediators as foundation for digital
library interoperability - Payette and Lagoze mediators (V-A surrogates)
to aggregate and create a localized service layer
for distributed resources
11FEDORA Digital Object Model
12Establishing a Virtual Control Zone
13V-A Surrogate Applications
- Access management
- Shared responsibility among trusted partners
- Enhanced and customized functionality
- Examples reference linking, format translation,
special needs - Preservation
- Monitoring "significant" events and acting on them
14- DigitalObject A
- View Slides
- View Video
- View synchronized presentation using applet
Context Broker A
15Context Broker A
16Where we are now
- Ongoing FEDORA reference prototype
- http//www.cs.cornell.edu/cdlrg/FEDORA.html
- Policy enforcement research
- Content mediation
- Proposed joint deployment with University of
Virginia - Open source scalable implementation of FEDORA
architecture - Testing and deployment with a number of research
library partners.
17Event-Aware Metadata Frameworksdescribing
changes over time
- Acknowledgements
- Dan Brickley (ILRT, Bristol)
- Martin Doer (FORTH, Crete)
- Jane Hunter (DSTC, Brisbane)
18Distributed ContentThe Metadata Challenge
- From fixed, contained physical artifacts to
fluid, distributed digital objects - Need for basis of trust and authenticity in
network environment - Decentralization and specialization of resource
description and need for mapping formalisms
19Multi-entity nature of object description
20Attribute/Value approaches to metadata
The playwright of Hamlet was Shakespeare
Hamlet has a creator
Shakespeare
21run into problems for richer descriptions
The playwright of Hamlet was Shakespeare,who was
born in Stratford
Hamlet has a creator
Stratford
birthplace
22because of their failure to model entity
distinctions
Shakespeare
name
R1
R2
creator
birthplace
title
Stratford
Hamlet
23ABC/Harmony Event-aware metadata model
- Recognizing inherent lifecycle aspects of
description (esp. of digital content) - Modeling incorporates time (events and
situations) as first-class objects - Supplies clear attachment points for agents,
roles, occurrent properties - Resource description as a story-telling activity
24Resource-centric Metadata
Title Anna Karenina
Author Leo Tolstoy
Illustrator Orest Vereisky
Translator Margaret Wettlin
Date Created 1877
Date Translated 1978
Description Adultery Depression
Birthplace Moscow
Birthdate 1828
25(No Transcript)
26Queries over descriptive graphs
Rudolf Squish http//swordfish.rdfweb.org/rdfque
ry
List details of events where Lagoze is a
participating agent SELECT ?title, ?type, ?time,
?place, ?name FROM http//ilrt.org/discovery/h
armony/oai.rdf WHERE (webtype ?event
abcEvent) (abccontext ?event ?context)
.. AND ?name lagoze USING web FOR
http//www.w3.org/1999/02/22-rdf-syntax-ns
27Where we are now
- Stabilization of model
- Collaboration with museum/CIDOC community for
joint modeling principles - Plans
- RDF api for model elements
- UI for metadata creation
- Query engine testing
28Open Archives Initiativefacilitating exchange
of structured information
- Acknowledgements
- Herbert Van de Sompel
- OAI Steering and Technical Committees
29Open Archives Initiative
- Testing the hypotheses
- exposing metadata in various forms will
facilitate creation of value-added services - key to deployable DL infrastructure is low-entry
cost - Individual communities can/will customize common
infrastructure
30Where weve come from
- Late 1999 Santa Fe UPS meeting increase impact
of eprint initiatives through federation - Santa Fe Convention metadata harvesting among
eprint archives - Increasing interest outside the eprint community
- Research libraries
- Museums
- Publishers
31Progress over the past year
- OAI workshops at US and EC DL conferences
- Organizational stability
- Executive committee and steering committee
- September 2000 technical meeting
- Reframe and rethink technical solutions for
broader domain - Extensive testing and refinement of technical
infrastructure
32Technical Infrastructure key technical features
- Deploy now technology 80/20 rule
- Two-party model providers and consumers
- Simple HTTP encoding
- XML schema for some degree of protocol
conformance - Extensibility
- Multiple item-level metadata
- Collection level metadata
33OAI protocol requests
service provider
data provider
- Supporting protocol requests
- Identify
- ListMetadataFormats
- ListSets
- Harvesting protocol requests
- ListRecords
- ListIdentifiers
- GetRecord
34Where we are now
- Stable 1.0 protocol specification
- Hopefully, self-documenting infrastructure
- http//www.openarchives.org
- 27 registered data providers
- Increasing number of tools available
- Research initiatives
- NSF-funded NSDL
- EC-funded Cyclades
- Andrew W. Mellon service proposals
- EC-funded community building
35Where do we go from here
- Controlling the stampede
- Maintaining the organizational model lean and
mean while encouraging community-specific
exploitation - Encouraging testing especially through deployment
and especially service development - Encouraging metadata diversification this isnt
just above Dublin Core!!! - Preservation
- Document access
- Authentication
36OAI Metadata Research
- Dictionary of metadata terms (Tom Baker)
- Mandating usage rules has only limited
effectiveness - Compiling usage of those terms is vital to
machine understanding and interoperability - Provide context heuristics for search engine and
indexer processing - Large-scale deployment of OAI and web crawling
enables (partial) automation of usage compilation
(e.g., data mining of term usage)
37Preservation Modelsmonitoring threats to
distributed content
- Acknowledgements
- Bill Arms
- Peter Botticelli (CUL)
- Anne Kenney (CUL)
38Preservation Remote Control
- Organization Issues
- assured preservation may not be possible
without direct custodial control. - what are the levels of acceptability and for
which types of resources? - Technical Issues
- what are the technologies for remote control at
the various levels of assurance deemed acceptable
by the library? - what is the probability of a reasonable level of
preservation in the context of such technologies?
39Cost vs. Functionality
40Leveraging Current Work
- Event-based metadata
- Metadata harvesting
- Longevity and threats to digital resources
41Level 0 Experiment
42Level 1 Experiment
43One of Six Core Integration Demonstration
Projects for the NSDL
44How Big might the NSDL be?
The NSDL aims to be comprehensive -- all
branches of science, all levels of education,
very broadly defined. Five year targets
1,000,000 different users 10,000,000 digital
objects 100,000 independent sites
Requires low-cost, scalable, technology
automated collection building and maintenance
45Levels of InteroperabilityMetadata Harvesting
Agreements on simple protocol and metadata
standard(s) Example Metadata harvesting
protocol of the Open Archives Initiative
(MHP) Moderate-quality services Low cost
of entry to participating sites Moderately large
numbers of loosely collaborating sites Promising
but still an emerging approach
46Levels of InteroperabilityGathering
Robots gather collections automatically with no
participation from individual sites Examples Web
search services (e.g., Google) CiteSeer (a.k.a.
ResearchIndex) Restricted but useful services
Zero cost of entry to gathered sites Very
large numbers of independent sites Only suitable
for open access collections