Title: Digital Library Architecture: A Service-Based Approach
1Digital Library ArchitectureA Service-Based
Approach
Mo i Rana, Norway November 10, 1998
Sandra Payette Department of Computer
Science Cornell University payette_at_cs.cornell.edu
http//www2.cs.cornell.edu/payette/presentations/D
L-architecture.ppt
2Overview
- Why talk about DL architecture?
- Digital Libraries - the architectural perspective
- Review of service-based architecture
- NCSTRL - a working example
- Dienst - existing service-oriented architecture
- Cornell next generation (component-oriented)
- Conclusion
3Why Talk about Digital Library Architecture?
- Web alone is not a digital library
- Commercial packages limited
- limited flexibility
- standards issues
- network-enabled applications not DL architecture
- Must position for broader DL opportunities
4Web by itself not a DL Architecture
- Documents - Files, CGI, MIME-Types
- Naming - URLs
- Document Servers - HTTP servers
- Resource Discovery - web crawlers
- Collections - web pages, ad-hoc
- IP - Access Control List, passwords, ad-hoc
5WWW Infrastructure Evolving
- Resource Description Framework (RDF)
- will allow rich metadata semantics for documents
- http//www.w3.org/RDF/
- Extensible Markup Language (XML)
- will allow highly structured documents and rich
linking (relationship) capabilities - http//www.w3.org/XML/
- Uniform Resource Names (URNs)
- will allow for persistent, globally unique
identifiers
6But still need Digital Library Architecture
- Richer document model - digital objects
- Persistent, unique naming - URNs
- Well-defined digital library services
- Better facilities for resource discovery
- Flexible definition of collections
- Management of distributed content services
- Rights management for intellectual property
7Digital Library Interoperability
8Digital Library ArchitectureKey Principles
- Open Architecture
- functionality partitioned into set of
well-defined services - services accessible via well-defined protocol
- Modularization
- promotes interoperability
- scalable to different clientele (research
library, informal web) - Federation
- enable aggregations into logical collections
- Distribution
- of content (collections) and services
- of administration and management of DL
9Component-Ware Digital Libraries
Digital Objects
10NCSTRL A Working Example
A Globally Distributed Digital Library
120 Institutions in US, Europe, and Asia
11NCSTRL Participants collections federated
- 120 institutions
- Universities/labs - research reports
- European Research Consortium for Informatics and
Mathematics (ERCIM) - Los Alamos (Physics pre-prints, ACM )
- D-Lib Magazine
- 40 independent servers
12Federation of Collections
13Documents in Distributed Repositories
14Multi-Format Document Model
15NCSTRLReal-world testbed for ...
- modular system based on a standard open
architecture - study of hard, real-world problems policy
issues, quality of service, federation of
publishers - creation of a self-sustaining international
federated digital collection
16Dienst NCSTRL technical base
- Implements a service-based architecture for
distributed digital libraries - Protocol and reference implementation
- Network of services
- WWW browser access
- Uniform search over distributed indexes
- Access to documents in distributed repositories
- Access to multi-formatted documents
17DienstService-Based Architecture
- Document model
- Naming service (CNRIs Handle System)
- Repository service
- Indexer service
- Collection service
- User Interface service
18Dienst Document Model
19(No Transcript)
20Dienst Document Protocol
- Documents addressable through their URNs
- Document service requests
- get document metadata
- get document formats
- get document in format
- get document partition (page) in format
21Dienst 5.0 Document Protocol
- More complex document model
- versions
- hierarchical part specification
- binders (multi-part documents)
- Structure service request
- Reveal, in XML, full or collapsed structure of a
document - e.g., chapters, sections, figures, etc.
- Describe multiple views of a document
- e.g., bibliography, content, thumbnails
22Dienst Core Services
WWW browser
Dienst User Interface
23Dienst ProtocolBuilding Gateways to
non-Conforming Sites
24Dienst Collection Service
25Naming Service
- Documents identified by globally unique names
- Names are persistent, permanent
- Registered names resolve to specific location
(URL)
cnri.dlib/april97-payette
Persistent Identifier (e.g., URN)
Naming Authority
Item Name
Location (URL)
http//www.somewebserver.org/somedirectory/somefil
e
26Identifiers Current Initiatives
- IETF Uniform Resource Names (URN)
- specification of URN framework
- requirements for resolution systems
- syntax definition
- Existing Systems
- CNRIs Handle System (NCSTRL uses)
- OCLC PURLs
- DOI Initiative
27Looking Ahead Current Research at Cornell
- Digital Objects and Repository
- FEDORA
- Joint work in Interoperability with CNRI
- Access Management
- Resource Discovery
- STARTS (Cornell/Stanford collaboration)
- Intelligent Distributed Searching
- Collection Definition
28Digital Object is...
getSection getArticle
getTrack getLabel
getChapter getPage
getFrame getLength
recognizable by what it can do
29What the client sees vs.What the object is
Book
Content-Type Interfaces
MARC
Mechanism
Structure
30FEDORA DigitalObject
31FEDORAExtensibility for Content Types
- Simple, familiar content types
- Complex, compound, dynamic content types
32Resource Discovery
- Meta-Searching for Resource Discovery
- query multiple document sources
- choose best sources to evaluate a query
- evaluate the query at these sources
- merge the query results from these sources
- Stanford Protocol Proposal for Internet Retrieval
and Search (STARTS) - www-db.stanford.edu/gravano/starts.html
- www.cs.cornell.edu/NCSTRL/STARTS/STARTShome.htm
33Distributed Collection Service Definition and
Access
User Interface
Intelligent routing based on regional conditions
Central Collection Server
34Conclusions Design with an Eye Toward the Future
- Know limitations of ad-hoc web development and
commercial packages - Embrace a service-based approach
- modular designs increase flexibility,
extensibility, plug-in/plug-out - well-defined services with protocols to enable
federation and interoperability - can utilize various technologies or commercial
software underneath the service layers - Watch Web developments in XML and RDF
35Further reading
- Lagoze and Payette An Infrastructure for
Open-Architecture Digital Libraries
http//ncstrl.cs.cornell.edu/Dienst/UI/1.0/Displa
y/ncstrl.cornell/TR98-1690 - Davis and Lagoze NCSTRL Design and Deployment
of a Globally Distributed Digital Library, Draft
of submission to IEEE Computer Special Issue on
Digital Libraries, February 1999.http//www2.cs.c
ornell.edu/lagoze/papers/NCSTRL-IEEE3.doc - Payette Persistent Identifiers, RLG DigiNews
http//www.rlg.org/preserv/diginews/diginews22.htm
l - Payette and Lagoze Flexible and Extensible
Digital Object and Repository Architecture
(FEDORA)http//www2.cs.cornell.edu/NCSTRL/CDLRG/F
EDORA.html