Defining Collections in Distributed Digital Libraries - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Defining Collections in Distributed Digital Libraries

Description:

Difficult to impose the organizational structures necessary, ensuring ... and its location in a repository nor its collocation with other member objects. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 29
Provided by: ccNct
Category:

less

Transcript and Presenter's Notes

Title: Defining Collections in Distributed Digital Libraries


1
Defining Collections in Distributed Digital
Libraries
  • Carl Logoze, David FieldingD-Lib
    MagazineNovember 1998

2
Introduction
3
Order and Chaos in Global Information Space (I)
  • Characteristics of the Web
  • Universality (???)
  • Quantity without quality
  • Uniformity (???)
  • Specialized and domain-specific tools,
    technologies, and guidance are difficult or
    impossible to find
  • Decentralization
  • Difficult to impose the organizational structures
    necessary, ensuring information integrity
    (reliability and accessibility) security and
    privacy for content and users, and survivability
    (preservation) of information
  • Web?????????????????????????Content??????????
  • ???????????

4
Order and Chaos in Global Information Space (II)
  • ???????????????????????
  • Selection, organization, and specialization
    should be permitted without being imposed
    (???????)
  • Mechanisms for selection, organization, and
    specialization should be flexible, extensible,
    and independent of other characteristics of DL,
    such as how content and services are physically
    distributed or how and by whom the components of
    the DLs are managed (?DL??????????????????????)
  • ??????????????

5
Role of Collection Developmentin Traditional
Library
  • Selection - defining resources belonging to a
    collection
  • Physical containment or demarcation in
    traditional library
  • Specialization - resource discovery aids or
    cataloging techniques tailored to the collection
    or the audience
  • Administration - management and preservation
    policies that conform to the collection
    characteristics.
  • ?????????????

6
Collection Service Architecture
  • Defines collection membership through criteria
  • Subject classification, language, or genre
  • Allow automatic and/or dynamic selection of
    resources from a set of distributed information
    sources, based on either metadata about those
    resources or the content within the resources
  • Provide query routing and query pre-processing
    and post-processing facilities to facilitates
    resource discovery
  • Tailored to the characteristics of the collection
  • Act as a distributed metadata repository,
    storing, disseminating, and processing data
    relevant to the management and administration of
    objects in the collection

7
Relationship with Overall DL Systems
  • The collection service is one of several services
    in the component-based digital library
    architecture
  • A repository service for storing digital content
  • A naming service for registering and resolving
    unique names for objects
  • An index service that processes queries for
    content discovery

8
Why Separate Collection Service with Others?
  • ?????????????????Collection??
  • ?????????(DO)???????Collection
  • Defined by multiple collection services under
    separate administration
  • Collection membership and administration is
    distinct from definitions of query capabilities
    (defined by index services)
  • Collection definition is relatively lightweight
  • The logically distinct collection service defined
    in this paper is fundamentally a simple query
    routing mechanism that requires no access to the
    content of individual digital objects
  • Collection??????????DO??

9
Component-Based Distributed Digital Libraries
10
Cornell Digital Library Research Group (CDLRG)
  • Deployment of distributed digital libraries
  • Open Architecture - the functionality of a DL
    system is available in the form of distinct
    function units (services), each of which has
    operational semantics exposed through an open
    protocol.
  • Federation - DLs are managed aggregations of
    these functional units (or services) and the
    resources to which they provide access. New
    functionality can be added to these systems
    through the implementation of value-added
    services, which interact with existing services
    using established protocols.
  • Distribution - The components (and content) of a
    digital library may be spread over the global
    Internet, but are presented to the user as a
    single uniform system.
  • NCSTRL, Dienst, CRADDL

11
Interaction of Core Digital Library Services
12
Core Services of DL (I)
  • Content digital objects (DO)
  • Byte streams content-specific behaviors
    secure access (rights management mechanisms)
  • Repository service deposit, storage, and access
    to DOs
  • A DO is considered contained within a repository
    if the URN of that object resolves to the
    respective repository
  • DOs are identified by globally-unique names
    registered with the naming service. The naming
    service is able to resolve a URN to one or more
    physical locations.

13
Core Services of DL (II)
  • Index service discovery of DOs via query.
  • DOs indexed by an index service may be
    distributed
  • Queries return results sets that contain the URNs
    of digital objects that match the query (and
    possibly other meta-information).
  • Collection service aggregation of access to sets
    of DOs and services into meaningful collections.
  • User interface services or gateways
    human-centered entry points to the functionality
    of the digital library

14
Modular Design -- Customized DLs
??????????????(Independent)
  • Creators of digital resources
  • Repository managers may adopt policies that
    implicitly select the digital objects that can be
    deposited into the repository.
  • Administrators of index servers select the
    digital objects that are indexed in that server.
  • Collection services apply broader (not digital
    object specific) selection mechanisms against the
    query interfaces of one or more index services.
  • User interface gateways select one or more
    collections that users can search over and access
    objects within.

15
Defining a Collection in a Distributed Digital
Library
16
Collection Definition (I)
  • A collection is logically defined as a set of
    criteria for selecting resources from the broader
    information space.
  • Static criteria
  • URNs or ISBNs
  • The set of resources that are stored in a
    specific repository
  • Dynamic criteria
  • Dublin Core subject element with the value
    "computer science".
  • Advanced Natural Language techniques

17
Collection Definition (II)
  • A collection is operational defined in terms of
    resource discovery
  • The resources in the collection are those that
    can be directly found using those resource
    discovery tools.
  • Collection-specific resource discovery tools have
    the following characteristics
  • Direct queries only to those index servers that
    can return objects in the collection (query
    routing).
  • Employ filtering techniques to select only those
    objects in the respective index servers that fit
    the collection criteria
  • Employ resource discovery aids specialized for
    the collection
  • Domain-specific stop-word lists, stemming
    algorithms, thesauri

18
Defining Collections -- Resource Discovery
19
Advantages
  • Location and Administrative Independence
  • No linkage between the membership of a resource
    in a collection and its location in a repository
    nor its collocation with other member objects.
  • Collections can be created, and subsequently shut
    down, on demand resources do not need to be
    moved to physical locations in fact no changes
    need to be made to the objects themselves.
  • Dynamic Membership
  • Extensibility -- offer an opportunity to employ
    more dynamic and contextual criteria as they are
    developed
  • Ex. criteria based analysis of link topology

20
Collection Service Implementations
21
Dienst Collection Service
  • Information to be accessed in the Dienst
    collection service
  • The list of publishing authorities that are part
    of the collection.
  • The network location ex. foo.ncstrl.org port 80
  • Meta information about each of the index servers
  • primary or secondary, last update of the index,
    performance information, etc.
  • Correspondence of index servers to repository
    servers
  • Provides the index servers with information on
    the repository servers from which they should
    download meta information for indexing.

22
???CS?????????????
????Query, ????IS???
23
Limitations of Dienst
  • Collection criteria are hard-wired
  • The Dienst protocol and server implementation
    limits the ability of user interface servers to
    interact with more than one collection (and its
    associated set of sub-collections)
  • The Dienst architecture incorrectly conflates
    (??) the functions of the user interface service
    with query routing.
  • Query routing ??User Interface?
  • Limits our capacity to performing query routing
    that is highly collection specific

24
CRADDL Collection Service
  • Implemented as a set of distributed servers that
    act as a metadata repository for collection
    specific information, and that perform collection
    specific query routing
  • Each collection service maps to a single
    collection in effect, a collection exists and is
    accessible in the digital library infrastructure
    if there is a collection service for it.
  • Collection Service Collection 11

25
Collection Service and User Interface Servers
  • Main Consumers of CS user interface gateways
    (UI)
  • UI gives human-friendly access to one or more
    collections
  • Interaction between UI and CS (through protocol
    requests)
  • The exchange of metadata about the collection
  • A collection description name and free-text
    description
  • Assist UI and users to choose collections for
    queries
  • The elements of the collection hierarchy
  • Query capability and customization information
  • Facilitate the creation of collection-specialized
    query forms by a user interface
  • Submission of query requests and the return of
    corresponding result sets

26
Collection Service
Customized UI
27
Components of CS
  • Central collection server (CCS) -- central point
    of management of a collection.
  • Responsible for the creation and modification of
  • Collection criteria
  • Index server tables - set of index servers used
    for searches
  • Collection metadata - collection description,
    query capabilities
  • One or more collection query routers (CQR)
  • Local, replicated access to collection metadata
    -- reliability
  • Query routing tailored for local conditions
  • Localized query routing and connectivity region
  • ??????UI?????CQR for collection metadata

28
Distributed collection service and connectivity
regions
Write a Comment
User Comments (0)
About PowerShow.com