Open Access to Digital Libraries - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Open Access to Digital Libraries

Description:

Document Publishing. Information Space Organization. User Rights Management ... Document Publishing ... Publishing. Scholnet is also able to import documents ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 49
Provided by: Pasqual85
Category:

less

Transcript and Presenter's Notes

Title: Open Access to Digital Libraries


1
Open Access to Digital Libraries
  • A second generation DL Scholnet
  • Pasquale Pagano
  • pagano_at_iei.pi.cnr.it
  • ISTI-CNR

2
Scholnet 2nd generation DL
  • Scholnet is a digital library infrastructure for
    supporting communication and collaboration among
    networked communities.
  • In addition to the provision of standard digital
    library capabilities for information acquisition,
    description, archiving, access, cross-language
    search and retrieval, and dissemination, it will
    provide support for
  • non-textual data types
  • hypermedia annotation
  • personalized information dissemination.

3
Document Model and Type
  • Without documents there would be no DLs!
  • For efficiency purposes, especially when handling
    millions of documents and gigabytes of space,
    optimised memorisation is crucial.
  • These problems shift, and sometimes partially
    disappear, when one considers the entire life and
    social context of a document or when one focuses
    the attention on other topics related to the
    documents management. This may be
  • Document Model
  • Document Type
  • Multilingual Documents
  • Multimedia Documents
  • Structured Documents
  • Distributed Collections

4
Digital Library Document Model
  • A DL document model needs to have a number of
    features that allow for storage of content in
    multiple forms (e.g., text, image, video, audio)
    and dissemination of that content in multiple
    variations.
  • The features that the document model must support
    are
  • Globally unique naming of documents using
    Handles, a URN scheme with names that consists of
    a unique naming authority and an identity string
    that is unique within authority
  • Multiple versions
  • Logical structuring of documents

DoMDL
5
Digital Library Document Model
  • Logical structuring of documents
  • Multiple types of descriptive metadata that can
    be associated with the document itself or
    sub-parts (e.g., individual chapters, pages,
    etc.) of the document.
  • Multiple views of a document, which are alternate
    expressions or structural breakdowns of the
    content encapsulated in the document. For
    example, a document representing a musical work
    may have an audio view, a textual view (the
    lyrics), and video view (of someone performing
    the music). Each of these views can have separate
    logical structuring (and metadata).
  • Hierarchical sub-structuring of each view, such
    as the sections, chapters, and pages in a book.
  • Multiple content-types (i.e., MIME types of
    output streams), binders (i.e., mechanisms for
    encapsulating multiple logical objects, such as
    scanned images, into a single output stream), and
    compression schemes.

DoMDL
6
Example 1
  • Postscript, Adobe Acrobat, MSWord
  • Postscript, Adobe Acrobat
  • Postscript, MSWord
  • Postscript, Adobe Acrobat
  • ..
  • Book
  • Postscript, MSWord
  • Chapter 1
  • Paragraph 1
  • Chapter 2
  • Paragraph 1
  • Paragraph 2
  • ..

DoMDL
7
Example 2
  • Article
  • English Version
  • Postscript, Adobe Acrobat
  • Chapter 1
  • Chapter 2
  • Italian Version
  • Postscript
  • Capitolo 1
  • .
  • Capitolo 2

DoMDL
8
Example 3
  • Article
  • Text
  • Postscript, Adobe Acrobat, MSWord
  • Part 1 Introduction
  • Postscript, Adobe Acrobat
  • Part 2 Content
  • Composite
  • SMIL
  • Video
  • Avi, Mpeg
  • Part 1 Introduction
  • Mpeg
  • Part 2 Content
  • .
  • Slide
  • Slide1
  • Gif

Metadata
Metadata
DoMDL
Metadata
9
Digital Library Document Type
  • One social issue with documents relates to
    culture and language. Whereas there are many
    causes of the movement towards English as a basis
    for global scientific and technical interchange,
    DLs may actually lead to an increase in
    availability of non-English content.
  • At the foundation, there are issues of
  • character encoding

MULTILINGUAL
  • searching multilingual collections

Unicode provides a single 16-bit coding scheme
suitable for all natural languages. In particular
UTF-8 (one of the UCS - Universal Character Set -
Transformation Formats) is a multibyte encoding
of Unicode that allows Unicode to be used in a
convenient and backwards compatible way in
environments that, like Unix, were designed
entirely around ASCII.
10
Digital Library Document Type
  • Documents are made up of one or more streams,
    often with a structure imposed (e.g., a raster
    organisation of a pixel stream represents a
    colour image).
  • Multimedia documents' streams usually must be
    synchronised in some way, therefore a special
    standard for handling this needs to be adopted.
  • At the foundation, there are issues of
  • Critical storage organization
  • Streaming access

MULTIMEDIA
11
Digital Library Document Type
  • While multimedia depends on the stream
    abstraction, structured documents require both
    the abstractions of streams and structures.
  • DLs typically may include both hierarchy
    documents and meta-data describing them,
    annotations, and linking to other related
    documents. Therefore special techniques for
    searching on structure need to be adopted.
  • An alternative approach shifts the burden of
    handling structure in documents to the user, by
    allowing multiple layers of filters and tools
    able to blind the document structure complexity.

STRUCTURED
12
Digital Library Annotation
  • Supports authorized users to create and share
    annotations
  • integrates rating, comments, annotation and
    reference linking features into the digital
    library infrastructure
  • stores annotations on documents and makes them
    available to authorised users

13
Digital Library Annotation
Document 2
Document 1
14
Digital Library Annotation
15
Digital Library Annotation
16
Digital Library Services
  • The Scholnet Digital Library supports
  • Information Acquisition and Description
  • Virtual Information Space Presentation
  • Cross-language Search and Retrieval
  • Information Space Browsing
  • Document Annotation
  • Annotation Discovery
  • Document Access
  • Automatic Personalized Dissemination

17
Digital Library Services
  • The Scholnet Digital Library supports
  • Document Archiving
  • Document Publishing
  • Information Space Organization
  • User Rights Management

18
Information Acquisition and Description
  • Scholnet supports the request for submission,
    withdrawal, and replacement of documents
    submitted from authorized users.
  • Scholnet offers a Graphical User Interface (GUI)
    to simplify the description and document
    structure management phases.
  • The GUI is generic with respect to the digital
    library document space selected
  • Metadata formats
  • Structure of the document

19
Virtual Information Space Presentation
  • The Scholnet Information Space is organized in
    collections.
  • A collection is a virtual aggregation of
    documents, selected by a collection administrator
    user, identified by a name and a description.
  • To search, browse, enrich any collection the
    appropriate set of services is presented to the
    user.

20
Virtual Information Space Presentation
  • Collections allow to personalize the documents
    organization in order to satisfy the different
    communities needs.

21
Cross-language Search and Retrieval
  • Scholnet supports different kinds of search for
    different user needs
  • Simple Search
  • Fielded Search
  • Search Across
  • To improve the quality of the results the
    Relevance Feedback is used as a solution for
    query modification.
  • All search types return structured response that
    contains a list in which each element of the list
    provides information on and a link to a document

22
Cross-language Search and Retrieval Simple Search
  • The simple search allows the user to search the
    documents in the selected collection(s) using the
    specified metadata formats without restricting
    his/her search criteria to specific fields.
  • This method performs a full text search and
    therefore neither numeric ranges nor controlled
    vocabularies can be applied.

23
Cross-language Search and Retrieval Fielded
Search
  • The fielded search allows the user to restrict
    his/her search criteria to specific fields of the
    selected metadata formats, and to combine these
    fields using the appropriate belief and not
    belief operator.
  • This method allows to use date and numeric
    ranges, as well as right types and controlled
    vocabulary.

24
Cross-language Search and Retrieval Search Across
  • The user has to specify a value for an attribute
    that supports the cross-querying capability, and
    the characteristic of the subset of the results
    that he/she wants to receive.
  • For example, let us suppose that the subject
    attribute supports the cross-querying capability
    with respect to the language. This means that the
    user can specify a subject in Italian and
    retrieve all documents containing related
    subjects, even if these documents are described
    in a different language.

25
Cross-language Search and Retrieval Search Across
  • The simplest approach is to locate words or
    phrases in dictionaries, and to use the
    translated terms to search in collections in
    other language.
  • However other sophisticated approaches can be
    used
  • Controlled vocabulary searching using a
    multilingual thesaurus
  • Users can browse the version of the thesaurus in
    their preferred language and extract relevant
    terms to use in queries. The English version of
    the thesaurus serves as the interlingua and
    translation medium.
  • Free text searching
  • The user first launches a query in his preferred
    language. He/she indicates the most relevant
    results retrieved. In the automatic way the
    system extracts the most relevant terms from
    these documents and these will be used to query
    documents in the rest of the collection.

26
Cross-language Search and Retrieval Relevance
Feedback
  • The relevance feedback is used as a solution for
    query modification in order to improve the
    quality of the results.
  • The user has to specify the relevant retrieved as
    response to a search. The service uses this
    information to modify the original query. The new
    query is performed and the results returned to
    the user.

27
Cross-language Search and Retrieval
  • The search methods return a list of documents in
    which each element of the list contains the
    fields specified in one of the available
    result-set formats.
  • The result-set formats are chosen by the
    administrator and could be changed dynamically.
  • Header
  • handle of and link to a document
  • Short
  • handle of and link to a document
  • title, author, publication date
  • Long
  • handle of and link to a document
  • title, author, publication date
  • subject, description

28
Information Space Browsing
  • The Browsing Service allows to browse the content
    of a collection (virtual set of documents) using
    a specific field of the selected metadata
    formats.
  • It returns structured response that contains a
    list in which each element of the list contains
    the fields specified in one of the available
    result-set formats. and a link to a document.
  • The result-set formats are chosen by the
    administrator and can be changed dynamically.

29
Document Annotation
  • Authorized users can annotate a document or its
    parts specifying descriptive information and
    access rights to the annotation, plus information
    about one or more of the following sections
  • Evaluate
  • Criticize
  • Comment
  • Is related with

30
Annotation Discovery
  • Annotations can be discovered
  • Accessing the document to which they are related
  • Searching in the annotation space for
  • Author
  • Description
  • Comment
  • Relation
  • Evaluation

31
Document Access
  • A document is a digital object that
  • may contain other digital objects
  • is composed by entities
  • An entity is the minimal logic component that can
    be accessed by authorized users
  • A description can be associated to each digital
    object or entity

32
Document Access
33
Document Access
34
Document Access
35
Automatic Personalized Dissemination
  • Automatic notification to the user when a new
    document matching the users information needs
    (user profile) is published in the digital
    library
  • The user personalizes his/her profile with
    interesting topics
  • topic name
  • keywords
  • categories
  • notification flag

36
Digital Library Services
  • The Scholnet Digital Library supports
  • Information Acquisition and Description
  • Virtual Information Space Presentation
  • Cross-language Search and Retrieval
  • Information Space Browsing
  • Document Annotation
  • Annotation Discovery
  • Document Access
  • Automatic Personalized Dissemination

37
Digital Library Services
  • The Scholnet Digital Library supports
  • Document Archiving
  • Document Publishing
  • Information Space Organization
  • User Rights Management

38
Document Archiving
  • Scholnet supports the archiving of millions of
    documents geographically distributed over
    thousands of different publishers
  • Scholnet supports the automatic formats
    trans-coding in accordance with the publisher and
    the document author choices
  • Scholnet supports the automatic compression and
    de-compression of any manifestation in accordance
    with the publisher and the document author
    choices

39
Document Publishing
  • The Publisher Administrator can approve, review,
    reject any submission, withdrawal, editing of
    document metadata and document object request
    submitted by a registered user
  • The incoming request are stored in a temporary,
    and not public available, area.

40
Document Publishing
  • Scholnet is also able to import documents from
  • Document repositories stored on file system.
  • Simple sets of XML files.
  • Structured sets of XML description and content
    manifestations.

41
Information Space Organization
  • The Scholnet Information Space is organized in
    collections.
  • A collection is a virtual aggregation of
    documents, selected by a collection administrator
    user, identified by a name and a description.
  • Any collection must supply the information
    necessary to manage this aggregation of virtual
    documents. This information is used by the other
    services in order to handle the objects in the
    collection objects.

42
Information Space Organization
  • The Scholnet Collection Service GUI helps the
    user to
  • Create new collection
  • Refine an exiting collection
  • In all cases the following steps are to be
    performed
  • Selection of the publishing institution
  • Selection of the query language
  • Specification of the condition
  • Specification of the descriptive collection
    metadata (name, description, subject).

43
User Rights Management
  • The Digital Library administrator is established
    at service start-up time and can assign special
    rights to the registered users in order to give
    them the appropriate administration rights on
    those services of the architecture that need an
    administrator.
  • In the current version of the Scholnet system,
    these services are the Registry, the Repository,
    the Library Management, and the Collection
    Service.

44
User Rights Management
  • A Registry Administrator can assign and remove
    user rights. The DL Administrator is a Registry
    Administrator.
  • A Repository Administrator can assign document
    submission rights in his/her publisher/s to any
    registered user.
  • Such users become submitters and can also edit
    or delete documents.

45
User Rights Management
  • A Library Management Administrator can approve or
    reject any submission, edit or withdraw requests
    submitted by a submitter to his/her publisher.
  • A Collection Service Administrator can create
    collection, and edit or delete his/her own
    collections.

46
User Rights Management
USER
ADMINISTRATOR
Collection
Generic
Library Management
Registered
Submitter
Repository
Registry
47
Scholnet Infrastructure Data
  • Documents
  • Repository Service
  • No of documents managed gt 100 thousands
  • No of files stored gt ½ million

48
Scholnet Infrastructure Metadata
  • Documents
Write a Comment
User Comments (0)
About PowerShow.com