Title: Open Access to Digital Libraries
1Open Access to Digital Libraries
- A second generation DL Scholnet
- Pasquale Pagano
- pagano_at_iei.pi.cnr.it
- ISTI-CNR
2Scholnet 2nd generation DL
- Scholnet is a digital library infrastructure for
supporting communication and collaboration among
networked communities. - In addition to the provision of standard digital
library capabilities for information acquisition,
description, archiving, access, cross-language
search and retrieval, and dissemination, it will
provide support for - non-textual data types
- hypermedia annotation
- personalized information dissemination.
3Document Model and Type
- Without documents there would be no DLs!
- For efficiency purposes, especially when handling
millions of documents and gigabytes of space,
optimised memorisation is crucial. - These problems shift, and sometimes partially
disappear, when one considers the entire life and
social context of a document or when one focuses
the attention on other topics related to the
documents management. This may be - Document Model
- Document Type
- Multilingual Documents
- Multimedia Documents
- Structured Documents
- Distributed Collections
4Digital Library Document Model
- A DL document model needs to have a number of
features that allow for storage of content in
multiple forms (e.g., text, image, video, audio)
and dissemination of that content in multiple
variations. - The features that the document model must support
are - Globally unique naming of documents using
Handles, a URN scheme with names that consists of
a unique naming authority and an identity string
that is unique within authority - Multiple versions
- Logical structuring of documents
DoMDL
5Digital Library Document Model
- Logical structuring of documents
- Multiple types of descriptive metadata that can
be associated with the document itself or
sub-parts (e.g., individual chapters, pages,
etc.) of the document. - Multiple views of a document, which are alternate
expressions or structural breakdowns of the
content encapsulated in the document. For
example, a document representing a musical work
may have an audio view, a textual view (the
lyrics), and video view (of someone performing
the music). Each of these views can have separate
logical structuring (and metadata). - Hierarchical sub-structuring of each view, such
as the sections, chapters, and pages in a book. - Multiple content-types (i.e., MIME types of
output streams), binders (i.e., mechanisms for
encapsulating multiple logical objects, such as
scanned images, into a single output stream), and
compression schemes.
DoMDL
6Example 1
- Postscript, Adobe Acrobat, MSWord
- Postscript, Adobe Acrobat
- Postscript, MSWord
- Postscript, Adobe Acrobat
- ..
- Book
- Postscript, MSWord
- Chapter 1
- Paragraph 1
- Chapter 2
- Paragraph 1
- Paragraph 2
- ..
DoMDL
7Example 2
- Article
- English Version
- Postscript, Adobe Acrobat
- Chapter 1
-
- Chapter 2
- Italian Version
- Postscript
- Capitolo 1
- .
- Capitolo 2
DoMDL
8Example 3
- Article
- Text
- Postscript, Adobe Acrobat, MSWord
- Part 1 Introduction
- Postscript, Adobe Acrobat
- Part 2 Content
-
- Composite
- SMIL
- Video
- Avi, Mpeg
- Part 1 Introduction
- Mpeg
- Part 2 Content
- .
- Slide
- Slide1
- Gif
Metadata
Metadata
DoMDL
Metadata
9Digital Library Document Type
- One social issue with documents relates to
culture and language. Whereas there are many
causes of the movement towards English as a basis
for global scientific and technical interchange,
DLs may actually lead to an increase in
availability of non-English content. - At the foundation, there are issues of
- character encoding
MULTILINGUAL
- searching multilingual collections
Unicode provides a single 16-bit coding scheme
suitable for all natural languages. In particular
UTF-8 (one of the UCS - Universal Character Set -
Transformation Formats) is a multibyte encoding
of Unicode that allows Unicode to be used in a
convenient and backwards compatible way in
environments that, like Unix, were designed
entirely around ASCII.
10Digital Library Document Type
- Documents are made up of one or more streams,
often with a structure imposed (e.g., a raster
organisation of a pixel stream represents a
colour image). - Multimedia documents' streams usually must be
synchronised in some way, therefore a special
standard for handling this needs to be adopted. - At the foundation, there are issues of
- Critical storage organization
- Streaming access
MULTIMEDIA
11Digital Library Document Type
- While multimedia depends on the stream
abstraction, structured documents require both
the abstractions of streams and structures. - DLs typically may include both hierarchy
documents and meta-data describing them,
annotations, and linking to other related
documents. Therefore special techniques for
searching on structure need to be adopted. - An alternative approach shifts the burden of
handling structure in documents to the user, by
allowing multiple layers of filters and tools
able to blind the document structure complexity.
STRUCTURED
12Digital Library Annotation
- Supports authorized users to create and share
annotations - integrates rating, comments, annotation and
reference linking features into the digital
library infrastructure - stores annotations on documents and makes them
available to authorised users
13Digital Library Annotation
Document 2
Document 1
14Digital Library Annotation
15Digital Library Annotation
16Digital Library Services
- The Scholnet Digital Library supports
- Information Acquisition and Description
- Virtual Information Space Presentation
- Cross-language Search and Retrieval
- Information Space Browsing
- Document Annotation
- Annotation Discovery
- Document Access
- Automatic Personalized Dissemination
17Digital Library Services
- The Scholnet Digital Library supports
- Document Archiving
- Document Publishing
- Information Space Organization
- User Rights Management
18Information Acquisition and Description
- Scholnet supports the request for submission,
withdrawal, and replacement of documents
submitted from authorized users. - Scholnet offers a Graphical User Interface (GUI)
to simplify the description and document
structure management phases. - The GUI is generic with respect to the digital
library document space selected - Metadata formats
- Structure of the document
-
19Virtual Information Space Presentation
- The Scholnet Information Space is organized in
collections. - A collection is a virtual aggregation of
documents, selected by a collection administrator
user, identified by a name and a description. - To search, browse, enrich any collection the
appropriate set of services is presented to the
user.
20Virtual Information Space Presentation
- Collections allow to personalize the documents
organization in order to satisfy the different
communities needs.
21Cross-language Search and Retrieval
- Scholnet supports different kinds of search for
different user needs - Simple Search
- Fielded Search
- Search Across
- To improve the quality of the results the
Relevance Feedback is used as a solution for
query modification. - All search types return structured response that
contains a list in which each element of the list
provides information on and a link to a document
22Cross-language Search and Retrieval Simple Search
- The simple search allows the user to search the
documents in the selected collection(s) using the
specified metadata formats without restricting
his/her search criteria to specific fields. - This method performs a full text search and
therefore neither numeric ranges nor controlled
vocabularies can be applied.
23Cross-language Search and Retrieval Fielded
Search
- The fielded search allows the user to restrict
his/her search criteria to specific fields of the
selected metadata formats, and to combine these
fields using the appropriate belief and not
belief operator. - This method allows to use date and numeric
ranges, as well as right types and controlled
vocabulary.
24Cross-language Search and Retrieval Search Across
- The user has to specify a value for an attribute
that supports the cross-querying capability, and
the characteristic of the subset of the results
that he/she wants to receive. - For example, let us suppose that the subject
attribute supports the cross-querying capability
with respect to the language. This means that the
user can specify a subject in Italian and
retrieve all documents containing related
subjects, even if these documents are described
in a different language.
25Cross-language Search and Retrieval Search Across
- The simplest approach is to locate words or
phrases in dictionaries, and to use the
translated terms to search in collections in
other language. - However other sophisticated approaches can be
used - Controlled vocabulary searching using a
multilingual thesaurus - Users can browse the version of the thesaurus in
their preferred language and extract relevant
terms to use in queries. The English version of
the thesaurus serves as the interlingua and
translation medium. - Free text searching
- The user first launches a query in his preferred
language. He/she indicates the most relevant
results retrieved. In the automatic way the
system extracts the most relevant terms from
these documents and these will be used to query
documents in the rest of the collection.
26Cross-language Search and Retrieval Relevance
Feedback
- The relevance feedback is used as a solution for
query modification in order to improve the
quality of the results. - The user has to specify the relevant retrieved as
response to a search. The service uses this
information to modify the original query. The new
query is performed and the results returned to
the user.
27Cross-language Search and Retrieval
- The search methods return a list of documents in
which each element of the list contains the
fields specified in one of the available
result-set formats. - The result-set formats are chosen by the
administrator and could be changed dynamically. - Header
- handle of and link to a document
- Short
- handle of and link to a document
- title, author, publication date
- Long
- handle of and link to a document
- title, author, publication date
- subject, description
28Information Space Browsing
- The Browsing Service allows to browse the content
of a collection (virtual set of documents) using
a specific field of the selected metadata
formats. - It returns structured response that contains a
list in which each element of the list contains
the fields specified in one of the available
result-set formats. and a link to a document. - The result-set formats are chosen by the
administrator and can be changed dynamically.
29Document Annotation
- Authorized users can annotate a document or its
parts specifying descriptive information and
access rights to the annotation, plus information
about one or more of the following sections - Evaluate
- Criticize
- Comment
- Is related with
30Annotation Discovery
- Annotations can be discovered
- Accessing the document to which they are related
- Searching in the annotation space for
- Author
- Description
- Comment
- Relation
- Evaluation
31Document Access
- A document is a digital object that
- may contain other digital objects
- is composed by entities
- An entity is the minimal logic component that can
be accessed by authorized users - A description can be associated to each digital
object or entity
32Document Access
33Document Access
34Document Access
35Automatic Personalized Dissemination
- Automatic notification to the user when a new
document matching the users information needs
(user profile) is published in the digital
library - The user personalizes his/her profile with
interesting topics - topic name
- keywords
- categories
- notification flag
36Digital Library Services
- The Scholnet Digital Library supports
- Information Acquisition and Description
- Virtual Information Space Presentation
- Cross-language Search and Retrieval
- Information Space Browsing
- Document Annotation
- Annotation Discovery
- Document Access
- Automatic Personalized Dissemination
37Digital Library Services
- The Scholnet Digital Library supports
- Document Archiving
- Document Publishing
- Information Space Organization
- User Rights Management
38Document Archiving
- Scholnet supports the archiving of millions of
documents geographically distributed over
thousands of different publishers - Scholnet supports the automatic formats
trans-coding in accordance with the publisher and
the document author choices - Scholnet supports the automatic compression and
de-compression of any manifestation in accordance
with the publisher and the document author
choices
39Document Publishing
- The Publisher Administrator can approve, review,
reject any submission, withdrawal, editing of
document metadata and document object request
submitted by a registered user - The incoming request are stored in a temporary,
and not public available, area.
40Document Publishing
- Scholnet is also able to import documents from
- Document repositories stored on file system.
- Simple sets of XML files.
- Structured sets of XML description and content
manifestations.
41Information Space Organization
- The Scholnet Information Space is organized in
collections. - A collection is a virtual aggregation of
documents, selected by a collection administrator
user, identified by a name and a description. - Any collection must supply the information
necessary to manage this aggregation of virtual
documents. This information is used by the other
services in order to handle the objects in the
collection objects.
42Information Space Organization
- The Scholnet Collection Service GUI helps the
user to - Create new collection
- Refine an exiting collection
- In all cases the following steps are to be
performed - Selection of the publishing institution
- Selection of the query language
- Specification of the condition
- Specification of the descriptive collection
metadata (name, description, subject).
43User Rights Management
- The Digital Library administrator is established
at service start-up time and can assign special
rights to the registered users in order to give
them the appropriate administration rights on
those services of the architecture that need an
administrator. - In the current version of the Scholnet system,
these services are the Registry, the Repository,
the Library Management, and the Collection
Service.
44User Rights Management
- A Registry Administrator can assign and remove
user rights. The DL Administrator is a Registry
Administrator. - A Repository Administrator can assign document
submission rights in his/her publisher/s to any
registered user. - Such users become submitters and can also edit
or delete documents.
45User Rights Management
- A Library Management Administrator can approve or
reject any submission, edit or withdraw requests
submitted by a submitter to his/her publisher. - A Collection Service Administrator can create
collection, and edit or delete his/her own
collections.
46User Rights Management
USER
ADMINISTRATOR
Collection
Generic
Library Management
Registered
Submitter
Repository
Registry
47Scholnet Infrastructure Data
- Repository Service
- No of documents managed gt 100 thousands
- No of files stored gt ½ million
48Scholnet Infrastructure Metadata