Title: Open Access to Digital Libraries,
1Open Access to Digital Libraries
- Donatella Castelli
- Pasquale Pagano
- Manuele Simi
- ISTI-CNR
- Pisa
2Outline
- 15 May 2003
- 09.30 What is a digital library?
- What is a second generation digital
library? - 11.00 Coffee break
- 11.30 A second generation DL Scholnet
- Introduction to its functionality
- DoMDL document model
- Annotation model
- 13.00 Lunch
- 14.30 Scholnet demo
- 15.45 Break
- 16.00 Experimentation of Scholnet by group 1
- 17.00 End of Day 1
3Outline (cont.)
- 16 May 2003
- 09.30 Experimentation of Scholnet by group 2
- 10.15 OpenDLib demo
- 10.30 The Scholnet Architecture
- 11.00 Coffee break
- 11.30 Scholnet demo
- 12.00 How to set up a Scholnet DL
- 13.00 Lunch
- 14.30 Comparison with other DL systems
- 15.00 Discussion and Questionnaire
- 15.30 End of day 2
4What is a DL?(traditional definition)
- An institution which performs and/or supports
(at least) the functions of a library in the
context of distributed, networked collections of
information objects in digital form - Nicholas Belkin
- Eigth DELOS Workshop
- Stockholm, 1998
5Digital objects
- Not only document descriptions (metadata)
- but also
- documents (texts, videos, sounds, 3D images, ..
data, programs, maps, . )
- Not only a catalogue.
- but also a repository
6DL basic services
- Acquisition
- Submission
- Repository
- Search/Browsing/Retrieval
- Dissemination
- User Interface
7A digital library is thus the analogous of
- A digital library
- A digital museum
- A digital archive
- A digital audio-video archive
-
- A data center
- ..
8The Origin
- The library systems
- The Web
9Library systems in the past
- from a direct communication (on-line) .
. to a communication via WEB HTTP protocol
10Web Access to OPACs
Internet
HTTP protocol
Generic users access though the Web only the
search/retrieval service
WEB Interface
Search
The software modules that implement the other
services do not communicate through the Web
Cataloguing Service
Loan Service
Catalogue
11Web access to OPACs
- Each catalogue is accessible through its own user
interface - User interfaces differ for
- access points
- names of the access points
- language
- graphics
- .
- Users must be familiar with many interfaces
- No cross-searches are possible
12- An example of
- user interface
13A solution Z39.50
- Standard communication protocol for
information search and retrieval -
It establishes the rules that regulate the
communication between the clients and the
servers (automatic catalogues)
Client
Server
Translation
Translation
Protocol Messages
14Z39.50
- It allows the search and retrieval of
bibliographic information from different
distributed OPACs by issuing a single query
though a common user interface -
OPAC - B
OPAC-A
Server Z39.50
Server Z39.50
Protocol Z39.50
User Interf. Client Z39.50
15Virtual Library
- Parallel access to selected Z39.50 OPACs
- Common user interface
- WEB access with mapping HTTP-Z39.50 (and
vice-versa) - Traditional bibliographic search based on
OPACs
16Use of Z39.50
- The use of Z39.50 is limited to catalographic
resources - No project implements the protocol functions for
the management of electronic documents
17The Web
- 90-ies global access to information resources
- Different types of resources, stored on Internet
distributed servers and accessible through the
WWW-World Wide Web - The Web allows to access a specific resource by
specifying the network address (URL)
18 Search engines
- Index the words in the Web pages
- Allow the resource discovery (without explicitly
knowing the address) - Implement their own resource selection polices
and their own indexing techniques - Offer statistics/probabilistic search
- Return Web addresses
19Resource discovery on the Web
- Finding relevant information on the World
Wide Web has become increasingly problematic
due to the explosive growth of networked
resources. Current Web indexing evolved rapidly
to fill the demand for resource discovery tools,
but that indexing, while useful, is a poor
substitute for richer varieties of resource
description. - Dublin Core Metadata Initiative
lthttp//www.ietf.org/rfc/rfc2413.txtgt
20The problem of the noise in the resource
discovery
- There is the need of some form of cataloguing
of the resources available on Internet in order
to achieve a good balance between recall and
precision - Descriptive rules must be suitable for all the
types of information resources
Dublin Core Metadata Format
21What is Dublin Core?
- Dublin Core metadata is used to supplement
existing methods for searching and indexing
Web-based metadata, regardless of whether the
corresponding resource is an electronic document
or a "real" physical object. - Dublin Core metadata provides card catalog-like
definitions for defining the properties of
objects for Web-based resource discovery systems.
22What is the DC Metadata Element Set?
- It is a set of 16 descriptive semantic
definitions. It represents a core set of elements
likely to be useful across a broad range of
vertical industries and disciplines of study - Title - Creator
- Subject - Description
- Publisher - Contributor
- Date - Type
- Format -
Identifier - Source - Language
- Relation - Coverage
- Rights - Audience
23Who can benefit from using DC metadata?
- Dublin Core metadata is being used as the basis
for descriptive systems by several interest
groups such as - educational organizations
- libraries
- government institutions
- scientific research sectors
- Web page authors
- businesses requiring more searchable sites
- corporations with vast knowledge management
systems
24More Info about DC and metadata
- http//dublincore.org
- http//dublincore.org/usage/terms/dc/current-eleme
nts/ - Maria Bruna Baldacci, Rappresentazioni
formalizzate, http//dlibcenter.iei.pi.cnr.it/it/
index.html
25First generation DLs in the US
- 1994-1998 Digital Library Initiative Phase I
- Funded by
- National Science Foundation(NSF)
- Department of Defense Advanced Research Project
Agency (DARPA) - National Aeronautics and Space Administration
(NASA) -
- Objective
- The focus is to dramatically advance the
means to collect, store, and organize information
in digital forms, and make it available for
searching, retrieval, and processing via
communication networks all in user friendly
ways -
26First generation DLs in Europe
- 1996
- ERCIM Technical Reference Digital Library
ERCIM European Consortium for Informatics
and Applied Mathematics
27An example the NCSTRL DL
- Networked Compurer Science Technical Reports
Library (NCSTRL) - Focus
- Proving the possibility of creating a DL as a
federation of distributed services - Result
- System operational on around one hundred of
widespread servers
28Distributed services
Search service
Internet
Users access the system through a Web interface
Repository Service
Repository Service
Repository service
Browse Service
The services are distributed on the
Internet. They communicate through an
established protocol.
Interface Service
Loan Service?
29NCSTRL (cont.)
- Documents
- Computer Science Technical Reports published by
more than one hundred research institutions - Descriptive metadata format
- Author, title, abstract
- (provided by the author of the doc)
- Services
- Search and browse on author, title and abstract
- Submission is carried out by the author by
sending the document and its metadata to the
NCSTRL administrator
30Another example the Informedia DL
- Centralized audio-video DL system
- Focus
- Automatic content metadata extraction through
the integration of various technologies - Speech understanding for automatically derived
transcripts - Face, text and object recognition
- Key frame extraction and indexing
- Geocoding
- Topic assignement
31The Informedia DL
- Documents
- Audio-video resources (mainly News)
- Metadata
- Terms extracted from the transcript and from the
image captions - Locations
- Keyframe
- Faces
- Name of the speakers
- Video abstract
-
32The Informedia DL
- Services
- Search based on
- Free text
- Image similarity
- Face and object similarity
- Geographical information
- Multiple presentation styles of query results
33Informedia DL underlying technologies
- Automatic indexing through
- Speech understanding for automatically derived
transcripts - Face, text and object recognition
- Key frame extraction and indexing
- Geocoding
- Topic assignment
- Automatic abstract generation
34The Informedia DL example 1
35The Informedia DL example 2
36From the first to the second generation DLs
- A DL is not only a instrument for a
- wider
- cheaper
- faster
- dissemination of information but it can also be
- a mean for supporting the communication and
collaboration between the members of a community
of interest
37Second generation DLs
- A DL can offer more
- New types of digital objects
- New services
38New types of digital objects
- Multimedia
- Structured
- Annotated
- Multilingual
-
The new document types enrich the possible forms
of remote collaboration among the members of a
community of interest
39Multimedia documents
- Videos and slides of tutorials, seminars,
lectures - Training sessions
- Project presentations
- Demos
40Structured documents
41Annotated documents
- rating
- comment
- description
- link
- agreement
- disagreement
- explanation
-
- on the whole document or on its parts
- authored by different people
- public or restricted
42Multilingual documents
- Documents in different languages can be
maintained in the same DL - These documents can be accessed by querying in
the language of the document and in any other
supported language
43New Services (1)
- New document types impose a re-thinking of the
traditional library services - Submission
- Description
- Search
- Dissemination
-
44An example the acquisition of video
documents
It must be possible to structure the video into
meaningful parts (sequences, scenes, frames)
45The description of video documents
and describe the video and its parts separately
46Another example the search
- Multiple search types options
- Free text search
- Fielded search
- Monolingual and cross language search
- Similarity search
- Search using the doc structure
- Search on annotations
- ..
47An example of complex query
- All the seminars
- such that contain a slide such that
- is about XML
- and contain a video such that
- there is an image of Serge Abiteboul
- and have been created after December 2000
- and have a good rating
48New services (2)
- New services (not necessarily document centered)
enabled by recently developed technologies, can
be included in a DL to improve its potential
usages - Recommenders
- Co-operative work services
- Peer-reviewing supporting services
- Authoring services
- E-learning tools
- .
49An example a collaborative environm.
50An example a collaborative environm.
51An example a recommender system
52A new research trend from a DL ..
- DL have been developed as ad-hoc systems
- These systems require a great investment in terms
of man power and technologies for the
implementation of the software and their
maintenance - Skilled personnell is needed
- Few communities can build their own DL
53 to a digital library service system
- A digital library service system
-
- Flexible and open DL system that offers DL
services. It can be customized according to the
characteristics of the context where the DL must
operate and to the needs of its users - (like a DBMS)
54Example of customization dimensions
- Service specific parameters
- Metadata formats
- Document types
- Annotation model
- Controlled vocabulary
- Query language
- Formats of the results
55Customization is not enough
- Each community needs specific services in
addition to the basic services - (basic services e.g. search and browse)
- For example
- A community of physics may need a specific
service to test the consistency of the results
published - A worldwide community of medicine may need a
translation service -
56Openness is also required
- Open means that new services can be easily
added (expandability) - Each community of users can add their own
specific services - The use of a DL may raise new requirements other
services can be added over the DL lifetime
(dynamic expandability) to cover emerging needs
57Another trend exploiting existing content
- The production of the DL content is a very
expensive process - DLs can also be built by exploiting content
stored by existing distributed heterogeneous
sources
58The reference model
Each service operates on the data of multiple
archives
data providers open their archives
59Data and service providers
- An existing source can act as a data provider
and a service provider - Advantages of this approach
- Third parties services that operate on existing
data can be implemented - A source can be accessed through advanced
services built by others
60Interoperability solutions
- Several solutions are possible
- The services apply schema mappings
- Data providers implement a more or less complex
protocol (e.g. OAI) - Automatic mapping generation (a current research
topic)
61An example NSDL
- The National Science, Mathematics, Engineering
and Technology Education Digital Library (NSDL) - Over the next five years NSDL is expected to
serve millions of users and provide access to
tens of millions of digital resources
62NSDL core architecture
Portals
Metadata Repository
Users
Search Discovery
Services
Direct entry
Gathering
OAI Harvest
Collections
63Spectrum of interoperability
- To achieve widespread adoption, the cost of
adoption must be low - Few collections have metadata conforming to
common and well-established standards, if they
have metadata at all - Sources do not necessarily implement a protocol
that allows harvesting of resources
64The NSDL metadata strategy
- Collect (through a variety of ingesting
mechanisms) item metadata from cooperating
collections in any of eight supported native
formats - When appropriate crosswalk native metadata to
Qualified Dublin Core which will provide a lingua
franca for interoperability - When item-level metadata does not exist and where
possible, process content and generate metadata
automatically - Accept that item level metadata will not always
exists. Concentrate limited human effort on the
creation of this collection-level metadata
65Mechanisms for enter of metadata
- Metadata ingest via Open Archives Initiative
protocol for metadata harvesting - Metadata ingest via FTP, e-mail or web upload
- (XML-base text file, Excel spreadsheet,
tab-delimited text file) - Metadata ingest by direct entry
- (by authorised users)
- Metadata ingest by gathering
- (web crawling automatic metadata generation)
66User interface through portals
-
- The NSDL users will be very diverse, including
students, instructors, the public at all levels,
librarians, community interest groups, NSDL
federated partners - Access to the DL will be through portals (main
portal, specialised portals, personalized
portals)