Title: An Evaluation and Implementation for capturing,
1An Institutional Repository System
(An Evaluation and Implementation for capturing,
describing and publishing digital
works) S.Shashi Nath IKM, Jan-2004
2Agenda
- Project Title and Goal
- Introduction
- What is DSpace
- A little History Milestones
- What it Does
- What can we put in
- DSpace information model
- Features
- Architecture
- DSpace Federation
- Implementation
3Project Title and Goal
- Title
- Implementing Dspace System to capture, describe
and publish digital - works
- Goal
- To demonstrate the implementation of the latest
version of Dspace - system to
- Capture and describe digital works using a
submission workflow module - Distribute digital works over the web through a
search and retrieval - system
- Long term Preservation of digital works
4Introduction
- Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in
information? - T.S. Elliot - Institutional Repository A container for an
institutions scholarly output. - Mission is to provide reliable, long-term access
to managed digital resources to its - designated community, now in the future.
- Key Drivers
- Open Access Movement-Self Archiving
- Open Source Software Development
- Open Archives Initiative-Standards for Open
Access - Focus on Long Term Access Preservation
5About DSpace
- What is DSpace
- DSpace is a specialized digital asset
management/Institutional Repository system - developed by the MIT Libraries with support from
the HP-MIT Alliance designed to - A platform to build an Institutional Repository
- Support long-term preservation of digital
material - To allow creation, indexing and searching of
metadata - enables institutions to capture and describe
digital works using a custom submission workflow
module - distribute an institution's digital works over
the web through a search and retrieval system - Be distributable in open source ( available under
the BSD open source license to other research
institutions to run as-is, or to modify and
extend as needed)
6About DSpace
A little History Milestones March 2000, HP
awarded 1.8 million to the MIT Libraries for an
18-month collaboration to build DSspace. HP Labs
and MIT Libraries released the system worldwide
on Nov. 4, 2002. Whos working with
DSpace Cambridge University Columbia University
Cornell University MIT University of
Ohio University of Rochester University of
Toronto University of Washington and many
other universities
Early adopters beta test DSpace March - Sept.,
2002 DSpace adds new communities at
MIT September 30, 2002 DSpace content becomes
publicly accessible September 30, 2002 DSpace
launch event November 4, 2002 DSpace source code
released under November 4, 2002 Open Source BSD
license Early federators begin to collaborate
with DSpace Fall, 2002
7What it Does
- Captures
- Digital research material in any formats
- Directly from creators (faculty)
- Large-scale, stable, managed long-term storage
- Describes
- Descriptive, technical, rights metadata
- Persistent identifiers
- Distributes
- Via WWW, with necessary access control
- Preserves
- Bitstream guaranteed
8What can we put in
Possible DSpace Content Articles Preprints,
e-prints Technical Reports Working
Papers Conference Papers E-theses Audio/Video
Datasets Statistical, geospatial
etc. Images Visual, scientific Teaching
material Lecture notes, visualizations,
simulations Digitized library collections
9DSpace information model
10DSpace information model
- Communities
- Departments, Labs, Research Centers, Schools
- Collections (in communities)
- Distinct groupings of like items
- Items (in collections)
- Logical content objects
- Receive persistent identifier
- Bitstreams (in items)
- Individual files
- Receive preservation treatment
- Versioning- Item versions can be
- All instances of a work in different formats
- E.g. the XML, PDF, and PostScript versions
- All editions of a work over time
- Metadata lists all available versions of items
11Features
- User Interface
- Web based, for submission, end-user and System
Administrators - search and retrieval of items by browsing or
searching the metadata - Workflow
- Enables differing submission workflows for
communities - models "e-people" who have "roles" in the
workflow of a particular Community in the context
of a given collection - Open Archives Initiative (OAI)
- OAI-PMH 2.0 Compatible and uses used the OCLC
OAICat - Persistent Identifiers (Handles)
- Implements CNRI handles as the persistent
identifier associated with each item - Access Control
- DSpace allows contributors to limit access to
items in DSpace, at both the collection and the
individual item level. - Metadata Schema
- Utilises Qualified Dublin Core.
12Features
- Preservation
- "bit preservation", where a digital file is
carefully preserved exactly as it was created
without the slightest change (Known
Un-Supported Format) - "functional preservation", where the digital file
is kept useable as technology formats, media, and
paradigms evolve (Supported Format) - Technology platform
- designed to run on the UNIX platform, original
code is in Java. - includes a RDBMS (PostgreSQL), a Web server and
Java servlet engine (Apache and Tomcat, Jena (an
RDF toolkit from HP Labs), OAICat from OCLC, and
several other useful libraries, Lucene 1.2
(index/search) etc . - Search and Retrieval
- description of items using a qualified version
of the Dublin Core metadata schema. These
descriptions are entered into a relational
database, which is used by the search engine to
retrieve items. Browsing though title, date and
author indices keyword Searching - Indexed by Search Engines
13Architecture
14Architecture
- The DSpace architecture is a straightforward
three-layer architecture, including storage,
business, and application layers, each with a
documented API to allow for future customization
and enhancement. - The storage layer is implemented using the
file system, as managed by - PostgreSQL database tables.
- The business layer is where the
DSpace-specific functionality resides, - including the workflow, content
management, administration, and search - and browse modules. Each module has an
API to allow DSpace adopters - to replace or enhance that function as
desired. - The application layer covers the interfaces
to the system the web UI and - batch loader, in particular, but also
the OAI support and Handle server for - resolving persistent identifiers to
DSpace items.
15DSpace Federation
- The DSpace Federation includes minimally all the
research institutions, libraries, - and other cultural heritage institutions that are
using the DSpace digital - repository system.
- Members of the Federation share the following
goals - Sharing in the development and maintenance of the
DSpace source code. - Developing a critical corpus of content that
represents the intellectual output of the worlds
leading research institutions. - Promoting the continued development of the DSpace
service through the open source community. - Promoting the interoperability of archival
repositories. - Ensuring the long-term preservation of scholarly
work by complying with published standards and
supporting national and international initiatives
to develop standards in this domain.
16Implementation
Installed Prerequisite Software 1. UNIX-like OS
- Installed RedHat Linux 9.0 2. Java SDK 1.3 or
later Installed Java SDK 1.4.2 3. Ant 1.4 or
later Installed Jakarta Ant 1.5.1 (a make tool
for java applications) 4. Tomcat 4.0-Installed
Tomcat 5.0 (a web server for Java servelets and
JSP) 5. Postgresql 7.3 or later-Installed 7.3.2
(an RDBMS) 6. Javalibraries - i. Activation.jar
ii. mail.jar iii. servlet.jar Installed DSpace
-Release1.1.1, (released on 29-Aug-2003 ) (1.2
due in March, 2004) -Tweaked Configuration of
DSpace
17Implementation
Installed Handle Server - Installed Handle server
included - Obtained Handle Prefix from CNRI,
Handle System (1875) - Tweaked Configuration of
DSpace for Handle Server - Administered Handle
Server and made configuration changes - Resolved
Global Handles Successfully Customized DSpace -
Changed several jsps to change the look of the
Repository - Added Custom Content like links,
IISc. License etc. Ensured OAI Compliance -Tweaked
DSpace Configuration for OAI compliance -
Registered and Tested DSpace with OAI Repository
Explorer - Successfully Harvested Metadata via
OAI Repository Explorer
18Implementation
Ran DSpace via Secure Sockets Layer - Installed
Tomcat SSL (https protocol) successfully Installed
OAI Harveter Plugin -Installed OAI Harvester
Plugin for DSpace (from keplerDspace
project) -Tweaked configuration such as Data
Provider details etc. -Installed Simple API for
XML (SAX) Parser -Successfully Harvested
Metadata -Full Text Caching also possible
without use of Handle Server Working with
DSpace - Created Communities/Collections (with a
view to test different file types) - Tested WebUI
for submission as well as Batch Import and Export
for Items - Explored and administered the full
Admin. Interface (Groups, Authorizations,
E-persons, E-Mail Notifications,
Workflows,Metadata Registry, Bitstream Registry
etc.) - Tested Search, Advanced Search and Browse
Features
19Wish-list
- Better and Comprehensive Documentation (Maybe
Mine) - Extensive Online Help (Really lacking)
- Binaries would make life and installation easier
- Stable OAI harvester and URL importing function
- METS Schema Adherence
- Browse by Subject
- Custom Sorting of Search Results (by relevance,
by title etc.) - Full Text Indexing
- Multifile Linking in Item Support
- Thumbnail Support for content
- Collection/Item in Multiple Community/Collection
Support - Sub-Communities Support
- Bitstream Level Handles and Metadata
20Opinion
- In an institution like IISc. the workflow model
suits perfectly allowing stringent peer review. - Robust Repository System. Champion in the making,
for the Open Access Movement. - Greater transparency in workflows and submission
process, will lead to popular usage. - Tough work ahead to promote it and open access
- ITS NOT EASYBUT SURELY NOT IMPOSSIBLE ?
21Thanks for your patience. Any Queries?? Lets
move on to the LIVE DEMO