Title: DSpace SRB Integration
1DSpace / SRBIntegration
- Luc Declerck and Chris Frymann
- University of California, San Diego Libraries
- CNI Fall Task Force Meeting
- Portland, Oregon
- December 6-7, 2004
2This Presentation Will
- Provide a brief overview of DSpace SRB
- Report on a project to integrate the two systems
3Success of the Web
- Much of the value and success of the Web is a
result of its - Ease of use
- Enormous size
- Simplistically, this has been achieved through
- A simple user interface
- Transparent access to distributed storage
4SimpleInterface
Transparent Access to Distributed Storage
Web Browser
Web Server
DSpace
SRB
5Project Participants
- San Diego Super Computer Center (SDSC)
- Member of National Partnership for Advanced
Computational Infrastructure (NPACI) an NSF
sponsored program - MIT Libraries (MIT)
- UC San Diego Libraries (UCSD)
- National Archives and Records Administration
(NARA)
6Main Project Goal
- Demonstrate that integration of DSpace with SRB
leads to improved functionality for both systems
7DSpace
- Jointly developed by
- MIT Libraries
- Hewlett-Packard (HP)
8DSpace Familiar As
- Simple user-friendly front end providing
- Digital content ingestion
- Indexing, search and discovery
- Content management
- Dissemination services
9Federation Services
SRB
10Dspace 2.0 Discussion Planning
- AssetStore API
- AIP Model
- Modularity Mechanism
- UI Framework
- http//simile.mit.edu/dspace-wiki/DspaceTwo
11DSpace Availability
- To any type of organization as
- Free, open-source software
- That can be customized and extended
-
- From
- http//sourceforge.net/projects/dspace/
12SRB
- Storage Resource Broker
- Developed at San Diego Supercomputer Center
13SRB
- Server software programming interfaces
- Allows applications that store and retrieve files
to treat a diverse collection of physical storage
devices as a single logical resource - Utilizes data grid and federation technologies
14DSpace
Applications
Fedora
CDL DPR
Other
SRB
Broker
Storage
15Basic Storage Resource
200 GB Disk Drive
16Storage Resource
10 drives 2 Terabytes/box
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
.2 TB
Rackmount Storage Server
SRB lets us treat it as a single logical resource
17Single Logical Resource 12 TB
Server 6
Server 5
Rack of Storage Servers
Server 4
Server 3
Server 2
Server 1
18Single Logical Resource 50 TB
12 TB
12 TB
12 TB
12 TB
Room of Racks
19200 TBSingle Logical Resource
Applications
SRB
20What SRB Does
- Connects, replicates, syncs, and archives
heterogeneous resources in a logical manner,
using abstraction mechanisms - Provides a way to access files and computers
based on their attributes rather than just their
names or physical locations
21SDSC Storage Resource Broker Meta-data Catalog
InQ
MySRB
DSpace
Application
Linux I/O
OAI WSDL
Access APIs
DLL / Python
Java, NT Browsers
GridFTP
Federation
Consistency Management /
Authorization-Authentication
SRB Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase, SQLServer
Drivers
HRM
22SRB Data Grid Abstractions
- Logical name space for files
- Global persistent identifier
- Storage repository abstraction
- Standard operations supported on storage systems
- Information repository abstraction
- Standard operations to manage collections in
databases - Access abstraction
- Standard interface to support alternate APIs
- Latency management mechanisms
- Aggregation, parallel I/O, replication, caching
- Security interoperability
- GSSAPI, inter-realm authentication,
collection-based authorization
23Data Grids Federation
- Data grids provide the ability to name, organize,
and manage data on distributed storage resources - Federation provides a way to name, organize, and
manage data on multiple data grids.
24Federated SRB Server Model
Peer-to-peer Brokering
Read Application
Parallel Data Access
Logical Name Or Attribute Condition
1
6
5/6
SRB server
SRB server
3
4
5
SRB agent
SRB agent
2
Server(s) Spawning
R1
MCAT
1.Logical-to-Physical mapping 2.Identification of
Replicas 3.Access Audit Control
R2
Data Access
25SRB Availability
- SRB source distributed to academic and research
institutions - Commercial use access through UCSD Technology
Transfer Office - William Decker WJDecker_at_ucsd.edu
- Commercial version from
- http//www.nirvanastorage.com
26SRB Info Resources
- SRB Homepage
- http//www.npaci.edu/DICE/SRB/
27The Project
28Main Goal
- Extension of DSpace storage capability
- Use SRB as filestore for DSpace bitstreams
29Simple User Interface
Unlimited Storage
DSpace
SRB
Content Ingestion Discovery
Dissemination
Uniform interface to storage Distributed
Heterogeneous
30Implementation Steps
- Replace DSpace file system calls with SRB access
calls - Employ METS based Archival Information Package
(AIP) - Enable exchange of data and metadata between
independent DSpace and SRB systems - Validate authenticity of exchanged content
31UCSD LibrariesTo ProvideTest Collections
- Still Images
- Over 200,000 Digitized Slides
- Approximately 4 Terabytes
- Moving Images
- California movie newsreel footage
- Size of collection to be determined
32Testing Will Explore
- Management of terabyte scale collections
- Automating aspects of archival workflow
- Integration of METS
- Automated verification and validation checking
33Schedule Deliverables
- Year 1 Develop Prototype
- Develop functional requirements
- Specify standard interfaces METS profiles
- Prototype implementation of specified design
- Ingest data, evaluate functionality and
performance
34Schedule Deliverables
- Year 2 - Implementations
- Federation with additional systems, possibly
- CDL
- OCLC
- Fedora
- Scalability testing
- Ingestion of more content types
35Progress So Far
- Use Cases
- Data Model (AIP)
- Project timeline
- Data preparation
- Batch ingestion of files into SRB
- METS Profile development
- Code integration
- Single item ingest and retrieval
- Batch registration of existing SRB resources into
DSpace
36Data Model
- Paired Content and Metadata Files
- Metadata encoded in standard METS profiles
- Stand-alone METS files used to describe arbitrary
levels of aggregation of lower level objects
37(No Transcript)
38DSpace/SRB Integration Paths
DSpace
Content File Metadata
DB
Content Files
Single Item Ingest into DSpace/SRB
Future
DSpace Batch Import Utility
Batch Registration into DSpace/SRB
QDC
Mcat
METS Metadata
Content Files
SRB
Batch Ingestion into SRB
Content Files Metadata
Storage Layer
39Federation
UCSD
CDL
MIT
DSpace
DSpace
DSpace
SRB
SRB
SRB
40Federation
UCSD
CDL
MIT
DSpace
DSpace
DSpace
SRB
SRB
SRB
41Peer-to-Peer Federation
- Occasional Interchange - for specified users
- Replicated Catalogs - entire state information
replication - Resource Interaction - data replication
- Replicated Data Zones - no user interactions
between zones - Master-Slave Zones - slaves replicate data from
master zone - Snow-Flake Zones - hierarchy of data
replication zones - User / Data Replica Zones - user access from
remote to host zone - Nomadic Zones - synchronize local zone to
parent zone - Free-floating myZone - synchronize without a
parent zone - Archival BackUp Zone - synchronize to an archive
42Principle peer-to-peer federation approaches
43Future Plans / Challenges
- Planning intersection with DSpace 2.0 evolution
- Extension of Preservation Architecture (AIP)
- Naming (implementation of ARKs)
- Exploring alternative modes of control
- DSpace
- SRB APIs
- Federated zone configuration
- Deeper integration of DSpace and SRB metadata
databases - Explore life-cycle management of integrated
resources - Explore relationship to CDL DPR
- Handling of compound documents
44ConclusionExpected Results / Benefits
- DSpace users
- Federated collection management through
distributed grid technology - Exchange of METS encoded collections
- SRB users
- User friendly ingest mechanism
- Extended life-cycle management
- Exchange of METS encoded collections
45Simple User Interface
Simple Access to Unlimited Storage
DSpace
SRB
46Q A
- Project website
- http//libnet.ucsd.edu/nara