Title: An Interim Report from DAWG
1An Interim Report from DAWG
- Digital Architecture and Infrastructure Working
Group - Chartered by Grace Agnew to
- Develop policies and procedures to support an
integrated, secure, and effective common
infrastructure - Develop a digital library infrastructure to
support an integrated, sustainable digital
library initiative. - Goals include
- Provide sustainability of the digital content and
technology platform - Support of the RUL Data Architecture
- Apply new interoperability protocols
- Support state-wide initiatives
2DAWG Team
- Anne Butman
- Tom Frusciano
- Judy Gardner
- Michael Giarlo
- Nick Gonzaga
- Dave Hoover
- Patrick Huey
Ron Jantz (chair) Sam McDonald Ann Montanaro Lynn
Mullins Robert Nahory Jeffery Triggs Karen
Wenk Yang Yu
3Challenges in Digital Libraries
- Integration across diverse digital collections
- Scale to millions of objects
- Flexibility to handle many digital formats
- Ability to customize by adding special tools and
services - Preservation of digital objects
- Sustainability and interoperability
4Initial Focus of DAWG
- Infrastructure
- Evaluating and selecting a large mass storage
system to accommodate millions of digital objects - Architecture
- Developing the architecture and prototype for an
RU digital library network.
5Concepts and Terminology
- RU Digital Library Network (DLN)
- A system of people, standards, and
software/hardware that provide the access,
management, and preservation of digital
repositories of interest to RU. - RUL Digital Library Repository (DLR)
- A repository that is designed and managed by RUL
to contain and provide access to digital
resources created by RU and RUL. The DLR is
part of the DLN. - Digital Object Architecture support of complex
objects - multiple manifestations, e.g. a book represented
as images, text, and digital sound - multiple formats, e.g. a map represented as tiff,
djvu, and MrSid - multiple behaviors, e.g. display at different
resolutions, rotate a 3D object, etc.
6Architecture Design Philosophy
- Design Principles Interoperability,
Sustainability, and Extensibility - Informed by the Open Archive Information System
(OAIS) Reference Model. - Designed to contain the output of RU (both
scholarly material and administrative data). - Policy decisions will determine content and how
distributed or centralized the repository will
ultimately become. - Will accomodate a virtual network of repositories
enabling access to existing metadata repositories
(IRIS, Luna) as well as providing a framework for
accessing and searching external metadata
resources. - The technological framework and content must be
sustainable. - All information resources, on submittal to the
repository, should have, at a minimum, a core set
of metadata that can be mapped to RU Core. - The architecture is flexible (customizable) and
extensible. For example, discipline-specific
portals can be developed.
7RU Digital Library Network - Features
- Large scale, stable, digital repository
- Searching across multiple repositories
- Searching and browsing using RU Core
- Flexible metadata support
- Access through portals by community, content, and
format - Easy to use submission process
- Digital preservation with persistent identifiers
- Flexible, digital object architecture
- Access to existing digital collections
- Sustainability through open-source, standards,
and support of critical workflow processes.
8RU Digital Library Network Possible Content
- Maps (e.g. digitized historic New Jersey Maps)
- Historic documents
- Electronic Journals
- 3D objects (e.g. glass art, Roman coins,
scrapbooks) - Multimedia objects (e.g. digital video)
- Special ebook collections
- Numeric data
- Preprints, learning objects from RU faculty
- Dissertations
- Operational and Administrative RU Reports
- Object level access to existing digital
collections (e.g. NJEDL) - Searchable metadata collected through harvesting.
9RU Digital Library Network
Search and Browse Interface
Federated (z39.50) (tightly coupled)
RU Digital Library Repository
IRIS
Harvested (OAI-PMH) (loosely coupled)
Other Nodes
10Cross-Repository Searching An Early DLN Prototype
11Digital Object Structure Three Types
METS Wrapper
Metadata
Ptr to External Digital object
Harvested Metadata
Digital Objects
12Repository Architecture and Metadata
- METS (Metadata Encoding and Transmission
Standard) will be used to encapsulate
descriptive, preservation, structural and
behavior metadata. - For interoperability, all metadata schemas must
map to NJCore and Dublin Core - The architecture must support creation of simple
(NJCore, Dublin Core) and complex metadata (FGDC,
MPEG-7, IEEE LOM, etc.)
13Metadata and Dynamic MappingAn Example
Input METS Wrapper
FGDC for maps
Preservation
Structure
Object
14Open Source Digital Repositories
- Dspace A digital library repository
- DSpace is a specialized type of digital asset
management or content management system it
manages and distributes digital items, made up of
digital files (or bitstreams) and allows for
the creation, indexing, and searching of
associated metadata to locate and retrieve the
items. It is designed to support the long-term
preservation of the digital material stored in
the repository. (http//dspace.rutgers.edu) - Fedora A digital object repository
- Fedora is a foundation upon which interoperable
web-based digital libraries can be built. Fedora
consists of APIs (application program interfaces)
for creating access and management applications.
15Archival Storage and Preservation
- A physically separate archive is managed for
preservation purposes. - The archive is separate from the presentation
form (website) and the daily backup. - The intent of the archive is to capture all the
required forms of the digital material in
non-proprietary format. - Each digital object would have preservation
metadata and a persistent ID.
16Mass Storage System - Requirements
- Initial capacity of 10 to 20 Terabytes (TB)
- Extensible to 100, 200TB and beyond
- Low management overhead
- Information must survive migrations across
software and platforms - History/audit trails required for each object
- Mirroring to a remote cluster (e.g. a cluster in
NB and one in Newark) to provide offsite backup. - Global name space across all RUL locations
- Platforms required Windows 2000, Unix, Linux
17(No Transcript)
18Technologies and Standards
- Persistent ID CNRI Handle System
- OAI-PMH Protocol for metadata harvesting
- METS Metadata Encoding and Transmission
Standard - OpenURL
- SCORM
19Detailed Requirements
- Ingest
- Administration
- Access
- Data Management
- Preservation
- Storage
- System Level
20Progress To Date
- Infrastructure
- Commercial product discussions and quote from EMC
for mass storage. Also examining ADIC and IBMs
Storage Tank (an open source product). - CamdenBase directories/permissions standardized
for transfer to systems. - Developed initial criteria for an RUL server
registry - Architecture
- Educating ourselves in various technologies 1)
OAI-PMH, 2) CNRI Handle System, 3) SCORM, 4)
Z39.50/YAZ, 5) METS, 6) OpenURL - Draft for requirements and architecture
- Cross-repository search prototype
- Downloaded Dspace (from MIT) - started evaluation
- UVa (Fedora) visit planned for early March
21Next Steps
- Fedora visit UVa for half day tutorial
- Continue reviewing and select mass storage system
- Prepare interim communication package
- High level architecture and requirements
- Cross-repository searching prototype
- Preliminary assessment of Dspace and Fedora
- Communicate and Get Feedback
- Begin more detailed evaluation of Dspace and
Fedora - Produce architecture/functional specification
- Develop prototype with sample content
22Tasks and Timeline for DAWG-A
- January, 2003 Requirements/Architecture
document - February, 2003 Discussion, Feedback with RUL
and RU - March May, 2003 Evaluation of candidate
systems (Dspace, Fedora, et al) - June August, 2003 Select system and prototype
sample content - September - November, 2003 Prototype-trial of
multiple repositories
23Tasks and Timeline for DAWG-I
- December, 2002 - Determine requirements for mass
storage system - December January, 2003 Transfer CamdenBase to
Systems, test, and evaluate process - December January, 2003 - Research and evaluate
possible mass storage products - March, 2003 - Recommend mass storage solution
- March 2003 - Develop RUL server registry criteria