An Interim Report from DAWG - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

An Interim Report from DAWG

Description:

An Interim Report from DAWG Digital Architecture and Infrastructure Working Group Chartered by Grace Agnew to: Develop policies and procedures to support an ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 24
Provided by: SCC106
Category:

less

Transcript and Presenter's Notes

Title: An Interim Report from DAWG


1
An Interim Report from DAWG
  • Digital Architecture and Infrastructure Working
    Group
  • Chartered by Grace Agnew to
  • Develop policies and procedures to support an
    integrated, secure, and effective common
    infrastructure
  • Develop a digital library infrastructure to
    support an integrated, sustainable digital
    library initiative.
  • Goals include
  • Provide sustainability of the digital content and
    technology platform
  • Support of the RUL Data Architecture
  • Apply new interoperability protocols
  • Support state-wide initiatives

2
DAWG Team
  • Anne Butman
  • Tom Frusciano
  • Judy Gardner
  • Michael Giarlo
  • Nick Gonzaga
  • Dave Hoover
  • Patrick Huey

Ron Jantz (chair) Sam McDonald Ann Montanaro Lynn
Mullins Robert Nahory Jeffery Triggs Karen
Wenk Yang Yu
3
Challenges in Digital Libraries
  • Integration across diverse digital collections
  • Scale to millions of objects
  • Flexibility to handle many digital formats
  • Ability to customize by adding special tools and
    services
  • Preservation of digital objects
  • Sustainability and interoperability

4
Initial Focus of DAWG
  • Infrastructure
  • Evaluating and selecting a large mass storage
    system to accommodate millions of digital objects
  • Architecture
  • Developing the architecture and prototype for an
    RU digital library network.

5
Concepts and Terminology
  • RU Digital Library Network (DLN)
  • A system of people, standards, and
    software/hardware that provide the access,
    management, and preservation of digital
    repositories of interest to RU.
  • RUL Digital Library Repository (DLR)
  • A repository that is designed and managed by RUL
    to contain and provide access to digital
    resources created by RU and RUL. The DLR is
    part of the DLN.
  • Digital Object Architecture support of complex
    objects
  • multiple manifestations, e.g. a book represented
    as images, text, and digital sound
  • multiple formats, e.g. a map represented as tiff,
    djvu, and MrSid
  • multiple behaviors, e.g. display at different
    resolutions, rotate a 3D object, etc.

6
Architecture Design Philosophy
  • Design Principles Interoperability,
    Sustainability, and Extensibility
  • Informed by the Open Archive Information System
    (OAIS) Reference Model.
  • Designed to contain the output of RU (both
    scholarly material and administrative data).
  • Policy decisions will determine content and how
    distributed or centralized the repository will
    ultimately become.
  • Will accomodate a virtual network of repositories
    enabling access to existing metadata repositories
    (IRIS, Luna) as well as providing a framework for
    accessing and searching external metadata
    resources.
  • The technological framework and content must be
    sustainable.
  • All information resources, on submittal to the
    repository, should have, at a minimum, a core set
    of metadata that can be mapped to RU Core.
  • The architecture is flexible (customizable) and
    extensible. For example, discipline-specific
    portals can be developed.

7
RU Digital Library Network - Features
  • Large scale, stable, digital repository
  • Searching across multiple repositories
  • Searching and browsing using RU Core
  • Flexible metadata support
  • Access through portals by community, content, and
    format
  • Easy to use submission process
  • Digital preservation with persistent identifiers
  • Flexible, digital object architecture
  • Access to existing digital collections
  • Sustainability through open-source, standards,
    and support of critical workflow processes.

8
RU Digital Library Network Possible Content
  • Maps (e.g. digitized historic New Jersey Maps)
  • Historic documents
  • Electronic Journals
  • 3D objects (e.g. glass art, Roman coins,
    scrapbooks)
  • Multimedia objects (e.g. digital video)
  • Special ebook collections
  • Numeric data
  • Preprints, learning objects from RU faculty
  • Dissertations
  • Operational and Administrative RU Reports
  • Object level access to existing digital
    collections (e.g. NJEDL)
  • Searchable metadata collected through harvesting.

9
RU Digital Library Network
Search and Browse Interface
Federated (z39.50) (tightly coupled)
RU Digital Library Repository
IRIS
Harvested (OAI-PMH) (loosely coupled)
Other Nodes
10
Cross-Repository Searching An Early DLN Prototype
11
Digital Object Structure Three Types
METS Wrapper
Metadata
Ptr to External Digital object
Harvested Metadata
Digital Objects
12
Repository Architecture and Metadata
  • METS (Metadata Encoding and Transmission
    Standard) will be used to encapsulate
    descriptive, preservation, structural and
    behavior metadata.
  • For interoperability, all metadata schemas must
    map to NJCore and Dublin Core
  • The architecture must support creation of simple
    (NJCore, Dublin Core) and complex metadata (FGDC,
    MPEG-7, IEEE LOM, etc.)

13
Metadata and Dynamic MappingAn Example
Input METS Wrapper
FGDC for maps
Preservation
Structure
Object
14
Open Source Digital Repositories
  • Dspace A digital library repository
  • DSpace is a specialized type of digital asset
    management or content management system it
    manages and distributes digital items, made up of
    digital files (or bitstreams) and allows for
    the creation, indexing, and searching of
    associated metadata to locate and retrieve the
    items. It is designed to support the long-term
    preservation of the digital material stored in
    the repository. (http//dspace.rutgers.edu)
  • Fedora A digital object repository
  • Fedora is a foundation upon which interoperable
    web-based digital libraries can be built. Fedora
    consists of APIs (application program interfaces)
    for creating access and management applications.

15
Archival Storage and Preservation
  • A physically separate archive is managed for
    preservation purposes.
  • The archive is separate from the presentation
    form (website) and the daily backup.
  • The intent of the archive is to capture all the
    required forms of the digital material in
    non-proprietary format.
  • Each digital object would have preservation
    metadata and a persistent ID.

16
Mass Storage System - Requirements
  • Initial capacity of 10 to 20 Terabytes (TB)
  • Extensible to 100, 200TB and beyond
  • Low management overhead
  • Information must survive migrations across
    software and platforms
  • History/audit trails required for each object
  • Mirroring to a remote cluster (e.g. a cluster in
    NB and one in Newark) to provide offsite backup.
  • Global name space across all RUL locations
  • Platforms required Windows 2000, Unix, Linux

17
(No Transcript)
18
Technologies and Standards
  • Persistent ID CNRI Handle System
  • OAI-PMH Protocol for metadata harvesting
  • METS Metadata Encoding and Transmission
    Standard
  • OpenURL
  • SCORM

19
Detailed Requirements
  • Ingest
  • Administration
  • Access
  • Data Management
  • Preservation
  • Storage
  • System Level

20
Progress To Date
  • Infrastructure
  • Commercial product discussions and quote from EMC
    for mass storage. Also examining ADIC and IBMs
    Storage Tank (an open source product).
  • CamdenBase directories/permissions standardized
    for transfer to systems.
  • Developed initial criteria for an RUL server
    registry
  • Architecture
  • Educating ourselves in various technologies 1)
    OAI-PMH, 2) CNRI Handle System, 3) SCORM, 4)
    Z39.50/YAZ, 5) METS, 6) OpenURL
  • Draft for requirements and architecture
  • Cross-repository search prototype
  • Downloaded Dspace (from MIT) - started evaluation
  • UVa (Fedora) visit planned for early March

21
Next Steps
  • Fedora visit UVa for half day tutorial
  • Continue reviewing and select mass storage system
  • Prepare interim communication package
  • High level architecture and requirements
  • Cross-repository searching prototype
  • Preliminary assessment of Dspace and Fedora
  • Communicate and Get Feedback
  • Begin more detailed evaluation of Dspace and
    Fedora
  • Produce architecture/functional specification
  • Develop prototype with sample content

22
Tasks and Timeline for DAWG-A
  • January, 2003 Requirements/Architecture
    document
  • February, 2003 Discussion, Feedback with RUL
    and RU
  • March May, 2003 Evaluation of candidate
    systems (Dspace, Fedora, et al)
  • June August, 2003 Select system and prototype
    sample content
  • September - November, 2003 Prototype-trial of
    multiple repositories

23
Tasks and Timeline for DAWG-I
  • December, 2002 - Determine requirements for mass
    storage system
  • December January, 2003 Transfer CamdenBase to
    Systems, test, and evaluate process
  • December January, 2003 - Research and evaluate
    possible mass storage products
  • March, 2003 - Recommend mass storage solution
  • March 2003 - Develop RUL server registry criteria
Write a Comment
User Comments (0)
About PowerShow.com