Architecting Extensible Digital Repository Services - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Architecting Extensible Digital Repository Services

Description:

Tufts Digital Library. VUE. Future Directions. A Brief History of Digital Collections at Tufts ... Artifact Image Library (Art History) Miscellaneous projects ... – PowerPoint PPT presentation

Number of Views:167
Avg rating:3.0/5.0
Slides: 36
Provided by: anoop
Category:

less

Transcript and Presenter's Notes

Title: Architecting Extensible Digital Repository Services


1
Architecting Extensible Digital Repository
Services
  • Robert Chavez, Robert Dockins, Anoop Kumar,
    Matthew Mcvey, Ranjani Saigal, Nikolai Schwertner
  • Tufts University, Medford, MA
  • Fedora Users Conference, Rutgers University, May
    13 2005

2
An Overview
  • Digital Collections at Tufts
  • Reasons for developing Tufts Digital Repository
    (TDR)
  • Some design requirements and goals
  • The TDR architecture and services
  • Applications that interface with TDR
  • Tufts Digital Library
  • VUE
  • Future Directions

3
A Brief History of Digital Collections at Tufts
  • Pre-existing Digital Projects/Libraries/Collection
    s
  • Perseus Digital Library
  • Tufts University Science Knowledgebase
    (TUSK-Medicine)
  • Artifact Image Library (Art History)
  • Miscellaneous projects
  • Crime and Punishment, Faculty Publications,
    Faculty Datasets, many and varied content
    management systems
  • Digital Collections and Archives (DCA)
  • steward of the University's permanently valuable
    digital records and collections
  • many and varied digital collections
  • university records

4
Why TDR?
  • Digital collections and materials are continually
    growing adding content in a variety of formats.
  • Original architectures and systems were not built
    to accommodate such expansion.
  • Original architectures and systems were not built
    to facilitate interoperability or sharing of
    resources.
  • Needed a university-wide digital repository that
    could manage the ever increasing content while
    continuing to service discipline specific needs
    and leveraging existing and new tools and
    services.
  • Need for DCA to support digital data warehouse
    services and digital archival storage services
    for digital content of enduring value.

5
Who?
  • Digital Collections and Archives (DCA), Academic
    Technology (AT)
  • partnered to create a digital repository and
    digital library application for managing content
    while supporting teaching and learning at the
    university.
  • Roles (a bit over-simplified)
  • DCA content developers, collection and deposit
    policy creators, managers of repository
  • AT content developers, applications and overall
    system architects and developers

6
Design Requirements
  • Persistence
  • Enforce unique persistent identifiers
  • Manage identifiers for multiple projects
  • Assurance that the data will be preserved and
    retrievable over time
  • Ingest
  • Enforce archival standards
  • Ability to incorporate appraisal
  • Automated ingest workflow
  • Management
  • Use of information packages to facilitate storage
    and dissemination
  • Incorporate content models
  • Rights/access management
  • Access/Interoperability
  • Digital resources should be accessible to
    multiple applications and systems
  • Authorization policies must be enforced
  • Scalability
  • (Re)Usability
  • Leverage existing and new tools and services

7
TDR Architecture
Interfacing Services
Caching Service
Naming Service
A
Application Interface
U
FedoraClient
A
P
Fedora Repository Service
Drop Box
Application Interface
U
Ingestion Service
Search Interface
U
P
Search Index
Search Service
Indexing Service
P - Data Provider A - Administrator U -
User Arrows represent flow of data
8
Services of TDR
9
Current System Architecture
10
TDL Application
  • How it all fits together, a working application
  • http//dl.tufts.edu

11
General TDL application search transaction process
TDL App Search Interface JSP
Search Service Oracle Query Builder Java App.
U
Search Index Main Index Oracle
Search Service Results Collation Java App.
TDL App Search Results Search Interface
Search Index XML index Oracle
Naming Service URN-PID resolution MySQL
Repository Service Object Dissemination Fedora
TDL App Disseminator Viewer JSP
U
12
TDL Architecture
  • Drop Box and Ingestion Service
  • Naming Service
  • Fedora Repository Service at Tufts
  • Indexing and Search Services
  • Interfacing Services

13
Drop Box and Ingestion Service
  • automate the process of preparing materials for
    ingest
  • validate materials before ingest
  • primarily for large-scale ingests
  • not an object factory (i.e., not a tool for
    building individual objects)

14
(No Transcript)
15
TDL Architecture
  • Drop Box and Ingestion Service
  • Naming Service
  • Fedora Repository Service
  • Indexing and Search Services
  • Interfacing Services

16
Naming Service
  • Assigns, reserves and resolves URNs
  • The URN has a very flexible structure that can be
    tailor made to suit the special needs of the
    particular naming convention.
  • Example namespace1namescape2namespace3object_
    id
  • Manages repositories
  • multiple production repositories, backup
    repositories, etc.
  • Tufts URN Formats examples
  • tuftsdcacentralMS10233.1345
  • Perseustext1999.04.0006
  • 97.5224.77-1729-47
  • URN Properties
  • Provides unique ID to objects deposited into
    repository
  • Service assures resolution to unique resource.
  • Implementation
  • MySQL, Java class, JSP Management console

17
Tufts Naming Service
18
TDL Architecture
  • Drop Box and Ingestion Service
  • Naming Service
  • Fedora Repository Service
  • Indexing and Search Service
  • Interfacing Services

19
Fedora Repository Service
  • Fedora met many of our critical needs
  • Modular nature of the repository service
  • Management of digital content over time
    (versioning, etc.)
  • Aggregation of mixed, possibly distributed, data
    into complex objects
  • The ability to specify multiple content
    disseminations of these objects
  • The ability to associate rights management
    schemes with these disseminations.

20
Fedora Repository Service, cont
  • Tufts Implementation Details
  • External data stores
  • Modeling behaviors and content
  • Piece of a larger architecture not out of the
    box solution
  • Tufts Repository Models/Policies
  • Fedora _at_ Tufts serves several purposes
  • Archival/institutional repository
  • Guarantee functional preservation
  • Data warehouse
  • Guarantee bitstream preservation
  • Active Repository
  • Active workspace constantly updated content (i.e
    faculty data sets, faculty pubs, content mapping)

21
Behavior Definitions
  • Atomic units sets of standardized behaviors
  • Building blocks of content models
  • Allow for flexible reuse of data
  • Contributes to inter-repository sharing of
    objects
  • Dissemination of standard output XML, plain
    text, binary format
  • Rendering/processing of disseminations is the
    responsibility of applications implemented over
    the repository.

22
Content Models
  • Unique content models built from content modeling
    components.
  • Digital Objects that subscribe to a given content
    model inherit all methods established by a
    particular behavior.
  • Digital objects can subscribe to content models
    that suit their type or class.
  • Functional not presentation specific

23
Implementation Challenges
  • Processing large (gt10MB) XML Documents
  • XML databases
  • Processing large images
  • Imaging servers
  • Streaming Media
  • GIS data
  • Modeling Collections
  • Advanced Searching
  • Shopping cart searching
  • Caching Disseminations

24
TDL Architecture
  • Drop Box and Ingestion Service
  • Naming Service
  • Fedora Repository Service
  • Indexing and Search Service
  • Interfacing Services

25
Indexing Search Service
  • Indexing
  • Digital objects piped through from ingestion
    service
  • Metadata index
  • Full-text index
  • Specialized XML index
  • Implementation
  • Java indexing application
  • Oracle database
  • Supported Types of Search
  • Basic full-text
  • Basic metadata
  • Advanced metadata
  • Accessing the service
  • HTTP GET/POST
  • SOAP

26
TDL Architecture
  • Drop Box and Ingestion Service
  • Naming Service
  • Fedora Repository Service at Tufts
  • Indexing Service and Search Engine
  • Interfacing Services

27
Interfacing Services
  • An important design requirement for TDR was to
    allow current digital library applications to
    easily interface with TDR and provide access to
    the content in the digital repository within
    their own environments in a seamless fashion.
  • Current applications like VUE can interface with
    this service to allow their tools to disseminate
    the content that resides in TDL
  • The service is being designed not only to support
    current applications but also to accommodate the
    needs of future yet-to-be-defined applications
    like course management systems, learning tools,
    portals etc.

28
Fedora OKI Bridge
29
Applications Accessing TDR Content
  • Tufts Digital Library Application
  • http//dl.tufts.edu/
  • Visual Understanding Environment (VUE)
  • http//vue.tccs.tufts.edu/

30
(No Transcript)
31
Future Directions
  • Revised search service (Zebra?)
  • XML database for metadata and XML objects (eXist)
  • Customization and enhancement to address a wide
    variety of needs (i.e. University Records).
  • Object factory a workbench for building certain
    classes of objects
  • Automated browsing service for Repository.
  • Authentication and authorization modules
  • Asset Definitions
  • Collection Modeling
  • Federation

32
Asset Definitions
  • The purpose of the Fedora Asset Definition is to
    define and expose content types and methods of
    objects/assets in a repository in a standard way.
    The goal is to facilitate access between
    applications and digital repositories, digital
    repositories and digital repositories, etc.
  • Some of the questions that we asked ourselves
    during our repository and application development
    helped us form the concept of an Asset
    Definition. For example
  • How can an application find out what are the
    objects/assets within a particular repository and
    how does one figure out how to refer to these
    objects?
  • If one has an object/asset in a repository, how
    does one describe it so that other applications
    can understand what they can do with it?

33
Asset Definitions, cont
  • getFullAssetDefintion
  • getPreview
  • getDescription
  • getFullView
  • getDefaultContent
  • getDescMetadata
  • getAdminMetadata
  • getThumbnail
  • getScreenSize
  • getMaxSize
  • getDynamicView

34
Collection Modeling
35
Collection Modeling
  • Object Relationships
  • Extend Fedora RDF to create collection networks
  • Recursive disseminators to track paths in the
    network
  • Facilitate access to sets of materials
  • Facilitate management of digital objects
  • Facilitate browsing of sets of materials
  • http//nikolai.tccs.tufts.edu1980/fedora/get/demo
    collectionAll/demoCollection/viewMembers/
Write a Comment
User Comments (0)
About PowerShow.com