Fedora Selecting and Implementing an Open Source Software Digital Repository

About This Presentation
Title:

Fedora Selecting and Implementing an Open Source Software Digital Repository

Description:

Digital library 'objects' have many parts. Metadata ... Mediation - auto-dispatching to distributed web services for content transformation ... –

Number of Views:127
Avg rating:3.0/5.0
Slides: 42
Provided by: william213
Learn more at: http://worldcat.org
Category:

less

Transcript and Presenter's Notes

Title: Fedora Selecting and Implementing an Open Source Software Digital Repository


1
FedoraSelecting and Implementing an Open Source
Software Digital Repository
  • Jon Dunn
  • Digital Library ProgramIndiana University
  • RLG Members Forum, December 12, 2003

2
Outline
  • What is a repository and why do we need it?
  • Background on IU environment
  • Background on Fedora
  • Fedora Digital Object Model
  • The Fedora Architecture
  • Fedora use at IU EVIADA
  • Future Fedora use

3
Why a repository?
  • Isnt what we have good enough?
  • Web servers, delivery systems
  • File servers
  • Databases
  • Hierarchical storage systems
  • Why do libraries need repositories?

4
A digital object is more than just a file!
Example Electronic Book
Metadata
Delivery page image files (JPEG)
Hi-res page image files (TIFF)
Text file (TEI/XML)
5
A digital object is more than just a file!
Example Archival Collection
EAD Finding Aid
6
DL Objects
  • Digital library objects have many parts
  • Metadata
  • Descriptive, administrative, structural,
    preservation,
  • Preservation/archival files (several)
  • Delivery files (several)
  • How do we keep them connected and organized?
  • Now Good practice in file naming, directory
    organization, project documentation -not
    scalable!
  • Future Digital object repository

7
Repository Purposes
  • Access
  • Web access to digital files and metadata
  • Services/applications for searching, browsing,
    transformation, etc.
  • Preservation
  • Secure storage for digital files and metadata
  • Services for integrity checking, migration,
    conversion, etc.

8
Data Persistence
  • Key is migration
  • Keeping the bits alive
  • Physical media
  • Logical media format
  • Keeping the bits understandable
  • File format
  • Metadata
  • Small pockets of digital content pose a problem
    for migration

9
DL Object Repository
Preservation version in MDSS
Repository System
Users and Applications Access and Management
Delivery version(s) on web server
Metadata records
10
Motivation for a Digital Repositoryat Indiana
University
  • Many pockets of digital content and metadata
  • Difficult to sustain
  • Variable tech support, replacement funding
  • Harder to preserve, migrate data forward to new
    software and hardware
  • Harder to budget for
  • Difficult to build common services and
    applications
  • Cross-collection search
  • Standard interfaces for viewing and playing
    content
  • Interfaces to course management and other IT
    services
  • OAI data providers
  • Preservation services (integrity checks, etc.)

11
Not a New Model
  • Digital Repository
  • Common system for storing, managing, and
    providing access to digital content and metadata
  • Integrated Library System
  • Common system for storing, managing, and
    providing access to MARC records

12
Digital Repository vs. Institutional
Repository
  • Digital repository
  • Common storage for digital content and metadata
  • Basic infrastructure component plumbing
  • Institutional repository
  • Often implies focus on one application
    institutional content, research output
  • e.g. MIT DSpace
  • capture, store, index, preserve, and
    redistribute the intellectual output of a
    universitys research faculty in digital formats

13
BackgroundIU Digital Library Program
  • Mission
  • dedicated to the production, maintenance,
    distribution, and preservation of a wide range of
    high quality networked information resources for
    scholars and students at Indiana University and
    elsewhere

14
IU Digital Library Program
  • Established in 1997
  • Collaborative venture
  • University Libraries (IUL)
  • University Information Technology Services (UITS)
  • School of Library and Information Science (SLIS)
  • School of Informatics
  • Funding provided by Libraries and UITS
  • University-wide responsibility 8 campuses
  • Responsibility beyond just the Libraries

15
IU Digital Library ProgramAreas of
Responsibility
  • Digital conversion
  • Metadata
  • Usability / UI design
  • Infrastructure
  • Software development
  • DL research
  • Both direct involvement and consulting roles

16
IU Digital Library Program Staff
  • 12.5 full-time equivalent (FTE) permanent staff
  • 3 librarians
  • 9 professional staff IT, digital conversion,
    UI/usability
  • 1 support staff (.5 FTE)
  • 10 grant-funded IT staff
  • Student staff, including graduate assistants and
    interns from the School of Library and
    Information Science and Computer Science

17
Object Types at IU
  • Books
  • Manuscripts
  • Photographs
  • Art images
  • Music audio
  • Video
  • Sheet music
  • Musical score images
  • Music notation files
  • and more

18
Questions In Repository Planning at IU
  • Scope
  • Just library?
  • Museums and archives?
  • All campuses?
  • Other digital content
  • Instructional (e.g. faculty materials in
    OnCourse)
  • Business (PR, Athletics, etc.)
  • Funding model
  • Standards
  • Minimum requirements for content formats and
    metadata
  • Tools/services/applications
  • What else is needed to make a repository
    useful/usable for preservation and access?

19
Repository Evaluation Criteria
  • Flexibility
  • Not a rigid data model
  • Support for many media types, complex digital
    objects
  • Not locked into one technology platform (OS,
    database)
  • Extensibility
  • Use of modern technologies
  • Easy integration with other systems/tools
  • Means of extension/modification
  • Support for DL standards, particularly metadata
  • Sustainability
  • Supportability
  • Cost

20
Fedora
  • FEDORA
  • Flexible
  • Extensible
  • Digital
  • Object and
  • Repository
  • Architecture

21
Fedora - Background
  • Began as CS research project at Cornell 1997-98
  • Architecture
  • Reference implementation
  • UVa Libraries became interested 2000
  • Trying to create a DL architecture
  • No commercial solutions found
  • Mellon-funded project 2001-2003
  • Joint UVa/Cornell project
  • Update technologies
  • Make use of relational database
  • Make more production-ready
  • IU member of deployment group engaged in testing

22
Fedora - Technical Environment
  • Open Source software
  • Written in Java
  • OS Platforms
  • Windows
  • Linux / Unix
  • Mac OS X (not yet officially supported)
  • Database support
  • MySQL
  • McKoi
  • Oracle8i , Oracle9i

23
What does Fedora do?
  • Manages files or references to files that make up
    digital objects
  • Manages associations between objects and
    interfaces
  • Invokes behaviors of objects
  • Basic DL plumbing

24
What does Fedora not do?
  • Searching/browsing of metadata and content
  • End-user UI for display/navigation of metadata
    and content
  • Cataloging tools
  • Preservation services
  • Fedora is DL plumbing Not an out-of-the-box
    complete DL system

25
Fedora 1.2 Software Feature Set
  • Open Fedora APIs
  • Repository as web services
  • Flexible Digital Object Model
  • Content View objects as bundle of items (content
    and metadata)
  • Service View objects as a set of service methods
    (behaviors)
  • Extensible functionality by associating services
    with objects
  • Repository System
  • Core Services Management, Access/Search, OAI-PMH
  • Storage XML object store relational db object
    cache relational db object registry
  • Mediation - auto-dispatching to distributed web
    services for content transformation
  • Auto-Indexing system metadata and DC record of
    each object
  • HTTP Basic Authentication and Access Control
  • Built-in disseminator services XSLT x-form,
    image manipulation, xml-to-PDF
  • Content Versioning
  • Automatic version control (saves version of
    content/metadata when modified)
  • Enables date-time stamped API requests (see
    object as it looked at a point in time)

26
The Fedora Object Model
  • PID persistent unique identifier
  • Datastreams represent content or metadata
  • System Metadata manage and track the object in
    the system
  • Disseminator(s) a service for transforming or
    presenting the object
  • Behavior Definition
  • Behavior Mechanism

27
Object Model Example Image Objects
  • Two File Image Object
  • Data
  • Hi Resolution Version tif
  • Low Resolution Version jpg
  • MrSID File Image Object
  • Data
  • MrSID File

28
Basic Image Interface Behavior Definitions
  • getHighResolutionTIF
  • getLowResolutionJPG

29
Implementations Behavior Mechanisms
  • Two File Image Object
  • getHighResolutionTIF
  • returns high resolution TIF
  • getLowResolutionJPG
  • returns low resolution JPG
  • MrSID Image Object
  • getHighResolutionTIF
  • processes the MrSID file to return a high
    resolution TIF file of the image
  • getLowResolutionJPG
  • processes the MrSID file to return a low
    resolution JPG of the image

30
FEDORAs Interface Implementation
Behavior Definition Object
Data Object
Behavior Mechanism Object
31
Fedora Architecture
32
Client and Web Service Interactions
user
user
user
Client application
Server application
web browser
Client application
Fedora Service APIs
Fedora Repository System
Content Transform Service
Content Transform Service
External Service Dispatch
API
API
33
Current Fedora Use at IU EVIADA
  • EVIADA
  • Ethnomusicological Video for Instruction and
    Analysis Digital Archive (!)
  • Goals
  • Digital archive of ethnomusicology field video
  • Instructional tool
  • Partnership with University of Michigan
  • Funding from Andrew W. Mellon Foundation

34
Current Fedora Use at IU EVIADA
  • Complex objects
  • Many versions of content
  • Original analog video
  • Digital Betacam tape
  • Digital file master 50 Mbps MPEG-2
  • Derivative files MPEG-1, QuickTime, Real, ???
  • Many types of metadata
  • Collection-level descriptive metadata
  • Annotations event, scene, action
  • Technical, preservation, digital provenance
  • Using METSMODSMARC

35
Current Fedora Use at IU EVIADA
  • Fedora used to manage content and metadata
  • Streaming video files will be redirected
    content
  • Web application built with Java, Struts
    framework, Oracle9i XDB
  • Web-based annotation tool
  • Creates METS structmap and MODS records

36
(No Transcript)
37
Future Fedora Software Releases
December 2003 December 2004
  • Fedora Object XML (FOXML)
  • Internal storage format direct expression of
    Fedora object model
  • Better support for relationships (kinship
    metadata)
  • Better support for audit trail (event history)
  • Format identifiers for dynamic service binding
  • Shibboleth authentication
  • Policy Enforcement
  • XACML expression language
  • Fedora policy enforcement module
  • Web interface for easy content submission
  • Batch object modification utility
  • Administrative Reporting
  • Object Event History (ABC/RDF disseminations)
  • Better support for collections
  • New ingest and export formats (METS1.3, DIDL)

38
Future Fedora Development Proposals
  • Digital Library in a Box
  • Full-featured DL application with Fedora inside
  • Optimized for common set of content types
  • Fedora Power Server
  • Integrity Management Tools
  • Service and link liveness checker
  • Fault Tolerance
  • Mirroring and Replication
  • Peer-to-peer interoperability features
  • Repository clustering
  • Load balancing
  • Object Creation Tools
  • Workflow applications based on content models
  • Web interface for document/content submission

39
Implementing Fedora at IU beyond EVIADA Next
Steps
  • Define scope
  • Define content, metadata standards
  • Import existing content into Fedora
  • Initial focus on images?
  • Define and implement applications
  • Example Common image search service
  • Ongoing process

40
Who should use Fedora?
  • Now
  • Willingness to do programming, development
  • Willingness to be on the bleeding edge
  • Sufficient IT / DL staff
  • Interested in cooperating with others to define
    best practices
  • Future Lower barriers to entry

41
  • Thanks to
  • Corey Keith, Library of Congress
  • Sandy Payette, Cornell University
  • More information on Fedora
  • www.fedora.info
  • My contact information
  • Jon Dunn, jwd_at_indiana.edu, 812-855-0953
Write a Comment
User Comments (0)
About PowerShow.com