Title: Fedora Selecting and Implementing an Open Source Software Digital Repository
1FedoraSelecting and Implementing an Open Source
Software Digital Repository
- Jon Dunn
- Digital Library ProgramIndiana University
- RLG Members Forum, December 12, 2003
2Outline
- What is a repository and why do we need it?
- Background on IU environment
- Background on Fedora
- Fedora Digital Object Model
- The Fedora Architecture
- Fedora use at IU EVIADA
- Future Fedora use
3Why a repository?
- Isnt what we have good enough?
- Web servers, delivery systems
- File servers
- Databases
- Hierarchical storage systems
- Why do libraries need repositories?
4A digital object is more than just a file!
Example Electronic Book
Metadata
Delivery page image files (JPEG)
Hi-res page image files (TIFF)
Text file (TEI/XML)
5A digital object is more than just a file!
Example Archival Collection
EAD Finding Aid
6DL Objects
- Digital library objects have many parts
- Metadata
- Descriptive, administrative, structural,
preservation, - Preservation/archival files (several)
- Delivery files (several)
- How do we keep them connected and organized?
- Now Good practice in file naming, directory
organization, project documentation -not
scalable! - Future Digital object repository
7Repository Purposes
- Access
- Web access to digital files and metadata
- Services/applications for searching, browsing,
transformation, etc. - Preservation
- Secure storage for digital files and metadata
- Services for integrity checking, migration,
conversion, etc.
8Data Persistence
- Key is migration
- Keeping the bits alive
- Physical media
- Logical media format
- Keeping the bits understandable
- File format
- Metadata
- Small pockets of digital content pose a problem
for migration
9DL Object Repository
Preservation version in MDSS
Repository System
Users and Applications Access and Management
Delivery version(s) on web server
Metadata records
10Motivation for a Digital Repositoryat Indiana
University
- Many pockets of digital content and metadata
- Difficult to sustain
- Variable tech support, replacement funding
- Harder to preserve, migrate data forward to new
software and hardware - Harder to budget for
- Difficult to build common services and
applications - Cross-collection search
- Standard interfaces for viewing and playing
content - Interfaces to course management and other IT
services - OAI data providers
- Preservation services (integrity checks, etc.)
11Not a New Model
- Digital Repository
- Common system for storing, managing, and
providing access to digital content and metadata - Integrated Library System
- Common system for storing, managing, and
providing access to MARC records
12Digital Repository vs. Institutional
Repository
- Digital repository
- Common storage for digital content and metadata
- Basic infrastructure component plumbing
- Institutional repository
- Often implies focus on one application
institutional content, research output - e.g. MIT DSpace
- capture, store, index, preserve, and
redistribute the intellectual output of a
universitys research faculty in digital formats
13BackgroundIU Digital Library Program
- Mission
- dedicated to the production, maintenance,
distribution, and preservation of a wide range of
high quality networked information resources for
scholars and students at Indiana University and
elsewhere
14IU Digital Library Program
- Established in 1997
- Collaborative venture
- University Libraries (IUL)
- University Information Technology Services (UITS)
- School of Library and Information Science (SLIS)
- School of Informatics
- Funding provided by Libraries and UITS
- University-wide responsibility 8 campuses
- Responsibility beyond just the Libraries
15IU Digital Library ProgramAreas of
Responsibility
- Digital conversion
- Metadata
- Usability / UI design
- Infrastructure
- Software development
- DL research
- Both direct involvement and consulting roles
16IU Digital Library Program Staff
- 12.5 full-time equivalent (FTE) permanent staff
- 3 librarians
- 9 professional staff IT, digital conversion,
UI/usability - 1 support staff (.5 FTE)
- 10 grant-funded IT staff
- Student staff, including graduate assistants and
interns from the School of Library and
Information Science and Computer Science
17Object Types at IU
- Books
- Manuscripts
- Photographs
- Art images
- Music audio
- Video
- Sheet music
- Musical score images
- Music notation files
- and more
18Questions In Repository Planning at IU
- Scope
- Just library?
- Museums and archives?
- All campuses?
- Other digital content
- Instructional (e.g. faculty materials in
OnCourse) - Business (PR, Athletics, etc.)
- Funding model
- Standards
- Minimum requirements for content formats and
metadata - Tools/services/applications
- What else is needed to make a repository
useful/usable for preservation and access?
19Repository Evaluation Criteria
- Flexibility
- Not a rigid data model
- Support for many media types, complex digital
objects - Not locked into one technology platform (OS,
database) - Extensibility
- Use of modern technologies
- Easy integration with other systems/tools
- Means of extension/modification
- Support for DL standards, particularly metadata
- Sustainability
- Supportability
- Cost
20Fedora
- FEDORA
- Flexible
- Extensible
- Digital
- Object and
- Repository
- Architecture
21Fedora - Background
- Began as CS research project at Cornell 1997-98
- Architecture
- Reference implementation
- UVa Libraries became interested 2000
- Trying to create a DL architecture
- No commercial solutions found
- Mellon-funded project 2001-2003
- Joint UVa/Cornell project
- Update technologies
- Make use of relational database
- Make more production-ready
- IU member of deployment group engaged in testing
22Fedora - Technical Environment
- Open Source software
- Written in Java
- OS Platforms
- Windows
- Linux / Unix
- Mac OS X (not yet officially supported)
- Database support
- MySQL
- McKoi
- Oracle8i , Oracle9i
23What does Fedora do?
- Manages files or references to files that make up
digital objects - Manages associations between objects and
interfaces - Invokes behaviors of objects
- Basic DL plumbing
24What does Fedora not do?
- Searching/browsing of metadata and content
- End-user UI for display/navigation of metadata
and content - Cataloging tools
- Preservation services
-
- Fedora is DL plumbing Not an out-of-the-box
complete DL system
25Fedora 1.2 Software Feature Set
- Open Fedora APIs
- Repository as web services
- Flexible Digital Object Model
- Content View objects as bundle of items (content
and metadata) - Service View objects as a set of service methods
(behaviors) - Extensible functionality by associating services
with objects - Repository System
- Core Services Management, Access/Search, OAI-PMH
- Storage XML object store relational db object
cache relational db object registry - Mediation - auto-dispatching to distributed web
services for content transformation - Auto-Indexing system metadata and DC record of
each object - HTTP Basic Authentication and Access Control
- Built-in disseminator services XSLT x-form,
image manipulation, xml-to-PDF - Content Versioning
- Automatic version control (saves version of
content/metadata when modified) - Enables date-time stamped API requests (see
object as it looked at a point in time)
26The Fedora Object Model
- PID persistent unique identifier
- Datastreams represent content or metadata
- System Metadata manage and track the object in
the system - Disseminator(s) a service for transforming or
presenting the object - Behavior Definition
- Behavior Mechanism
27Object Model Example Image Objects
- Two File Image Object
- Data
- Hi Resolution Version tif
- Low Resolution Version jpg
- MrSID File Image Object
- Data
- MrSID File
28Basic Image Interface Behavior Definitions
- getHighResolutionTIF
- getLowResolutionJPG
29Implementations Behavior Mechanisms
- Two File Image Object
- getHighResolutionTIF
- returns high resolution TIF
- getLowResolutionJPG
- returns low resolution JPG
- MrSID Image Object
- getHighResolutionTIF
- processes the MrSID file to return a high
resolution TIF file of the image - getLowResolutionJPG
- processes the MrSID file to return a low
resolution JPG of the image
30FEDORAs Interface Implementation
Behavior Definition Object
Data Object
Behavior Mechanism Object
31Fedora Architecture
32Client and Web Service Interactions
user
user
user
Client application
Server application
web browser
Client application
Fedora Service APIs
Fedora Repository System
Content Transform Service
Content Transform Service
External Service Dispatch
API
API
33Current Fedora Use at IU EVIADA
- EVIADA
- Ethnomusicological Video for Instruction and
Analysis Digital Archive (!) - Goals
- Digital archive of ethnomusicology field video
- Instructional tool
- Partnership with University of Michigan
- Funding from Andrew W. Mellon Foundation
34Current Fedora Use at IU EVIADA
- Complex objects
- Many versions of content
- Original analog video
- Digital Betacam tape
- Digital file master 50 Mbps MPEG-2
- Derivative files MPEG-1, QuickTime, Real, ???
- Many types of metadata
- Collection-level descriptive metadata
- Annotations event, scene, action
- Technical, preservation, digital provenance
- Using METSMODSMARC
35Current Fedora Use at IU EVIADA
- Fedora used to manage content and metadata
- Streaming video files will be redirected
content - Web application built with Java, Struts
framework, Oracle9i XDB - Web-based annotation tool
- Creates METS structmap and MODS records
36(No Transcript)
37Future Fedora Software Releases
December 2003 December 2004
- Fedora Object XML (FOXML)
- Internal storage format direct expression of
Fedora object model - Better support for relationships (kinship
metadata) - Better support for audit trail (event history)
- Format identifiers for dynamic service binding
- Shibboleth authentication
- Policy Enforcement
- XACML expression language
- Fedora policy enforcement module
- Web interface for easy content submission
- Batch object modification utility
- Administrative Reporting
- Object Event History (ABC/RDF disseminations)
- Better support for collections
- New ingest and export formats (METS1.3, DIDL)
38Future Fedora Development Proposals
- Digital Library in a Box
- Full-featured DL application with Fedora inside
- Optimized for common set of content types
- Fedora Power Server
- Integrity Management Tools
- Service and link liveness checker
- Fault Tolerance
- Mirroring and Replication
- Peer-to-peer interoperability features
- Repository clustering
- Load balancing
- Object Creation Tools
- Workflow applications based on content models
- Web interface for document/content submission
39Implementing Fedora at IU beyond EVIADA Next
Steps
- Define scope
- Define content, metadata standards
- Import existing content into Fedora
- Initial focus on images?
- Define and implement applications
- Example Common image search service
- Ongoing process
40Who should use Fedora?
- Now
- Willingness to do programming, development
- Willingness to be on the bleeding edge
- Sufficient IT / DL staff
- Interested in cooperating with others to define
best practices - Future Lower barriers to entry
41- Thanks to
- Corey Keith, Library of Congress
- Sandy Payette, Cornell University
- More information on Fedora
- www.fedora.info
- My contact information
- Jon Dunn, jwd_at_indiana.edu, 812-855-0953