Title: The Fedora Project March 10, 2003
1The Fedora Project March 10, 2003
- Sandy Payette
- Cornell Information Science
2Motivation
- The Problem of Complex Content
3Digital Library Contentnot just documents ...
- Complex, compound, dynamic objects
4Key Research Questions
- How can clients interact with heterogeneous
collections of complex objects in a simple and
interoperable manner? - How can complex objects be designed to be both
generic and genre-specific at the same time? - How can we associate services and tools with
objects to provide different presentations or
transformations of the object content? - How can we associate specialized, fine-grained
access control policies with specific objects, or
with groups of objects? - How can we facilitate the long-term management
and preservation of complex objects with
dependencies on distributed content and services?
5Shortcomings of commercial digital library
products
- Narrow focus on specific media formats (e.g.
image databases, document management) - Fail to effectively address interrelationships
among digital entities - Fail to address interoperability no open
interfaces to facilitate sharing of services no
standard protocols for cross-system
interoperability - Fail to provide facilities for managing programs
and tools that are integral to delivering digital
content. - Not extensible does not enable easy integration
of new tools and services - Do not address fine-grained access control and
preservation issues.
6The Flexible Extensible Digital Object Repository
Architecture (FEDORA)
- DARPA and NSF-funded research at Cornell
(1997-present) - CORBA-based reference implementation
(Payette/Lagoze) - Extensive interoperability testing (with
Arms/Blanchi/Overly) - Policy Enforcement (Payette/Schneider)
- Interpreted and re-implemented at U of Virginia
(1999-) - Simple web-oriented implementation, focused on
access to collections - Java servlet and relational db
- Testbed of 10,000,000 objects with performance
metrics (1999-2001) - Mellon-Funded FEDORA Software(2002-)
- University of Virginia and Cornell - joint
development - Open source
- Web services and XML
- Mediation of distributed services
- Preservation focus
7The Fedora Architecture
- Digital Object Model
- The Repository
- Web Services
8FEDORA Basic Object Architecture
- Digital Object Model
- Container to aggregate digital content of any
type - Data or metadata
- Local or distributed
- Behavior definitions (like abstract interfaces)
- Hooks to external services
- Enables multiple disseminations of content
9Digital Object Model Functional View
Application
10Digital Object Model Architectural View
Globally unique persistent id
Persistent ID (
PID
)
Public view access methods for obtaining
disseminations of digital object content
Disseminators
Internal view metadata necessary to manage the
object
System
Metadata
Datastreams
Protected view content that makes up the
basis of the object
11Digital Object Model Service Relationships
12FEDORA Basic Repository Architecture
- Repository System
- Object Management
- Lifecycle (Ingest/create ? Store ? Delete ?
Approve ? Purge) - Validation
- PID Generation
- Version management
- Access Control
- Preservation support
- Object Access
- Object Dissemination
- Object Reflection
- Service Mediation
13FedoraA Programmers View
- Understanding the system implementation
- Web Services
- Server Design
14What is a Web Service?
- A distributed application that runs over the
internet. - An addressable network endpoint which receives
structured messages returns structured responses. - A web application that publishes an open
interface through which clients can send requests
and received responses.
15How is this different from plain old web
applications?
- Formally defined API (application programming
interface) defines a set of abstract operations
for a web service - Published bindings for client to run operations
- Standard protocol for invoking operations on the
service. - XML as standard means of encoding service
requests and responses.
16Why are Web Services important?
- Interoperability
- Web applications can interact and build upon each
other - Data is transferred in an interoperable manner
(HTTP) - Data is encoded in an interoperable format (XML)
- Works in decentralized, distributed,
operating-system independent environment. - Standards-oriented
- Means to expose complex operations with rich data
typing (via XML Schema language typing) - Ease of integrating distributed systems via the
Web - W3C effort to develop this service architecture
17How are Web Services Implemented?
- Simple Object Access Protocol (SOAP)
- SOAP is a messaging protocol that can run over
different transport protocols (e.g., HTTP, SMTP) - Operation oriented (send a request to a end
point) - Like CORBA, RMI, DCOMbut for Web and simpler
- Application APIs can be defined and published
using the Web Service Description Language (WSDL) - Requests and responses sent as XML messages
- Supports simple and complex data typing in
requests and responses - Supports transmission of binary data within
requests or response packages
18How are Web Services Implemented?
- REST (Representational State Transfer)
- URI HTTP XML
- URI/resource driven message built into a URL
- HTTP GET or POST
- Response is XML data
- Issues
- Not a standard, but a style of doing web apps
arguably it just gives a fancy name to how lots
of people do applications on the web by default
nothing really new here just argues to do things
the way we have been, maybe a little more
standard by using XML. - Fragile service definition URLs change
- No data typing on requests
- Limited ability to transmit complex requests on
URL - W3C behind SOAP one strong voice out there for
REST (Prescod).
19Example of Web Service using SOAP
My Application
SOAP Request (XML)
Google Web Service
SOAP/HTTP
SOAP/HTTP
doSpellingSuggestion(payet)
payette
SOAP Response (XML)
20XML SOAP Request
lt?xml version"1.0" encoding"UTF-8"?gt SOAP-ENVEn
velope xmlnsSOAP-ENVhttp//schemas.xmlsoap.org/s
oap/envelope/
xmlnsxsi"http//www.w3.org/1999/XMLSchema-inst
ance
xmlnsxsd"http//www.w3.org/1999/XMLSchema"gt
ltSOAP-ENVBodygt ltmdoSpellingSuggestion
xmlnsm"urnGoogleSearch"gt ltkeygt/e325JlNPASJult/k
eygt ltphrasegtpayetlt/phrasegt lt/mdoSpellingSuggest
iongt lt/SOAP-ENVBodygt lt/SOAP-ENVEnvelopegt
21XML SOAP Response
lt?xml version"1.0" encoding"UTF-8"?gt ltSOAP-ENVE
nvelope xmlnsSOAP-ENV"http//schemas.xmlsoap.org
/soap/envelope/"
xmlnsxsi"http//www.w3.org/1999/XMLSche
ma-instance"
xmlnsxsd"http//www.w3.org/1999/XMLSchema"gt
ltSOAP-ENVBodygt ltns1doSpellingSugges
tionResponse xmlnsns1"urnGoogleSearch"
SOAP-ENVencodingStyle"http//sc
hemas.xmlsoap.org/soap/encoding/"gt
ltreturn xsitype"xsdstring"gtpa
yettelt/returngt lt/ns1doSpellingSuggestionRespons
egt lt/SOAP-ENVBodygt lt/SOAP-ENVEnvelope
gt
22Fedora and Web Services
- Fedora Repository system exposed as two related
Web services - Access (API-A) and Management (API-M)
- Both described using WSDL
- Both have SOAP and HTTP bindings
- Back-end services
- Digital object behaviors implemented as linkages
to other distributed web services - Service binding metadata (WSDL) stored in special
Fedora objects. - Fedora Repository system acts a mediator to these
services.
23Fedora Web Services View
24Fedora Server Design
- 3-Tiered Architecture
- Modular Extensible
- System Diagram
25Server Design 3 Layers
26System Diagram
27Fedora Implementation Technologies
- Fedora Web Services Layer
- Apache Axis for SOAP over HTTP
- Apache Tomcat 4.1
- Core Repository System
- Sun Java J2SDK1.4
- Xerces 2-2.0.2 for XML parsing and validation
- Saxon 6.5 for XSLT transformation
- Schematron 1.5 for validation
- MySQL-2.23.52 and Mckoi relational database
- Deployment Platforms
- Windows 2000, NT, XP
- Solaris
- Linux
28DEMO
29Deployment Partners
- Los Alamos National Laboratory Research Library
- Library of Congress Motion Picture and Recorded
Sound Division - Indiana University Digital Library group
- Kings College London Humanities Computing
- NYU Humanities Computing
- Northwestern University Academic Computing
- Oxford Oxford Digital Library and The Refugee
Studies Center - Tufts Digital Collections and Archives Department