Title: WMS, RUcore and Fedora Mini-Conference
1WMS, RUcore and FedoraMini-Conference
- Wednesday Morning
- Greetings and Introduction Grace
- Collaboration and Architecture Overview Ron
- RUcore Data Model Grace
- WMS Tutorial - Mary Beth, Kalaivani, Sharon
- Lunch (box lunch in conference room)
- Wednesday Afternoon
- Hands-On Experience Mary Beth, Kalaivani,
Sharon - Feedback from WMS sessions
- Collaboration Discussion All
2WMS, RUcore and FedoraMini-Conference
- Thursday Morning
- Brief Recap Ron
- WMS architecture - Yang
- User Interface, Search engine and collections -
Chad - Management services - Ron
- Lunch (on your own)
- Thursday Afternoon
- Further collaboration discussion
- Wrap-up and next steps
3Possible Areas for Collaboration
- Sharing Content
- Exchange, harvesting
- Federated Searching
- Fedora Experimentation
- Relationship services
- Directory ingest
- Use of xacml
- Very large files
- Event management
- Data Registries
- File formats
- Content Models
- Software Development
- Requirements
- Sharing software
- Joint development
- Life cycle support
4Fedora Enterprise Architecture Major Goals
2007 thru 2009
- Paradigm Focus
- Scholarly Communication Collaboration
- Libraries and Museums Access and Publishing
- Infinite Scalability
- Size of and number of objects
- Capacity and throughput (e.g. ingest 20TB a day)
- Life cycle preservation
- Trust Model
- Transactions - Begin/Commit
- Transactions across repositories
- Enable graph based objects (compound objects)
5Persistence and Layered Architecture
Applications
Data
6Layered Architecture - RUcore
Applications and Portals (NJDH, RUcore, workflow,
etc)
Middleware Services (searching, alerting,
integrity, etc)
Fedora Core Framework
FOXML Datastreams
7RUcore - How it Works
RUCORE Portal
NJ Digital Highway
Custom Portals
Dissertations
User, Collection, Preservation Services
Workflow Management System
Fedora Repository Service
Faculty Submissions
Digital Object Repository (Fedora)
Digital Object Ingest
7
8Simple and Compound Objects
Compound Object - Graph Model
Article Object (Simple)
Persistent ID
IsAnnotationOf
article
Metadata
Behaviors (Disseminators)
Data streams
IsAnnotationOf
SMAP1 StrMap (TOC)
A2
DJVU1- presentation
PDF1 - presentation
XML1 OCR text
A1
ARCH1- Archival master (tiffs of each page)
9Collections In RUcore
- A digital collection is simply a grouping of
objects according to some criteria. - Types of digital collections in RUcore
- Explicit A digital collection whose object
membership is specified explicitly within the
descriptive metadata. - Dynamic A digital collection of objects which
are grouped according to user specified criteria.
10Using Explicit and Dynamic Collections
- Personal Collections
- Department Collections
- Including Faculty Personal collections (e.g.
preprints, reports, etc) - ETDs for the Department
- Centers and Grant Funded Research
- New Jersey Digital Highway
- Center for Remote Sensing and Spatial Analysis
(CRRSA) Access and preservation of GIS
resources related to New Jersey
11RUcore Collection Architecture
Circles collection objects Rectangles content
objects
RUCORE
NJDH (Grant Project)
Solid line explicit membership Dashed line
dynamic membership
Rutgers University Libraries
Rutgers University
Eagleton Archive
Centers/ Departments
General Collections
Special Collections
11
12Collection Architecture - Lefty
RUCORE
NWestern (1782.1)
RUL (1782.1)
Center/Dept Collections
RU ETDs
FacColl One
FacColl Two
Dept. ETDs
- http//hdl.rutgers.edu/1782.1/NorthwesternU.colle
ction.165 - http//hdl.rutgers.edu/1782.1/PennStateUniv.colle
ction.164 - http//hdl.rutgers.edu/1782.1/PrincetonUniv.colle
ction.166
Solid line explicit membership Dashed line
dynamic membership
12
13Management Services(incl. Collection and
Preservation)
- Management
- Super-user editing (handles, datastreams,
metadata) - Purging an object
- Export (foxml, mets)
- Collections
- Collection administration
- Statistics
- Preservation
- Creation of archival master
- Creation of persistent ID (handle)
- Checksum verification
14Management Services
- Access to individual objects is provided by a
special search portal using the same indexes as
the public search but providing Fedora API
management functionality - Viewing, Exporting and/or purging objects
- Editing metadata, adding/changing datastreams
- Validating objects, checking audit trails,
testing signatures - There is a special Fedora database search
allowing access to all objects whether or not
they are members of an active collection.
15Collection Administration
- Edit collection information
- Add parents to a collection
- Add dynamic search terms to a collection
- Generate an XML structure map
16Collections - Indexing and Ingest
- Active Collections may be indexed individually or
all together at any time, though this is
typically done using a nightly cron job. - Ingest is done through the management API and is
typically called by the WMS program, but may be
called directly from the management interface as
well.
17Preservation - Alerting
- All Fedora API management functions trigger
alerting messages, are stored in the Fedora audit
trails, and are registered in the collection
statistics database. - Statistics are kept for all object downloads as
well as editing activities and may be accessed at
collection or repository levels.
18Preservation PIDs and Handles
- Handles are normally created as part of the
ingest process, but may be manually created,
changed, or purged on a per object basis using
the management interface. - Three global registries for RU
- 1782.1 Rutgers University Libraries
- 1782.2 Rutgers University
- 1782.3 NJ Digital Highway
19Object Integrity Verifying Checksums
- Archival datastreams have SHA1 checksums, created
during the WMS pipeline process, as well as
filesize data stored in the technical metadata
section of each objects. - SHA1 checksums are tested using the sha1sum
checking algorithm in conjunction with a
management function that polls the repository and
extracts sha1sum character strings from the
techMD of individual objects or groups of
objects. It has a calendar feature that allows it
to be run as a cron on a subset of objects for
each day of the week with result reports emailed
to appropriate data managers.
20Certification as a Trusted Repository
- Ultimately, we want to become certified as a
trusted repository. There are four major areas
A. Organization
B. Repository Functions
Repository actively monitors Archival Information
Package Integrity.
Repository staff have skills appropriate to their
duties.
C. Designated Community
D. Technologies
Repository has technologies to monitor security.
Repository defines its Designated Community
- RLG/NARA draft An Audit Checklist for the
Certification of Trusted Digital Repositories
21Preservation Services Architecture
Preservation Portal
Preservation Services
. . .
Alerting
Migration
Statistics
Monitoring
Event Messaging
Preservation Integrity
Preservation Monitoring
Fedora Repository Service
Content Models
Digital Object Repository
Format Registry
Fedora Service Framework
21
22Content Models(Content Model Dissemination
Architecture CMDA)
- The CM object specifies constraints on the
digital object (DO) - MIME type and format
- Min/max of number of datastreams
- Whether multiple datastreams are ordered
- The CM is used to determine runtime behavior
- On ingest, Fedora validates DO based on CM
constraints - Disseminators are not bound into the DO
- Run time binding occurs through the CM object and
the rels-ext datastream - The CM can point to a format registry
23Content Models, Formats, and Disseminators
23
24Events and Outcomes
- An event is an
- . . . action that involves at least one object,
agent, and/or rights entity (PREMIS). - . . . occurrence that is significant to the
performance of a task - Event outcome a situation or state that follows
an event and is a result of the event.
25Fedora Event Management
- Generic Framework
- Events can have messages which are associated
with all types of services (preservation,
collection, user, etc) - Messages represent events with actions and
outcomes - Fedora will provide a middle-ware messaging
solution based on open-source Java Messaging
Service (JMS) - Fedora Working Group Focus
- Preservation events are atomic (i.e. associated
with a Fedora API) - The event message will be based on the PREMIS
event entity - Initial types ingest, delete, modify,
fixityCheck
26The Event Message
- Event message structure
- The message payload will be xml-based and use the
PREMIS event entity semantic units - Global identifiers (URIs) will be used for event
type and outcome - An example might look like the following
lteventgt lteventIdentifiergt lteventIdentifierTypegtRu
core eventlt/eventIdentifierTypegt lteventIdentifier
Valuegt30169lt/eventIdentifierValuegt lt/eventIdentifi
ergt lteventTypegtinfopremis/preservation/event/inge
stlteventTypegt lteventDateTimegt2006-07-16T192030lt/
eventDateTimegt lteventDetailgt(to be used for
general information)lt/eventDetailgt lteventOutcomeIn
formationgt lteventOutcomegtinfopremis/preservation/
outcome/successlt/eventOutcomegt lteventOutcomeDetail
gt(more text)lt/eventOutcomeDetailgt lt/eventOutcomeIn
formationgt ltlinkingAgentIdentifiergtrutgers-lib200
lt/linkingAgentIdentifiergt ltlinkingAgentIdentifiergt
rutgers-lib400lt/linkingAgentIdentifiergt ltlinkingO
bjectIdentifiergtrutgers-lib4291lt/linkingObjectIde
ntifiergt lt/eventgt
27Event Management - Ingest(Using the
publisher/subscriber model)
User Input
JMS Topic Queue
lteventTypegtingestltgt
lteventTypegtdeleteltgt
lteventTypegt
lteventTypegt
Workflow Management System
lteventTypegt
Digital Object Repository (Fedora)
Digital Object Ingest