Title: A repository case study: The University of Hull
1A repository case studyThe University of Hull
- RSP Fedora training days
- 22-23 January 2009
- Richard Green
- r.green_at_hull.ac.uk
2A Fedora case study
- The material that follows describes the
development of the Institutional Repository and
associated services at the University of Hull. - The repository uses Fedora (currently v2.2.3) and
a ? marks places where Fedoras features have
been particularly useful.
3Question 1
- What is a repository?
- What should a repository do?
4Wikipedia a repository
- A repository is a place where data are stored
and maintained for future retrieval. A repository
can be - a place where data are stored
- a place where specifically digital data are
stored - a site where eprints are located
- a place where multiple databases or files are
located for distribution over a network, - a computer location that is directly accessible
to the user without having to travel across a
network. - Well, what about institutional repository then?
5An institutional repository?
- What is an institutional repository?
- DSpace? EPrints? Fedora? ....?
- a showcase for intellectual output?
- a passive part of a persons workflow?
- an active part of a persons workflow?
- a records management system?
- a preservation system?
- ?????
6All at the same time?
- Firstly, the notion that one 'institutional
repository' should hold all of a university's
e-objects is an absurd one, and generally
recognized by my audiences as soon as I say it.
The present state of software does not support
such a scheme, nor are the characteristics of the
objects anywhere near uniform. A great deal of
time and money is wasted by people who haven't
yet realized this simple fact. A university needs
several e-repositories or e-libraries,
whatever you call them. - Arthur Sale
- Professor of Computing (Research), University of
Tasmania - JISC mail list posting, 14 January 2006
7All at the same time?
- Institutional repositories are a set of
services that a university offers to the members
of its community for the management and
dissemination of digital materials created by the
institution and its community members. - Clifford Lynch
- Director, Coalition for Networked Information
- Institutional Repositories Essential
Infrastructure for Scholarship in the Digital Age
(2003)
8All at the same time?
- The repositories landscape is wide and hugely
varied with quite extreme views of what does and
does not, should and should not, (could and could
not?),constitute an institutional repository
IR. - For better or for worse, Hull has taken the view
that its Institutional Repository should be
content agnostic effectively the Cliff Lynch
view.
9The University of Hull IR
- Hull decided to take the IR upstream and offer
it to users as part of their workflow prior to
any sort of publication a My Repository space - Repository envisaged as
- Web Services based
- wide range of content
- IR to be a central, infrastructural service
interacting with other core services
10IR as part of an infrastructure
11What do we think they want?
12The JISC-funded RepoMMan Project
- To develop a simple user interface to My
Repository - Web Services, orchestrated by BPEL (Business
Process Execution Language), Fedora underneath - (Fedora can offer the web-services ?, the content
agnosticism ?, and scalability ?) - Browser-based UI
- after experimentation, this was developed using
FLEX
13Involving the user
- User needs analysis what do people want (not,
what do we think they want?) - user researcher, member of Learning Teaching
teams, administrator, postgraduate, (potentially)
undergraduate, ... - Interviews followed by on-line survey first at
Hull, then more widespread
14Question 2
- What might a user want to get from My
Repository? - What might a user want to put into My
Repository?
15What the user wants from my repository
- we take in as a sine qua non that a personal
repository interface should not make it difficult
to do something that is currently achieved easily - the repository interface must allow structuring
of a user's personal storage space and have the
capacity to hold potentially large numbers of
objects, possibly of a range of differing types,
for each user ? - the repository should provide an easily usable
versioning facility (it must be easy to version a
file and to revert to an earlier version) ? - the repository should allow sharing of a private
document with a closed group of collaborators and
should provide some sort of locking facility so
that conflicting revisions cannot occur - the repository must make public exposure of
content easy and controllable, taking account of
digital rights issues as part of that process - Green R (2005) R-D3 Report on research user
requirements on-line survey
16What the user wants from my repository
- SAMP
- Storage
- (safe, backed up regularly)
- Access
- (easy and from anywhere they have a browser)
- Managed
- (full version control) ?
- Preservation
- (to know it is there when they want it short-
and possibly long-term) ?
17What the user wants in my repository
- Document files (for example .doc .rtf/rtfd .pdf
.xsd .ps) - Image files (for example .jpg/jpeg .gif .png .psd
.tif/tiff .eps) - Audio files (for example .wav .mp3 .aac)
- Video files (for example .wmv .avi .rm .mpg (and
its variants)) - Spreadsheet files (for example .xls .xsc)
- Statistics files (for example from a package like
SPSS) - Diagrams or CAD (for example from packages such
as Visio or AutoCAD) - Database files (for example SQL, MySQL, Oracle or
Access files) - Presentation files (for example PowerPoint files)
- Web pages
- Simple text files (this would include .txt and
.XML files, for example) - Archive formats (for example Zip or Stuffit
files) - Specialist text formats (for example from LaTeX)
- Source code and binaries
- ...
- Green R (2005) R-D3 Report on research user
requirements on-line survey
18The RepoMMan tool
19The RepoMMan tool
- Left hand side browses local computer
- Right hand side shows the repository represented
as a file structure. In fact folders are
digital collection objects and files are
digital objects - The large arrows provide up/download
20The RepoMMan tool
- (Re-) uploading an object creates a version
- Double clicking a file (object) accesses past
versions
21The RepoMMan tool
- The tool generates metadata for text objects
(.doc, .pdf, .html, .txt) on demand using local
data (web services etc) and Data Fountains iVia
metadata software
Data Fountains See http//dfnsdl.ucr.edu
22The system
23Three tier stack
- Model View Controller layer providing user
interface - BPEL orchestrating Web Services (Fedora and
other) to move files and objects around - Fedora drawing on ID Management System and
University Storage Area Network
24Three-tier stack simple deposit
25The RepoMMan tool
- The sharing button is not yet implemented but
is still firmly on our shopping list. - to allow sharing with specified people or group,
for instance for co-authoring - Publish is just now implemented but as part of
a bigger scheme (REMAP). - Once published
26The main UI Muradora
- The main repository UI is currently a customised
Muradora. - Available direct (www) or through the University
portal - Repository content also available through VLE,
departmental websites etc. - New UI being developed in conjunction with Fedora
Commons, the University of Virginia and Stanford
University (the Hydra Project).
27An enterprise production IR
- The requirements of an enterprise, production
repository call for a slightly more complex
provision than we described earlier - For security (sanity?) reasons My Repository
and the public-facing repository at Hull are
different instances of Fedora. Whole thing needs
watertight security. (Probably paranoia on our
part.)
28Production IR
29Enter the JISC-funded REMAP Project
- The REMAP project will provide additional
functionality to the RepoMMan publish process - Repository objects will be enhanced at the point
of publication to service the needs of records
management and digital preservation (RMDP) ? - REMAP involves University Archivist and Records
Manager and the Spoken Word Services team at
Glasgow Caledonian University - Started with user-needs gathering
30The Publish process
- The digital object in My repository is cloned
- new object belongs to IR, author has no rights
- The object is reconstructed to fit a standard
content model ? depending on the type of payload
including RMDP flags - Object is deposited in a staging area for
checking - Email (or preferred message) sent to author
31The Publish process
- Object checked, tweaked and ?approved
- RMDP flags set according to content ?
- Object published with appropriate security ?
- not necessarily available to all
- Author emailed with URL
- Calendar server checks object regularly and
actions any necessary alerts
32Example alerts
- Events
- Information only an event has taken place
- requiring action workflow this has just
happened you need to do that - Dates
- Information only object zzzz seems to have
stalled in a workflow since dd/mm/yy - Information only document xxxx has not been
deposited, the deadline was dd/mm/yy - Requiring action revision/review/update due
- Requiring action specified lifespan reached.
Hide? - Requiring action embargo date reached. Unhide?
- Status
- The repository contains nnn objects of type .vvv
- Green R, Awre C, Burg J, Mays V, Wallace I (2007)
- REMAP records management and preservation
requirements
33Responses
- Take default action
- Snooze for xx days
- Alerts must be capable of grouping
- all the 2008 Geography papers, not twenty alerts
for individual Geography papers - Yes/no to all and more granular choice
- hide these but not those
- Messages and alerts need an importance attached
to them so that administrators can rank by
urgency - Invoke preservation action
34...but
- ...but that is work in progress, due to finish
March 2009. - and then there is Hydra
- a collaboration between Universities of Hull,
Stanford and Virginia with Fedora Commons - to build a highly flexible, (re-)configurable,
end-to-end workflow-based Fedora system, used
through a browser client, from My repository
right through to preservation
35And the thanks!!!
- This work owes much to colleagues elsewhere, key
amongst them - Ian Dolphin
- now International Director of the e-Framework for
Education Research, previously Director of
e-Strategy, University of Hull - Chris Awre
- Information Architect, eSIG, University of Hull
- Robert Sherratt
- Technical Manager, eSIG, University of Hull
- Simon Lamb
- Lead software developer, RepoMMan and REMAP,
eSIG, University of Hull - numerous contacts around the UK, and
internationally, involved with Fedora, Data
Fountains, Muradora and BPEL and, of course, the
JISC who funded (and fund!) much of this work.
36Links
- www.hull.ac.uk/esig/repomman
- www.hull.ac.uk/remap
- edocs.hull.ac.uk
- r.green_at_hull.ac.uk
37(No Transcript)