Title: Optimising metadata workflows in a distributed information environment
1Optimising metadata workflows ina distributed
information environment
- R. John Robertson Jane BartonCentre for
Digital Library ResearchUniversity of
Strathclyde, UK
2Overview
- Introductions definitions
- Metadata, workflow optimisation
- Diversity the distributed information
environment - Models and frameworks
- Generic models repositories, objects metadata
- Existing models frameworks
- Developing a metadata lifecycle model
- Using the metadata lifecycle model tooptimise
workflow - Moving forward
3Metadata, workflow optimisation
- Metadata good quality metadata metadata that
meets repository requirements - Metadata workflow quality assured metadata by
design metadata creation QA processes
designed to meet repository requirements with
available resources - Metadata workflow optimisation refining
metadata workflow to improve quality enhance
metadata - Critical to functionality, interoperability
sustainability of repositories
4Optimising metadata workflow
Barton, J. Robertson, R.J. Designing workflows
for quality assured metadata. CETIS Metadata
Digital Repositories SIG Meeting, Edinburgh, 10th
March 2005.
5Diversity the dIE
- In the wider environment, there is considerable
diversity - of purpose
- of metadata requirements
- of metadata creation processes priorities
- Diversity presents challenges for
interoperability between repositories - Diversity also offers potential for refinement of
metadata workflow among repositories - Assumes/requires persistent object identifiers
6Optimising metadata workflow in the dIE
- Workflow optimisation requires a model of the dIE
- to facilitate strategic partnerships
- to inform allocation of resources
- to foster holistic approach to creation,
augmentation enhancement of metadata - To achieve this, two conditions must be met
- local workflow must be articulated
- local workflow must be placed in context of
wider environment
7Reference models for workflow optimisation
- Ecology of repositories
- provides a typology of repositories associated
services - models the relationships between them between
their domains - Object lifecycle model
- profiles objects within repositories their
movement, transformation adaptation within the
dIE - Metadata lifecycle model
- profiles metadata within repositories its
movement, augmentation enhancement within the
dIE
8Existing models frameworks
- Existing models that relate to (parts of) the
reference models - the E-Learning Framework
- McLean Blincos cosmic view
- the JISC Information Environment
- CORDRA
- the work of Gonçalves et al
9The E-Learning Framework (ELF)
- A common approach to service oriented
architectures for education via - a definitional model of service components
- standards tools to support their
interoperability - Addresses a specific domain provides a typology
of functions within that domain - (The E-Learning Framework. http//www.elframework.
org)
10McLean Blincos cosmic view
- A service domain typology of repositories
- more comprehensive than ELF but less detailed
- highlights potential for cross-domain approach
- identifies need for better articulation of
context methodologies to deal with complex
contextual issues - (McLean, N. The ecology of repository services a
cosmic view. ECDL, 2004. http//www.ecdl2004.org/p
resentations/mclean/)
11The JISC Information Environment
- Provides convenient access to a comprehensive
collection of scholarly educational materials - can be viewed as a specific implementation of ELF
- provides a superstructure to inform co-ordinate
technical infrastructure development - focuses on technical solutions to support
structural syntactical interoperability - taking a lead in addressing unresolved issues in
the object lifecycle - (JISC. Strategic activities Information
Environment.2004. http//www.jisc.ac.uk/about_inf
o_env.html)
12CORDRA
- Enables access to wide range of learning object
repositories through federated searching - high common denominator for participating LORs
- creates community of repositories behind
interoperability boundary - assumes federation as method of interaction, with
metadata integration rather than
interoperability, so little potential for
metadata workflow optimisation - (Kraan,W. Mason,J. Issues in federating
repositories a report on the first
International CORDRA Workshop. D-Lib Magazine,
11(3), 2005.)
13Gonçalves et als 5S
- Complex formal taxonomy of repositories
- comprehensively catalogues repositories from five
perspectives - engages with all three reference models but does
not engage with interactions offers only a
static view - (Goncalves,M.A. et al. Streams, structures,
spaces, Scenarios, societies (5S) a formal model
for digital libraries. ACM Transactions on
Information Systems, 22(2), 2004.)
14Existing models frameworks
- In general, existing models
- address structural syntactic interactions to a
degree but do not address semantic interactions - provide voices, vocabularies grammar for
repositories - could usefully be extended to profile not only
what repositories do but how they might interact
with each other
15Developing a metadata lifecycle model
- A metadata lifecycle model (MLM) must
- include profiles of each repositorys metadata,
ideally at element level, more realistically in
terms of structure, semantics syntax - distinguish between local requirements those of
the wider community - enable clusters of similar repositories to be
identified relationships established - include processes carried out as a result of
these relationships, formal or informal
16Components of the model
17Using the MLM to optimise workflow
- MLM enables repositories to optimise workflow by
- exploiting known metadata sources elsewhere in
the dIE via intelligent import or harvesting - exploiting formal metadata relationships between
repositories services via negotiation
establishment of minimum standards - provides a framework for assessing the
cost/benefit of eg implementing particular
metadata elements or participating in consortia
18Using the MLM example
- The NSDL is a centralised service harvesting
metadata from multiple sources - breaks harvested metadata into elements assigns
provenance metadata to them - creates optimum records by combining metadata
elements from various sources - creates metadata profiles of sources to enable
these processes to be automated - demonstrates that metadata workflow optimisation
intelligent harvesting can yield real benefits
19Using the MLM use cases
- LOR using LOM wants to harvest metadata records,
has crosswalks mappings for structure syntax,
seeks repositories with similar semantic approach - federated search service wants to dynamically
select search targets that can support MESH - departmental repository enhances its metadata
byre-harvesting general subject terms from its
IR specialist subject terms from a subject
repository - centralised service augments metadata
automatically original source re-harvests
improved record
20Moving forward
- In context of rapid repository development with
limited resources, must use available resources
as effectively as possible - Optimising metadata workflow across the dIE can
enable repositories to - expand element sets without compromising on
quality - expand functionality
- improve ingest processes
- support more automatic metadata transformation
enhancement
21Moving forward
- Development of the MLM to support metadata
workflow optimisation requires - standard way of profiling repositories at
repository, object metadata level - integration with registry projects for
repositories, standards, application profiles
vocabularies - at individual repository level, a method for the
design of metadata workflows that makes reference
to exploits workflows elsewhere in the dIE
22Optimising metadata workflow
Barton, J. Robertson, R.J. Designing workflows
for quality assured metadata. CETIS Metadata
Digital Repositories SIG Meeting, Edinburgh, 10th
March 2005.