Title: Interoperation and Infrastructure for Digital Archiving:
1Interoperation and Infrastructure for Digital
Archiving the LuKII Project
- by
- Michael Seadle Peter Schirmbacher,Â
- Humboldt-Universität zu Berlin
-
- Reinhard Altenhöner Tobias Steinke,
- Deutsche Nationalbibliothek
- Â
Berlin School of Library and Information Science
2Introduction
  In June, 2007, a DFG-sponsored workshop on
digital archiving took place in
Berlin.  Interoperability between LOCKSS (Lots
of Copies Keep Stuff Save) and KOPAL
(Co-operative Development of a Long-Term Digital
Information Archive) was one of the most
discussed ideas that emerged from that workshop.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
3Scholarly infrastructure
  Today's scholarly infrastructure depends
heavily on digital materials. In some fields,
particularly in the natural sciences, digital
publication is taken for granted. More
publishers are launching new journals only in
digital formats and open-access publications are
almost exclusively digital.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
4Repositories
  The repositories offer ways to collect and
give access to digital information. Â They lack
infrastructure to do integrity checking with a
statistically significant likelihood of finding
and addressing integrity problems or to address
usability problems with regular migration.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
5Open Access
  Germany has played a leading part
internationally in the open access
movement.  As a result its institutional
repositories contain a wealth of research works.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
6Cost Effectiveness
 Cost-effectiveness is key because long term
digital archiving is expensive. Universities
and their libraries have grown accustomed to
paying the costs for retaining paper works,
including their housing, handling and repair
after heavy use. Those costs will not go away
any time soon, which means that the cost of
digital preservation comes in addition to, not
instead of, existing costs.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
7LuKII goals
 The first goal of this project is to establish
interoperability between KOPAL (from Germany) and
LOCKSS (from the US) in order to marry German
goals for migration and usability with
cost-effective bitstream preservation. Â The
second goal is to test the prototype
interoperable system by harvesting a wide variety
of data from German OPUS and eDoc institutional
repositories.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
8LOCKSS
 LOCKSS (Lots of Copies Keep Stuff Safe from
Stanford University) is arguably the earliest
digital preservation and dissemination system.
It is known in particular for its robustness in
maintaining the integrity of the digital
object.  LOCKSS has faced genuine attack
scenarios, shifted platforms, and tested format
migration network-wide.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
9Bitstream integrity
  Bitstream integrity is broadly seen in the US
as the sine qua non of long term digital
archiving.  If the file is damaged,
usability/readability and authenticity cease to
be meaningful. LOCKSS is neutral toward
usability/readability solutions and can function
with more than one.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
10Archival Storage
  The Archival Storage in LOCKSS uses seven
separate nodes to check routinely on the
integrity of an archived bitstream and to take
action to replace a damaged copy.  The updated
version is copied to other LOCKSS boxes in the
network, but the older version is also retained
in case of future need.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
11Context
  Context plays an important role in LOCKSS.
The URL of the original work is stored with the
digital object. This not only allows the system
to recognize and refer back to the original
version of a digital document in order to check
routinely for changes without requiring human
intervention, but also lets the system know if
the original for some reason ceases to be
available online.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
12Ingest
 The current LOCKSS ingest process (its SIP or
Submission Information Package in OAIS terms)
uses a crawler that efficiently harvests all
documents in a standard tree-structure website
when it has permission from a manifest on the
server being harvested. The manifest serves as
a guarantee to publishers that the LOCKSS crawler
only takes materials that they have explicitly
authorized.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
13Cost-effectiveness
 Cost-effectiveness has been an integral feature
of LOCKSS design from the outset. It helps to
reduce costs by using cheap and simple equipment.
The fact that it is open source means that
libraries and other preservation-oriented
institutions world-wide can use it without paying
for permission.  LOCKSS is used by 197
libraries and institutions in 19 countries.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
14LOCKSS Alliance
   LOCKSS Alliance membership is not required
for the use of an open source package like
LOCKSS, though it is strongly encouraged as a way
of sharing development and support costs.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
15Community
  LOCKSS looks to a community of developers at
member institutions of the LOCKSS Alliance to
help to keep it up to date. This
community-based co-development on the LINUX model
is particularly cost-effective.  Cost is
obviously a factor for a commercial firm with
profits to make.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
16KOPAL Background
  The goal of the KOPAL project (2004 2007),
founded by the Federal Ministry for Education and
Research (Bundesministerium für Bildung und
Forschung), was the cooperative development of a
long-term digital information archive. The
archival system is based on DIAS by IBM, which
was originally developed for the Koninklijke
Bibliotheek of the Netherlands (KB).
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
17KOPAL
 The German National Library and the Staats- und
Universitätsbibliothek Göttingen (SUB Göttingen)
use KOPAL, whose DIAS (Digital Archive
Information System) core was developed by IBM for
the National Library of the Netherlands.
Additional open source software has enhanced
the ingest procedures and has provided tools to
enable preservation planning activities like
systematic migration workflows.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
18KOPAL users
 The DIAS system for the KOPAL solution is
currently used by two clients, DNB and SUB
Göttingen.  Their data are independently of
each other stored and accessible. Â The system
is located at Göttingen, which is responsible for
guaranteeing bitstream preservation.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
19Universal Object Format
 The KOPAL system tries to deal with the problem
of obsolete file formats and rendering
environments by support of file format migration
throughout its architecture. Every archival
package is in an open defined format called
Universal Object Format, which describes a
structure to record metadata for preservation
together with the content files.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
20koLibRI
- Â
- The koLibRI Java software library was developed
by the German National Library and SUB Göttingen
within the KOPAL project to support the
integration of DIAS in the local IT
infrastructure of the clients. Its tasks are - Encapsulate the communication with DIAS
- Create archival objects conforming to the
Universal Object Format - Automatically generate technical metadata with
the tool JHOVEÂ - Manage the ingest and the access to DIASÂ
- Manage the workflow to migrate file formats in
archival objects based on given parameters and
migration tools
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
21KOPAL advantages
- KOPAL gains several advantages in working with
LOCKSS. - Â
- LOCKSS strength in preserving bitstream integrity
- LOCKSS's effective dissemination package.
- The shared support and development structure of
the LOCKSS Alliance - Â
- KOPAL's state-of the-art presentation environment
offers a solution for digital objects that are no
longer usable. - Since KOPAL's systematic migration-flow
guarantees the long-term usability and
accessibility of digital objects, it complements
the functions of LOCKSS well.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
22LOCKSS advantages
  KOPAL's state-of the-art presentation
environment offers a solution for digital objects
that are no longer usable.  Since KOPAL's
systematic migration-flow guarantees the
long-term usability and accessibility of digital
objects, it complements the functions of LOCKSS
well.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
231st Objective
 The goal of this project is to make open access
repositories in Germany, both discipline-specific
and institutional, more robust over time. The
first objective involves establishing a LOCKSS
network in Germany and providing the technical
support to maintain it without constant reference
to the LOCKSS teams in Stanford or Edinburgh.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
242nd Objective
- Â
- Interoperability with KOPAL is the second
objective. - Â
- David Rosenthal (Stanford/LOCKSS) in private
correspondence suggested the following three
types of interoperability - Transfer interoperability
- Dissemination interoperability
- Audit interoperability
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
253rd Objective
  The third objective is to test the
interoperability prototype (the LuKII
prototype) by harvesting digital contents from a
selection of German institutional repositories
from the OA-Netzwerk-Projekt.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
263rd Objective
- Among the key development issues for this third
objective are - Â
- ingest automation,Â
- cost-effective metadata creation,Â
- format migration testing.
- Â
- An absolutely essential feature of long term
digital archiving systems is to free them as much
as possible from the need for costly human
intervention.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
27Current status
- Â
- Current status The project has the following
rough timeline - Â
- March/April Hiring staff
- May -- Development of the LOCKSS network in
Germany - June training for Berlin technical staff at
Stanford. - July/August Programming for METS and query
support at Stanford programming for SFTP
crawler, and parsing extracting METS metadata
at Berlin - September koLibRI generation of data for testing
LOCKSS modifications at D-NB implementation into
test LOCKSS network Berlin / Stanford - October first repository data load start of
iterative tool development.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
28Conclusion
  Scholarly research on long term digital
archiving is just beginning. Today's system
designs may no longer be the ideal in 50 or 100
years. The more that systems can cooperate and
interoperate, the greater the chances that
investments in archiving systems can be carried
into the future.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
29Sources
- Deutsche Initiative für Netzwerkinformationen
(2009) Open Access-Netzwerk Projekt. Available
(Dec 2009) http//www.dini.de/projekte/oa-netzwer
k/ - Library of Congress, (2009), Metadata Encoding
and Transmision Standard. Available (Dec 2009)
http//www.loc.gov/standards/mets/ - Library of Congress, National Digital Information
Infrastructure Preservation Program (2009),
WARC, Web ARChive file format.. Available
(December 2009) http//www.digitalpreservation.go
v/formats/fdd/fdd000236.shtml - LOCKSS (2009), Libraries. Available (Dec 2009)
http//www.lockss.org/lockss/Libraries - LOCKSS (2009), Publications. Available (Dec
2009) http//www.lockss.org/lockss/Publications - Country (Ranking Web of Repositories).
- Seadle, Michael Elke Greifeneder. 2008. In
archiving we trust Results from a workshop at
Humboldt University in Berlin. First Monday
13(1). - Directory of open access journals. Available at
http//www.doaj.org/doaj?funcfindJournals
Accessed January 23, 2009. Â - Â
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin