Interoperation and Infrastructure for Digital Archiving: - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Interoperation and Infrastructure for Digital Archiving:

Description:

This community-based co-development on the LINUX model is particularly cost-effective. Cost is obviously a factor for a commercial firm with profits to make. ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 30
Provided by: Goo7962
Category:

less

Transcript and Presenter's Notes

Title: Interoperation and Infrastructure for Digital Archiving:


1
Interoperation and Infrastructure for Digital
Archiving the LuKII Project
  • by
  • Michael Seadle Peter Schirmbacher, 
  • Humboldt-Universität zu Berlin
  • Reinhard Altenhöner Tobias Steinke,
  • Deutsche Nationalbibliothek
  •  

Berlin School of Library and Information Science
2
Introduction
    In June, 2007, a DFG-sponsored workshop on
digital archiving took place in
Berlin.    Interoperability between LOCKSS (Lots
of Copies Keep Stuff Save) and KOPAL
(Co-operative Development of a Long-Term Digital
Information Archive) was one of the most
discussed ideas that emerged from that workshop.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
3
Scholarly infrastructure
    Today's scholarly infrastructure depends
heavily on digital materials. In some fields,
particularly in the natural sciences, digital
publication is taken for granted. More
publishers are launching new journals only in
digital formats and open-access publications are
almost exclusively digital.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
4
Repositories
    The repositories offer ways to collect and
give access to digital information.   They lack
infrastructure to do integrity checking with a
statistically significant likelihood of finding
and addressing integrity problems or to address
usability problems with regular migration.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
5
Open Access
    Germany has played a leading part
internationally in the open access
movement.    As a result its institutional
repositories contain a wealth of research works.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
6
Cost Effectiveness
  Cost-effectiveness is key because long term
digital archiving is expensive. Universities
and their libraries have grown accustomed to
paying the costs for retaining paper works,
including their housing, handling and repair
after heavy use. Those costs will not go away
any time soon, which means that the cost of
digital preservation comes in addition to, not
instead of, existing costs.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
7
LuKII goals
  The first goal of this project is to establish
interoperability between KOPAL (from Germany) and
LOCKSS (from the US) in order to marry German
goals for migration and usability with
cost-effective bitstream preservation.   The
second goal is to test the prototype
interoperable system by harvesting a wide variety
of data from German OPUS and eDoc institutional
repositories.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
8
LOCKSS
  LOCKSS (Lots of Copies Keep Stuff Safe from
Stanford University) is arguably the earliest
digital preservation and dissemination system.
It is known in particular for its robustness in
maintaining the integrity of the digital
object.    LOCKSS has faced genuine attack
scenarios, shifted platforms, and tested format
migration network-wide.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
9
Bitstream integrity
    Bitstream integrity is broadly seen in the US
as the sine qua non of long term digital
archiving.    If the file is damaged,
usability/readability and authenticity cease to
be meaningful. LOCKSS is neutral toward
usability/readability solutions and can function
with more than one.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
10
Archival Storage
    The Archival Storage in LOCKSS uses seven
separate nodes to check routinely on the
integrity of an archived bitstream and to take
action to replace a damaged copy.    The updated
version is copied to other LOCKSS boxes in the
network, but the older version is also retained
in case of future need.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
11
Context
    Context plays an important role in LOCKSS.
The URL of the original work is stored with the
digital object. This not only allows the system
to recognize and refer back to the original
version of a digital document in order to check
routinely for changes without requiring human
intervention, but also lets the system know if
the original for some reason ceases to be
available online.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
12
Ingest
  The current LOCKSS ingest process (its SIP or
Submission Information Package in OAIS terms)
uses a crawler that efficiently harvests all
documents in a standard tree-structure website
when it has permission from a manifest on the
server being harvested. The manifest serves as
a guarantee to publishers that the LOCKSS crawler
only takes materials that they have explicitly
authorized.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
13
Cost-effectiveness
  Cost-effectiveness has been an integral feature
of LOCKSS design from the outset. It helps to
reduce costs by using cheap and simple equipment.
The fact that it is open source means that
libraries and other preservation-oriented
institutions world-wide can use it without paying
for permission.    LOCKSS is used by 197
libraries and institutions in 19 countries.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
14
LOCKSS Alliance
      LOCKSS Alliance membership is not required
for the use of an open source package like
LOCKSS, though it is strongly encouraged as a way
of sharing development and support costs.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
15
Community
    LOCKSS looks to a community of developers at
member institutions of the LOCKSS Alliance to
help to keep it up to date. This
community-based co-development on the LINUX model
is particularly cost-effective.    Cost is
obviously a factor for a commercial firm with
profits to make.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
16
KOPAL Background
    The goal of the KOPAL project (2004 2007),
founded by the Federal Ministry for Education and
Research (Bundesministerium für Bildung und
Forschung), was the cooperative development of a
long-term digital information archive. The
archival system is based on DIAS by IBM, which
was originally developed for the Koninklijke
Bibliotheek of the Netherlands (KB).
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
17
KOPAL
  The German National Library and the Staats- und
Universitätsbibliothek Göttingen (SUB Göttingen)
use KOPAL, whose DIAS (Digital Archive
Information System) core was developed by IBM for
the National Library of the Netherlands.
Additional open source software has enhanced
the ingest procedures and has provided tools to
enable preservation planning activities like
systematic migration workflows.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
18
KOPAL users
  The DIAS system for the KOPAL solution is
currently used by two clients, DNB and SUB
Göttingen.    Their data are independently of
each other stored and accessible.   The system
is located at Göttingen, which is responsible for
guaranteeing bitstream preservation.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
19
Universal Object Format
  The KOPAL system tries to deal with the problem
of obsolete file formats and rendering
environments by support of file format migration
throughout its architecture. Every archival
package is in an open defined format called
Universal Object Format, which describes a
structure to record metadata for preservation
together with the content files.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
20
koLibRI
  •  
  • The koLibRI Java software library was developed
    by the German National Library and SUB Göttingen
    within the KOPAL project to support the
    integration of DIAS in the local IT
    infrastructure of the clients. Its tasks are
  • Encapsulate the communication with DIAS
  • Create archival objects conforming to the
    Universal Object Format
  • Automatically generate technical metadata with
    the tool JHOVE 
  • Manage the ingest and the access to DIAS 
  • Manage the workflow to migrate file formats in
    archival objects based on given parameters and
    migration tools

Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
21
KOPAL advantages
  • KOPAL gains several advantages in working with
    LOCKSS.
  •  
  • LOCKSS strength in preserving bitstream integrity
  • LOCKSS's effective dissemination package.
  • The shared support and development structure of
    the LOCKSS Alliance
  •  
  • KOPAL's state-of the-art presentation environment
    offers a solution for digital objects that are no
    longer usable.
  • Since KOPAL's systematic migration-flow
    guarantees the long-term usability and
    accessibility of digital objects, it complements
    the functions of LOCKSS well.

Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
22
LOCKSS advantages
    KOPAL's state-of the-art presentation
environment offers a solution for digital objects
that are no longer usable.    Since KOPAL's
systematic migration-flow guarantees the
long-term usability and accessibility of digital
objects, it complements the functions of LOCKSS
well.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
23
1st Objective
  The goal of this project is to make open access
repositories in Germany, both discipline-specific
and institutional, more robust over time. The
first objective involves establishing a LOCKSS
network in Germany and providing the technical
support to maintain it without constant reference
to the LOCKSS teams in Stanford or Edinburgh.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
24
2nd Objective
  •  
  • Interoperability with KOPAL is the second
    objective. 
  •  
  • David Rosenthal (Stanford/LOCKSS) in private
    correspondence suggested the following three
    types of interoperability
  • Transfer interoperability
  • Dissemination interoperability
  • Audit interoperability

Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
25
3rd Objective
    The third objective is to test the
interoperability prototype (the LuKII
prototype) by harvesting digital contents from a
selection of German institutional repositories
from the OA-Netzwerk-Projekt.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
26
3rd Objective
  • Among the key development issues for this third
    objective are 
  •  
  • ingest automation, 
  • cost-effective metadata creation, 
  • format migration testing.
  •  
  • An absolutely essential feature of long term
    digital archiving systems is to free them as much
    as possible from the need for costly human
    intervention.

Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
27
Current status
  •  
  • Current status The project has the following
    rough timeline
  •  
  • March/April  Hiring staff
  • May -- Development of the LOCKSS network in
    Germany
  • June training for Berlin technical staff at
    Stanford.
  • July/August Programming for METS and query
    support at Stanford programming for SFTP
    crawler, and parsing extracting METS metadata
    at Berlin
  • September koLibRI generation of data for testing
    LOCKSS modifications at D-NB implementation into
    test LOCKSS network Berlin / Stanford
  • October first repository data load start of
    iterative tool development.

Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
28
Conclusion
    Scholarly research on long term digital
archiving is just beginning. Today's system
designs may no longer be the ideal in 50 or 100
years. The more that systems can cooperate and
interoperate, the greater the chances that
investments in archiving systems can be carried
into the future.
Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
29
Sources
  • Deutsche Initiative für Netzwerkinformationen
    (2009) Open Access-Netzwerk Projekt. Available
    (Dec 2009) http//www.dini.de/projekte/oa-netzwer
    k/
  • Library of Congress, (2009), Metadata Encoding
    and Transmision Standard. Available (Dec 2009)
    http//www.loc.gov/standards/mets/
  • Library of Congress, National Digital Information
    Infrastructure Preservation Program (2009),
    WARC, Web ARChive file format.. Available
    (December 2009) http//www.digitalpreservation.go
    v/formats/fdd/fdd000236.shtml
  • LOCKSS (2009), Libraries. Available (Dec 2009)
    http//www.lockss.org/lockss/Libraries
  • LOCKSS (2009), Publications. Available (Dec
    2009) http//www.lockss.org/lockss/Publications
  • Country (Ranking Web of Repositories).
  • Seadle, Michael Elke Greifeneder. 2008. In
    archiving we trust Results from a workshop at
    Humboldt University in Berlin. First Monday
    13(1). 
  • Directory of open access journals. Available at
    http//www.doaj.org/doaj?funcfindJournals
    Accessed January 23, 2009.  
  •  

Michael Seadle Berlin School of Library
Information Science Humboldt Universität zu
Berlin
Write a Comment
User Comments (0)
About PowerShow.com