Title: The Planets Digital Preservation Project
1The Planets Digital Preservation Project
- DLF, April 2007
- Adam Farquhar
2Outline
- The Digital Preservation problem
- The Planets project
- Goals
- Status
- Architecture
- The Planets testbed
- The Planets interoperability framework
- Preservation planning
3Digital information at risk
- Our society risks a gaping hole in the cultural
and scientific record unless we act now - European National Libraries and Archives
- Have the legal responsibility and the legislative
framework to safeguard digital information - Must provide sustained access to cultural and
scientific knowledge - Have limited ability to ensure that todays
digital information will be accessible for future
generations - Meeting the challenge of preserving access goes
beyond the capabilities of any single institution
4EU Support for digital preservation
- Major initiative in the Information Science and
Technology (IST) Framework Programme 6 Call 5 - Two Integrated Projects funded Planets (BL),
Caspar (CCLRC) - Coordinated action DPE (HATII at Glasgow)
- Planets builds on strong digital archiving and
preservation programmes at European, National and
institutional levels - Addresses core digital preservation challenges
- Use an empirical approach to learn what works and
why - Four year project starting June 2006 with 15me
budget
5Losing digital information hurts everyone
- An NHS doctor needs a 1987 clinical study found
on Google Scholar - She tries to open the dvi file, but cant
- A father shows his children the computer game he
wrote in school - He wrote the game in PDP assembler
- He stored the program on paper tape
- A small business owner wants to market the energy
saving device it developed in 1985 - She carefully stored all of the files
- Now she doesnt have the applications to read the
documents, spread-sheets, and CAD drawings - The CAD company is long out of business
6Losing digital information costs opportunity
- A university research lab has provided its data,
technical reports, software on-line since 1984
and on the web since 1990. The professor retires
and closes the lab in 2004 - A university IP officer wants to defend a patent
challenge - A biographer wants review the unpublished work
- A former student wants to revive a line of
research - The digital files
- Some are damaged
- Some rely on applications that are out-of-use
- Some rely on hardware that is unavailable
- Some rely on an environment that no longer exists
- Some rely on information that no-one recorded
7Losing digital information costs money
- An oil company collected extensive data for a
reservoir and want to exploit it in 2007 - All documents and data are held in v1.3 of an
integrated management product - They now use v9.0 and cant read or access it
- An oilfield services company collects dipmeter
data in the 1970s - Stored on 7-Track tapes
- Recorded in optimised formats
- Difficult and expensive to repeat measurement data
8How big is the problem?
- Who is touched by digital preservation problems?
- Individual consumers
- Small and medium sized enterprises
- Large corporations
- University libraries, faculties, institutes
- Publishers
- Libraries
- Local, regional, national governments
- every person or organisation that keeps digital
material for more than 15 years! - Estimates suggest Europe loses 3bn per year in
business value
9Planets
- Four-year EU-Funded (FP6) Digital Preservation
research and technology development project. - Increase Europes ability to ensure long-term
access to its cultural and scientific heritage - Improve decision-making about long term
preservation - Ensure long-term access to valued digital content
- Control the costs of preservation actions through
increased automation, scaleable infrastructure - Ensure wide adoption across the user community
and establish market place for preservation
services and tools - Build practical solutions
- Integrate existing expertise, designs and tools
- Share and build
10Planets
- Brings together Archives, Libraries, researchers
and technology companies - Builds on strong digital archiving and
preservation programmes - Addresses core challenges
- Focuses on needs of Libraries and Archives
- Will provide an interoperable framework to enable
- Third-parties to provide tools and services
- Vendors to integrate preservation services
- Content owners to ensure long-term access to
their digital content - Will use an empirical approach to gather evidence
- Outreach shows potential to make a difference
11Planets partners I
- The British LibraryNational Library,
NetherlandsAustrian National LibraryState and
University Library, DenmarkRoyal Library,
Denmark - National Archives, UKSwiss Federal
ArchivesNational Archives, Netherlands
12Planets partners II
- Tessella Plc
- IBM Netherlands
- Microsoft Research, Cambridge
- Austrian Research Centers
- Hatii at University of Glasgow
- University of Freiburg
- Technical University of Vienna
- University at Cologne
13The Team
14Planets approach
- Planning services that empower organisations to
define, evaluate, and execute preservation plans - Methodologies, tools and services for
Characterisation of digital objects - Innovative solutions for Preservation Actions
- An Interoperability Framework provides services
distributed services - A Testbed enables objective evaluation of
protocols, tools, services and plans - Outreach, workshops and training to engage the
user and vendor communities
15Project architecture reflects problem structure
Digital Content
Preservation Action Services
Org. Context
Test Bedevaluation and validation services
External Context
Characterisation Services
Interoperability Framework
16Status
- Fall 06
- Built the team
- Gathered initial requirements
- Conducted workshops and surveys
- Winter 07
- Built specifications
- Evaluated component technologies
- Spring 07
- Finalised many technical and implementation
decisions - Started to build tools and services
- Summer 07
- Initial prototypes completed
- First experiments conducted
17Key technology choices
- Extensive use of XML throughout architecture
- Extensive use of web services
- Extensive use of Java and open source components
- JSF (Java Server Faces) for user interfaces
- Workflows
- BPEL Business process execution language to
describe experiments and plans - Eclipse BPEL workflow designer
- Repository and interfaces
- JSR-170 Repository API
- Access to digital archives
- Jackrabbit to manage intermediate storage and
data - Sandbox technology for some third-party tools
- JBoss application server
18Testbed
- Provides a foundation for objective evaluation
- Load content
- Experiment collect data, evaluate results,
compare outcomes - Validate preservation plans
- Benchmark tools and services
- Consists of
- Data storage, hardware, Planets software, testbed
software - Benchmark and other content
- Provides resources for
- The project partners
- The preservation community
- External organisations
- Tool and service certification
19Testbed Screen Shot
20Testbed Screen Shot
Run Experiment
21Interoperability Framework
- Provide the glue to hold the Planets tools and
services together - Provide service registries
- Characterisation services
- Preservation action services
- Provide shared services
- Security, authentication, authorisation
- Monitoring, logging, auditing
- Intermediate data, repository, file system space
- Execute and manage workflows
- Enable third-parties to provide tools and
services - Enable vendors to integrate preservation services
22Applications
Testbed
Preservation Planning
CharacternServices
.
Workflow Designer
Admin Tool
Interoperability Framework
Service Bus
Security
Monitoring
Workflow Execution Engine
Transaction Manager
Work Space
Exception Handling
Database Layer
23Interoperability Framework Workflow design
24Preservation planning
Preservation Policy
Plans
Preservation Planner
Plan Evaluator
Content Profile
Usage Profile
Sample Content
Plan
Actions
25Preservation plan execution
Delivery
Adaptor
Executor
Repository
Plan
Content
26Content characterisation
- Characterise content to support preservation
- Reduce up-front metadata costs
- E.g., Harvard segmented images based on tool
parameters - Build on TNAs PRONOM for file-format
identification - Define a characterisation language (XCDL)
- Define an extraction language (XCEL)
- Define an pluggable interpreter
- Extend to measure loss due to actions
- All transformations cause loss
- Leverage understanding to improve file formats
- Address a root cause of digital obsolescence
27Preservation actions
- Transform content
- Wrap third-party transformation tools
- Fill gaps with new tools
- Preserve relational databases
- Build on Swiss Archive work
- Preserve Office content
- Build on MSFT tools
- Transform environments
- Modular emulation of the full hardware/software
environment - Provides full look-and feel
- Superb for highly dynamic content
- Leverage Virtual Machine technology
- Layered durable emulation
- Build on IBM Universal Virtual Computer (UVC)
- Establish abstract device drivers
28Conclusion
- Planets
- Is a major EU co-funded digital preservation
project - Addresses the needs of Libraries and Archives
- Has made substantial progress towards a
service-oriented preservation infrastructure - Looks forward to working with the international
Digital Library community to test, evaluate,
refine, improve
29Questions?
Thanks! http//www.planets-project.eu/ info_at_plane
ts-project.eu