Title: Office formats
1 Practical Experiences of the Digital Preservation
Testbed Office formats Jacqueline Slats File
Formats for Preservation, Erpanet May10-11 2004,
Vienna, Austria
2- Digital Preservation Testbed is performing
experiments on three strategies for preserving
records without affecting the authenticity of the
records - - Migration
- - XML
- - Emulation (Universal Virtual Computer)
-
- Is assessing their practical use for theDutch
situation.
3- Experiments are taking place on
- text documents
- spreadsheets
- electronic mail
- databases.
4Research Questions
- Advantages of each preservation approach?
- Factors affecting each approach?
- Effectiveness of each approach?
- Basic Requirements for Preservation(context,
content, structure, appearance, behaviour) - Which metadata are essential for preservation?
5Experiment Process (1)
- Step 1 Definition of the process
- Step 2 Preparation for the process
- Step 3 Authenticity requirements and evaluation
checklist - Step 4 Design of the experiments
- Step 5 Resource specification
- Step 6 Go/no go decision
6Experiment Process (2)
- Step 7 Development of the experiment
- Step 8 Test experiment
- Step 9 Go/no go decision
- Step 10 Run experiment
- Step 11 Evaluate experiment
- Step 12 Consider results
7Testbed Team
8Digital record as a combination of ...
Hardware
Software
Computer file
Digital Record
9Basic Requirements for Preservation
- Context
- Content
- Structure
- Appearance
- Behaviour
10Basic Requirements for Preservation of text
documents Context
- Organisational context, such as name of
organisation, business process, date, relation
with other documents - Preservation Log File, with information about
original and current file formats, name and
version of hardware, software and OS,
preservation actions
11Basic Requirements for Preservation of text
documents Content
- All content must be preserved, including headers
and footers, table of content, document
properties, remarks - Plain text must always be readable
12Basic Requirements for Preservation of text
documents Structure
- Structure of the document must be preserved, in
order to represent the logical relations between
the components of the document, such as the order
of chapters, paragraphs, but also the right
position of inserted remarks, footnotes and images
13Basic Requirements for Preservation of text
documents Appearance
- The appearance of the original and the preserved
version do not have to be identical, but the new
appearance may not in any way affect the meaning
of the original record
14Basic Requirements for Preservation of text
documents Behaviour
- Description of active links must be preserved
- Active behaviour, updating the content must not
be preserved, but prove of this behaviour driven
content does
15Text documents (1)
Approach
Results
- Migration from an older version of an application
to a newer version of this application - Migration from an application to a standard
format PDF - Migration of old records created in one word
processor to another (WP to Word)
- For the short term needs to be repeated every
few years manual checking only if migration is
automated - PDF is suitable to represent text documents
authentically, especially the appearance - Met authenticity requirements only after manual
intervention
16Text documents (2)
Approach
Results
- XML is able to represent the context, content,
structure and behavior of text documents
authentically. To represent appearance an
additional stylesheet is required.
17File format XML
Cons
Pros
- Open standard, controlled through W3C
- Platform-independent
- Self describing and human readable
- Well equipped to preserve content, context and
structure
- Difficult to fully preserve the appearance of a
texual document - XML, its related standards and their use form a
complex material much pioneering work still
needs to be done
18File format PDF
Pros
Cons
- PDF is openly and freely published
- Platform-independent
- Widely used standard
- Well equipped to preserve content, context and
appearance
- Adobe controls the development of PDF
- Sometimes loss of information (e.g. diacritic
characters) - No full-bodied ASCII basis
19Decision table for preservation of text documents
PDF alternativebackwards compatibility
P lt 10 jr
Implicit structure
P gt 10 jr
PDF
Text-document
PDF or XML alternativebackwards compatibility
P lt 10 jr
Explicit structure
PDF or XML
P gt 10 jr
20Preserved Object
Original file
XML file
PDFfile
Preserv. log file
Metadata
1..
1..
1
0..
1
1
Imagefile
DTD or schema
Style-sheet
21For further information about the
Testbed Website www.digitaleduurzaamheid.nl
e-mail testbed_at_nationaalarchief.nl