Title: Ingest Strategies for Digital Libraries
1Ingest Strategies for Digital Libraries
- Seamus Ross and
- Adam Rusbridge
- HATII, University of Glasgow
2Introduction
- Investigate Requirements, Procedures and
Surrounding Issues of Ingest - Focused on Metadata Requirements
- Prototype Repository with PHP/MySQL
- Based Preservation Metadata on NLNZ Schema
- Subset of MARC21 for Bibliographic Metadata
- Pragmatic and Practical
3Objectives
- To investigate the
- Development of Workflows
- Representation of Complex Objects
- Metadata Requirements
- Feasibility and Implications of Manual
Acquisition - Time and Cost Requirements
4Media Types
- Published CD-ROMs and Floppy Disks
- - Specialist software often required, complex
navigational requirements - Unpublished CD-ROMs and Floppy Disks
- - Unstructured, undocumented (often backup)
collections
5Development
- Prototype Repository Development Targeted
- Defined Minimum Repository Structure
- Preservation and Bibliographic Metadata Strategy
- Linux OS, MySQL, PHP
6Analysis of Media
- Disks had none or limited additional information
at submission - Necessary to execute and view all objects
- Preliminary Work Included
- - Packaging, Documentation, System Requirements
and Entry Points - Search Internet for Semantic Information
- Execute under Native Platform
7Investigation
- Establish Content of Media
- - e.g. Hybrid Disk Unrelated Objects
- Validate Completeness and Integrity
- - e.g. Virus Checking Comparisons of Table Of
Contents and Checksums - Identification of File Types
- - Reliance on Format Identification websites
(www.filext.com)
8Investigation
- Alternate methods of Identification
- - Binary editors required for unconventional
extensions. Unsurprisingly, difficult to
interpret information - Time consuming and technically demanding
- Many Open-Source Utilities Required
- - Difficult to find, technically demanding to
use - Time Consuming, Mentally Exhausting, Ambiguous
9A Workflow Model
Selection
Archiving/ Ingest
Description
Registration
Ingest Prep
Delivery Acquisition
Quarantine Virus Checking
Verification
10Representation of Objects
1.
3.
2.
6.
5.
4.
11Representation of Objects
1.
3.
2.
6.
5.
4.
12Representation of Objects
1.
3.
2.
6.
5.
4.
13Representation of Objects
1.
3.
2.
6.
5.
4.
14Representation of Objects
1.
3.
2.
6.
5.
4.
15Representation of Objects
1.
3.
2.
6.
5.
4.
16Representation of Objects
1.
3.
2.
6.
5.
4.
17Representation of Objects
1.
3.
2.
6.
5.
4.
18Representation of Objects
1.
3.
2.
6.
5.
4.
19Representation of Objects
1.
3.
2.
6.
5.
4.
20Representation of Objects
1.
3.
2.
6.
5.
4.
21Representation of Objects
1.
3.
2.
6.
5.
4.
22Metadata Requirements
- Minimum metadata required on deposit
- - In our sample set, inversely proportional to
available documentation - - Creator, Title, Description, Audience, Group
- - Automation easier with submission standards
- - dependant on curators of selection
- Too easy to skip non-mandatory elements
- Easier to extract metadata on original platform
- How can we ensure correct representation?
23Feasibility of Manual Acquisition
- Many utilities available, development required
- Automation essential at the file level
- - Too many files and errors
- NLNZ Extract Tool 80 Technical Metadata
- - Current limitation of formats
- - More binary filetypes needed
- Collaboration with File Format Registries
- Automation of Semantic Metadata Extraction...?
24Cost Requirements
- Difficult to determine costs from pilot
- Automation reduces costs
- Number and Expertise of Technicians
- Adherence to Submission Requirements or Standards
could reduce costs - - Only to point of content selection and
appraisal, which is often also in-house
25Conclusions
- Lack of understanding and awareness must be
addressed - Focused goal of Preservation Metadata
- Collaborating system of Preservation,
Bibliographic, Collection Management,
Authentication and File Format Information - Several schemas available, what's best for your
needs? -
26Conclusions
- Further Investigation and Guidance necessary
- - Infrastructures must be developed and
implemented - - Institutions must tailor tried and tested
solutions for their needs