Tools Development and Demonstration: North Carolina Geospatial Data Archiving Project

About This Presentation
Title:

Tools Development and Demonstration: North Carolina Geospatial Data Archiving Project

Description:

Tools Development and Demonstration: North Carolina Geospatial Data Archiving Project Jim Tuttle North Carolina State University Libraries –

Number of Views:100
Avg rating:3.0/5.0
Slides: 13
Provided by: ncs81
Learn more at: https://www.lib.ncsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Tools Development and Demonstration: North Carolina Geospatial Data Archiving Project


1
Tools Development and DemonstrationNorth
Carolina Geospatial Data Archiving Project
  • Jim Tuttle
  • North Carolina State University Libraries

2
Process Overview
  • Data transfer
  • Threat and format analysis, validation
  • Archive package organization
  • Selective format migration
  • Metadata normalization and supplementation
  • Source metadata translation
  • Statistics collection
  • Extra-repository AIP management

3
Data Transfer
  • Python Md5sum comparison
  • 'Transfer set' metadata capture in 'Seed file'

4
Threat and format analysis, validation
  • Python wrappers for the following
  • Virus ClamAV
  • Compressed files (tar, zip, gzip, bzip)?
  • Geodatabases (extension and size)?
  • Executable files (magic numbers)?
  • Jhove validation

5
Archive package organization
  • ESRI ArcGIS toolbar for selected formats

6
Archive package organization
  • Rule-based python logic
  • filestem
  • extension relationships ( multi-file format
    validation)?
  • directory structure
  • Manual intervention
  • metadata.doc
  • NOID assignment

7
Selective Format Migration
  • Coversions using ArcGIS toolbar
  • e00 interchange to coverage to shapefile
  • geodatabase to raster, shapefile, etc
  • Original files retained

8
Metadata Normalization Supplementation
  • Agency-specific XML templates in ArcCatalog with
    synchronization flags
  • Provenance and curation metadata scripted

9
Source Metadata Translation
  • Hub-and-spoke model a la Echo Depository
  • repository agnostic
  • modular conversion hub
  • facilitate repository software migration
    inter-archive exchange

10
Statistics Collection
  • Python scripted statistics generation
  • number of files by format
  • cumulative size by format
  • mean file size
  • collection size
  • agency contribution

11
Extra-repository AIP management
  • Workflow Management Database populated as a spoke
    on the metadata/ingest hub
  • External tracking of NOID, Handle, ISO keywords,
    other metadata for interaction with other systems

12
Questions?
  • Jim Tuttle
  • Geospatial Data Librarian Project Coordinator
    NCGDAP
  • NCSU Libraries
  • jim_tuttle at ncsu dot edu
  • http//www.lib.ncsu.edu/ncgdap/
Write a Comment
User Comments (0)
About PowerShow.com