METS Case Study: The NYU Digital Library Team - PowerPoint PPT Presentation

About This Presentation
Title:

METS Case Study: The NYU Digital Library Team

Description:

Projects at NYU using METS. EAD Finding Aid Project. Tokyo Tribunal Proceedings ... turner for photo albums, documents, books Edisto Album, Tokyo Tribunal brief, ... – PowerPoint PPT presentation

Number of Views:356
Avg rating:3.0/5.0
Slides: 23
Provided by: Leslie5
Learn more at: https://www.loc.gov
Category:
Tags: mets | nyu | case | digital | library | study | team

less

Transcript and Presenter's Notes

Title: METS Case Study: The NYU Digital Library Team


1
METS Case Study The NYU Digital Library Team
  • METS Opening Day
  • 27 October, 2003
  • Leslie Myrick

2
Projects at NYU using METS
  • EAD Finding Aid Project
  • Tokyo Tribunal Proceedings
  • Afghanistan Digital Library
  • CRL Political Web Archiving Project
  • DRAM
  • Hemispheric Institute
  • REPO History Sign Project

3
WHY METS? (1)
  • METS was formulated to serve as a
  • Submission Information Package
  • Archival Information Package
  • Dissemination Information Package

4
Why METS? (2)
  • In other words, its a
  • Transfer Syntax
  • Archival Syntax
  • Functional Syntax

5
METS and Complex Digital Objects
  • Finding aid images with multiple scans/versions
  • Page turner for photo albums, documents, books
    Edisto Album, Tokyo Tribunal brief, Afghanistan
    Digital Library
  • Multimedia/Time-Based Media Navigators
    Hemispheric Institute SMIL Viewer
  • Web Site Navigator CRL Political Communications
    Web Archiving Project

6
Using METS as a SIP
  • Berol Collection Finding Aid -- in negotiations
    with RLG Cultural Materials Project
  • METS will be bundled with objects EAD

7
METS as a Functional Syntax
  • METS designed not only for transfer and archival
    management, but for giving access to, navigating
    an object
  • METS XSLT can create dynamic interfaces with
    links to resources and their metadata
  • METS can be dumped into Oracle, indexed and
    searched using context-aware queries.

8
METS Plays Well With Others
  • We have
  • EAD Finding Aids pointing to METS
  • METS pointing to Finding Aids and marcxml records
  • METS pointing to and manipulating TEI

9
METS and Extensions at NYU
  • MODS and DC for descriptive
  • MIX for Images/technical
  • textMD for text/technical
  • LC A/V Prototype smptetechMD AES
  • Missing Links overall Preservation Schema plugin
    (PREMIS) rights MD schema

10
Ingredients (so far)
  • Perl
  • MySQL and some Oracle
  • Tomcat
  • Servlets and jsp
  • Saxon and XT
  • XSLT

11
Tools for Creation
  • zeroDB Database
  • Input via interface as well as batch loading of
    metadata extracted by scripts
  • e.g. ImageMagick identify, arcscraper.pl
  • Outputs METS using Perl DBI

12
Tools for Dissemination
  • Page-turners
  • Multimedia Viewers
  • Thumbnail Browsers

13
Typical METS Creation Workflow
  • ImageMagick extraction of image metadata
  • Database input (batch and manual entry) of
    descriptive and technical metadata
  • Generation of METS using Perl DBI against MySQL

14
Image Magick Verbose Dump
  • Image taqw_001s.jpg
  • Format JPEG (Joint Photographic Experts Group
    JFIF format)
  • Geometry 625x886
  • Class DirectClass
  • Type true color
  • Depth 8 bits-per-pixel component
  • Colors 33080
  • Profile-color 552 bytes
  • Profile-iptc 5636 bytes unknown êëÿ
  • Resolution 100x100 pixels/inch
  • Filesize 210kb
  • Interlace None
  • Background Color white
  • Border Color dfdfdf
  • Matte Color grey74
  • Iterations 0
  • Compression JPEG
  • signature 8c37d0b82374d8eaa6b4d6b062699a9b8d7d8
    6f2ba1d4e320f2226181d062822
  • Tainted False

15
Image Magick non-Verbose Dump
  • taqw-fr001.tif TIFF 6500x6817 DirectClass 8-bit
    126mb 4.3u 006
  • taqw-fr001s.jpg1 JPEG 625x886 DirectClass 8-bit
    191kb 0.0u 001
  • taqw-fr001t.jpg2 JPEG 100x142 DirectClass 8-bit
    9954b 0.0u 001

16
Extracting METS from a DB
  • doWebArchive.cgi
  • MODS for homepage DC for pages
  • MIX for images/technical
  • textMD for web page/technical

17
METS for Discovery
  • Dump METS files into Oracle as CLOB
  • Create Oracle Intermedia index
  • XML-aware full-text search
  • Example CRL political web archiving project

18
CRL Political Web Archive
  • Collaboration between Stanford, Cornell, Texas,
    NYU, IA under aegis of CRL, Mellon
  • Sub-Saharan Africa, South East Asia, Latin
    America, Western Europe
  • Testbed 400 URLs websites from radical groups,
    NGOs
  • Internet Archive .arc files

19
Internet Archive .arc files
  • .arc file 100 MB aggregate of harvested files,
    along with HTTP headers and crawler-generated
    header for each file
  • Fine as a simple SIP, but basically unmanageable
    as an AIP or DIP
  • At present accessed using byte offsets to grab
    content from aggregate file
  • Only searchable by URL (Wayback Machine)

20
Automated extraction of text-based metadata e.g.
web pages
  • arcscraper.pl
  • Descriptive and technical MD for object
  • datscraper.pl
  • Checksums, titles
  • Links from each object
  • makeLinkTable.pl
  • Creates link to object relationships

21
Go to Videotape
22
The Future?
  • Persistent Identifiers
  • Preservation Metadata Schema
  • Java development
  • Move from Oracle to Cheshire II
Write a Comment
User Comments (0)
About PowerShow.com