Title: METS Case Study: The NYU Digital Library Team
1METS Case Study The NYU Digital Library Team
- METS Opening Day
- 27 October, 2003
- Leslie Myrick
2Projects at NYU using METS
- EAD Finding Aid Project
- Tokyo Tribunal Proceedings
- Afghanistan Digital Library
- CRL Political Web Archiving Project
- DRAM
- Hemispheric Institute
- REPO History Sign Project
3WHY METS? (1)
- METS was formulated to serve as a
- Submission Information Package
- Archival Information Package
- Dissemination Information Package
4Why METS? (2)
- In other words, its a
- Transfer Syntax
- Archival Syntax
- Functional Syntax
5METS and Complex Digital Objects
- Finding aid images with multiple scans/versions
- Page turner for photo albums, documents, books
Edisto Album, Tokyo Tribunal brief, Afghanistan
Digital Library - Multimedia/Time-Based Media Navigators
Hemispheric Institute SMIL Viewer - Web Site Navigator CRL Political Communications
Web Archiving Project
6Using METS as a SIP
- Berol Collection Finding Aid -- in negotiations
with RLG Cultural Materials Project - METS will be bundled with objects EAD
7METS as a Functional Syntax
- METS designed not only for transfer and archival
management, but for giving access to, navigating
an object - METS XSLT can create dynamic interfaces with
links to resources and their metadata - METS can be dumped into Oracle, indexed and
searched using context-aware queries.
8METS Plays Well With Others
- We have
- EAD Finding Aids pointing to METS
- METS pointing to Finding Aids and marcxml records
- METS pointing to and manipulating TEI
9METS and Extensions at NYU
- MODS and DC for descriptive
- MIX for Images/technical
- textMD for text/technical
- LC A/V Prototype smptetechMD AES
- Missing Links overall Preservation Schema plugin
(PREMIS) rights MD schema
10Ingredients (so far)
- Perl
- MySQL and some Oracle
- Tomcat
- Servlets and jsp
- Saxon and XT
- XSLT
11Tools for Creation
- zeroDB Database
- Input via interface as well as batch loading of
metadata extracted by scripts - e.g. ImageMagick identify, arcscraper.pl
- Outputs METS using Perl DBI
12Tools for Dissemination
- Page-turners
- Multimedia Viewers
- Thumbnail Browsers
13Typical METS Creation Workflow
- ImageMagick extraction of image metadata
- Database input (batch and manual entry) of
descriptive and technical metadata - Generation of METS using Perl DBI against MySQL
14Image Magick Verbose Dump
- Image taqw_001s.jpg
- Format JPEG (Joint Photographic Experts Group
JFIF format) - Geometry 625x886
- Class DirectClass
- Type true color
- Depth 8 bits-per-pixel component
- Colors 33080
- Profile-color 552 bytes
- Profile-iptc 5636 bytes unknown êëÿ
- Resolution 100x100 pixels/inch
- Filesize 210kb
- Interlace None
- Background Color white
- Border Color dfdfdf
- Matte Color grey74
- Iterations 0
- Compression JPEG
- signature 8c37d0b82374d8eaa6b4d6b062699a9b8d7d8
6f2ba1d4e320f2226181d062822 - Tainted False
15Image Magick non-Verbose Dump
- taqw-fr001.tif TIFF 6500x6817 DirectClass 8-bit
126mb 4.3u 006 - taqw-fr001s.jpg1 JPEG 625x886 DirectClass 8-bit
191kb 0.0u 001 - taqw-fr001t.jpg2 JPEG 100x142 DirectClass 8-bit
9954b 0.0u 001
16Extracting METS from a DB
- doWebArchive.cgi
- MODS for homepage DC for pages
- MIX for images/technical
- textMD for web page/technical
-
17METS for Discovery
- Dump METS files into Oracle as CLOB
- Create Oracle Intermedia index
- XML-aware full-text search
- Example CRL political web archiving project
18CRL Political Web Archive
- Collaboration between Stanford, Cornell, Texas,
NYU, IA under aegis of CRL, Mellon - Sub-Saharan Africa, South East Asia, Latin
America, Western Europe - Testbed 400 URLs websites from radical groups,
NGOs - Internet Archive .arc files
19Internet Archive .arc files
- .arc file 100 MB aggregate of harvested files,
along with HTTP headers and crawler-generated
header for each file - Fine as a simple SIP, but basically unmanageable
as an AIP or DIP - At present accessed using byte offsets to grab
content from aggregate file - Only searchable by URL (Wayback Machine)
20Automated extraction of text-based metadata e.g.
web pages
- arcscraper.pl
- Descriptive and technical MD for object
- datscraper.pl
- Checksums, titles
- Links from each object
- makeLinkTable.pl
- Creates link to object relationships
21Go to Videotape
22The Future?
- Persistent Identifiers
- Preservation Metadata Schema
- Java development
- Move from Oracle to Cheshire II