Title: Metadata and Digital Libraries
1Metadata and Digital Libraries
- The use of TEI and EAD in the University of
Torontos Barren Lands Digital Archive
By Marlene van Ballegooie and Sian Meikle
University of Toronto Libraries
2Background to the Barren Lands
- Project focused on two exploratory surveys
conducted by J.B. Tyrrell in 1893 and 1894 - Approximately 5,000 images
- Full-text searchable
3Background to the Barren Lands
4Background to the Barren Lands
5Background to the Barren Lands
6Background to the Barren Lands
7Background to the Barren Lands
8Background to the Barren Lands
9Background to the Barren Lands
10Online Archives what approach?
- Question how to represent archives in an online
environment? - Exhibition approach low searchability but
provides context - Database approach high searchability, low
context - Neither enough for Barren Lands
11Online Archives what standards?
- Library of Congress American Memory project,
presenting its archival collections online, uses
EAD - California Digital Library, collection level MARC
records link to EAD finding aids
12What is SGML?
- EAD is an instance of SGML
- SGML Standard Generalized Markup Language
- device and system independent
- SGML is a metalanguage rules for defining a
markup language - SGML Document Type Definitions (DTD) spell out
the allowable language
13SGML Example
- Shows structural relationships between the
elements of a document
ltdocumentgt lttitlegtThe story of my lifelt/titlegt
ltauthorgt ltfirstnamegtJohnlt/firstnamegt
ltsurnamegtSmithlt/surnamegt lt/authorgt
lt/documentgt
14What is XML?
- SGML is large and complex
- XML (extensible markup language) is a subset of
SGML - developed by W3C
- designed for use over the web
- Many DTDs available in SGML and XML versions
15Encoded Archival Description (EAD)
- EAD is a DTD for encoding archival finding aids
- EAD non-proprietary standard
- Available in SGML and XML versions
- Represents the multi-level nature of archives
16EAD Structure
- lteadgt
- lteadheadergtContains info on title, author,
creation date. etc. of the finding
aidlt/eadheadergt - ltarchdescgtWrapper that holds all descriptive info
such as title, date, extent, biographical sketch,
scope and content, administrative and arrangement
info, notes, subject headings, etc. - ltc01gtSeries level description e.g.
Correspondence - ltc02gtSub-series level description e.g. Family
Correspondence - ltc03gtFile level description e.g. Letters to
William C. Tyrrell - lt/c03gtlt/c02gtlt/c01gtlt/archdescgtlt/eadgt
17Encoding Finding Aids With EAD
- EAD allows for description of itemsBUTnot
sufficient for Barren Lands project. Why? - Could not include all metadata elements
- Could not embed OCR effectively
- Could not include the necessary structural
metadata
18Describing Items
- What metadata standards are used to describe
items? - For books, TEI Lite is a popular standard
- Used by Making of America (MOA), Early Canadiana
Online (ECO)
19What is TEI?
- International standard for encoding electronic
textual materials - Developed by humanities scholars in 1987,
guidelines published in 1994 - TEI-Lite - A subset of TEI containing the most
commonly used tags
20TEI Example
- The header
- ltteiHeadergt ltfileDescgt lt/fileDescgt
- Bibliographic information Author, title,
publisher, etc - ltencodingDescgt lt/encodingDescgt
- How the material was modified when digitized
- ltprofileDescgt lt/profileDescgt
- Non bibliographic information subject
descriptors, etc. - ltrevisionDescgt lt/revisionDescgt
- Revision log for digital version
- lt/teiHeadergt
21TEI Example
- The body
- ltbodygt
- The document itself. Contains structural
elements in turn. Some examples - ltdiv1gtltdiv7gt
- Describe major structural divisions
hierarchically - ltpgt, ltpbgt
- Marks boundaries between pages of text
- lttitlePagegt, ltepigraphgt, ltsalutegt,
- For the title page, epigraph, salutation, and so
on - ltrefgt, ltxrefgt
- For pointers to other places, possibly with
explanatory text - lt/bodygt
22Item Level Description
- Barren Lands tags included
- ltteiheadergt and its sub-elements to capture
descriptive metadata - ltxrefgt for linking documents
- ltpgt, ltpbgt to demarcate pages and hold structural
metadata
23Text Encoding
- Descriptive metadata encoded in TEIheader
- Used optical character recognition (OCR) and
manual rekeying to obtain text files - Placed each page of text within paragraph tags
ltpgt in the body of the TEI record
24Structural Metadata
- Structural metadata needed to keep track of image
sequence - Encoded in the empty page break ltpbgt element
- Example
- ltpb idtyrrell/text/T10001/0004 n0
typeill titleA Valley in the Barrens
rotateyesgt
25Barren Lands Two tiered approach
TEI Item Description
EAD finding aid Item
Item Item
Programmed link
Extrefloc link
TEI Item Description
Programmed link
Extrefloc link
TEI Item Description
Programmed link
Extrefloc link
26(No Transcript)
27Did this approach work?
- Modifications preserved intent of DTD, but added
information via attributes - Minimal structural encoding allowed us to meet
project needs cost-effectively - Use of standards allowed us to use standard
crosswalks to extract information for data
warehousing
28Further resources standards
- XML in 10 points (W3C)http//www.w3.org/XML/1999/
XML-in-10-points - TEI lite tag listhttp//www.tei-c.org/Lite/U5-tag
list.html - Encoded Archival Description An Introduction and
Overview by Daniel Pittihttp//www.dlib.org/dlib/
november99/11pitti.html - Library of Congress standards http//www.loc.gov/
standards/ - Digital Library Federation standardshttp//www.di
glib.org/standards.htm
29Further resources digital archives
- Images Canadahttp//www.imagescanada.ca/
- Making of Americahttp//moa.umdl.umich.edu/index
.html - American Memoryhttp//memory.loc.gov/
- Barren Landshttp//digital.library.utoronto.ca/T
yrrell/ - Virginia Digital Libraryhttp//www.lva.lib.va.us
/dlp/