Metadata and Digital Libraries - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Metadata and Digital Libraries

Description:

Project focused on two exploratory surveys conducted by J.B. Tyrrell in 1893 and 1894 ... pb id='tyrrell/text/T10001/0004' n='0' type='ill' title='A Valley in ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 30
Provided by: marlenevan
Category:

less

Transcript and Presenter's Notes

Title: Metadata and Digital Libraries


1
Metadata and Digital Libraries
  • The use of TEI and EAD in the University of
    Torontos Barren Lands Digital Archive

By Marlene van Ballegooie and Sian Meikle
University of Toronto Libraries
2
Background to the Barren Lands
  • Project focused on two exploratory surveys
    conducted by J.B. Tyrrell in 1893 and 1894
  • Approximately 5,000 images
  • Full-text searchable

3
Background to the Barren Lands
  • Maps

4
Background to the Barren Lands
  • Letters

5
Background to the Barren Lands
  • Diaries

6
Background to the Barren Lands
  • Notebooks

7
Background to the Barren Lands
  • Photographs

8
Background to the Barren Lands
  • Newspaper clippings

9
Background to the Barren Lands
  • Published works

10
Online Archives what approach?
  • Question how to represent archives in an online
    environment?
  • Exhibition approach low searchability but
    provides context
  • Database approach high searchability, low
    context
  • Neither enough for Barren Lands

11
Online Archives what standards?
  • Library of Congress American Memory project,
    presenting its archival collections online, uses
    EAD
  • California Digital Library, collection level MARC
    records link to EAD finding aids

12
What is SGML?
  • EAD is an instance of SGML
  • SGML Standard Generalized Markup Language
  • device and system independent
  • SGML is a metalanguage rules for defining a
    markup language
  • SGML Document Type Definitions (DTD) spell out
    the allowable language

13
SGML Example
  • Shows structural relationships between the
    elements of a document

ltdocumentgt lttitlegtThe story of my lifelt/titlegt
ltauthorgt ltfirstnamegtJohnlt/firstnamegt
ltsurnamegtSmithlt/surnamegt lt/authorgt
lt/documentgt
14
What is XML?
  • SGML is large and complex
  • XML (extensible markup language) is a subset of
    SGML
  • developed by W3C
  • designed for use over the web
  • Many DTDs available in SGML and XML versions

15
Encoded Archival Description (EAD)
  • EAD is a DTD for encoding archival finding aids
  • EAD non-proprietary standard
  • Available in SGML and XML versions
  • Represents the multi-level nature of archives

16
EAD Structure
  • lteadgt
  • lteadheadergtContains info on title, author,
    creation date. etc. of the finding
    aidlt/eadheadergt
  • ltarchdescgtWrapper that holds all descriptive info
    such as title, date, extent, biographical sketch,
    scope and content, administrative and arrangement
    info, notes, subject headings, etc.
  • ltc01gtSeries level description e.g.
    Correspondence
  • ltc02gtSub-series level description e.g. Family
    Correspondence
  • ltc03gtFile level description e.g. Letters to
    William C. Tyrrell
  • lt/c03gtlt/c02gtlt/c01gtlt/archdescgtlt/eadgt

17
Encoding Finding Aids With EAD
  • EAD allows for description of itemsBUTnot
    sufficient for Barren Lands project. Why?
  • Could not include all metadata elements
  • Could not embed OCR effectively
  • Could not include the necessary structural
    metadata

18
Describing Items
  • What metadata standards are used to describe
    items?
  • For books, TEI Lite is a popular standard
  • Used by Making of America (MOA), Early Canadiana
    Online (ECO)

19
What is TEI?
  • International standard for encoding electronic
    textual materials
  • Developed by humanities scholars in 1987,
    guidelines published in 1994
  • TEI-Lite - A subset of TEI containing the most
    commonly used tags

20
TEI Example
  • The header
  • ltteiHeadergt ltfileDescgt lt/fileDescgt
  • Bibliographic information Author, title,
    publisher, etc
  • ltencodingDescgt lt/encodingDescgt
  • How the material was modified when digitized
  • ltprofileDescgt lt/profileDescgt
  • Non bibliographic information subject
    descriptors, etc.
  • ltrevisionDescgt lt/revisionDescgt
  • Revision log for digital version
  • lt/teiHeadergt

21
TEI Example
  • The body
  • ltbodygt
  • The document itself. Contains structural
    elements in turn. Some examples
  • ltdiv1gtltdiv7gt
  • Describe major structural divisions
    hierarchically
  • ltpgt, ltpbgt
  • Marks boundaries between pages of text
  • lttitlePagegt, ltepigraphgt, ltsalutegt,
  • For the title page, epigraph, salutation, and so
    on
  • ltrefgt, ltxrefgt
  • For pointers to other places, possibly with
    explanatory text
  • lt/bodygt

22
Item Level Description
  • Barren Lands tags included
  • ltteiheadergt and its sub-elements to capture
    descriptive metadata
  • ltxrefgt for linking documents
  • ltpgt, ltpbgt to demarcate pages and hold structural
    metadata

23
Text Encoding
  • Descriptive metadata encoded in TEIheader
  • Used optical character recognition (OCR) and
    manual rekeying to obtain text files
  • Placed each page of text within paragraph tags
    ltpgt in the body of the TEI record

24
Structural Metadata
  • Structural metadata needed to keep track of image
    sequence
  • Encoded in the empty page break ltpbgt element
  • Example
  • ltpb idtyrrell/text/T10001/0004 n0
    typeill titleA Valley in the Barrens
    rotateyesgt

25
Barren Lands Two tiered approach
TEI Item Description
EAD finding aid Item
Item Item
Programmed link
Extrefloc link
TEI Item Description
Programmed link
Extrefloc link
TEI Item Description
Programmed link
Extrefloc link
26
(No Transcript)
27
Did this approach work?
  • Modifications preserved intent of DTD, but added
    information via attributes
  • Minimal structural encoding allowed us to meet
    project needs cost-effectively
  • Use of standards allowed us to use standard
    crosswalks to extract information for data
    warehousing

28
Further resources standards
  • XML in 10 points (W3C)http//www.w3.org/XML/1999/
    XML-in-10-points
  • TEI lite tag listhttp//www.tei-c.org/Lite/U5-tag
    list.html
  • Encoded Archival Description An Introduction and
    Overview by Daniel Pittihttp//www.dlib.org/dlib/
    november99/11pitti.html
  • Library of Congress standards http//www.loc.gov/
    standards/
  • Digital Library Federation standardshttp//www.di
    glib.org/standards.htm

29
Further resources digital archives
  • Images Canadahttp//www.imagescanada.ca/
  • Making of Americahttp//moa.umdl.umich.edu/index
    .html
  • American Memoryhttp//memory.loc.gov/
  • Barren Landshttp//digital.library.utoronto.ca/T
    yrrell/
  • Virginia Digital Libraryhttp//www.lva.lib.va.us
    /dlp/
Write a Comment
User Comments (0)
About PowerShow.com