Title: Emerging Standards for Complex Works
1Emerging Standards for Complex Works
- Howard Besser
- UCLA School of Education Information
- http//www.gseis.ucla.edu/howard
2Emerging Standards for Complex Works
- Background Context for Standards
- MOA2 Structural Administrative Metadata
- NISO/DLF Technical Imaging Standards
- Identification/provenance
- Rich Media
- Longevity
3Key problems were facing
- Discovery
- Longevity
- Interoperability
4Traditional Digital Library Model
5Ideal Digital Library Model
6For Interoperability Digital Libraries Need
Standards
- Descriptive Metadata for consistent description
- Discovery Metadata for finding
- Administrative Metadata for viewing and
maintaining - Structural Metadata for navigation
- ... Terms Conditions Metadata for controlling
access...
7Why are Standards and Metadata consensus
important?
- Managing digital files over time
- Longevity
- Interoperability
- Veracity
- Recording in a consistent manner
- Will give vendors incentive to create
applications that support this
8Collaborative Metadata Projects
- Dublin Core
- NSF/ERCIM Digital Collaboratory
- OCLC CORC Project-
- Visual Resources Association (VRA) Core
- Encoded Archival Description (EAD)
- Computerized Interchange of Museum Information
(CIMI)- - Records Export for Art and Cultural Heritage
(REACH)
9CORC--Cooperative Online Resource Catalog
- both bib records webliographies (pathfiinders)
- supports both AACR2/MARC and DC
- began 1/99, scheduled availability 7/00
- 100-200 participants
- Academic libraries
- OCLC networks, special libraries, public
libraries, state national libraries, consortia
10Making of America II-
- Background of the DLF Project
- Administrative Metadata
- Structural Metadata
11MOA2 Goal is Interpoerability
12DLF Metadata for Interoperability Testbedthe
MOA II Project
- R D
- Distributed Repositories
- Transportation, 1869-1900
- Testbed Project
- Best Practices
- Structural and administrative metadata
13Previous Projects/Background
- Library Standards Background
- UC Berkeley Background
- Finding Aids
- EAD
- SGML
- EAD Digital Archives
14MOA II Classes of Objects
- Continuous Tone Photos
- Photo Albums
- Diaries, journals, letterpress books
- Ledgers
- Correspondence
15MOA II Metadata
- Administrative Metadata
- for enhancing resource management
- Structural Metadata
- for reflecting internal hierarchies and
relationships btwn parts - Raw/Seared/Cooked
16MOA II Behaviors
17MOA II Best practices
- Use/Users/Collection
- Benchmarking
- Masters vs. Derivatives
- Scanning-
- Administrative Metadata-
- Structural Metadata-
18Scanning Best Practices
- Think about users (and potential users), uses,
and type of material/collection - Scan at the highest quality that does not exceed
the likely potential users/uses/material - Do not let todays delivery limitations influence
your scanning file sizes understand the
difference between digital masters and derivative
files used for delivery - Many documents which appear to be bitonal
actually are better represented with greyscale
scans
- Include color bar and ruler in the scan
- Use objective measurements to determine scanner
settings (do NOT attempt to make the image good
on your particular monitor or use image
processing to color correct) - Dont use lossy compression
- Store in a common (standardized) file format
- Capture as much metadata as is reasonably
possiple (including metadata about the scanning
process itself)
19Why Scale is important
20Administrative Metadatato uniquely identify a
digital resource and manage it over time
- Information about where the various
pieces/versions of the object reside - Information to view the digital object
- Information about the scanning process
21Structural Metadatathat which is relevant to
presentation of the digital object to the user
- metadata defining the "object a book, a diary,
a photo album - metadata defining the sub-objects pages
(physical) or chapters and subheads (intellectual)
22SGML, XML, HTML
- TEI for structured humanities text
- EAD for Finding Aids
23NISO/DLF Image Metadata WorkshopPossible Goals
- Metadata fields
- Rules for Field Contents (authority control)
- Core set of necessary fields
- Syntax for expressing fields and contents
(headers)
24Image MetadataFocus on Metadata that may prove
helpful for
- management
- use
- preservation
- ...
25Image MetadataBreak-out Groups Work Done
- Characteristics and Features of Images
- Image Production and Reformatting Features
- Image Identification and Integrity
26NISO/DLF Image Metadata Workshop (4/99) Image
Technical Information Possible Goals
- Metadata fields
- Rules for Field Contents (authority control)
- Core set of necessary fields
- Syntax for expressing fields and contents
(headers)
27Image MetadataFocus on Metadata that may prove
helpful for
- management
- use
- preservation
- ...
28Image MetadataBreak-out Groups Work Done-
- Characteristics and Features of Images
- Image Production and Reformatting Features
- Image Identification and Integrity
29Image Metadata Elements for Data DictionaryData
Dictionary Entries
- Element Name
- Definition (short) of the element name
- Is the element required? (Identified as
Mandatory, Mandatory if Applicable, Recommended,
Optional) - How is the value of the element represented?
- Examples
- When is this data collected?
- What is the purpose of this data?
- Who would the identified users be?
- How is the metadata used?
- What other metadata standards reference it?
30Image Metadata Elements for Data
DictionaryCharacteristics and Features Element
List
- Format Issues
- Resolution Issues
- Encoding
- Compression
- Others
31Image Metadata Elements for Data Dictionary
Image Production Element List (Pertaining to the
Image)
- In-image target(s)
- System target(s), associated with the object
- Responsible agent
- Rationale
- Hardware
- Software
32 Image Metadata Elements for Data Dictionary
Image Production Element List (Pertaining to the
Process)
- Format of the image
- Intrinsic characteristics of the image
- Identification
- Provides a means for defining methodology
including documentation and rationale - Who is involved with the file?
- Who created the image file?
- Who commissioned the creation of the image file
(i.e., the chartering entity), as opposed to Who
is the responsible agency? Who is the owner? - Where
- What
- When necessary dates including capture
date/time, modification - Checksum
- Navigational aid
- Encoding tools
33Image MetadataNISO/DLF Image MetadataIn
Progress
- Data Dictionary for both Characteristics
Features and for Image Production Elements due
end of 6/00
34Finding Image Origins
35Identification/Provenance (Images)-
- The number of variant forms of a work can be
enormous - Image Families
- A digital image frequently has many layers of
parentage - Information about the parentage that can indicate
the quality and veracity of the image (Dublin
Core "Source" and "Relation") - how to deal with different versions derived from
the same scan or different encoding schemes - Vocabulary Standards to express this
36The number of variant forms of a work can be
enormous
- different views of the same object
- different lighting of the same object
- different scans of the same photo
- different resolutions
- different compression schemes
- different compression ratios
- different file storage formats
- different details of the same image
- ...
37Image Families
38Identification/Provenance
- how to deal with different versions (browse,
hi-res, medium res) derived from the same scan or
different encoding schemes (TIFF, PICT, JFIF) - Vocabulary Standards to express this
- VRA Surrogate Categories
- CIMI's "Image Elements
39Other Metadata
- Description of depiction/surrogate (What VRA
calls its "Surrogate Categories") - Description of original object
- Rights and Reproduction Information
- Location Information
40Metadata for Digital Commerce
41ltIndecsgt
- formal structure for describing and uniquely
identifying intellectual property itself, the
people and businesses involved in its trading,
and the agreements which they make about it
(primarily for publishing, music, and visual
arts) - will develop high-level specifications for the
services that will be required to implement a
global IP trading system based on this ltindecsgt
generic data model - focus is on encoding rights at a high level, not
on resource discovery - likely to involve metadata schma registration and
directory to allow interoperation of personal
identifiers for rightsholders and users - supported by EEC DG-13
- First meeting July 1999
- http//www.indecs.org/
42Problems Potentialsof Rich Media-
- Types of Rich Media
- Technologies and problems
- Opportunities--a scenario
- Metadata
- Indexing
43Some Types of Rich Media
- Moving image materials
- Multimedia
- Interactive programs
- Computer art
44After an uphill battle, tech and Tinseltown find
common ground (USA Today, 3/3/00)
45Projected Changes Prospect of digitized movies
already has some mourning loss of film(SF
Chronicle, 3/5/00)
46Video Technology to Make the Head Spin (NYT
3/2/00)
47ECI - Hole in Space (both)
48ECI - 84-locations
49ECI - 84-Community Memory
50ECI - 84-MOCA
51ECI - Avatars Humans
52ECI - Avatar Stage
53Complexity of Rich Media
- Works often have artistic nature (including video
games) - Enormous number of elements can, at times, be
very important to preserve (pacing, original
artifact, elements used to construct the
artifact) - Too complex to save every one of these aspects
for every type of material - Importance of saving documentation
54Rich Media Technologies
- Streaming media vs. Downloaded files
- Bandwidth and compression
- Need to offload functions onto clients
55The Inter-relation Problem
- -Info is increasingly inter-related to other info
- -How do we make our own Info persist when it
points to and integrates with Info owned by
others? - -What is the boundary of a set of information (or
even of a digital object)?
56The Translation Problem
- Content translated into new delivery devices
changes meaning - -A photo vs. a painting
- -If Info is produced originally in digital form
in one encoded format, will it be the same when
translated into another format? - Behaviors
57Problems of Rich Media
- Complexity of formats (storage compression)
- Synchronicity between media/streams
- Pieces and Boundaries
- Persistent IDs
- Interactivity
- Historical context
- Content
- Recontextualization (Postmodernism)
58Opportunities--a scenario
- Huge stable online DB of rich media (Prelinger
Archives) - Creators create new works that consist mainly of
links to and transitions btwn pieces of the rich
media DB - Works are not really assembled until run-time
- Securing IP permission may shift from
capital-intensive producer to end-user - Economics of media production may change
drastically
59Structural Metadata for Complex Objects-
60Synchronized Multimedia Integration Language
(SMIL)
- For repurposing and reuse in different ways
- Use XML to reference various pieces in different
ways - Supported by Realmedia but not Microsoft or
Macromedia
61MPEG 4
- Object-oriented
- Very low level of granularity (even objects vs
backgrounds) - Scaleable bandwidth use
- Binary Format for Scenes (BIFS) borrows concepts
from VRML
62Indexing ofMoving Image Materials
- Whole works vs. parts of Works
- MPEG 7
- Approaches to segmentation thumbnail
representation - Closed caption indexing
- Audio description indexing
- Semiotics
63Other Types of Metadata-
- Longevity
- Identification/Provenance
- Rights Management
64The Short Life of Digital Info Digital Longevity
Problems-
- Disappearing Information
- The Viewing Problem
- The Scrambling Problem
- The Inter-relation Problem
- The Custodial Problem
- The Translation Problem
65The Viewing Problem
- Digital Info requires a whole infrastructure to
view it - Each piece of that infrastructure is changing at
an incredibly rapid rate - How can we ever hope to deal with all the
permutations and combinations
66The Scrambling ProblemDangers from
- Compression to ease storage delivery
- Container Architecture to enhance digital commerce
67The Inter-relation Problem
- -Info is increasingly inter-related to other info
- -How do we make our own Info persist when it
points to and integrates with Info owned by
others? - -What is the boundary of a set of information (or
even of a digital object)?
68The Custodial Problem
- How do we decide what to save?
- Who should save it?
- How should they save it?
- -methods for later access emulation, migration,
etc. - -issues of authenticity and evidence
69The Translation Problem
- Content translated into new delivery devices
changes meaning - -A photo vs. a painting
- -If Info is produced originally in digital form
in one encoded format, will it be the same when
translated into another format? - Behaviors
70Pieces of the Solution (1/2)
- -We need to insist upon clearly readable
standardized ways for digital objects to
self-identify their formats - -We should discourage scrambling
- -We need to better understand information
inter-relates to other Info, and what constitutes
boundaries of Info objects
71Pieces of the Solution (2/2)
- -People and organizations wishing to make
information persist need guidelines of how to go
about doing it - -We need to better understand how translating
from one storage or display format to another
affects the meaning of a work - -We need to save the behaviors of a digital
object, not just its contents
72Metadata can be the first line of defense
- Can tell you
- where the file is (if you cant find the file)
- where more info about the file is (if you have
the file but most other metadata has become
separated) - what the file format is
- what the compression scheme is
- what application program and version is needed
for the file
73Groups Working onthe Big Longevity
Problemhttp//sunsite.Berkeley.EDU/Imaging/Databa
ses/Longevity/
- CPA Task Force
- CPA Study Group
- Getty Time Bits Conference-
- Internet Archive
- Long Now
74Migration/Refreshing
- Impact on evidential value
75Emerging Standards for Complex Works
- Howard Besser
- UCLA School of Education Information
- http//www.gseis.ucla.edu/howard/image-meta.html
- http//sunsite.Berkeley.EDU/moa2/
- http//www.gseis.ucla.edu/howard/Classes/287-movi
ng.html - http//www.gseis.ucla.edu/howard/Classes/287-mov-
index-bib.html - http//www.gseis.ucla.edu/howard/Metadata/UC-May0
0/ - http//www.getty.edu/gri/standard/intrometadata/
- http//sunsite.Berkeley.EDU/Imaging/Databases/sta
ndards - http//sunsite.Berkeley.EDU/Longevity/
- http//www.ifla.org/II/metadata.htm
- http//purl.oclc.org/metadata/dublin_core/
- http//purl.oclc.org/corc/
- http//lcweb.loc.gov/ead/
- http//sunsite.berkeley.edu/Metadata/sp2000.html
76Data StructuresThe VRA Core
- 28 elements specifically for visual resource
collections - Work Description Categories-
- Visual Document Description Categories-
- http//www.oberlin.edu/art/vra/dsc.html
77VRA CoreWork Description Categories
- Work type
- Title
- Measurements
- Material
- Technique
- Creator
- Role
- Date
- Repository name
- Repository place
- Repository number
- Current site
- Original site
- Style/period/group/movement
- Nationality/culture
- Subject
- Related work
- Relationship type
- Notes
78VRA CoreVisual Document Description Categories
- Visual document type
- Visual document format
- Visual document measurements
- Visual document date
- Visual document owner
- Visual document owner number
- Visual document view description
- Visual document subject
- Visual document source
79Thesaurus for Graphic Materials
- designed for subject indexing of pictorial
materials, particularly large general collections
of historical images - for cataloging and retrieval
- good for general audiences and broad approaches
to the material - TGM-I Subject Terms TGM-II Genre and Physical
Characteristic Terms - http//lcweb.loc.gov/rr/print/tgm/toc.html
80AAT
- 120,000 terms
- for describing objects, textual materials,
images, architecture, and material culture from
antiquity to present - large and complex
- http//www.getty.edu/gri/vocabularies/
81ULAN
- name authority
- http//www.getty.edu/gri/vocabularies/
82Thesaurus of Geographic Names
- over 1 million records
- hierarchical and global
- throughout history
- most records include coordinates and descriptive
notes
83Semantics/Syntax/Structure
- Semantics
- meaning, as defined by a community to meet their
particular needs (DC) - Syntax
- a systematic arrangement of data elements for
machine processing - facilitates the exchange and use of metadata
among various applications (HTML, XML, RDF) - Structure
- a formal arrangement of the syntax with the goal
of consistent representation of the semantics
(rules defining field contents like 1/11/99)
84Metadata Mapping-
- Crosswalks
- Resource Description Framework (RDF)
85Crosswalks
- mapping btwn differing metadata structures
- eliminate the need for monolithic, universally
adopted standards - focus on flexibility and interoperatiblity
- RDF-based metadata registries
86Crosswalk Example
87Resource Description Framework (RDF, spec
released 2/99)
- W3C Metadata activity
- designed to move the Web beyond simple links to
semantically-rich relationships btwn resources - metadata application using XML as a common syntax
for exchange and processing - flexible architecture for managing diverse
application-specific metadata packets that can be
processed by machines - associates resources, property types, and
corresponding values - http//www.w3.org/RDF/
88RDF
- Resources (character strings, names, digital
objects) - Property (is the author of)
- Value
- resourcespropertiesrelationships
- many different relationships can be reflected
89XML-encoded RDF
- lt?xmlnamespace nshttp//www.w3.org/RDF/RDF
prefix"RDF" ?gt - lt?xmlnamespace nshttp//purl.oclc.org/DC/
prefix"DC" ?gt - ltRDFRDFgt
- ltDCCreatorgtHoward Besserlt/DCCreatorgt
- lt/RDFDescriptiongt
- lt/RDFRDFgt
90(No Transcript)