Title: OAI and Metadata Aggregation
1OAI and Metadata Aggregation
Sarah Shreeves University of Illinois at
Urbana-Champaign LIS 450 RO Representing and
Organizing Information Resources March 7, 2004
2Outline
- What is the Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH)? - OAI Projects at the University of Illinois and
what weve learned
3OAI is a tool
- Set of rules that defines the communication
between systems (like FTP and HTTP) - All about moving metadata (not data) around
- Assumes widely distributed content, but
centralized services - A building block for digital library services
- The purpose of OAI is to foster interoperability
4OAI is not.
- Metadata
- A search tool
- A database
5How OAI Works
Service Provider Data Provider
- Data providers and service providers
- OAI requests are sent via HTTP
- Responses are sent in valid XML
6OAI Use of Dublin Core
- DC is OAIs lowest common denominator
- BUT
- OAI supports encourages use of other
community-driven metadata schemas
7Harvesting vs. Federation
- Different approaches to interoperability
- Federation services are run remotely on remote
data (e.g. Broadcast Searching) - Harvesting metadata is transferred from the
remote source to the destination where the
services are located - OAI is a harvesting tool.
8OAI Compared to Z39.50
9Why Use OAI?
- Content in non-Z39.50 enabled locations
- Metadata provider more lightweight than Z39.50
and scales well. - Service provider wishes to augment search
services or metadata normalization is needed. - Portals can use both Z39.50 OAI
10Who uses OAI?
- Approximately 400 data providers
- Basic building block of the National Science
Digital Library (NSDL) - Incorporated into D-Space and Eprints.org
- Part of ContentDM, Michigans DLXS, and other
products - International use Open Archives Forum in Europe,
UK and EU
11OAI Projects at UIUC
- NSF funded Second Generation Digital Mathematics
Resources - Mellon funded OAI Metadata Harvesting Project
- http//nergal.grainger.uiuc.edu/search/
- IMLS Digital Collections and Content Project
12(No Transcript)
13Challenges of Metadata Aggregation
- Heterogeneity of items described
- Loss of Context / Information loss
- Knowledge structures differ
- So.
- Native metadata schemas differ
- Controlled vocabularies differ
- Use and presentation of items differ
14Challenges of Metadata Aggregation
- Metadata quality issues emphasized
- Completeness
- Provenance
- Accuracy
- Conformance to expectations
- Logical consistency/coherence
- Timeliness
- Accessibility
15Metadata for different communities
16Metadata for different communities
17- Loss of Context Record in OAI aggregation
18- Context Record in native database
19Loss of context / data
20Loss of context / data
21Completeness of Metadata
- identifierhttp//images.umdl.umich.edu/cgi/i/imag
e/image-idx?viewentrysubviewdetailccfish3ice
ntryidX-0802viewid1004_112 - publisher UMMZ Fish Division
- format jpeg
- type image
- subject 1926-05-18
- subject 1926081218Trib. to Sixteen Cr. Trib.
Pine River, Manistee R.R10WS26
S27JAM26-46005T21N1926/05/18 - language UND
- description Flora and Fauna of the Great Lakes
Region
22(No Transcript)
23Granularity of Description Excerpt of Metadata
Record Describing "Cotton coverlet with
embroidered butterfly design"
- Description Digital image of a single-sized
cotton coverlet for a bed with embroidered
butterfly design. Handmade by Anna F. Ginsberg
Hayutin. - Source Materials cotton and embroidery floss.
Dimensions 71 in. x 86 in. Markings top right
hand corner has 1 1/2 in. x 1/2 in. label cut
outs at upper left and right hand side for head
board fabric is woven in a variation of a rib
weave color each of yellow and gray
hand-embroidered cotton butterflies and flowers
from two shades of each color of embroidery floss
- blue, pink, green and purple and single top 20
in. bordered with blue and black cotton
embroidery thread stitches used for embroidery
running stitch, chain stitch, French knot and
back stitches selvage edges left unfinished
lower edges turned under and finished with large
gray running stitches made with embroidery floss. - Format Epson Expression 836 XL Scanner with
Adobe Photoshop version 5.5 300 dpi 21-53K
bytes. Available via the World Wide Web. - Coverage
- Date Created 2001-09-19 094518 Updated
20011107162451 Created 2001-04-05 Created
1912-1920? - Type Image
24Granularity of Description Excerpt of Metadata
Record Describing American Woven Coverlet
- Description Materials Textile--Multi,
PigmentDye Manufacturing Process
Weaving--Hand, Spinning, Dyeing, Hand-loomed blue
wool and white linen coverlet, worked in overshot
weave in plain geometric variant of a
checkerboard pattern.Coverlet is constructed from
finely spun, indigo-dyed wool and undyed linen,
woven with considerable skill. Although the
pattern is simpler, the overall craftsmanship is
higher than 1934.01.0094A. - D. Schrishuhn,
11/19/99 This coverlet is an example of early
"overshot" weaving construction, probably dating
to the 1820's and is not attributable to any
particular weaver. -- Georgette Meredith,
10/9/1973 - Source
- Format 228 x 169 x 1.2 cm (1,629 g)
- Coverage Euro-American America, North United
States Indiana? Illinois? - Date Early 19th c. CE
- Type cultural physical object original
25Challenge Range of vocabularies in use
Controlled Vocabularies in use for IMLS NLG
projects (results from survey of 65 NLG projects
with digital content)
26Data providers can
- Create metadata for interoperability
- Reusable metadata - think beyond your local users
and environment - Use well structured and defined schemas move
beyond simple DC - Use and identify controlled vocabularies
27Service Providers can
- Analyze metadata and cluster and normalize some
aspects - Build indexes based on type of resource (image,
text, physical object) rather than collection - Custom interfaces and selective views for target
audiences / domains
28Recap
- OAI is a tool
- OAI is easy - metadata is hard
- Better metadata better interoperability
29Resources
- Open Archives Initiative
- http//www.openarchives.org
- Mellon Illinois OAI project
- http//oai.grainger.uiuc.edu
- IMLS Digital Collections and Content Project
- http//imlsdcc.grainger.uiuc.edu
30Contact Information
Sarah Shreeves Project Coordinator, IMLS Digital
Collections and Content Visiting Assistant
Professor of Library Administration University of
Illinois Library at Urbana-Champaign Email
sshreeve_at_uiuc.edu Phone 217.244.7809