Title: Modeling a Natural Language Gateway to MetadataEnabled Resources
1Modeling a Natural Language Gateway to
Metadata-Enabled Resources
- Lynne C. Howarth
- Faculty of Information Studies
- University of Toronto, Canada
2Acknowledgements
- Funding provided by the Social Sciences and
Humanities Research Council of Canada (SSHRC SRG
410-99-1287)
3Introduction
- The development of technologies that enable
access to information regardless of geographic or
language barriers is a key factor for truly
global sharing of knowledge. - Source Oard, D., et al. 1999. Multilingual
Information Discovery and AccesS (MIDAS). D-Lib
Magazine, October 1999. Available at URL
http//www.dlib.org/dlib/october99/Ioard.html
4Introduction - 2
- End-user expectations for seamless cross-language
retrieval are increasing with greater use of and
access to Web resources (Large Moukdad, 2000) - Cross-language retrieval systems in experimental
stages (Peters Braschler, 2001) - Need for systems that obviate the requirement of
understanding underlying metadata structures and
tagging (Buckland et al., 1999) - Current research focus in metadata arena has
tended to be on syntax less focus on semantics
5Research Objectives
- Building on previous research (Howarth, Cronin,
Hannaford, 2002 2003) - to develop and refine a common set of labelled
categories to serve as a natural language
gateway to metadata-enabled resources,
enhancing - Semantic interoperability
- Language interoperability
- Multilingual access
- Cross-domain searching
6Crosswalks - 1
- Identified and analysed the structure and content
of eight metadata schemes from various domains - Encoded Archival Description (EAD)
- Dublin Core (DC) (qualified)
- Government now Global Information Locator
Service (GILS) - Text Encoding Initiative (TEI)
- Visual Resources Association (VRA) Visual
Document Description Categories - Consortium for Interchange of Museum Information
(CIMI) now ceased - Digital Geospatial Metadata (DGM)
- ONIX (Online Information Exchange) Publishing
domain)
7Corsswalks 2
- Using MARC21 as a baseline, and existing
cross-schema crosswalks as benchmarks, created
a master crosswalk incorporating all elements
from each of the eight metadata schemas (previous
slide) - Crosswalks used to identify
- elements that matched across all schemes
- elements that corresponded between two systems or
among three or more - elements that were clearly unique to a domain
- Source, Cromwell-Kessler, W. (1998)
-
8Crosswalk ExampleTop cells Schema-specific
termBottom cells Like terms (cross-schema)
9Categorization
- Crosswalks used as a framework for deriving a set
of common categories (n 17 see Table 1) - Metatags from each schema assigned to one or more
of the 17 categories of 885 tags assessed - 680 tags assigned 11 with category
- 75 tags assigned to gt 1 category, e.g.
- temporal keyword (DGM) gt Date Time Period
Subject - altformavail (EAD) gt Contact Information
Identifiers Physical Format Place - 130 tags not included in any category (DGM 110
ONIX 9 DC 6 EAD 2 others 1 GILS 0) - Categories were assigned natural language labels
and definitions developed for each
10Categories (see Table 1)
11Exercises Focus Group Testing
- Potential clarity and utility of labeled
categories tested using quantitative (assigned
activities) and qualitative (focus group
discussions) approaches - Categorization exercises - purpose
- Resolve any semantic ambiguities (fuzzy terms
that defied ready assignment to any one category) - Refine category definitions to ensure that
categories contain the kinds of concepts the end
user expects - Once categories validated in English can
broaden to multilingual environments
12Participants - 1
- Participants recruited from University of Toronto
environment (total 19) - Division of participants into two cohorts
- Experts (librarians) (n 12)
- Novices (students) (n 7)
- First year students, with undergraduate degrees
in the Social Sciences or Humanities - No completed courses in information
retrieval/search strategies, or cataloguing
13Category Matching Activities
- All participants (n 19) asked to match category
names to definitions (could match gt1) - All participants asked level of agreement with
- In general I found it easy to make the match
- In general this category name represents what is
described in the definition - In general, I found the definition helpful in
clarifying the meaning of the category - Each cohort given a randomly generated set of 28
concepts (or subcategories each with a
definition) and asked to assign each to a
category or categories (see Figure 1)
14Example of Categorization Exercise (see also
Figure 1 in paper)
- Point of Contact Identifies an organization or
person serving as the point of contact also
includes information on methods for making
contact. -
- I would put this concept into the following
category/categories check (P) as many as
apply -
- _____ Contact Information
- _____ Names
- _____ None of the above
- And/or
- ____ I would suggest the category
name(s)_______________ from the - list
- ____ I would like to suggest my own category
- name__________________
- ____ I dont know
15Procedures
- After paper exercise completed, facilitator led
discussion - Discussion focused on elements that participants
either could not categorize or for which they
assigned a new category name - Of particular note for this paper are
- Names category
- Subject category
- Title category
16Findings Activity 1
- Matching elements to definitions
- 10.5 of participants did not link any definition
to the Title category definition described as
weird, strange, sterile - For Subject, categories Summary
Description, and Genre Type were deemed
interchangeable by some participants - For Names, categories Roles and Contact
Information were considered interchangeable by
some participants element label described as
fuzzy - Moderate to high ambiguity associated with key
categories in search strategies
17Findings Activity 2
- Likert scale questions (1strongly agree 5
strongly disagree) - rankings for categories,
Names, Subject, and Title - In general I found it easy to make the match
- Subject 5th (mean 1.63)
- Title 9th (mean 2.42)
- Names 11th (mean 2.53) - total ranks out of
13 - In general this category name represents what is
described in the definition - Subject 5th (mean 1.86)
- Names 10th (mean 2.26)
- Title 11th (mean 2.42) - total ranks out
of 13
18Findings Activity 2
- Likert scale questions (1strongly agree 5
strongly disagree) - rankings for categories,
Names, Subject, and Title - In general, I found the definition helpful in
clarifying the meaning of the category - Subject 6th (mean 2.11)
- Names 7th (mean 2.26)
- Title 8th (mean 2.32) - total ranks out of
12 - Despite expectations of transparency, category
labels were somewhat unclear even confusing - Definitions marginally more helpful for Names
and Title, than for Subject
19 Table 2 Subcategory to Category Assignment by
Element Name and Participant Cohort
Table 2 Subcategory to Category Assignment by
Element Name and Participant Cohort
20Findings Activity 3
- Correct matching of subcategories
- Novices successfully matched subcategory to
category for 17.9 of 28 assigned terms - Experts successfully matched subcategory to
category for only 7 of 28 assigned terms - Experts supplied more write-in suggestions
than novices - Novices more likely than experts to make
educated guess or to respond I dont know,
than to offer new or indicate none of the above
21Findings Activity 3
- Novices and experts, alike perceived
category label and definition for Subject to be
semantically unambiguous - Names and Title were perceived by both
cohorts to have semantic flaws - Many subcategories associated with names and
highly confusing may require rethinking and
refining
22Q - Whats in a Name?A Ambiguity and Confusion
- Focus group participants respond - 1
- Well, it was interesting, challenging. It
really makes you realize how much terminology
we're all tied by and how troubling it really is
(general agreement). I mean, people, you know
like us that are allegedly finding information
(laughs) and doing research all the time and
we're going "what does this mean?", "I don't know
what this is" so imagine the role of the user who
is more baffled, presumably.
23Q - Whats in a Name?A Arbitrariness and
Inconsistency
- Focus group participants respond - 2
- Well yeah. It brings to mind how arbitrary and
difficult it is to assign those things, you know.
And how subjective and why it is inconsistent,
you know why you've got resources under one
category because one person dealt with them and
why, you know, someone else did something
different and even though, you know, you can
communicate and look them up and make things
uniform, there's still lots of inconsistencies.
It's human nature (general agreement/laughter).
24Q - Whats in a Name?A - The Importance of
Context
- Focus group participants respond - 3
- It is kind of hard, just looking at it sort of
abstractly, sort of broken apart like this
without being able to look at a few records or
something, you know, because when you're actually
using it, the context always does help. I mean
that's part of understanding it, so you know,
just because it's sometimes hard to understand
then, how some of the things relate to one
another cuz you don't know how they're going to
be put together on the screen, that made it
harder in some places.
25Next Steps - 1
- If misalignment of semantic congruence in
monolingual environment, then likely problematic
to map to multilingual, multicultural
applications to create universal gateway - Categories Names, Title, Subject that might
be assumed to be more readily understood may pose
particular challenges for language
interoperability
26Next Steps - 2
- Matrix of Scenarios to be addressed
- English language query gt retrieves monolingual
English results - English language query gt retrieves multilingual
results - Other language query gt retrieves monolingual
results by language of query - Other language query gt retrieves multilingual
results - Multiple languages query gt retrieves results in
languages defined by query - Multiple languages query gt retrieves
multilingual results - Universal gateway based on a common set of
derived categories that readily map to other
languages? Useful? Possible? - Some reflections on research direction
27For More Information . . .
- Please visit the project Web site at
- http//www.fis.utoronto.ca/special/
- metadata/
- E-mail metadata_at_fis.utoronto.ca
28Thank-you!