Modeling a Natural Language Gateway to MetadataEnabled Resources - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Modeling a Natural Language Gateway to MetadataEnabled Resources

Description:

Funding provided by the Social Sciences and Humanities Research ... Provenance. Repository. Title Statement-Sponsor. TITLE. Object Name. Series Name. SUBJECT ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 29

Provided by: NTUs92

Category:

more less

Transcript and Presenter's Notes

Title: Modeling a Natural Language Gateway to MetadataEnabled Resources

1
Modeling a Natural Language Gateway to
Metadata-Enabled Resources

Lynne C. Howarth
Faculty of Information Studies
University of Toronto, Canada

2
Acknowledgements

Funding provided by the Social Sciences and
Humanities Research Council of Canada (SSHRC SRG
410-99-1287)

3
Introduction

The development of technologies that enable
access to information regardless of geographic or
language barriers is a key factor for truly
global sharing of knowledge.
Source Oard, D., et al. 1999. Multilingual
Information Discovery and AccesS (MIDAS). D-Lib
Magazine, October 1999. Available at URL
http//www.dlib.org/dlib/october99/Ioard.html

4
Introduction - 2

End-user expectations for seamless cross-language
retrieval are increasing with greater use of and
access to Web resources (Large Moukdad, 2000)
Cross-language retrieval systems in experimental
stages (Peters Braschler, 2001)
Need for systems that obviate the requirement of
understanding underlying metadata structures and
tagging (Buckland et al., 1999)
Current research focus in metadata arena has
tended to be on syntax less focus on semantics

5
Research Objectives

Building on previous research (Howarth, Cronin,
Hannaford, 2002 2003)
to develop and refine a common set of labelled
categories to serve as a natural language
gateway to metadata-enabled resources,
enhancing
Semantic interoperability
Language interoperability
Multilingual access
Cross-domain searching

6
Crosswalks - 1

Identified and analysed the structure and content
of eight metadata schemes from various domains
Encoded Archival Description (EAD)
Dublin Core (DC) (qualified)
Government now Global Information Locator
Service (GILS)
Text Encoding Initiative (TEI)
Visual Resources Association (VRA) Visual
Document Description Categories
Consortium for Interchange of Museum Information
(CIMI) now ceased
Digital Geospatial Metadata (DGM)
ONIX (Online Information Exchange) Publishing
domain)

7
Corsswalks 2

Using MARC21 as a baseline, and existing
cross-schema crosswalks as benchmarks, created
a master crosswalk incorporating all elements
from each of the eight metadata schemas (previous
slide)
Crosswalks used to identify
elements that matched across all schemes
elements that corresponded between two systems or
among three or more
elements that were clearly unique to a domain
Source, Cromwell-Kessler, W. (1998)

8
Crosswalk ExampleTop cells Schema-specific
termBottom cells Like terms (cross-schema)
9
Categorization

Crosswalks used as a framework for deriving a set
of common categories (n 17 see Table 1)
Metatags from each schema assigned to one or more
of the 17 categories of 885 tags assessed
680 tags assigned 11 with category
75 tags assigned to gt 1 category, e.g.
temporal keyword (DGM) gt Date Time Period
Subject
altformavail (EAD) gt Contact Information
Identifiers Physical Format Place
130 tags not included in any category (DGM 110
ONIX 9 DC 6 EAD 2 others 1 GILS 0)
Categories were assigned natural language labels
and definitions developed for each

10
Categories (see Table 1)
11
Exercises Focus Group Testing

Potential clarity and utility of labeled
categories tested using quantitative (assigned
activities) and qualitative (focus group
discussions) approaches
Categorization exercises - purpose
Resolve any semantic ambiguities (fuzzy terms
that defied ready assignment to any one category)
Refine category definitions to ensure that
categories contain the kinds of concepts the end
user expects
Once categories validated in English can
broaden to multilingual environments

12
Participants - 1

Participants recruited from University of Toronto
environment (total 19)
Division of participants into two cohorts
Experts (librarians) (n 12)
Novices (students) (n 7)
First year students, with undergraduate degrees
in the Social Sciences or Humanities
No completed courses in information
retrieval/search strategies, or cataloguing

13
Category Matching Activities

All participants (n 19) asked to match category
names to definitions (could match gt1)
All participants asked level of agreement with
In general I found it easy to make the match
In general this category name represents what is
described in the definition
In general, I found the definition helpful in
clarifying the meaning of the category
Each cohort given a randomly generated set of 28
concepts (or subcategories each with a
definition) and asked to assign each to a
category or categories (see Figure 1)

14
Example of Categorization Exercise (see also
Figure 1 in paper)

Point of Contact Identifies an organization or
person serving as the point of contact also
includes information on methods for making
contact.
I would put this concept into the following
category/categories check (P) as many as
apply
_____ Contact Information
_____ Names
_____ None of the above
And/or
____ I would suggest the category
name(s)_______________ from the
list
____ I would like to suggest my own category
name__________________
____ I dont know

15
Procedures

After paper exercise completed, facilitator led
discussion
Discussion focused on elements that participants
either could not categorize or for which they
assigned a new category name
Of particular note for this paper are
Names category
Subject category
Title category

16
Findings Activity 1

Matching elements to definitions
10.5 of participants did not link any definition
to the Title category definition described as
weird, strange, sterile
For Subject, categories Summary
Description, and Genre Type were deemed
interchangeable by some participants
For Names, categories Roles and Contact
Information were considered interchangeable by
some participants element label described as
fuzzy
Moderate to high ambiguity associated with key
categories in search strategies

17
Findings Activity 2

Likert scale questions (1strongly agree 5
strongly disagree) - rankings for categories,
Names, Subject, and Title
In general I found it easy to make the match
Subject 5th (mean 1.63)
Title 9th (mean 2.42)
Names 11th (mean 2.53) - total ranks out of
13
In general this category name represents what is
described in the definition
Subject 5th (mean 1.86)
Names 10th (mean 2.26)
Title 11th (mean 2.42) - total ranks out
of 13

18
Findings Activity 2

Likert scale questions (1strongly agree 5
strongly disagree) - rankings for categories,
Names, Subject, and Title
In general, I found the definition helpful in
clarifying the meaning of the category
Subject 6th (mean 2.11)
Names 7th (mean 2.26)
Title 8th (mean 2.32) - total ranks out of
12
Despite expectations of transparency, category
labels were somewhat unclear even confusing
Definitions marginally more helpful for Names
and Title, than for Subject

19

Table 2 Subcategory to Category Assignment by
Element Name and Participant Cohort
Table 2 Subcategory to Category Assignment by
Element Name and Participant Cohort
20
Findings Activity 3

Correct matching of subcategories
Novices successfully matched subcategory to
category for 17.9 of 28 assigned terms
Experts successfully matched subcategory to
category for only 7 of 28 assigned terms
Experts supplied more write-in suggestions
than novices
Novices more likely than experts to make
educated guess or to respond I dont know,
than to offer new or indicate none of the above

21
Findings Activity 3

Novices and experts, alike perceived
category label and definition for Subject to be
semantically unambiguous
Names and Title were perceived by both
cohorts to have semantic flaws
Many subcategories associated with names and
highly confusing may require rethinking and
refining

22
Q - Whats in a Name?A Ambiguity and Confusion

Focus group participants respond - 1
Well, it was interesting, challenging. It
really makes you realize how much terminology
we're all tied by and how troubling it really is
(general agreement). I mean, people, you know
like us that are allegedly finding information
(laughs) and doing research all the time and
we're going "what does this mean?", "I don't know
what this is" so imagine the role of the user who
is more baffled, presumably.

23
Q - Whats in a Name?A Arbitrariness and
Inconsistency

Focus group participants respond - 2
Well yeah. It brings to mind how arbitrary and
difficult it is to assign those things, you know.
And how subjective and why it is inconsistent,
you know why you've got resources under one
category because one person dealt with them and
why, you know, someone else did something
different and even though, you know, you can
communicate and look them up and make things
uniform, there's still lots of inconsistencies.
It's human nature (general agreement/laughter).

24
Q - Whats in a Name?A - The Importance of
Context

Focus group participants respond - 3
It is kind of hard, just looking at it sort of
abstractly, sort of broken apart like this
without being able to look at a few records or
something, you know, because when you're actually
using it, the context always does help. I mean
that's part of understanding it, so you know,
just because it's sometimes hard to understand
then, how some of the things relate to one
another cuz you don't know how they're going to
be put together on the screen, that made it
harder in some places.

25
Next Steps - 1

If misalignment of semantic congruence in
monolingual environment, then likely problematic
to map to multilingual, multicultural
applications to create universal gateway
Categories Names, Title, Subject that might
be assumed to be more readily understood may pose
particular challenges for language
interoperability

26
Next Steps - 2

Matrix of Scenarios to be addressed
English language query gt retrieves monolingual
English results
English language query gt retrieves multilingual
results
Other language query gt retrieves monolingual
results by language of query
Other language query gt retrieves multilingual
results
Multiple languages query gt retrieves results in
languages defined by query
Multiple languages query gt retrieves
multilingual results
Universal gateway based on a common set of
derived categories that readily map to other
languages? Useful? Possible?
Some reflections on research direction