Experiments in Ontology Alignment - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Experiments in Ontology Alignment

Description:

a Japanese woman trained to entertain men with conversation and singing. and dancing ... girl trained as an entertainer to serve as a hired entertainer. to men ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 33

Provided by: Eduar1

Category:

more less

Transcript and Presenter's Notes

Title: Experiments in Ontology Alignment

1
Experiments in Ontology Alignment

Eduard Hovy
Information Sciences Institute
University of Southern California
www.isi.edu/hovy

2
Outline

Some ontologies middle models
Alignment
Step 1 semi-automated
Step 2 manual
Upper Model features
Omega

3
Approaching a deep ontology
Used in Senseval2

Simple term taxonomy (e.g, WordNet)
Inventory of sense/meaning terms
Inventory of word-specific role frames for verbs
and nouns (propbank and nombank)
Semantic classes for entities with simple
inheritance
No inference support
Shallow semantic ontology (e.g., Omega)
Semantic classes for events
Inventory of class-based role frames for verbs
and nouns
Support for simple class-based inferences over
roles and events (e.g., temporal relations,
causal relations, state-changes)
Linked to annotated sentences and rep frames
Deep ontology (e.g., CYC)
Structure for formal concept definitions
Repository for axioms and support of inference

OntoBank
Future
4
Parsimonious vs profligate

Parsimonious
Few symbols
Easy to see conceptual relatedness
Easy to define and run inferences
Hard to compose complex meanings

Profligate
Many symbols
Hard to determine conceptual relatedness
Hard work to define inferences
No need to compose complex meanings
Easy to fall into the trap of semantics-by-capital
ization (or wishful mnemonics McDermott
Artificial Intelligence Meets Natural Stupidity,
1981)

There is no correct position what you choose
depends on how much inference you need vs how
complex your domain is
5
CYC middle
Lenat www.cyc.com

Built by CYC Artificial Intelligence reasoning
and databases
Hundreds of thousands of concepts
Various termsets available over past years
Many interesting capabilities

6
WordNet
Miller Fellbaum wordnet.princeton.edu

Being built by Miller and Fellbaum at Princeton
cognitive scientists
Synonymous senses of words grouped into synsets
approx. 120,000 synsets
Rudimentary Upper Model all Middle Model
Nouns organized by hyponym (ISA) average depth
of Noun hierarchy 12
Verbs weakly organized by hyponym avg depth 3
Adjectives organized as star structures
(quasi-synonym clusters related to antonym
clusters)
Also meronym (part-of) and other relations, and
recently includes sense frequency values
Used for many NLP applications, but effectiveness
is controversial
IR study claims WordNet not useful (Voorhees)
QA work, using axioms in Extended WordNet
(Moldovan), shows great promise
Wordsense disambiguation shows WordNet has too
many senses

7
Mikrokosmos
Nirenburg et al. crl.nmsu.edu/Research/ Projects/m
ikro/

Intermittently being built by Nirenburg et al. at
New Mexico State U and U of Maryland NLP people
About 6000 concepts, 250 relations (slots)
Focus on lexicon define cores of meaning
clusters and differentiate at the word/sense
level includes about 25K English and 25K Spanish
(and some other) words
Used as Interlingua symbol repository for MT, in
Text Meaning Rep (TMR) notation
Nice feature facets on slots
Value value of the slot (may be a formula)
Strength certainty/probability
Aspect constant/intermittent/etc.

8
Aligning ontologies

Instead of building an ontology (with all the
problems that entails)can one just combine
existing ones?
Find the most popular concepts and organization
Merge the definitions
Identify individual errors and problem areas
I tried this in 199697 (Hovy, LREC 1998)
Project funded by IBM Align Upper Models of CYC,
Penman, and Mikrokosmos
Built alignment routines and created merge
Conceptual mismatch problems were significant!
Since then, fairly large group of researchers
doing this a competition every year

9
Outline

Some ontologies middle models
Alignment
Step 1 semi-automated
Step 2 manual
Upper Model features
Omega

10
Omega construction methodology

Methodology
Upper Model build by hand merge
Middle Model merge existing term taxonomies
start with basic ontology (terminology
taxonomy) and enrich
Lower Model and Instance Base acquire knowledge
by text harvesting machine learning over text
Evaluate each component acquisition

11
Omega sources (Hovy et al. 03)
Our own new work (ISI) 400 nodes
WordNet 2 (Princeton) 110,00 nodes
Mikrokosmos (New Mexico State U) 6,000 nodes
Penman Upper Model (ISI) 300 nodes
12
General alignment and merging

Goal find attachment point(s) in ontology for
node/term from somewhere else (ontology, website,
metadata schema, etc.)
Its hard to do manually very hard to do
automaticallysystem needs to understand
semantics of entities to be aligned

13
Alignment merging Stage 1 Semi-automatic

Goal find attachment point in ontology for
node/term from somewhere else (ontology, website,
metadata schema, etc.)
Procedure For each new term/concept
1. extract and format info name, definition,
associated text, local taxonomy cluster, etc.
2. apply alignment suggestion heuristics (NAME,
DEFINITION, HIERARCHY, DISPERSAL match) against
big ontology, to get proposed attachment points
with strengths (Hovy 98) test with numerous
parameter combinations, see http//edc.isi.edu/ali
gnment/ (Hovy et al. 01)
3. automatically combine suggested alignments
(Fleischman et al 03)
4. apply validation checks
5. manually accept or reject suggestions
Process developed in early 1990s (Agirre et al.
94 Knight Luk 94 Okumura Hovy 96 Hovy 98
Hovy et al. 01)
Not stunningly accurate, but can speed up manual
alignment markedly

14
Automated link proposal heuristics

Types of alignment suggestion heuristics
Text Matches (Knight Luk 94, Dalianis Hovy
98)
concept names (cognates reward for delimiter
confluence...)
textual definitions (string matching, demorphing,
stop words...)
Hierarchy Matches
shared superconcepts, to filter ambiguity
(Knight Luk 94)
semantic distance Agirre et al. 94)
semantic group dispersal (Hovy and Philpot 97)
Data Item and Form Matches
inter-concept relations (Ageno et al. 94 Rigau
Agirre 95)
slot-filler restrictions (Okumura Hovy 94)
Suggestion combination function
E.g., score vnamescore defscore (10
taxscore)
Validation procedures
Hierarchy-based validation (Chalupsky Hovy
98)
new superconcept test
disjunction test
cycles/bowties test
Content-based validation (Russ 98)

15
Experimental results

Ontologies
Penman Upper Model (350)
CYC top region (2400) Lenat Lehmann 96
MIKROKOSMOS (4790 concepts) Mahesh 96
SENSUS top region (6768)
Recall (how many correct links were missed?)
difficult to count! 32.4 mill pairs
Precision (how many suggested links are
correct?)
0.252 (strict)
0.517 (lenient)
After 5 runs
883 suggestions ( 13 of SENSUS candidates)
correct 244 ( 3.6)
near miss 256 ( 3.8)
wrong 383 ( 5.6)

16
Outline

Some ontologies middle models
Alignment
Step 1 semi-automated
Step 2 manual
Upper Model features
Omega

17
Omega alignment process Stage 2 Manual

Created Upper Region (300 nodes) manually
Manually snipped tops off Mikro and WordNet, then
attached them to fringe of Upper Region
Automatically aligned bottom fringe of Mikro into
WordNet
Automatically aligned sides of bubbles
Checked manually

18
(No Transcript)
19
Problem 1

Is Amber Decomposable or Nondecomposable?
The stone sense of it (Mikro) is the resin
sense (WordNet) is not
What to do??

20
Outcome 1 Good and Misleading

S_at_foodstuffltfood
a substance that can be used or prepared
for use as food
superconcepts (S_at_food)
M_at_FOODSTUFF (COMB 13.355 NAME 91 DEF
10.00 TAX 0.140)
a substance that can be used or prepared for
use as food
superconcepts (M_at_FOOD M_at_MATERIAL)
----------------------------------------
S_at_librarygtbibliotheca
a collection of literary documents or
records kept for reference
superconcepts (S_at_aggregation)
M_at_LIBRARY (COMB 2.742 NAME 59 DEF 3.57
TAX 0.000)
a place in which literary and artistic
materials such as books periodicals
newspapers pamphlets and prints are kept for
reading or reference an
institution or foundation maintaining such a
collection
superconcepts (M_at_ACADEMIC-BUILDING)

A document collection or a place?
21
Outcome 2 Unclear and Error!

S_at_geisha
a Japanese woman trained to entertain men
with conversation and singing
and dancing
superconcepts (S_at_adult female
S_at_JapaneseltAsian)
M_at_GEISHA (COMB 1.540 NAME 46 DEF 2.27
TAX 0.000)
a Japanese girl trained as an entertainer to
serve as a hired entertainer
to men
superconcepts (M_at_ENTERTAINMENT-ROLE)
----------------------------------------
S_at_archipelago
many scattered islands in a large body of
water
superconcepts (S_at_dry land)
M_at_ARCHIPELAGO (COMB 1.522 NAME 131 DEF
1.33 TAX 0.000)
a sea with many islands
superconcepts (M_at_SEA)

A person or a function?
Land or sea?
22
When are two concepts the same? Guarinos
Identity Criteria

Material the stuff
Topological the shape
Morphological the parts
Functional the use
Meronymical the members
Social the societal role
(see also Pustejovskys qualia)

A water glass, before and after being smashed
the ACL in 1964 and in 2064
23
Shishkebobs (Hovy et al. in prep)

Library ISA Building (and hence cant buy things)
Library ISA Institution (and hence can buy
things)
SO Building ? Institution ? Location a
Library is all these

Also Country ? Nation ? Government (GPE)
France the land, the people, and the rulers
Also Field-of-Study ? Activity ?
Result-of-Process
(Science, Medicine, Architecture, Art)
Also Company ? Product ? Stock
He worked at Coke, drank Coke, and owned Coke
(shares)
We found about 400 potential shishkebobs

Shishkebobs Concept senses or metonymy
rings A continuum, from on-the-fly meaning
shadings to full metonymy
Link regular alternation possibilities at general
level in ontology allow meaning shift for
semantic interpretation, where needed
Using shishkebobs makes merging ontologies easier
(possible?) you respect each ontologys
perspective

24
Outline

Some ontologies middle models
Alignment
Step 1 semi-automated
Step 2 manual
Upper Model features
Omega

25
Problem 2 Upper Model features Local lattices

The standard KR approach
Find a primitive conceptundefined
Specialize it in various ways by adding various
differentiae
Define these differentiae elsewhere in the
ontology
Dont confuse definitional aspects with mere
properties!
An apple is-a fruit with essential differentium
XXX and with properties colourred,
sizetennis-ball-sized
Problems
What are the differentiae?
How do you order them?
Local lattices
Create small lattices localized points of
differentium combination