Title: Experiments in Ontology Alignment
1Experiments in Ontology Alignment
- Eduard Hovy
- Information Sciences Institute
- University of Southern California
- www.isi.edu/hovy
2Outline
- Some ontologies middle models
- Alignment
- Step 1 semi-automated
- Step 2 manual
- Upper Model features
- Omega
3Approaching a deep ontology
Used in Senseval2
- Simple term taxonomy (e.g, WordNet)
- Inventory of sense/meaning terms
- Inventory of word-specific role frames for verbs
and nouns (propbank and nombank) - Semantic classes for entities with simple
inheritance - No inference support
- Shallow semantic ontology (e.g., Omega)
- Semantic classes for events
- Inventory of class-based role frames for verbs
and nouns - Support for simple class-based inferences over
roles and events (e.g., temporal relations,
causal relations, state-changes) - Linked to annotated sentences and rep frames
- Deep ontology (e.g., CYC)
- Structure for formal concept definitions
- Repository for axioms and support of inference
OntoBank
Future
4Parsimonious vs profligate
- Parsimonious
- Few symbols
- Easy to see conceptual relatedness
- Easy to define and run inferences
- Hard to compose complex meanings
- Profligate
- Many symbols
- Hard to determine conceptual relatedness
- Hard work to define inferences
- No need to compose complex meanings
- Easy to fall into the trap of semantics-by-capital
ization (or wishful mnemonics McDermott
Artificial Intelligence Meets Natural Stupidity,
1981)
There is no correct position what you choose
depends on how much inference you need vs how
complex your domain is
5CYC middle
Lenat www.cyc.com
- Built by CYC Artificial Intelligence reasoning
and databases - Hundreds of thousands of concepts
- Various termsets available over past years
- Many interesting capabilities
6WordNet
Miller Fellbaum wordnet.princeton.edu
- Being built by Miller and Fellbaum at Princeton
cognitive scientists - Synonymous senses of words grouped into synsets
approx. 120,000 synsets - Rudimentary Upper Model all Middle Model
- Nouns organized by hyponym (ISA) average depth
of Noun hierarchy 12 - Verbs weakly organized by hyponym avg depth 3
- Adjectives organized as star structures
(quasi-synonym clusters related to antonym
clusters) - Also meronym (part-of) and other relations, and
recently includes sense frequency values - Used for many NLP applications, but effectiveness
is controversial - IR study claims WordNet not useful (Voorhees)
- QA work, using axioms in Extended WordNet
(Moldovan), shows great promise - Wordsense disambiguation shows WordNet has too
many senses
7Mikrokosmos
Nirenburg et al. crl.nmsu.edu/Research/ Projects/m
ikro/
- Intermittently being built by Nirenburg et al. at
New Mexico State U and U of Maryland NLP people - About 6000 concepts, 250 relations (slots)
- Focus on lexicon define cores of meaning
clusters and differentiate at the word/sense
level includes about 25K English and 25K Spanish
(and some other) words - Used as Interlingua symbol repository for MT, in
Text Meaning Rep (TMR) notation - Nice feature facets on slots
- Value value of the slot (may be a formula)
- Strength certainty/probability
- Aspect constant/intermittent/etc.
8Aligning ontologies
- Instead of building an ontology (with all the
problems that entails)can one just combine
existing ones? - Find the most popular concepts and organization
- Merge the definitions
- Identify individual errors and problem areas
- I tried this in 199697 (Hovy, LREC 1998)
- Project funded by IBM Align Upper Models of CYC,
Penman, and Mikrokosmos - Built alignment routines and created merge
- Conceptual mismatch problems were significant!
- Since then, fairly large group of researchers
doing this a competition every year
9Outline
- Some ontologies middle models
- Alignment
- Step 1 semi-automated
- Step 2 manual
- Upper Model features
- Omega
10Omega construction methodology
- Methodology
- Upper Model build by hand merge
- Middle Model merge existing term taxonomies
start with basic ontology (terminology
taxonomy) and enrich - Lower Model and Instance Base acquire knowledge
by text harvesting machine learning over text - Evaluate each component acquisition
11Omega sources (Hovy et al. 03)
Our own new work (ISI) 400 nodes
WordNet 2 (Princeton) 110,00 nodes
Mikrokosmos (New Mexico State U) 6,000 nodes
Penman Upper Model (ISI) 300 nodes
12General alignment and merging
- Goal find attachment point(s) in ontology for
node/term from somewhere else (ontology, website,
metadata schema, etc.) - Its hard to do manually very hard to do
automaticallysystem needs to understand
semantics of entities to be aligned
13Alignment merging Stage 1 Semi-automatic
- Goal find attachment point in ontology for
node/term from somewhere else (ontology, website,
metadata schema, etc.) - Procedure For each new term/concept
- 1. extract and format info name, definition,
associated text, local taxonomy cluster, etc. - 2. apply alignment suggestion heuristics (NAME,
DEFINITION, HIERARCHY, DISPERSAL match) against
big ontology, to get proposed attachment points
with strengths (Hovy 98) test with numerous
parameter combinations, see http//edc.isi.edu/ali
gnment/ (Hovy et al. 01) - 3. automatically combine suggested alignments
(Fleischman et al 03) - 4. apply validation checks
- 5. manually accept or reject suggestions
- Process developed in early 1990s (Agirre et al.
94 Knight Luk 94 Okumura Hovy 96 Hovy 98
Hovy et al. 01) - Not stunningly accurate, but can speed up manual
alignment markedly
14Automated link proposal heuristics
- Types of alignment suggestion heuristics
- Text Matches (Knight Luk 94, Dalianis Hovy
98) - concept names (cognates reward for delimiter
confluence...) - textual definitions (string matching, demorphing,
stop words...) - Hierarchy Matches
- shared superconcepts, to filter ambiguity
(Knight Luk 94) - semantic distance Agirre et al. 94)
- semantic group dispersal (Hovy and Philpot 97)
- Data Item and Form Matches
- inter-concept relations (Ageno et al. 94 Rigau
Agirre 95) - slot-filler restrictions (Okumura Hovy 94)
- Suggestion combination function
- E.g., score vnamescore defscore (10
taxscore) - Validation procedures
- Hierarchy-based validation (Chalupsky Hovy
98) - new superconcept test
- disjunction test
- cycles/bowties test
- Content-based validation (Russ 98)
15Experimental results
- Ontologies
- Penman Upper Model (350)
- CYC top region (2400) Lenat Lehmann 96
- MIKROKOSMOS (4790 concepts) Mahesh 96
- SENSUS top region (6768)
- Recall (how many correct links were missed?)
- difficult to count! 32.4 mill pairs
- Precision (how many suggested links are
correct?) - 0.252 (strict)
- 0.517 (lenient)
- After 5 runs
- 883 suggestions ( 13 of SENSUS candidates)
- correct 244 ( 3.6)
- near miss 256 ( 3.8)
- wrong 383 ( 5.6)
16Outline
- Some ontologies middle models
- Alignment
- Step 1 semi-automated
- Step 2 manual
- Upper Model features
- Omega
17Omega alignment process Stage 2 Manual
- Created Upper Region (300 nodes) manually
- Manually snipped tops off Mikro and WordNet, then
attached them to fringe of Upper Region - Automatically aligned bottom fringe of Mikro into
WordNet - Automatically aligned sides of bubbles
- Checked manually
18(No Transcript)
19Problem 1
- Is Amber Decomposable or Nondecomposable?
- The stone sense of it (Mikro) is the resin
sense (WordNet) is not - What to do??
20Outcome 1 Good and Misleading
- S_at_foodstuffltfood
- a substance that can be used or prepared
for use as food - superconcepts (S_at_food)
- M_at_FOODSTUFF (COMB 13.355 NAME 91 DEF
10.00 TAX 0.140) - a substance that can be used or prepared for
use as food - superconcepts (M_at_FOOD M_at_MATERIAL)
- ----------------------------------------
- S_at_librarygtbibliotheca
- a collection of literary documents or
records kept for reference - superconcepts (S_at_aggregation)
- M_at_LIBRARY (COMB 2.742 NAME 59 DEF 3.57
TAX 0.000) - a place in which literary and artistic
materials such as books periodicals - newspapers pamphlets and prints are kept for
reading or reference an - institution or foundation maintaining such a
collection - superconcepts (M_at_ACADEMIC-BUILDING)
A document collection or a place?
21Outcome 2 Unclear and Error!
- S_at_geisha
- a Japanese woman trained to entertain men
with conversation and singing - and dancing
- superconcepts (S_at_adult female
S_at_JapaneseltAsian) - M_at_GEISHA (COMB 1.540 NAME 46 DEF 2.27
TAX 0.000) - a Japanese girl trained as an entertainer to
serve as a hired entertainer - to men
- superconcepts (M_at_ENTERTAINMENT-ROLE)
- ----------------------------------------
- S_at_archipelago
- many scattered islands in a large body of
water - superconcepts (S_at_dry land)
- M_at_ARCHIPELAGO (COMB 1.522 NAME 131 DEF
1.33 TAX 0.000) - a sea with many islands
- superconcepts (M_at_SEA)
A person or a function?
Land or sea?
22When are two concepts the same? Guarinos
Identity Criteria
- Material the stuff
- Topological the shape
- Morphological the parts
- Functional the use
- Meronymical the members
- Social the societal role
- (see also Pustejovskys qualia)
A water glass, before and after being smashed
the ACL in 1964 and in 2064
23Shishkebobs (Hovy et al. in prep)
- Library ISA Building (and hence cant buy things)
- Library ISA Institution (and hence can buy
things) - SO Building ? Institution ? Location a
Library is all these
- Also Country ? Nation ? Government (GPE)
- France the land, the people, and the rulers
- Also Field-of-Study ? Activity ?
Result-of-Process - (Science, Medicine, Architecture, Art)
- Also Company ? Product ? Stock
- He worked at Coke, drank Coke, and owned Coke
(shares) - We found about 400 potential shishkebobs
- Shishkebobs Concept senses or metonymy
rings A continuum, from on-the-fly meaning
shadings to full metonymy - Link regular alternation possibilities at general
level in ontology allow meaning shift for
semantic interpretation, where needed - Using shishkebobs makes merging ontologies easier
(possible?) you respect each ontologys
perspective
24Outline
- Some ontologies middle models
- Alignment
- Step 1 semi-automated
- Step 2 manual
- Upper Model features
- Omega
25Problem 2 Upper Model features Local lattices
- The standard KR approach
- Find a primitive conceptundefined
- Specialize it in various ways by adding various
differentiae - Define these differentiae elsewhere in the
ontology - Dont confuse definitional aspects with mere
properties! - An apple is-a fruit with essential differentium
XXX and with properties colourred,
sizetennis-ball-sized - Problems
- What are the differentiae?
- How do you order them?
- Local lattices
- Create small lattices localized points of
differentium combination
26Omega Upper Model
- About 300 concepts
- Built by hand
- Several mutually exclusive branch points
- Several local lattices
- Top children Object, Event, Property
27Outline
- Some ontologies middle models
- Alignment
- Step 1 semi-automated
- Step 2 manual
- Upper Model features
- Omega
28Omega content and framework
www.omega.edu
Goal one environment for various ontologies and
resources
- Concepts 120,604 Concept/term entries 76 MB
- WordNet (Princeton Miller Fellbaum)
- Mikrokosmos (NMSU Nirenburg et al.)
- Penman Upper Model (ISI Bateman et al.)
- 25,000 Noun-noun compounds (ISI Pantel)
- Lexicon / sense space
- 156,142 English words 33,822 Spanish words
- 271,243 word senses
- 13,000 frames of verb arg structure with case
roles - LCS case roles (Dorr) 6.3MB
- PropBank roleframes (Palmer et al.) 5.3MB
- Framenet roleframes (Fillmore et al.) 2.8MB
- WordNet verb frames (Fellbaum) 1.8MB
- Associated information (not all complete)
- WordNet subj domains (Magnini Cavaglia) 1.2
MB - Various relations learned from text (ISI
Pantel) - TAP domain groupings (Stanford Guha)
- SemCor term frequencies 7.5MB
- Topic signatures (Basque U Agirre et al.) 2.7GB
- Instances 10.1 GB
- 1.1 million persons harvested from text
- 765,000 facts harvested from text
- 5.7 million locations from USGS and NGA
- Framework (over 28 million statements of
concepts, relations, instances) - Available in PowerLoom
- Instances in RDF
- With database/MYSQL
- Online browser
- Clustering software
- Term and ontology alignment software
29Omega browser Mammoth
30Omega hierarchy display
31Omega sense frames
32Thank you!