Experiments in Ontology Alignment - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Experiments in Ontology Alignment

Description:

a Japanese woman trained to entertain men with conversation and singing. and dancing ... girl trained as an entertainer to serve as a hired entertainer. to men ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 33
Provided by: Eduar1
Category:

less

Transcript and Presenter's Notes

Title: Experiments in Ontology Alignment


1
Experiments in Ontology Alignment
  • Eduard Hovy
  • Information Sciences Institute
  • University of Southern California
  • www.isi.edu/hovy

2
Outline
  1. Some ontologies middle models
  2. Alignment
  3. Step 1 semi-automated
  4. Step 2 manual
  5. Upper Model features
  6. Omega

3
Approaching a deep ontology
Used in Senseval2
  • Simple term taxonomy (e.g, WordNet)
  • Inventory of sense/meaning terms
  • Inventory of word-specific role frames for verbs
    and nouns (propbank and nombank)
  • Semantic classes for entities with simple
    inheritance
  • No inference support
  • Shallow semantic ontology (e.g., Omega)
  • Semantic classes for events
  • Inventory of class-based role frames for verbs
    and nouns
  • Support for simple class-based inferences over
    roles and events (e.g., temporal relations,
    causal relations, state-changes)
  • Linked to annotated sentences and rep frames
  • Deep ontology (e.g., CYC)
  • Structure for formal concept definitions
  • Repository for axioms and support of inference

OntoBank
Future
4
Parsimonious vs profligate
  • Parsimonious
  • Few symbols
  • Easy to see conceptual relatedness
  • Easy to define and run inferences
  • Hard to compose complex meanings
  • Profligate
  • Many symbols
  • Hard to determine conceptual relatedness
  • Hard work to define inferences
  • No need to compose complex meanings
  • Easy to fall into the trap of semantics-by-capital
    ization (or wishful mnemonics McDermott
    Artificial Intelligence Meets Natural Stupidity,
    1981)

There is no correct position what you choose
depends on how much inference you need vs how
complex your domain is
5
CYC middle
Lenat www.cyc.com
  • Built by CYC Artificial Intelligence reasoning
    and databases
  • Hundreds of thousands of concepts
  • Various termsets available over past years
  • Many interesting capabilities

6
WordNet
Miller Fellbaum wordnet.princeton.edu
  • Being built by Miller and Fellbaum at Princeton
    cognitive scientists
  • Synonymous senses of words grouped into synsets
    approx. 120,000 synsets
  • Rudimentary Upper Model all Middle Model
  • Nouns organized by hyponym (ISA) average depth
    of Noun hierarchy 12
  • Verbs weakly organized by hyponym avg depth 3
  • Adjectives organized as star structures
    (quasi-synonym clusters related to antonym
    clusters)
  • Also meronym (part-of) and other relations, and
    recently includes sense frequency values
  • Used for many NLP applications, but effectiveness
    is controversial
  • IR study claims WordNet not useful (Voorhees)
  • QA work, using axioms in Extended WordNet
    (Moldovan), shows great promise
  • Wordsense disambiguation shows WordNet has too
    many senses

7
Mikrokosmos
Nirenburg et al. crl.nmsu.edu/Research/ Projects/m
ikro/
  • Intermittently being built by Nirenburg et al. at
    New Mexico State U and U of Maryland NLP people
  • About 6000 concepts, 250 relations (slots)
  • Focus on lexicon define cores of meaning
    clusters and differentiate at the word/sense
    level includes about 25K English and 25K Spanish
    (and some other) words
  • Used as Interlingua symbol repository for MT, in
    Text Meaning Rep (TMR) notation
  • Nice feature facets on slots
  • Value value of the slot (may be a formula)
  • Strength certainty/probability
  • Aspect constant/intermittent/etc.

8
Aligning ontologies
  • Instead of building an ontology (with all the
    problems that entails)can one just combine
    existing ones?
  • Find the most popular concepts and organization
  • Merge the definitions
  • Identify individual errors and problem areas
  • I tried this in 199697 (Hovy, LREC 1998)
  • Project funded by IBM Align Upper Models of CYC,
    Penman, and Mikrokosmos
  • Built alignment routines and created merge
  • Conceptual mismatch problems were significant!
  • Since then, fairly large group of researchers
    doing this a competition every year

9
Outline
  1. Some ontologies middle models
  2. Alignment
  3. Step 1 semi-automated
  4. Step 2 manual
  5. Upper Model features
  6. Omega

10
Omega construction methodology
  • Methodology
  • Upper Model build by hand merge
  • Middle Model merge existing term taxonomies
    start with basic ontology (terminology
    taxonomy) and enrich
  • Lower Model and Instance Base acquire knowledge
    by text harvesting machine learning over text
  • Evaluate each component acquisition

11
Omega sources (Hovy et al. 03)
Our own new work (ISI) 400 nodes
WordNet 2 (Princeton) 110,00 nodes
Mikrokosmos (New Mexico State U) 6,000 nodes
Penman Upper Model (ISI) 300 nodes
12
General alignment and merging
  • Goal find attachment point(s) in ontology for
    node/term from somewhere else (ontology, website,
    metadata schema, etc.)
  • Its hard to do manually very hard to do
    automaticallysystem needs to understand
    semantics of entities to be aligned

13
Alignment merging Stage 1 Semi-automatic
  • Goal find attachment point in ontology for
    node/term from somewhere else (ontology, website,
    metadata schema, etc.)
  • Procedure For each new term/concept
  • 1. extract and format info name, definition,
    associated text, local taxonomy cluster, etc.
  • 2. apply alignment suggestion heuristics (NAME,
    DEFINITION, HIERARCHY, DISPERSAL match) against
    big ontology, to get proposed attachment points
    with strengths (Hovy 98) test with numerous
    parameter combinations, see http//edc.isi.edu/ali
    gnment/ (Hovy et al. 01)
  • 3. automatically combine suggested alignments
    (Fleischman et al 03)
  • 4. apply validation checks
  • 5. manually accept or reject suggestions
  • Process developed in early 1990s (Agirre et al.
    94 Knight Luk 94 Okumura Hovy 96 Hovy 98
    Hovy et al. 01)
  • Not stunningly accurate, but can speed up manual
    alignment markedly

14
Automated link proposal heuristics
  • Types of alignment suggestion heuristics
  • Text Matches (Knight Luk 94, Dalianis Hovy
    98)
  • concept names (cognates reward for delimiter
    confluence...)
  • textual definitions (string matching, demorphing,
    stop words...)
  • Hierarchy Matches
  • shared superconcepts, to filter ambiguity
    (Knight Luk 94)
  • semantic distance Agirre et al. 94)
  • semantic group dispersal (Hovy and Philpot 97)
  • Data Item and Form Matches
  • inter-concept relations (Ageno et al. 94 Rigau
    Agirre 95)
  • slot-filler restrictions (Okumura Hovy 94)
  • Suggestion combination function
  • E.g., score vnamescore defscore (10
    taxscore)
  • Validation procedures
  • Hierarchy-based validation (Chalupsky Hovy
    98)
  • new superconcept test
  • disjunction test
  • cycles/bowties test
  • Content-based validation (Russ 98)

15
Experimental results
  • Ontologies
  • Penman Upper Model (350)
  • CYC top region (2400) Lenat Lehmann 96
  • MIKROKOSMOS (4790 concepts) Mahesh 96
  • SENSUS top region (6768)
  • Recall (how many correct links were missed?)
  • difficult to count! 32.4 mill pairs
  • Precision (how many suggested links are
    correct?)
  • 0.252 (strict)
  • 0.517 (lenient)
  • After 5 runs
  • 883 suggestions ( 13 of SENSUS candidates)
  • correct 244 ( 3.6)
  • near miss 256 ( 3.8)
  • wrong 383 ( 5.6)

16
Outline
  1. Some ontologies middle models
  2. Alignment
  3. Step 1 semi-automated
  4. Step 2 manual
  5. Upper Model features
  6. Omega

17
Omega alignment process Stage 2 Manual
  • Created Upper Region (300 nodes) manually
  • Manually snipped tops off Mikro and WordNet, then
    attached them to fringe of Upper Region
  • Automatically aligned bottom fringe of Mikro into
    WordNet
  • Automatically aligned sides of bubbles
  • Checked manually

18
(No Transcript)
19
Problem 1
  • Is Amber Decomposable or Nondecomposable?
  • The stone sense of it (Mikro) is the resin
    sense (WordNet) is not
  • What to do??

20
Outcome 1 Good and Misleading
  • S_at_foodstuffltfood
  • a substance that can be used or prepared
    for use as food
  • superconcepts (S_at_food)
  • M_at_FOODSTUFF (COMB 13.355 NAME 91 DEF
    10.00 TAX 0.140)
  • a substance that can be used or prepared for
    use as food
  • superconcepts (M_at_FOOD M_at_MATERIAL)
  • ----------------------------------------
  • S_at_librarygtbibliotheca
  • a collection of literary documents or
    records kept for reference
  • superconcepts (S_at_aggregation)
  • M_at_LIBRARY (COMB 2.742 NAME 59 DEF 3.57
    TAX 0.000)
  • a place in which literary and artistic
    materials such as books periodicals
  • newspapers pamphlets and prints are kept for
    reading or reference an
  • institution or foundation maintaining such a
    collection
  • superconcepts (M_at_ACADEMIC-BUILDING)

A document collection or a place?
21
Outcome 2 Unclear and Error!
  • S_at_geisha
  • a Japanese woman trained to entertain men
    with conversation and singing
  • and dancing
  • superconcepts (S_at_adult female
    S_at_JapaneseltAsian)
  • M_at_GEISHA (COMB 1.540 NAME 46 DEF 2.27
    TAX 0.000)
  • a Japanese girl trained as an entertainer to
    serve as a hired entertainer
  • to men
  • superconcepts (M_at_ENTERTAINMENT-ROLE)
  • ----------------------------------------
  • S_at_archipelago
  • many scattered islands in a large body of
    water
  • superconcepts (S_at_dry land)
  • M_at_ARCHIPELAGO (COMB 1.522 NAME 131 DEF
    1.33 TAX 0.000)
  • a sea with many islands
  • superconcepts (M_at_SEA)

A person or a function?
Land or sea?
22
When are two concepts the same? Guarinos
Identity Criteria
  • Material the stuff
  • Topological the shape
  • Morphological the parts
  • Functional the use
  • Meronymical the members
  • Social the societal role
  • (see also Pustejovskys qualia)

A water glass, before and after being smashed
the ACL in 1964 and in 2064
23
Shishkebobs (Hovy et al. in prep)
  • Library ISA Building (and hence cant buy things)
  • Library ISA Institution (and hence can buy
    things)
  • SO Building ? Institution ? Location a
    Library is all these
  • Also Country ? Nation ? Government (GPE)
  • France the land, the people, and the rulers
  • Also Field-of-Study ? Activity ?
    Result-of-Process
  • (Science, Medicine, Architecture, Art)
  • Also Company ? Product ? Stock
  • He worked at Coke, drank Coke, and owned Coke
    (shares)
  • We found about 400 potential shishkebobs
  • Shishkebobs Concept senses or metonymy
    rings A continuum, from on-the-fly meaning
    shadings to full metonymy
  • Link regular alternation possibilities at general
    level in ontology allow meaning shift for
    semantic interpretation, where needed
  • Using shishkebobs makes merging ontologies easier
    (possible?) you respect each ontologys
    perspective

24
Outline
  1. Some ontologies middle models
  2. Alignment
  3. Step 1 semi-automated
  4. Step 2 manual
  5. Upper Model features
  6. Omega

25
Problem 2 Upper Model features Local lattices
  • The standard KR approach
  • Find a primitive conceptundefined
  • Specialize it in various ways by adding various
    differentiae
  • Define these differentiae elsewhere in the
    ontology
  • Dont confuse definitional aspects with mere
    properties!
  • An apple is-a fruit with essential differentium
    XXX and with properties colourred,
    sizetennis-ball-sized
  • Problems
  • What are the differentiae?
  • How do you order them?
  • Local lattices
  • Create small lattices localized points of
    differentium combination

26
Omega Upper Model
  • About 300 concepts
  • Built by hand
  • Several mutually exclusive branch points
  • Several local lattices
  • Top children Object, Event, Property

27
Outline
  1. Some ontologies middle models
  2. Alignment
  3. Step 1 semi-automated
  4. Step 2 manual
  5. Upper Model features
  6. Omega

28
Omega content and framework
www.omega.edu
Goal one environment for various ontologies and
resources
  • Concepts 120,604 Concept/term entries 76 MB
  • WordNet (Princeton Miller Fellbaum)
  • Mikrokosmos (NMSU Nirenburg et al.)
  • Penman Upper Model (ISI Bateman et al.)
  • 25,000 Noun-noun compounds (ISI Pantel)
  • Lexicon / sense space
  • 156,142 English words 33,822 Spanish words
  • 271,243 word senses
  • 13,000 frames of verb arg structure with case
    roles
  • LCS case roles (Dorr) 6.3MB
  • PropBank roleframes (Palmer et al.) 5.3MB
  • Framenet roleframes (Fillmore et al.) 2.8MB
  • WordNet verb frames (Fellbaum) 1.8MB
  • Associated information (not all complete)
  • WordNet subj domains (Magnini Cavaglia) 1.2
    MB
  • Various relations learned from text (ISI
    Pantel)
  • TAP domain groupings (Stanford Guha)
  • SemCor term frequencies 7.5MB
  • Topic signatures (Basque U Agirre et al.) 2.7GB
  • Instances 10.1 GB
  • 1.1 million persons harvested from text
  • 765,000 facts harvested from text
  • 5.7 million locations from USGS and NGA
  • Framework (over 28 million statements of
    concepts, relations, instances)
  • Available in PowerLoom
  • Instances in RDF
  • With database/MYSQL
  • Online browser
  • Clustering software
  • Term and ontology alignment software

29
Omega browser Mammoth
30
Omega hierarchy display
31
Omega sense frames
32
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com