Ontology mapping needs context - PowerPoint PPT Presentation

About This Presentation
Title:

Ontology mapping needs context

Description:

CRISP: 700 concepts, broader-than. MeSH: 1475 concepts, broader-than ... CRISP (738) MeSH (1475) FMA (75.000) anchoring. anchoring. mapping. inference. 27 ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 45
Provided by: frankvan4
Category:

less

Transcript and Presenter's Notes

Title: Ontology mapping needs context


1
Ontology mapping needs context approximation
  • Frank van Harmelen
  • Vrije Universiteit Amsterdam

2
Or
  • How to make ontology-mapping less like data-base
    integration
  • andmore like a social conversation

3
Two obvious intuitions
  • The Semantic Web needs ontology mapping
  • Ontology mapping needs background knowledge
  • Ontology mapping needs approximation

4
Which Semantic Web?
  • Version 1"Semantic Web as Web of Data" (TBL)
  • recipeexpose databases on the web, use RDF,
    integrate
  • meta-data from
  • expressing DB schema semantics in machine
    interpretable ways
  • enable integration and unexpected re-use

5
Which Semantic Web?
  • Version 2Enrichment of the current Web
  • recipeAnnotate, classify, index
  • meta-data from
  • automatically producing markup named-entity
    recognition, concept extraction, tagging, etc.
  • enable personalisation, search, browse,..

6
Which Semantic Web?
  • Version 1Semantic Web as Web of Data
  • Version 2Enrichment of the current Web

data-oriented
  • Different use-cases
  • Different techniques
  • Different users

user-oriented
7
Which Semantic Web?
  • Version 1Semantic Web as Web of Data
  • Version 2Enrichment of the current Web
  • But both need ontologies for semantic agreement

between sources
between source user
8
Ontology research is almost done..
?
  • we know what they areconsensual, formalised
    models of a domain
  • we know how to make and maintain them (methods,
    tools, experience)
  • we know how to deploy them(search,
    personalisation, data-integration, )
  • Main remaining open questions
  • Automatic construction (learning)
  • Automatic mapping (integration)

9
Three obvious intuitions
  • The Semantic Web needs ontology mapping
  • Ontology mapping needs background knowledge

Ph.D. student
AIO
?
  • Ontology mapping needs approximation

?
young researcher
post-doc
10
  • This work with
  • Zharko Aleksovski Michel Klein

11
Does context knowledge help mapping?
12
The general idea
background knowledge
inference
source
target
mapping
13
a realistic example
  • Two Amsterdam hospitals (OLVG, AMC)
  • Two Intensive Care Units, different vocabs
  • Want to compare quality of care
  • OLVG-1400
  • 1400 terms in a flat list
  • used in the first 24 hour of stay
  • some implicit hierarchy e.g.6 types of Diabetes
    Mellitus)
  • some reduncy (spelling mistakes)
  • AMC similar list, but from different hospital

14
Context ontology used
  • DICE
  • 2500 concepts (5000 terms), 4500 links
  • Formalised in DL
  • five main categories
  • tractus (e.g. nervous_system, respiratory_system)
  • aetiology (e.g. virus, poising)
  • abnormality (e.g. fracture, tumor)
  • action (e.g. biopsy, observation, removal)
  • anatomic_location (e.g. lungs, skin)

15
Baseline Linguistic methods
  • Combine lexical analysis with hierarchical
    structure
  • 313 suggested matches, around 70 correct
  • 209 suggested matches, around 90 correct
  • High precision, low recall (the easy cases)

16
Now use background knowledge
DICE (2500 concepts, 4500 links)
inference
OLVG (1400, flat)
AMC (1400, flat)
mapping
17
Example found with context knowledge (beyond
lexical)
18
Example 2
19
Anchoring strength
  • Anchoring substring trivial morphology

anchored on N aspects OLVG OLVG AMC AMC
N5 N4 N3 N2 N1 0 0 4 144 401 2 198 711 285 208
total nr. of anchored terms total nr. of anchorings 549 1298 39 1404 5816 96
20
Results
  • Example matchings discovered
  • OLVG Acute respiratory failure AMC Asthma
    cardiale
  • OLVG Aspergillus fumigatus AMC Aspergilloom
  • OLVG duodenum perforation AMC Gut perforation
  • OLVG HIVAMC AIDS
  • OLVG Aorta thoracalis dissectie type B
    AMC Dissection of artery

21
Experimental results
  • Source target flat lists of 1400 ICU terms
    each
  • Background DICE (2300 concepts in DL)
  • Manual Gold Standard (n200)

22
Does more context knowledge help?
23
Adding more context
  • Only lexical
  • DICE (2500 concepts)
  • MeSH (22000 concepts)
  • ICD-10 (11000 concepts)
  • Anchoring strength

DICE MeSH ICD10
4 aspects 0 8 0
3 aspects 0 89 0
2 aspects 135 201 0
1 aspect 413 694 80
total 548 992 80
24
Results with multiple ontologies
Separate Lexical ICD-10 DICE MeSH
Recall Precision 64 95 64 95 76 94 88 89
  • Monotonic improvement
  • Independent of order
  • Linear increase of cost

Joint
25
does structured context knowledge help?
26
Exploiting structure
  • CRISP 700 concepts, broader-than
  • MeSH 1475 concepts, broader-than
  • FMA 75.000 concepts, 160 relation-types(we
    used is-a part-of)

27
Using the structure or not ?
  • (S lta B) (B lt B) (B lta T) ! (S lti T)

28
Using the structure or not ?
  • (S lta B) (B lt B) (B lta T) ! (S lti T)
  • No use of structure
  • Only stated is-a part-of
  • Transitive chains of is-a, and transitive
    chains of part-of
  • Transitive chains of is-a and part-of
  • One chain of part-of before one chain of is-a

29
Examples
30
Examples
31
Matching results (CRISP to MeSH)
(Golden Standard n30)
Recall total incr.
Exp.1Direct Exp.2Indir. is-a part-of Exp.3Indir. separate closures Exp.4Indir. mixed closures Exp.5Indir. part-of before is-a 448 395 395 395 395 417 516 933 1511 972 156 405 1402 2228 1800 1021 1316 2730 4143 3167 - 29 167 306 210
Precision total correct
Exp.1Direct Exp.4Indir. mixed closures Exp.5Indir. part-of before is-a 17 14 14 18 39 37 3 59 50 38 112 101 100 94 100
32
Three obvious intuitions
  • The Semantic Web needs ontology mapping
  • Ontology mapping needs background knowledge
  • Ontology mapping needs approximation

?
young researcher
post-doc
33
  • This work with
  • Zharko Aleksovski Risto Gligorov
  • Warner ten Kate

34
Approximating subsumptions(and hence mappings)
  • query A v B ?
  • B B1 u B2 u B3 ?A v B1, A v B2, A v B3 ?

B1
B3
A
35
Approximating subsumptions
  • Use Google distance to decide which
    subproblems are reasonable to focus on
  • Google distance
  • where
  • f(x) is the number of Google hits for x
  • f(x,y) is the number of Google hits for the
    tuple of search items x and y
  • M is the number of web pages indexed by Google
  • symmetric conditional probability of
    co-occurrence
  • estimate of semantic distance
  • estimate of contribution to B1 u B2 u B3

36
Google distance
HIDDEN
37
Google distance
animal
plant
sheep
cow
vegeterian
madcow
38
Google for sloppy matching
  • Algorithm for A v B (BB1 u B2 u B3)
  • determine NGD(B, Bi)?i, i1,2,3
  • incrementally
  • increase sloppyness threshold ?
  • allow to ignore A v Bi with ? ?i ?
  • match if remaining A v Bj hold

39
Properties of sloppy matching
  • When sloppyness threshold ? goes up,set of
    matches grows monotonically
  • ?0 classical matching
  • ?1 trivial matching
  • Ideally compute ?i such that
  • desirable matches become true at low ?
  • undesirable matches become true only at high
    ?
  • Use random selection of Bi as baseline

?
40
Experiments in music domain
Size 465 classes
very sloppy terms ? good
Depth 2 levels
41
Experiment
Manual Gold Standard, N50, random pairs
? 0.53
97

? 0.5
60
classical
precision
random
NGD
recall
20
7
16-05-2006
42
wrapping up
43
Three obvious intuitions
  • The Semantic Web needs ontology mapping
  • Ontology mapping needs background knowledge
  • Ontology mapping needs approximation

44
So that
  • shared context approximationmake
    ontology-mapping a bit more like a social
    conversation

45
Future Distributed/P2P setting
background knowledge
inference
source
target
mapping
46
Vragen discussie
  • Frank.van.Harmelen_at_cs.vu.nl
  • http//www.cs.vu.nl/frankh
Write a Comment
User Comments (0)
About PowerShow.com