Title: Ontology mapping needs context
1Ontology mapping needs context approximation
- Frank van Harmelen
- Vrije Universiteit Amsterdam
2Or
- How to make ontology-mapping less like data-base
integration
- andmore like a social conversation
3Two obvious intuitions
- The Semantic Web needs ontology mapping
- Ontology mapping needs background knowledge
- Ontology mapping needs approximation
4Which Semantic Web?
- Version 1"Semantic Web as Web of Data" (TBL)
- recipeexpose databases on the web, use RDF,
integrate - meta-data from
- expressing DB schema semantics in machine
interpretable ways - enable integration and unexpected re-use
5Which Semantic Web?
- Version 2Enrichment of the current Web
- recipeAnnotate, classify, index
- meta-data from
- automatically producing markup named-entity
recognition, concept extraction, tagging, etc. - enable personalisation, search, browse,..
6Which Semantic Web?
- Version 1Semantic Web as Web of Data
- Version 2Enrichment of the current Web
data-oriented
- Different use-cases
- Different techniques
- Different users
user-oriented
7Which Semantic Web?
- Version 1Semantic Web as Web of Data
- Version 2Enrichment of the current Web
- But both need ontologies for semantic agreement
between sources
between source user
8Ontology research is almost done..
?
- we know what they areconsensual, formalised
models of a domain - we know how to make and maintain them (methods,
tools, experience) - we know how to deploy them(search,
personalisation, data-integration, ) - Main remaining open questions
- Automatic construction (learning)
- Automatic mapping (integration)
9Three obvious intuitions
- The Semantic Web needs ontology mapping
- Ontology mapping needs background knowledge
Ph.D. student
AIO
?
- Ontology mapping needs approximation
?
young researcher
post-doc
10- This work with
- Zharko Aleksovski Michel Klein
11Does context knowledge help mapping?
12The general idea
background knowledge
inference
source
target
mapping
13a realistic example
- Two Amsterdam hospitals (OLVG, AMC)
- Two Intensive Care Units, different vocabs
- Want to compare quality of care
- OLVG-1400
- 1400 terms in a flat list
- used in the first 24 hour of stay
- some implicit hierarchy e.g.6 types of Diabetes
Mellitus) - some reduncy (spelling mistakes)
- AMC similar list, but from different hospital
14Context ontology used
- DICE
- 2500 concepts (5000 terms), 4500 links
- Formalised in DL
- five main categories
- tractus (e.g. nervous_system, respiratory_system)
- aetiology (e.g. virus, poising)
- abnormality (e.g. fracture, tumor)
- action (e.g. biopsy, observation, removal)
- anatomic_location (e.g. lungs, skin)
15Baseline Linguistic methods
- Combine lexical analysis with hierarchical
structure - 313 suggested matches, around 70 correct
- 209 suggested matches, around 90 correct
- High precision, low recall (the easy cases)
16Now use background knowledge
DICE (2500 concepts, 4500 links)
inference
OLVG (1400, flat)
AMC (1400, flat)
mapping
17Example found with context knowledge (beyond
lexical)
18Example 2
19Anchoring strength
- Anchoring substring trivial morphology
anchored on N aspects OLVG OLVG AMC AMC
N5 N4 N3 N2 N1 0 0 4 144 401 2 198 711 285 208
total nr. of anchored terms total nr. of anchorings 549 1298 39 1404 5816 96
20Results
- Example matchings discovered
- OLVG Acute respiratory failure AMC Asthma
cardiale - OLVG Aspergillus fumigatus AMC Aspergilloom
- OLVG duodenum perforation AMC Gut perforation
- OLVG HIVAMC AIDS
- OLVG Aorta thoracalis dissectie type B
AMC Dissection of artery
21Experimental results
- Source target flat lists of 1400 ICU terms
each - Background DICE (2300 concepts in DL)
- Manual Gold Standard (n200)
22Does more context knowledge help?
23Adding more context
- Only lexical
- DICE (2500 concepts)
- MeSH (22000 concepts)
- ICD-10 (11000 concepts)
- Anchoring strength
DICE MeSH ICD10
4 aspects 0 8 0
3 aspects 0 89 0
2 aspects 135 201 0
1 aspect 413 694 80
total 548 992 80
24Results with multiple ontologies
Separate Lexical ICD-10 DICE MeSH
Recall Precision 64 95 64 95 76 94 88 89
- Monotonic improvement
- Independent of order
- Linear increase of cost
Joint
25does structured context knowledge help?
26Exploiting structure
- CRISP 700 concepts, broader-than
- MeSH 1475 concepts, broader-than
- FMA 75.000 concepts, 160 relation-types(we
used is-a part-of)
27Using the structure or not ?
- (S lta B) (B lt B) (B lta T) ! (S lti T)
28Using the structure or not ?
- (S lta B) (B lt B) (B lta T) ! (S lti T)
- No use of structure
- Only stated is-a part-of
- Transitive chains of is-a, and transitive
chains of part-of - Transitive chains of is-a and part-of
- One chain of part-of before one chain of is-a
29Examples
30Examples
31Matching results (CRISP to MeSH)
(Golden Standard n30)
Recall total incr.
Exp.1Direct Exp.2Indir. is-a part-of Exp.3Indir. separate closures Exp.4Indir. mixed closures Exp.5Indir. part-of before is-a 448 395 395 395 395 417 516 933 1511 972 156 405 1402 2228 1800 1021 1316 2730 4143 3167 - 29 167 306 210
Precision total correct
Exp.1Direct Exp.4Indir. mixed closures Exp.5Indir. part-of before is-a 17 14 14 18 39 37 3 59 50 38 112 101 100 94 100
32Three obvious intuitions
- The Semantic Web needs ontology mapping
- Ontology mapping needs background knowledge
- Ontology mapping needs approximation
?
young researcher
post-doc
33- This work with
- Zharko Aleksovski Risto Gligorov
- Warner ten Kate
34Approximating subsumptions(and hence mappings)
- query A v B ?
- B B1 u B2 u B3 ?A v B1, A v B2, A v B3 ?
B1
B3
A
35Approximating subsumptions
- Use Google distance to decide which
subproblems are reasonable to focus on - Google distance
- where
- f(x) is the number of Google hits for x
- f(x,y) is the number of Google hits for the
tuple of search items x and y - M is the number of web pages indexed by Google
- symmetric conditional probability of
co-occurrence - estimate of semantic distance
- estimate of contribution to B1 u B2 u B3
36Google distance
HIDDEN
37Google distance
animal
plant
sheep
cow
vegeterian
madcow
38Google for sloppy matching
- Algorithm for A v B (BB1 u B2 u B3)
- determine NGD(B, Bi)?i, i1,2,3
- incrementally
- increase sloppyness threshold ?
- allow to ignore A v Bi with ? ?i ?
- match if remaining A v Bj hold
39Properties of sloppy matching
- When sloppyness threshold ? goes up,set of
matches grows monotonically - ?0 classical matching
- ?1 trivial matching
- Ideally compute ?i such that
- desirable matches become true at low ?
- undesirable matches become true only at high
? - Use random selection of Bi as baseline
?
40Experiments in music domain
Size 465 classes
very sloppy terms ? good
Depth 2 levels
41Experiment
Manual Gold Standard, N50, random pairs
? 0.53
97
? 0.5
60
classical
precision
random
NGD
recall
20
7
16-05-2006
42wrapping up
43Three obvious intuitions
- The Semantic Web needs ontology mapping
- Ontology mapping needs background knowledge
- Ontology mapping needs approximation
44So that
- shared context approximationmake
ontology-mapping a bit more like a social
conversation
45Future Distributed/P2P setting
background knowledge
inference
source
target
mapping
46Vragen discussie
- Frank.van.Harmelen_at_cs.vu.nl
- http//www.cs.vu.nl/frankh