Ontology mapping needs context - PowerPoint PPT Presentation

About This Presentation

Title:

Ontology mapping needs context

Description:

CRISP: 700 concepts, broader-than. MeSH: 1475 concepts, broader-than ... CRISP (738) MeSH (1475) FMA (75.000) anchoring. anchoring. mapping. inference. 27 ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 45

Provided by: frankvan4

Category:

more less

Transcript and Presenter's Notes

Title: Ontology mapping needs context

1
Ontology mapping needs context approximation

Frank van Harmelen
Vrije Universiteit Amsterdam

2
Or

How to make ontology-mapping less like data-base
integration

andmore like a social conversation

3
Two obvious intuitions

The Semantic Web needs ontology mapping

Ontology mapping needs background knowledge

Ontology mapping needs approximation

4
Which Semantic Web?

Version 1"Semantic Web as Web of Data" (TBL)
recipeexpose databases on the web, use RDF,
integrate
meta-data from
expressing DB schema semantics in machine
interpretable ways
enable integration and unexpected re-use

5
Which Semantic Web?

Version 2Enrichment of the current Web
recipeAnnotate, classify, index
meta-data from
automatically producing markup named-entity
recognition, concept extraction, tagging, etc.
enable personalisation, search, browse,..

6
Which Semantic Web?

Version 1Semantic Web as Web of Data

Version 2Enrichment of the current Web

data-oriented

Different use-cases
Different techniques
Different users

user-oriented
7
Which Semantic Web?

Version 1Semantic Web as Web of Data

Version 2Enrichment of the current Web

But both need ontologies for semantic agreement

between sources
between source user
8
Ontology research is almost done..
?

we know what they areconsensual, formalised
models of a domain
we know how to make and maintain them (methods,
tools, experience)
we know how to deploy them(search,
personalisation, data-integration, )
Main remaining open questions
Automatic construction (learning)
Automatic mapping (integration)

9
Three obvious intuitions

The Semantic Web needs ontology mapping

Ontology mapping needs background knowledge

Ph.D. student
AIO
?

Ontology mapping needs approximation

?
young researcher
post-doc
10

This work with
Zharko Aleksovski Michel Klein

11
Does context knowledge help mapping?
12
The general idea
background knowledge
inference
source
target
mapping
13
a realistic example

Two Amsterdam hospitals (OLVG, AMC)
Two Intensive Care Units, different vocabs
Want to compare quality of care
OLVG-1400
1400 terms in a flat list
used in the first 24 hour of stay
some implicit hierarchy e.g.6 types of Diabetes
Mellitus)
some reduncy (spelling mistakes)
AMC similar list, but from different hospital

14
Context ontology used

DICE
2500 concepts (5000 terms), 4500 links
Formalised in DL
five main categories
tractus (e.g. nervous_system, respiratory_system)
aetiology (e.g. virus, poising)
abnormality (e.g. fracture, tumor)
action (e.g. biopsy, observation, removal)
anatomic_location (e.g. lungs, skin)

15
Baseline Linguistic methods

Combine lexical analysis with hierarchical
structure
313 suggested matches, around 70 correct
209 suggested matches, around 90 correct
High precision, low recall (the easy cases)

16
Now use background knowledge
DICE (2500 concepts, 4500 links)
inference
OLVG (1400, flat)
AMC (1400, flat)
mapping
17
Example found with context knowledge (beyond
lexical)
18
Example 2
19
Anchoring strength

Anchoring substring trivial morphology

anchored on N aspects OLVG OLVG AMC AMC
N5 N4 N3 N2 N1 0 0 4 144 401 2 198 711 285 208
total nr. of anchored terms total nr. of anchorings 549 1298 39 1404 5816 96
20
Results

Example matchings discovered
OLVG Acute respiratory failure AMC Asthma
cardiale
OLVG Aspergillus fumigatus AMC Aspergilloom
OLVG duodenum perforation AMC Gut perforation
OLVG HIVAMC AIDS
OLVG Aorta thoracalis dissectie type B
AMC Dissection of artery

21
Experimental results

Source target flat lists of 1400 ICU terms
each
Background DICE (2300 concepts in DL)
Manual Gold Standard (n200)

22
Does more context knowledge help?
23
Adding more context

Only lexical
DICE (2500 concepts)
MeSH (22000 concepts)
ICD-10 (11000 concepts)
Anchoring strength

DICE MeSH ICD10
4 aspects 0 8 0
3 aspects 0 89 0
2 aspects 135 201 0
1 aspect 413 694 80
total 548 992 80
24
Results with multiple ontologies
Separate Lexical ICD-10 DICE MeSH
Recall Precision 64 95 64 95 76 94 88 89

Monotonic improvement
Independent of order
Linear increase of cost

Joint
25
does structured context knowledge help?
26
Exploiting structure

CRISP 700 concepts, broader-than
MeSH 1475 concepts, broader-than
FMA 75.000 concepts, 160 relation-types(we
used is-a part-of)

27
Using the structure or not ?

(S lta B) (B lt B) (B lta T) ! (S lti T)

28
Using the structure or not ?

(S lta B) (B lt B) (B lta T) ! (S lti T)
No use of structure
Only stated is-a part-of
Transitive chains of is-a, and transitive
chains of part-of
Transitive chains of is-a and part-of
One chain of part-of before one chain of is-a

29
Examples
30
Examples
31
Matching results (CRISP to MeSH)
(Golden Standard n30)
Recall total incr.
Exp.1Direct Exp.2Indir. is-a part-of Exp.3Indir. separate closures Exp.4Indir. mixed closures Exp.5Indir. part-of before is-a 448 395 395 395 395 417 516 933 1511 972 156 405 1402 2228 1800 1021 1316 2730 4143 3167 - 29 167 306 210
Precision total correct
Exp.1Direct Exp.4Indir. mixed closures Exp.5Indir. part-of before is-a 17 14 14 18 39 37 3 59 50 38 112 101 100 94 100
32
Three obvious intuitions

The Semantic Web needs ontology mapping

Ontology mapping needs background knowledge

Ontology mapping needs approximation

?
young researcher
post-doc
33

This work with
Zharko Aleksovski Risto Gligorov
Warner ten Kate

34
Approximating subsumptions(and hence mappings)

query A v B ?
B B1 u B2 u B3 ?A v B1, A v B2, A v B3 ?

B1
B3
A
35
Approximating subsumptions

Use Google distance to decide which
subproblems are reasonable to focus on
Google distance
where
f(x) is the number of Google hits for x
f(x,y) is the number of Google hits for the
tuple of search items x and y
M is the number of web pages indexed by Google

symmetric conditional probability of
co-occurrence
estimate of semantic distance
estimate of contribution to B1 u B2 u B3

36
Google distance
HIDDEN
37
Google distance
animal
plant
sheep
cow
vegeterian
madcow
38
Google for sloppy matching

Algorithm for A v B (BB1 u B2 u B3)
determine NGD(B, Bi)?i, i1,2,3
incrementally
increase sloppyness threshold ?
allow to ignore A v Bi with ? ?i ?
match if remaining A v Bj hold

39
Properties of sloppy matching

When sloppyness threshold ? goes up,set of
matches grows monotonically
?0 classical matching
?1 trivial matching
Ideally compute ?i such that
desirable matches become true at low ?
undesirable matches become true only at high
?
Use random selection of Bi as baseline

?
40
Experiments in music domain
Size 465 classes
very sloppy terms ? good
Depth 2 levels
41
Experiment
Manual Gold Standard, N50, random pairs
? 0.53
97

? 0.5
60
classical
precision
random
NGD
recall
20
7
16-05-2006
42
wrapping up
43
Three obvious intuitions