Semantic Schema Matching - PowerPoint PPT Presentation

About This Presentation

Title:

Semantic Schema Matching

Description:

C4 = CElectronics (CCameras CPhoto) CDigital Cameras. Two types of concepts of nodes. Cameras and Photo. Electronics. Digital Cameras. PC. 1. 2. 3. 4. 10 ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 29

Provided by: PavelS7

Category:

more less

Transcript and Presenter's Notes

Title: Semantic Schema Matching

1
Semantic Schema Matching
Pavel Shvaiko
joint work with Fausto Giunchiglia and Mikalai
Yatskevich
13th International Conference on
Cooperative Information Systems
(CoopIS) 3 November 2005, Agia Napa, Cyprus
2
Outline

Introduction
Semantic Matching
Element Level
Structure Level
Semantic Matching with Attributes
Comparative Evaluation
Conclusions and Future Work

3
Introduction
Information sources (e.g., XML schemas) can be
viewed as graph-like structures containing terms
and their inter-relationships
Matching takes two graph-like structures and
produces a mapping between the nodes of the
graphs that correspond semantically to each other
4

Semantic Matching

5
Semantic Matching
Semantic Matching Given two graphs G1 and G2,
for any node n1i ? G1, find the strongest
semantic relation R holding with node n2j ? G2
We compute semantic relations by analyzing the
meaning (concepts, not labels) which is codified
in the elements and the structures of schemas
Technically, labels at nodes written in natural
language are translated into propositional
logical formulas which explicitly codify the
labels intended meaning. This allows us to
codify the matching problem into a propositional
validity problem
6
Concept of a label concept of a node
Concept of a label is the propositional formula
which stands for the set of data instances that
one would classify under a label it
encodes Concept at a node is the propositional
formula which represents the set of data
instances which one would classify under a node,
given that it has a certain label and that it is
in a certain position in a tree
7
Four Macro Steps

For all labels in T1 and T2 compute concepts at
labels
For all nodes in T1 and T2 compute concepts at
nodes
For all pairs of labels in T1 and T2 compute
relations between concepts at labels
For all pairs of nodes in T1 and T2 compute
relations between concepts at nodes
Steps 1 and 2 constitute the preprocessing
phase, and are executed once and each time after
the schema is changed (OFF- LINE part)
Steps 3 and 4 constitute the matching phase, and
are executed every time the two schemas are to be
matched (ON - LINE part)

Given two labeled trees T1 and T2, do
8
Step 1 compute concepts at labels

The idea
Translate labels at nodes written in natural
language into propositional logical formulas
which explicitly codify the labels intended
meaning
Preprocessing
Tokenization. Labels (according to punctuation,
spaces, etc.) are parsed into tokens. E.g., Photo
and Cameras ? ltPhoto, and, Camerasgt
Lemmatization. Tokens are morphologically
analyzed in order to find all their possible
basic forms. E.g., Cameras ? Camera
Building atomic concepts. An oracle (WordNet) is
used to extract senses of lemmas. E.g., Camera
has 2 senses
Building complex concepts. Prepositions,
conjunctions are translated into logical
connectives and used to build complex
conceptsout of the atomic concepts
E.g., CCameras_and_Photo ltCameras,
U(WNCamera)gt ltPhoto, U(WNPhoto)gt,
where U is a union of the senses that WordNet
attaches to lemmas

9
Step 2 compute concepts at nodes

The idea
Extend concepts at labels by capturing the
knowledge residing in a structure of a graph in
order to define a context in which the given
concept at a label occurs
Computation
Concept at a node for some node n is computed as
a conjunction of concepts at labels located above
the given node, including the node itself

Semantic Matching
Element Level (Step 3)

11
Step 3 compute relations between (atomic)
concepts at labels

The idea
Exploit a priori knowledge, e.g., lexical, domain
knowledge, with the help of element level
semantic matchers

12
Step 3 Element level semantic matchers

Sense-based matchers have two WordNet senses in
input and produce semantic relations exploiting
(direct) lexical relations of WordNet
String-based matchers have two labels in input
and produce semantic relations exploiting string
comparison techniques

13
Step 3 Sense-based matchers. WordNet

WordNet computes relations between schema
entities in terms of lexical relationships of
WordNet
A ? B if A is a hyponym or meronym of B
Brand ? Name
A ? B if A is a hypernym or holonym of B
Europe ? Greece
A B if they are synonyms
Quantity Amount
A ? B if they are antonyms or siblings in part of
hierarchy
Microprocessors ? PC Board

14
Step 3 String-based matchers. Suffix

Suffix takes as input two strings and checks
whether the first one ends with the second one.
It returns equivalence relation in this case, and
Idk otherwise
Suffix is efficient in matching cognate words
and similar acronyms, but often syntactic
similarity does not imply semantic relatedness
PID ID
telephone phone
sword word

Semantic Matching
Structure Level (Step 4)

16
Step 4 compute relations between concepts at
nodes

The idea
Decompose the graph (tree) matching problem into
the set of node matching problems
Translate each node matching problem, namely
pairs of nodes with possible relations between
them, into a propositional formula
Check the propositional formula for validity
Assumption
Information we use, namely labels of nodes and
knowledge of WordNet, is globally consistent

17
Step 4 Example of a node matching task
18
Step 4 Efficient semantic matching

Conjunctive concepts at nodes
Matching formula is Horn
Satisfiability can be determined in linear time
SAT solver requires quadratic time
We developed ad hoc linear time reasoning
procedure
Avoid conversion to propositional formula
Reason on the axioms matrix
Disjunctive concepts at nodes
Matching formula is not in CNF by construction
Most SAT solvers require the input formula to be
in CNF
Conversion to CNF may lead to exponential space
explosion
Exploit structure preserving transformation
Size of formula in CNF is linear with respect to
original formula

Semantic Matching
with Attributes

20
Semantic matching with attributes

Observations
Semantic matching is based on the idea of
matching concepts, not their direct physical
implementations
Determining mappings is a first step towards the
ultimate goal of, e.g., data translation, query
mediation,
Attributes are ltattribute-name, typegt pairs
ltPID, stringgt, ltBrand, stringgt, ltID, intgt

Two alternatives
Exploit datatypes
Discard datatypes
Appropriateness
ltPrice, doublegt ? ltPrice, floatgt
ltPrice, doublegt ltPrice, floatgt

Comparative Evaluation

22
Testing methodology

Matching systems
S-Match vs. Cupid, COMA and SF as implemented
in Rondo

Measuring match quality and performance
Expert mappings are inherently subjective
Two degrees of freedom
Directionality
Use of Oracles
Indicators
Precision, 0,1 Recall, 0,1
Overall, -1,1 F-measure, 0,1
Time, sec.

23
Experimental results
PC PIV 1,7Ghz 512Mb. RAM Win XP
24

Conclusions and Future Work

25
Conclusions

Semantic schema matching approach
builds on the advances of the previous
solutions at the element level by providing a
library of element level matchers
guarantees correctness and completeness of its
results at the structure level by using
model-based techniques
Quality and performance evaluation (S-Match)
Automated reasoning techniques (e.g., SAT)
provide good performance for industrial-strength
matching tasks
The challenge is not efficiency but rather
missing domain knowledge

26
Future work

Iterative semantic matching
Recall increase via multiple strategies
Interactive semantic matching
GUI
Cutomizing technology
Extensive evaluation
Testing methodology
Industry-strength tasks

27
References

Project website - ACCORD http//www.dit.unitn.it/
accord/
Ontology Matching http//www.OntologyMatching.org
F. Giunchiglia, P. Shvaiko Semantic matching.
Knowledge Engineering Review Journal, 18(3),
2003.
P. Bouquet, L. Serafini, S. Zanobini Semantic
coordination a new approach and an application.
In Proceedings of ISWC, 2003.
P. Avesani, F. Giunchiglia, M. Yatskevich A
large scale taxonomy mapping evaluation. In
Proceedings of ISWC, 2005.
P. Shvaiko and J. Euzenat A survey of
schema-based matching approaches. Journal on Data
Semantics, IV, 2005.
C. Ghidini, F. Giunchiglia Local models
semantics, or contextual reasoning locality
compatibility. Artificial Intelligence Journal,
127(3), 2001.