Title: Semantic Schema Matching
1Semantic Schema Matching
Pavel Shvaiko
joint work with Fausto Giunchiglia and Mikalai
Yatskevich
13th International Conference on
Cooperative Information Systems
(CoopIS) 3 November 2005, Agia Napa, Cyprus
2Outline
- Introduction
- Semantic Matching
- Element Level
- Structure Level
- Semantic Matching with Attributes
- Comparative Evaluation
- Conclusions and Future Work
3Introduction
Information sources (e.g., XML schemas) can be
viewed as graph-like structures containing terms
and their inter-relationships
Matching takes two graph-like structures and
produces a mapping between the nodes of the
graphs that correspond semantically to each other
4 5Semantic Matching
Semantic Matching Given two graphs G1 and G2,
for any node n1i ? G1, find the strongest
semantic relation R holding with node n2j ? G2
We compute semantic relations by analyzing the
meaning (concepts, not labels) which is codified
in the elements and the structures of schemas
Technically, labels at nodes written in natural
language are translated into propositional
logical formulas which explicitly codify the
labels intended meaning. This allows us to
codify the matching problem into a propositional
validity problem
6Concept of a label concept of a node
Concept of a label is the propositional formula
which stands for the set of data instances that
one would classify under a label it
encodes Concept at a node is the propositional
formula which represents the set of data
instances which one would classify under a node,
given that it has a certain label and that it is
in a certain position in a tree
7Four Macro Steps
- For all labels in T1 and T2 compute concepts at
labels - For all nodes in T1 and T2 compute concepts at
nodes - For all pairs of labels in T1 and T2 compute
relations between concepts at labels - For all pairs of nodes in T1 and T2 compute
relations between concepts at nodes - Steps 1 and 2 constitute the preprocessing
phase, and are executed once and each time after
the schema is changed (OFF- LINE part) - Steps 3 and 4 constitute the matching phase, and
are executed every time the two schemas are to be
matched (ON - LINE part)
Given two labeled trees T1 and T2, do
8Step 1 compute concepts at labels
- The idea
- Translate labels at nodes written in natural
language into propositional logical formulas
which explicitly codify the labels intended
meaning - Preprocessing
- Tokenization. Labels (according to punctuation,
spaces, etc.) are parsed into tokens. E.g., Photo
and Cameras ? ltPhoto, and, Camerasgt - Lemmatization. Tokens are morphologically
analyzed in order to find all their possible
basic forms. E.g., Cameras ? Camera - Building atomic concepts. An oracle (WordNet) is
used to extract senses of lemmas. E.g., Camera
has 2 senses - Building complex concepts. Prepositions,
conjunctions are translated into logical
connectives and used to build complex
conceptsout of the atomic concepts - E.g., CCameras_and_Photo ltCameras,
U(WNCamera)gt ltPhoto, U(WNPhoto)gt, - where U is a union of the senses that WordNet
attaches to lemmas
9Step 2 compute concepts at nodes
- The idea
- Extend concepts at labels by capturing the
knowledge residing in a structure of a graph in
order to define a context in which the given
concept at a label occurs - Computation
- Concept at a node for some node n is computed as
a conjunction of concepts at labels located above
the given node, including the node itself
10- Semantic Matching
- Element Level (Step 3)
11Step 3 compute relations between (atomic)
concepts at labels
- The idea
- Exploit a priori knowledge, e.g., lexical, domain
knowledge, with the help of element level
semantic matchers
12Step 3 Element level semantic matchers
- Sense-based matchers have two WordNet senses in
input and produce semantic relations exploiting
(direct) lexical relations of WordNet - String-based matchers have two labels in input
and produce semantic relations exploiting string
comparison techniques
13Step 3 Sense-based matchers. WordNet
- WordNet computes relations between schema
entities in terms of lexical relationships of
WordNet - A ? B if A is a hyponym or meronym of B
- Brand ? Name
- A ? B if A is a hypernym or holonym of B
- Europe ? Greece
- A B if they are synonyms
- Quantity Amount
- A ? B if they are antonyms or siblings in part of
hierarchy - Microprocessors ? PC Board
14Step 3 String-based matchers. Suffix
- Suffix takes as input two strings and checks
whether the first one ends with the second one.
It returns equivalence relation in this case, and
Idk otherwise - Suffix is efficient in matching cognate words
and similar acronyms, but often syntactic
similarity does not imply semantic relatedness - PID ID
- telephone phone
- sword word
15- Semantic Matching
- Structure Level (Step 4)
16Step 4 compute relations between concepts at
nodes
- The idea
- Decompose the graph (tree) matching problem into
the set of node matching problems - Translate each node matching problem, namely
pairs of nodes with possible relations between
them, into a propositional formula - Check the propositional formula for validity
- Assumption
- Information we use, namely labels of nodes and
knowledge of WordNet, is globally consistent
17Step 4 Example of a node matching task
18Step 4 Efficient semantic matching
- Conjunctive concepts at nodes
- Matching formula is Horn
- Satisfiability can be determined in linear time
- SAT solver requires quadratic time
- We developed ad hoc linear time reasoning
procedure - Avoid conversion to propositional formula
- Reason on the axioms matrix
- Disjunctive concepts at nodes
- Matching formula is not in CNF by construction
- Most SAT solvers require the input formula to be
in CNF - Conversion to CNF may lead to exponential space
explosion - Exploit structure preserving transformation
- Size of formula in CNF is linear with respect to
original formula
19- Semantic Matching
- with Attributes
20Semantic matching with attributes
- Observations
- Semantic matching is based on the idea of
matching concepts, not their direct physical
implementations - Determining mappings is a first step towards the
ultimate goal of, e.g., data translation, query
mediation, - Attributes are ltattribute-name, typegt pairs
- ltPID, stringgt, ltBrand, stringgt, ltID, intgt
- Two alternatives
- Exploit datatypes
- Discard datatypes
- Appropriateness
- ltPrice, doublegt ? ltPrice, floatgt
- ltPrice, doublegt ltPrice, floatgt
21 22Testing methodology
- Matching systems
- S-Match vs. Cupid, COMA and SF as implemented
in Rondo
- Measuring match quality and performance
- Expert mappings are inherently subjective
- Two degrees of freedom
- Directionality
- Use of Oracles
- Indicators
- Precision, 0,1 Recall, 0,1
- Overall, -1,1 F-measure, 0,1
- Time, sec.
23Experimental results
PC PIV 1,7Ghz 512Mb. RAM Win XP
24- Conclusions and Future Work
25Conclusions
- Semantic schema matching approach
- builds on the advances of the previous
solutions at the element level by providing a
library of element level matchers - guarantees correctness and completeness of its
results at the structure level by using
model-based techniques - Quality and performance evaluation (S-Match)
- Automated reasoning techniques (e.g., SAT)
provide good performance for industrial-strength
matching tasks - The challenge is not efficiency but rather
missing domain knowledge
26Future work
- Iterative semantic matching
- Recall increase via multiple strategies
- Interactive semantic matching
- GUI
- Cutomizing technology
- Extensive evaluation
- Testing methodology
- Industry-strength tasks
27References
- Project website - ACCORD http//www.dit.unitn.it/
accord/ - Ontology Matching http//www.OntologyMatching.org
- F. Giunchiglia, P. Shvaiko Semantic matching.
Knowledge Engineering Review Journal, 18(3),
2003. - P. Bouquet, L. Serafini, S. Zanobini Semantic
coordination a new approach and an application.
In Proceedings of ISWC, 2003. - P. Avesani, F. Giunchiglia, M. Yatskevich A
large scale taxonomy mapping evaluation. In
Proceedings of ISWC, 2005. - P. Shvaiko and J. Euzenat A survey of
schema-based matching approaches. Journal on Data
Semantics, IV, 2005. - C. Ghidini, F. Giunchiglia Local models
semantics, or contextual reasoning locality
compatibility. Artificial Intelligence Journal,
127(3), 2001.
28