Semantic Schema Matching - PowerPoint PPT Presentation

About This Presentation
Title:

Semantic Schema Matching

Description:

C4 = CElectronics (CCameras CPhoto) CDigital Cameras. Two types of concepts of nodes. Cameras and Photo. Electronics. Digital Cameras. PC. 1. 2. 3. 4. 10 ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 29
Provided by: PavelS7
Category:

less

Transcript and Presenter's Notes

Title: Semantic Schema Matching


1
Semantic Schema Matching
Pavel Shvaiko
joint work with Fausto Giunchiglia and Mikalai
Yatskevich
13th International Conference on
Cooperative Information Systems
(CoopIS) 3 November 2005, Agia Napa, Cyprus
2
Outline
  • Introduction
  • Semantic Matching
  • Element Level
  • Structure Level
  • Semantic Matching with Attributes
  • Comparative Evaluation
  • Conclusions and Future Work

3
Introduction
Information sources (e.g., XML schemas) can be
viewed as graph-like structures containing terms
and their inter-relationships
Matching takes two graph-like structures and
produces a mapping between the nodes of the
graphs that correspond semantically to each other
4
  • Semantic Matching

5
Semantic Matching
Semantic Matching Given two graphs G1 and G2,
for any node n1i ? G1, find the strongest
semantic relation R holding with node n2j ? G2
We compute semantic relations by analyzing the
meaning (concepts, not labels) which is codified
in the elements and the structures of schemas
Technically, labels at nodes written in natural
language are translated into propositional
logical formulas which explicitly codify the
labels intended meaning. This allows us to
codify the matching problem into a propositional
validity problem
6
Concept of a label concept of a node
Concept of a label is the propositional formula
which stands for the set of data instances that
one would classify under a label it
encodes Concept at a node is the propositional
formula which represents the set of data
instances which one would classify under a node,
given that it has a certain label and that it is
in a certain position in a tree
7
Four Macro Steps
  • For all labels in T1 and T2 compute concepts at
    labels
  • For all nodes in T1 and T2 compute concepts at
    nodes
  • For all pairs of labels in T1 and T2 compute
    relations between concepts at labels
  • For all pairs of nodes in T1 and T2 compute
    relations between concepts at nodes
  • Steps 1 and 2 constitute the preprocessing
    phase, and are executed once and each time after
    the schema is changed (OFF- LINE part)
  • Steps 3 and 4 constitute the matching phase, and
    are executed every time the two schemas are to be
    matched (ON - LINE part)

Given two labeled trees T1 and T2, do
8
Step 1 compute concepts at labels
  • The idea
  • Translate labels at nodes written in natural
    language into propositional logical formulas
    which explicitly codify the labels intended
    meaning
  • Preprocessing
  • Tokenization. Labels (according to punctuation,
    spaces, etc.) are parsed into tokens. E.g., Photo
    and Cameras ? ltPhoto, and, Camerasgt
  • Lemmatization. Tokens are morphologically
    analyzed in order to find all their possible
    basic forms. E.g., Cameras ? Camera
  • Building atomic concepts. An oracle (WordNet) is
    used to extract senses of lemmas. E.g., Camera
    has 2 senses
  • Building complex concepts. Prepositions,
    conjunctions are translated into logical
    connectives and used to build complex
    conceptsout of the atomic concepts
  • E.g., CCameras_and_Photo ltCameras,
    U(WNCamera)gt ltPhoto, U(WNPhoto)gt,
  • where U is a union of the senses that WordNet
    attaches to lemmas

9
Step 2 compute concepts at nodes
  • The idea
  • Extend concepts at labels by capturing the
    knowledge residing in a structure of a graph in
    order to define a context in which the given
    concept at a label occurs
  • Computation
  • Concept at a node for some node n is computed as
    a conjunction of concepts at labels located above
    the given node, including the node itself

10
  • Semantic Matching
  • Element Level (Step 3)

11
Step 3 compute relations between (atomic)
concepts at labels
  • The idea
  • Exploit a priori knowledge, e.g., lexical, domain
    knowledge, with the help of element level
    semantic matchers

12
Step 3 Element level semantic matchers
  • Sense-based matchers have two WordNet senses in
    input and produce semantic relations exploiting
    (direct) lexical relations of WordNet
  • String-based matchers have two labels in input
    and produce semantic relations exploiting string
    comparison techniques

13
Step 3 Sense-based matchers. WordNet
  • WordNet computes relations between schema
    entities in terms of lexical relationships of
    WordNet
  • A ? B if A is a hyponym or meronym of B
  • Brand ? Name
  • A ? B if A is a hypernym or holonym of B
  • Europe ? Greece
  • A B if they are synonyms
  • Quantity Amount
  • A ? B if they are antonyms or siblings in part of
    hierarchy
  • Microprocessors ? PC Board

14
Step 3 String-based matchers. Suffix
  • Suffix takes as input two strings and checks
    whether the first one ends with the second one.
    It returns equivalence relation in this case, and
    Idk otherwise
  • Suffix is efficient in matching cognate words
    and similar acronyms, but often syntactic
    similarity does not imply semantic relatedness
  • PID ID
  • telephone phone
  • sword word

15
  • Semantic Matching
  • Structure Level (Step 4)

16
Step 4 compute relations between concepts at
nodes
  • The idea
  • Decompose the graph (tree) matching problem into
    the set of node matching problems
  • Translate each node matching problem, namely
    pairs of nodes with possible relations between
    them, into a propositional formula
  • Check the propositional formula for validity
  • Assumption
  • Information we use, namely labels of nodes and
    knowledge of WordNet, is globally consistent

17
Step 4 Example of a node matching task
18
Step 4 Efficient semantic matching
  • Conjunctive concepts at nodes
  • Matching formula is Horn
  • Satisfiability can be determined in linear time
  • SAT solver requires quadratic time
  • We developed ad hoc linear time reasoning
    procedure
  • Avoid conversion to propositional formula
  • Reason on the axioms matrix
  • Disjunctive concepts at nodes
  • Matching formula is not in CNF by construction
  • Most SAT solvers require the input formula to be
    in CNF
  • Conversion to CNF may lead to exponential space
    explosion
  • Exploit structure preserving transformation
  • Size of formula in CNF is linear with respect to
    original formula

19
  • Semantic Matching
  • with Attributes

20
Semantic matching with attributes
  • Observations
  • Semantic matching is based on the idea of
    matching concepts, not their direct physical
    implementations
  • Determining mappings is a first step towards the
    ultimate goal of, e.g., data translation, query
    mediation,
  • Attributes are ltattribute-name, typegt pairs
  • ltPID, stringgt, ltBrand, stringgt, ltID, intgt
  • Two alternatives
  • Exploit datatypes
  • Discard datatypes
  • Appropriateness
  • ltPrice, doublegt ? ltPrice, floatgt
  • ltPrice, doublegt ltPrice, floatgt

21
  • Comparative Evaluation

22
Testing methodology
  • Matching systems
  • S-Match vs. Cupid, COMA and SF as implemented
    in Rondo
  • Measuring match quality and performance
  • Expert mappings are inherently subjective
  • Two degrees of freedom
  • Directionality
  • Use of Oracles
  • Indicators
  • Precision, 0,1 Recall, 0,1
  • Overall, -1,1 F-measure, 0,1
  • Time, sec.

23
Experimental results
PC PIV 1,7Ghz 512Mb. RAM Win XP
24
  • Conclusions and Future Work

25
Conclusions
  • Semantic schema matching approach
  • builds on the advances of the previous
    solutions at the element level by providing a
    library of element level matchers
  • guarantees correctness and completeness of its
    results at the structure level by using
    model-based techniques
  • Quality and performance evaluation (S-Match)
  • Automated reasoning techniques (e.g., SAT)
    provide good performance for industrial-strength
    matching tasks
  • The challenge is not efficiency but rather
    missing domain knowledge

26
Future work
  • Iterative semantic matching
  • Recall increase via multiple strategies
  • Interactive semantic matching
  • GUI
  • Cutomizing technology
  • Extensive evaluation
  • Testing methodology
  • Industry-strength tasks

27
References
  • Project website - ACCORD http//www.dit.unitn.it/
    accord/
  • Ontology Matching http//www.OntologyMatching.org
  • F. Giunchiglia, P. Shvaiko Semantic matching.
    Knowledge Engineering Review Journal, 18(3),
    2003.
  • P. Bouquet, L. Serafini, S. Zanobini Semantic
    coordination a new approach and an application.
    In Proceedings of ISWC, 2003.
  • P. Avesani, F. Giunchiglia, M. Yatskevich A
    large scale taxonomy mapping evaluation. In
    Proceedings of ISWC, 2005.
  • P. Shvaiko and J. Euzenat A survey of
    schema-based matching approaches. Journal on Data
    Semantics, IV, 2005.
  • C. Ghidini, F. Giunchiglia Local models
    semantics, or contextual reasoning locality
    compatibility. Artificial Intelligence Journal,
    127(3), 2001.

28
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com