A survey of approaches to automatic schema matching - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

A survey of approaches to automatic schema matching

Description:

Schema matching is typically done by hand in current implementations ... Homonyms may mislead the matcher. 21. Schema-level matchers. 3. Linguistic approaches ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 36

Provided by: steliospa

Category:

more less

Transcript and Presenter's Notes

Title: A survey of approaches to automatic schema matching

1
A survey of approaches to automatic schema
matching

Presenter Pantouvakis Stelios

2
Introduction

Schema matching is typically done by hand in
current implementations
Drawbacks (time- effort-consuming, error-prone)
Need for automatic schema matching at
data (schema) integration
E-business
data warehousing
semantic query processing

3
The Match operator

Schema set of elements connected with some
structure
Independent of representation (XML, ER-model,
OO-model, directed graph,)
Mapping set of mapping elements (certain
elements from S1 mapped to certain elements from
S2) plus a mapping expression for each mapping
element, (which specifies the relation)

4
The Match operator

Mapping expressions may be
scalar (,lt)
functions (addition, concatenation)
ER-style relationships (is-a, part-of)
set-oriented relationships (overlaps, contains)
Symbol shows mapping elements without
determining the mapping expression

5
The Match operator

The match operation is a function that takes two
schemas S1 and S2 as input and returns a mapping
between them (matching result)
Implementation is similar to Join in that
checking for each element of S1 if each element
in S2 matches and produce an output. But there
are differences
operates on metadata (schema elements)
each element of S1 may match with multiple
elements of S2
many comparison expressions may be use
mappings may have multiple mapping expressions

6
ExampleMappings

Mappings may be
Cust.C Costumer.CustID
Concatenate( Cust.FirstName, Cust.LastName)
Costumer.Contact

7
Architecture of generic match
8
Architecture of generic match

In general, it is not possible to determine fully
automatically all matches between two schemas
The implementation of Match should therefore only
determine match candidates
The user has to accept, reject or change them
The user should be able to specify matches for
elements for which the system was unable to find
satisfactory match candidates

9
Classification of schema matching approaches

One match operator may use multiple matching
algorithms (matchers)
Different matchers work better to different
application domains
categorization of individual matchers is first
checked

10
Classification of schema matching approaches

Instance vs. Schema matchers can consider
instance data or only schema-level information
Element vs. Structure matching match individual
schema elements or combination of elements
Language vs. Constraint matcher can use
linguistic-based approach or constraint-based
approach

11
Classification of schema matching approaches

Matching Cardinality match result may relate
multiple elements of the two schemas
Auxiliary Information matchers may use also
dictionaries, global schemas, previous matching
decisions and user input.

12
(No Transcript)
13
Schema-level matchers

In general
Consider schema information, like name,
description, data type, relationship types
(part-of, is-a, etc), constraints and schema
structure.
Matchers may find multiple match candidates,
attaching to it a degree of similarity in the
range 0-1, in order to identify the best
candidates.

14
Schema-level matchers1. Granularity of
match(element-level vs. structure-level)

Element-level matching
for each element of S1 determine matching
elements in S2
may be at atomic level (attributes) or higher
level (entities, classes, relational tables) but
considers them in isolation, ignoring its
substructure and components

15
Schema-level matchers1. Granularity of
match(element-level vs. structure-level)

Structure-level matching
matches combinations of elements that appear
together in a structure in S1 with combinations
of elements in S2
full match complete structures
partial match some components of each structure
match
may use equivalence patterns (from a library)
(e.g. is-a hierarchy ? single structure with
Boolean attribute)

16
ExampleFull Partial Structural match
Atomic-level match
Address.ZIP CustomerAddress.PostalCode
17
ExampleEquivalence Pattern
18
Schema-level matchers2. Match cardinality

Each element of S1 (or S2) may participate in 0,
1 or many mapping elements.
Within an individual mapping element one or more
S1 elements can match one or more S2 elements.
Cases are
11, 1n, n1 (local cardinality)
nm (global cardinality requires structural
match)
Most existing approaches do 11 and 1n

19
ExampleMatch cardinalities
20
Schema-level matchers3. Linguistic approaches

Matchers use names and text to find semantically
similar schema elements
Need dictionaries (general nature, domain- or
enterprise-specific, even multilanguage)
These specific dictionaries require much effort
to be build up
Homonyms may mislead the matcher

21
Schema-level matchers3. Linguistic approaches

Name matching
equality of names
equality of canonical name (Cust CustNo)
equality of synonyms (make brand)
equality of hypernyms (book is-a publication
article is-a publication book article)
Similarity based on pronunciation or soundex
user-provided name matches (reportsTo
manager)
May be used for element- or structure- based
matchers or even match different levels
(author.name AuthorName)
Not limited to 11 matches(phone homePhone,
officePhone )

22
Schema-level matchers3. Linguistic approaches

Description matching
Use comments of schema elements in natural
language to match elements
simply by extracting words for synonym comparison
or as sophisticated as using natural language
understanding technology for semantically
equivalent expressions

Example
23
Schema-level matchers4. Constraint-based
approaches

Schemas often contain constraints to define data
types and value rangedm uniqueness, optionality,
relationship types and cardinalities.
If both schemas have such information the matcher
can use it to match elements.
Obviously this criterion alone will make many
matching errors.
Still this approach can be combined with other
matchers to limit match candidates

24
Schema-level matchers5. Reusing schema and
mapping information

Improve effectiveness of Match by supporting the
reuse of common schema components (schemas from
same domains are often very similar)
reusable components are from atomic-level
components to entire schema fragments
reuse of previously determined mappings. If
matching S?S2 is already done and S1?S2 matching
is needed, optionally S1?S could be found (if it
is easier)

25
ExampleReuse of previously determined mappings
26
Instance-level matchers

Instance-level data can give insight into the
contents and meaning of schema elements, using
frequencies of words, combination of words, range
of values etc.
Useful when schema information is limited and
when semi-structured data is used
Even when schema information is available this
approach can help decision between equally
plausible matchings

27
Instance-level matchers

Applicable to the most above approaches but
especially to
linguistic based approaches
constrained-based approaches
e.g. A constrained-based matcher may use a
instance-level check to choose Pno EmpNo and
not Pno DeptNo based on the range of values
of the three attributes
Main drawback possible number of schema elements
for evaluating instances

28
Combining different matchers

Several types of matchers. They can be combined
into a single Match operator in two ways
Hybrid matcher that intergrades multiple matching
criteria
Composite matcher that combines the results of
independently executed matchers (including hybrid
matchers)
Approaches must evaluate the possibility of using
criteria simultaneously or in a specific order

29
Combining different matchers Hybrid matcher

Typically uses hard-wired combination of
particular matching techniques that are executed
simultaneously or in a fixed order.
Better match candidates and better performance
than composite matcher
poor match candidates can be filtered out early
reduced number of passes

30
Combining different matchers Composite matcher

Allow a selection between several matchers
The user can choose the matchers to be executed
either simultaneously or in a specific order and
the way to combine results so that it better
applies the particular domain
The composite matcher may find a selection and
order automatically

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Conclusion

User interaction is necessary in any case because
the implementation of Match can only determine
match candidates which a user can accept, reject
or change
The more configurable the matcher is the best
results can be obtained
The current implementations have yet to explore
more general view over the problem (independence
of schema representation, more criteria available
for the user to choose among, applicable in
various domains)

35
Comments

If user must check all matchings and have to
interfere with most of matchers steps, when do
we win time and effort doing the work
automatically?
Time Space complexity of the (multiple)
algorithms?

Write a Comment

User Comments (0)