An Overview of Schema Matching Approaches - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

An Overview of Schema Matching Approaches

Description:

A mapping is defined as a set of mapping elements each of which specifies that ... Cryptic names. Homonyms. Preprocessing (tokenization, removing of stop words, ... – PowerPoint PPT presentation

Number of Views:283
Avg rating:3.0/5.0
Slides: 31
Provided by: alsh4
Category:

less

Transcript and Presenter's Notes

Title: An Overview of Schema Matching Approaches


1
An Overview of Schema Matching Approaches
  • Alsayed Algergawy

2
Outline
  • What is schema matching?
  • Where is schema matching used?
  • Schema matching challenges
  • Generic schema matching system architecture
  • Schema matching approaches
  • Existing prototypes and comparison
  • Conclusions

3
Why surveying?
  • Survey is useful when
  • Comparing different approaches of schema matching
  • Developing a new match algorithm
  • Implementing a schema matching component

4
Schema Matching
  • Schema matching is define as the task of
    finding the semantic correspondences between
    elements of two schemas.

S1
Match
Match Result
S2
Auxiliary information
5
  • Input information
  • Schema information
  • Instance data
  • Auxiliary information
  • Output information
  • A mapping is defined as a set of mapping
    elements each of which specifies that certain
    elements of S1 are mapped to certain elements of
    S2 ltID, Si1 , Sj2 ,Rgt

Element names, data types, constraints,
Used to characterize the content and semantics of
schema elements
Dictionaries, user input, previous match result
Mapping (match result)
6
Where is schema matching used?
  • To motivate the importance of schema
    matching, we summarize its use is several
    application domains
  • Database application domains
  • Data integration
  • Data warehouse
  • E-commerce
  • Query processing
  • Peer data management
  • Model management
  • Semantic Web
  • Semantic web services
  • Xml/html to an ontology

7
Data Integration
  • Problem Construct a global view from a set of
    independently constructed schemas.
  • - Different structure and terminologies
  • In the AI setting, this is the problem of
    integrating independently developed ontologies
    into a single ontology.
  • Solution Schema Matching is performed to find
    relationships between concepts in each schema.
    Then the matching elements can be unified.

8
Data Warehouse
  • Problem Integrating data sources into a data
    warehouse.
  • - Different formats between the source and
    warehouse.
  • Solution Use matching to find the elements of
    the source that are also present in the
    warehouse. Then the details of the semantics can
    be examined to integrate the two.

9
E-commerce
  • Problem Message translation.
  • -Each trading partner uses its own message
    format.
  • Solution A match operation would reduce the
    amount of manual work to specify how the formats
    are related.

10
Query Processing
  • Problem The terms used in the users query may
    be different from those in the database.
  • Solution Matching is used to map the
    user-specified concepts in the query to schema
    elements.

11
Challenges of Schema Matching
  • Despite its pervasiveness and importance,
    schema matching remains an extremely difficult
    problem
  • The semantics of the involved elements can be
    inferred from only a few information sources,
    typically the creators of data, documentation,
    and associated schema and data.
  • Schema elements are matched based on clues in the
    schema and data.
  • Schema and data clues are often incomplete.
  • To decide that element s of schema S matches
    element t of schema T, one must examine all other
    elements of T to make sure that there is no other
    element that matches s better than t.
  • To make matters worse, matching is often
    subjective, depending on the application.

12
Challenges (cont.)
  • Currently, the schema matching is largely
    performed manually, then it is a
  • tedious
  • time consuming
  • error prone, and
  • expensive
  • process.
  • To reduce the amount of manual effort as much
    as possible, approaches to semi automatically are
    required.
  • It is not possible to fully automatically
    determined all correspondences between two
    schemas due to of their semantic heterogeneity

13
Generic Schema Matching System Architecture
Application/Tool1 (semantic web)
Application/Tool 2 (E-commerce)
Application/Tool 3 (warehousing)
Application/Tool 4 (schema integration)

Schema import/export
Generic match implementation
General libraries
Internal schema representation
14
General Schema Matching Procedure
  • The schema matching process requires the
    following main steps
  • Importing external schemas
  • Identifying the elements to be matched
  • Applying the match algorithm(s)
  • Exporting the match results

15
Schema Matching Approaches
  • The schema matching approaches differ from
    each other according to
  • The input information
  • The way the input information is processed, and
  • The characteristics of the output information
  • It can be
  • Individual / combining
  • For individual matchers
  • Schema vs. instance
  • Element vs. structure
  • Language vs. constraint
  • Matching cardinality
  • Auxiliary information
  • For the combining matchers
  • Hybrid vs. composite

16
Classification of Schema Matching Approaches
Schema Matching Approaches
Individual Matchers
Combining Matchers
Schema-based
Instance-based
Hybrid Matchers
Composite Matchers
Structure Level
Element Level
Element Level
Constraint-based
Linguistic
Constraint-based
Constraint-based
Linguistic
Further Criteria -Match Cardinality
-Auxiliary information used




  • Word Frequency
  • Name Similarity
  • Description Similarity
  • Global Namespaces
  • Group Matching
  • Type Similarity
  • Key Properties
  • Value Pattern and Ranges

Sample Approaches
Bernstein P, Rahm E. A survey of approaches to
automatic schema matching
17
Schema-based Approaches
  • considering only schema information
  • including different properties of schema elements
    and relationships between them
  • applying either on element level or structure
    level
  • The two main approaches are linguistic-based and
    constraint-based

18
Linguistic-based Approaches
Syntactically Comparing name strings
Semantically Comparing name meaning
Exact string matching (equality of names)
approximate string matching (n-gram, soundex,)
Make use of auxiliary source, terminological
relationships, (synonyms, )
19
The issues affecting name matching
  • Multi-word names
  • Cryptic names
  • Homonyms

Preprocessing (tokenization, removing of stop
words,)
Abbreviation dictionary are required
20
Instance-based Approaches
  • Why?
  • When useful schema information is limited
  • To complement schema-based approaches
  • To match instance-level data
  • Limitation
  • The availability of representative data
  • The quality of the provided instance data
  • The execution time

21
Instance-based approaches
Element-level
Structure-level
Heuristic comparison
AI techniques
Text-based
Numerical/string
Using IR techniques keywords applying name
matching
Constraint-based chts data length, data type,
value range, value distribution
22
Combination Approaches
23
Existing Prototypes
Schema-based Prototypes
Instance-based Prototypes
  • SEMINT 94,2000
  • AutoPlex 2001
  • CLIO 2001
  • LSD 2001,2002
  • Corpus-based matching 2003
  • IMAP 2004
  • DELTA 95
  • DIKE 98
  • MOMIS 99
  • CUPID 2001
  • SF 2002
  • COMA 2001
  • COMA 2005

24
Prototype Comparison
  • Architecture
  • Schema representation
  • Result (mapping) representation
  • Input information
  • Matching algorithms
  • Match execution

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Conclusions
  • Schema matching is such a pervasive, important,
    and extremely difficult problem
  • Schema matching is primarily studied as a piece
    of other applications
  • It is should be studied independently
  • Schema matching is still performed manually
  • The existing approaches are incomplete solutions,
    exploiting at most a few approaches

29
References
  • Aumüller, D., H.H. Do, S. Massmann, E. Rahm
    Schema and Ontology Matching with COMA
    (Software Demonstration). Proc. 24. ACM SIGMOD
    Intl. Conf. Management of Data, 2005
  • Berlin, J., A. Motro Database Schema Matching
    Using Machine Learning with Feature Selection.
    Proc. 14. Intl. Conf. Advanced Information
    Systems Engineering (CAiSE), 2002
  • Clifton, C., E. Housman, A. Rosenthal Experience
    with a Combined Approach to Attribute-Matching
    Across Heterogeneous Databases. Proc. IFIP 2.6
    Working Conf. Database Semantics, 1996
  • Do, H.H., E. Rahm COMA - A System for Flexible
    Combination of Schema Matching Approach. Proc.
    Intl. Conf. Very Large Databases (VLDB), 2002
  • Doan, A.H., A. Halevy Semantic Integration
    Research in the Database Community A Brief
    Survey. AI Magazine, Special Issue on Semantic
    Integration, 2005
  • Rahm, E., P.A. Bernstein A Survey of Approaches
    to Automatic Schema Matching. VLDB Journal,
    10(4), 2001

30
Thank you
Write a Comment
User Comments (0)
About PowerShow.com