Title: An Overview of Schema Matching Approaches
1An Overview of Schema Matching Approaches
2Outline
- What is schema matching?
- Where is schema matching used?
- Schema matching challenges
- Generic schema matching system architecture
- Schema matching approaches
- Existing prototypes and comparison
- Conclusions
3Why surveying?
- Survey is useful when
- Comparing different approaches of schema matching
- Developing a new match algorithm
- Implementing a schema matching component
4Schema Matching
- Schema matching is define as the task of
finding the semantic correspondences between
elements of two schemas.
S1
Match
Match Result
S2
Auxiliary information
5- Input information
- Schema information
- Instance data
- Auxiliary information
- Output information
- A mapping is defined as a set of mapping
elements each of which specifies that certain
elements of S1 are mapped to certain elements of
S2 ltID, Si1 , Sj2 ,Rgt
Element names, data types, constraints,
Used to characterize the content and semantics of
schema elements
Dictionaries, user input, previous match result
Mapping (match result)
6Where is schema matching used?
- To motivate the importance of schema
matching, we summarize its use is several
application domains - Database application domains
- Data integration
- Data warehouse
- E-commerce
- Query processing
- Peer data management
- Model management
- Semantic Web
- Semantic web services
- Xml/html to an ontology
7Data Integration
- Problem Construct a global view from a set of
independently constructed schemas. - - Different structure and terminologies
- In the AI setting, this is the problem of
integrating independently developed ontologies
into a single ontology. - Solution Schema Matching is performed to find
relationships between concepts in each schema.
Then the matching elements can be unified.
8Data Warehouse
- Problem Integrating data sources into a data
warehouse. - - Different formats between the source and
warehouse. - Solution Use matching to find the elements of
the source that are also present in the
warehouse. Then the details of the semantics can
be examined to integrate the two.
9E-commerce
- Problem Message translation.
- -Each trading partner uses its own message
format. - Solution A match operation would reduce the
amount of manual work to specify how the formats
are related.
10Query Processing
- Problem The terms used in the users query may
be different from those in the database. - Solution Matching is used to map the
user-specified concepts in the query to schema
elements.
11Challenges of Schema Matching
- Despite its pervasiveness and importance,
schema matching remains an extremely difficult
problem - The semantics of the involved elements can be
inferred from only a few information sources,
typically the creators of data, documentation,
and associated schema and data. - Schema elements are matched based on clues in the
schema and data. - Schema and data clues are often incomplete.
- To decide that element s of schema S matches
element t of schema T, one must examine all other
elements of T to make sure that there is no other
element that matches s better than t. - To make matters worse, matching is often
subjective, depending on the application.
12Challenges (cont.)
- Currently, the schema matching is largely
performed manually, then it is a - tedious
- time consuming
- error prone, and
- expensive
- process.
- To reduce the amount of manual effort as much
as possible, approaches to semi automatically are
required. - It is not possible to fully automatically
determined all correspondences between two
schemas due to of their semantic heterogeneity
13Generic Schema Matching System Architecture
Application/Tool1 (semantic web)
Application/Tool 2 (E-commerce)
Application/Tool 3 (warehousing)
Application/Tool 4 (schema integration)
Schema import/export
Generic match implementation
General libraries
Internal schema representation
14General Schema Matching Procedure
- The schema matching process requires the
following main steps - Importing external schemas
- Identifying the elements to be matched
- Applying the match algorithm(s)
- Exporting the match results
15Schema Matching Approaches
- The schema matching approaches differ from
each other according to - The input information
- The way the input information is processed, and
- The characteristics of the output information
- It can be
- Individual / combining
- For individual matchers
- Schema vs. instance
- Element vs. structure
- Language vs. constraint
- Matching cardinality
- Auxiliary information
- For the combining matchers
- Hybrid vs. composite
16Classification of Schema Matching Approaches
Schema Matching Approaches
Individual Matchers
Combining Matchers
Schema-based
Instance-based
Hybrid Matchers
Composite Matchers
Structure Level
Element Level
Element Level
Constraint-based
Linguistic
Constraint-based
Constraint-based
Linguistic
Further Criteria -Match Cardinality
-Auxiliary information used
- Name Similarity
- Description Similarity
- Global Namespaces
- Type Similarity
- Key Properties
Sample Approaches
Bernstein P, Rahm E. A survey of approaches to
automatic schema matching
17Schema-based Approaches
- considering only schema information
- including different properties of schema elements
and relationships between them - applying either on element level or structure
level - The two main approaches are linguistic-based and
constraint-based
18Linguistic-based Approaches
Syntactically Comparing name strings
Semantically Comparing name meaning
Exact string matching (equality of names)
approximate string matching (n-gram, soundex,)
Make use of auxiliary source, terminological
relationships, (synonyms, )
19The issues affecting name matching
- Multi-word names
- Cryptic names
- Homonyms
Preprocessing (tokenization, removing of stop
words,)
Abbreviation dictionary are required
20Instance-based Approaches
- Why?
- When useful schema information is limited
- To complement schema-based approaches
- To match instance-level data
- Limitation
- The availability of representative data
- The quality of the provided instance data
- The execution time
21Instance-based approaches
Element-level
Structure-level
Heuristic comparison
AI techniques
Text-based
Numerical/string
Using IR techniques keywords applying name
matching
Constraint-based chts data length, data type,
value range, value distribution
22Combination Approaches
23Existing Prototypes
Schema-based Prototypes
Instance-based Prototypes
- SEMINT 94,2000
- AutoPlex 2001
- CLIO 2001
- LSD 2001,2002
- Corpus-based matching 2003
- IMAP 2004
- DELTA 95
- DIKE 98
- MOMIS 99
- CUPID 2001
- SF 2002
- COMA 2001
- COMA 2005
24Prototype Comparison
- Architecture
- Schema representation
- Result (mapping) representation
- Input information
- Matching algorithms
- Match execution
25(No Transcript)
26(No Transcript)
27(No Transcript)
28Conclusions
- Schema matching is such a pervasive, important,
and extremely difficult problem - Schema matching is primarily studied as a piece
of other applications - It is should be studied independently
- Schema matching is still performed manually
- The existing approaches are incomplete solutions,
exploiting at most a few approaches
29References
- Aumüller, D., H.H. Do, S. Massmann, E. Rahm
Schema and Ontology Matching with COMA
(Software Demonstration). Proc. 24. ACM SIGMOD
Intl. Conf. Management of Data, 2005 - Berlin, J., A. Motro Database Schema Matching
Using Machine Learning with Feature Selection.
Proc. 14. Intl. Conf. Advanced Information
Systems Engineering (CAiSE), 2002 - Clifton, C., E. Housman, A. Rosenthal Experience
with a Combined Approach to Attribute-Matching
Across Heterogeneous Databases. Proc. IFIP 2.6
Working Conf. Database Semantics, 1996 - Do, H.H., E. Rahm COMA - A System for Flexible
Combination of Schema Matching Approach. Proc.
Intl. Conf. Very Large Databases (VLDB), 2002 - Doan, A.H., A. Halevy Semantic Integration
Research in the Database Community A Brief
Survey. AI Magazine, Special Issue on Semantic
Integration, 2005 - Rahm, E., P.A. Bernstein A Survey of Approaches
to Automatic Schema Matching. VLDB Journal,
10(4), 2001
30Thank you