iMAP: Discovering Complex Semantic Matches between Database Schemas - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

iMAP: Discovering Complex Semantic Matches between Database Schemas

Description:

Discover a small set of candidate matches quickly. Key idea: using search ... Use other techniques to prune out candidates. The Internal of a Searcher. Search strategy ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 27
Provided by: Tru52
Category:

less

Transcript and Presenter's Notes

Title: iMAP: Discovering Complex Semantic Matches between Database Schemas


1
iMAP Discovering Complex Semantic Matches
between Database Schemas
  • Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon
    Halevy, Pedro Domingos
  • Present by Trung Nguyen

2
Introduction
  • Semantic mapping is crucial in combining
    disparate data sources
  • Current works focus on 1-1 mappings
  • This paper explore complex mappings
  • Accomplished by
  • Using searcher
  • Exploiting domain knowledge

3
Motivating Example
4
Transferring Data from S to T
  • Two steps
  • Schema matching
  • 1-1 mapping
  • Complex mapping
  • Create query expressions

5
iMAPs Architecture
6
Match Generator
  • Discover a small set of candidate matches quickly
  • Key idea using search
  • Problem search space is infinite
  • Solution
  • Use different kind of searchers in parallel
  • Special-purpose searchers can be added
  • Use other techniques to prune out candidates

7
The Internal of a Searcher
  • Search strategy
  • How to search efficiently?
  • ? Use beam search
  • Match Evaluation
  • How to evaluate match candidate?
  • ? Apply different techniques machine learning,
    statistics, heuristics
  • Termination Condition
  • When to terminate?
  • ? When the difference between Qk and Qk1 is less
    then some threshold

8
Types of Searchers (1)
  • Text
  • Numeric
  • Category
  • Schema-mismatch
  • Unit conversion
  • Date

9
Types of Searchers (2)
10
Similarity Estimator
  • Further evaluates the candidates resulted from
    Match Generator
  • Problem Accuracy reported by searchers may not
    be high
  • Solution employs multiple evaluator modules
  • Name-based evaluator
  • Naïve Bayes evaluator

11
Match Selector
  • Search for the best global match assignment
  • Problem highest score candidate may not be the
    acceptable
  • Solution
  • Integrate with domain constraints

12
Exploiting Domain Knowledge
  • To direct search process
  • To prune out unqualified candidates

13
Types of Domain Knowledge
  • Domain Constraints? Constraint implies on
  • Single attribute in S
  • Two attributes in S
  • Multiple attributes in T
  • Past Complex Matches
  • Overlap Data
  • External Data

14
Generating Explanations
  • Be able to explain matches being made
  • Help user gain insight into matching process
  • User can provide feedback to converge the matches
    quickly

15
Type of User Questions
  • Explain an existing matches
  • Why match X is present in the output?
  • Explain an absent match
  • Why match Y is NOT present in the output?
  • Explain match ranking
  • Why match X is ranked higher than match Y in the
    output?

16
The Explanation Module
  • Use dependency graph
  • Dependency graph records the flow of matches,
    data and assumptions into and out of system
    components
  • Nodes are
  • Schema attributes
  • Assumptions
  • Candidate matches
  • Domain knowledges pieces (e.g. constraints)
  • Dependency graph in iMAP is small

17
A Sample Dependency Graph
18
Sample Explanations to User Questions
19
Empirical Evaluation (1)
  • Domain and Data Sources
  • Real Estate
  • Inventory
  • Cricket
  • Financial Wizard
  • Data Processing
  • Experiments overlap and disjoint data

20
Empirical Evaluation (2)
21
Performance Result (1)
  • Overall and 1- 1 matching accuracy

22
Performance Result (2)
  • Complex matching accuracy Top-1

23
Performance Result (3)
  • Complex matching accuracy Top-3

24
Performance Discussion
  • Sometimes, iMAP adds noise to components ? reduce
    accuracy
  • For disjoint data, discovering meaningful numeric
    relationship is very difficult
  • Complex matches are not in top 1 but in top 3

25
Conclusion
  • First system that explore complex mapping
  • Employ searchers to help finding matches
  • Utilize domain knowledge to guide the search and
    the evaluation process
  • Offer a novel explanation facility

26
Thanks for your attention!Questions?
Write a Comment
User Comments (0)
About PowerShow.com