Schema Mapping: Experiences and Lessons Learned - PowerPoint PPT Presentation

About This Presentation
Title:

Schema Mapping: Experiences and Lessons Learned

Description:

House Agent Golf course Water front Phone_evening Name Address Street City State Basic_features beds SQFT MLS agent location ... PowerPoint Presentation Last ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 14
Provided by: degByuEdu
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Schema Mapping: Experiences and Lessons Learned


1
Schema Mapping Experiences and Lessons Learned
  • Yihong Ding
  • Data Extraction Group
  • Brigham Young University
  • Sponsored by NSF

2
Schema Mapping
  • Semantic correspondence between two schemas
  • Significance
  • data integration
  • data warehouses
  • ontology merging
  • message translation in e-commerce
  • semantic query processing
  • etc.

3
Schema Representation
Phone_evening
MLS
MLS
Bedrooms
Basic_features
location
House
Agent
beds
SQFT
Name
location_ description
agent
Phone_day
Golf course
Water front
Location
Address
name
cell phone
Street
City
State
home phone
office phone
4
11 Mapping Cardinality
Phone_evening
MLS
MLS
Bedrooms
Basic_features
location
House
Agent
beds
SQFT
Name
location_ description
agent
Phone_day
Golf course
Water front
Location
Address
name
cell phone
Street
City
State
home phone
office phone
5
n1 Mapping Cardinality
Phone_evening
MLS
MLS
Bedrooms
Basic_features
location
House
Agent
beds
SQFT
Name
location_ description
agent
Phone_day
Golf course
Water front
Location
Address
name
cell phone
Street
City
State
home phone
office phone
6
nm Mapping Cardinality
Phone_evening
MLS
MLS
Bedrooms
Basic_features
location
House
Agent
beds
SQFT
Name
location_ description
agent
Phone_day
Golf course
Water front
Location
Address
name
cell phone
Street
City
State
home phone
office phone
7
Object-Set Matcher (schema-level)
  • Name-based matcher
  • string and substring comparison
  • linguistic methods stemming, stop words,
    removing ignorable characters, etc.
  • thesaurus WordNet, etc.
  • 11 mapping cardinality

Agent
agent
Name
name
8
Object-Set Matcher (instance-level)
  • Data Frame
  • multiple regular expressions in Perl style
  • as simple as a list of data values
  • Data-frame matcher
  • use compare recognized data values
  • benefit able to recognize disjunctive data value
    sets
  • bias data frame may not correspond 100 with the
    semantics
  • limitation a needed data frame might not exist
  • 11 mapping cardinality

Car Model Ford, Honda, Chevy, Toyota
9
Extended Data-Frame Matcher (instance-level)
  • n1 mapping cardinality
  • Add a STRICT_SUBSTRING operation
  • With the help of structural analysis

Schema 1
location
Schema 2
Address
Street
City
State
10
Direct Structure Matcher
  • Comparing structure similarity between two
    candidate schemas
  • 11 mapping cardinality

Name
agent
Agent
Fax
Location
name
phone_day
fax
phone
Address
11
Reference Structure Matcher
  • If A and B match C, then A matches B.
  • Able to solve nm mapping cardinality
  • 11, n1, and nm mapping cardinalities

Phone
Day Phone
Cell Phone
Evening Phone
Office Phone
Home Phone
Schema 2
Schema 1
Home Phone
Evening Phone
Cell Phone
Day Phone
Office Phone
12
Experiments
Application (Number of Schemes) Precision () Recall () F () Number Matches Number Correct Number Incorrect
Faculty Member (5) 100 100 100 540 540 0
Course Schedule (5) 99 93 96 490 454 6
Real Estate (5) 90 94 92 876 820 92
Indirect Matches (precision 87, recall 94,
F-measure 90)
Data borrowed from Univ. of Washington DDH,
SIGMOD01
  • Rough Comparison with U of W Results
  • Faculty Member Accuracy, 92
  • Course Schedule Accuracy 71
  • Real Estate (2 tests) Accuracy 75

13
Lessons Learned
  • n1 and nm matches occur frequently.
  • 22 97/437 DMD03 (Course Catalog, Company
    Profile)
  • 45 287/638 (Car Ads, Cell Phones, Real Estate)
  • Reference structures provides a way to solve the
    long-lasting hard cluster mapping (nm
    cardinality) problem.
  • Data frames improve the instance-level matchers.
  • The combination of schema-level and
    instance-level matchers improve the results.
Write a Comment
User Comments (0)
About PowerShow.com