Title: Schema Mapping: Experiences and Lessons Learned
1Schema Mapping Experiences and Lessons Learned
- Yihong Ding
- Data Extraction Group
- Brigham Young University
- Sponsored by NSF
2Schema Mapping
- Semantic correspondence between two schemas
- Significance
- data integration
- data warehouses
- ontology merging
- message translation in e-commerce
- semantic query processing
- etc.
3Schema Representation
Phone_evening
MLS
MLS
Bedrooms
Basic_features
location
House
Agent
beds
SQFT
Name
location_ description
agent
Phone_day
Golf course
Water front
Location
Address
name
cell phone
Street
City
State
home phone
office phone
411 Mapping Cardinality
Phone_evening
MLS
MLS
Bedrooms
Basic_features
location
House
Agent
beds
SQFT
Name
location_ description
agent
Phone_day
Golf course
Water front
Location
Address
name
cell phone
Street
City
State
home phone
office phone
5n1 Mapping Cardinality
Phone_evening
MLS
MLS
Bedrooms
Basic_features
location
House
Agent
beds
SQFT
Name
location_ description
agent
Phone_day
Golf course
Water front
Location
Address
name
cell phone
Street
City
State
home phone
office phone
6nm Mapping Cardinality
Phone_evening
MLS
MLS
Bedrooms
Basic_features
location
House
Agent
beds
SQFT
Name
location_ description
agent
Phone_day
Golf course
Water front
Location
Address
name
cell phone
Street
City
State
home phone
office phone
7Object-Set Matcher (schema-level)
- Name-based matcher
- string and substring comparison
- linguistic methods stemming, stop words,
removing ignorable characters, etc. - thesaurus WordNet, etc.
- 11 mapping cardinality
Agent
agent
Name
name
8Object-Set Matcher (instance-level)
- Data Frame
- multiple regular expressions in Perl style
- as simple as a list of data values
- Data-frame matcher
- use compare recognized data values
- benefit able to recognize disjunctive data value
sets - bias data frame may not correspond 100 with the
semantics - limitation a needed data frame might not exist
- 11 mapping cardinality
Car Model Ford, Honda, Chevy, Toyota
9Extended Data-Frame Matcher (instance-level)
- n1 mapping cardinality
- Add a STRICT_SUBSTRING operation
- With the help of structural analysis
Schema 1
location
Schema 2
Address
Street
City
State
10Direct Structure Matcher
- Comparing structure similarity between two
candidate schemas - 11 mapping cardinality
Name
agent
Agent
Fax
Location
name
phone_day
fax
phone
Address
11Reference Structure Matcher
- If A and B match C, then A matches B.
- Able to solve nm mapping cardinality
- 11, n1, and nm mapping cardinalities
Phone
Day Phone
Cell Phone
Evening Phone
Office Phone
Home Phone
Schema 2
Schema 1
Home Phone
Evening Phone
Cell Phone
Day Phone
Office Phone
12Experiments
Application (Number of Schemes) Precision () Recall () F () Number Matches Number Correct Number Incorrect
Faculty Member (5) 100 100 100 540 540 0
Course Schedule (5) 99 93 96 490 454 6
Real Estate (5) 90 94 92 876 820 92
Indirect Matches (precision 87, recall 94,
F-measure 90)
Data borrowed from Univ. of Washington DDH,
SIGMOD01
- Rough Comparison with U of W Results
- Faculty Member Accuracy, 92
- Course Schedule Accuracy 71
- Real Estate (2 tests) Accuracy 75
13Lessons Learned
- n1 and nm matches occur frequently.
- 22 97/437 DMD03 (Course Catalog, Company
Profile) - 45 287/638 (Car Ads, Cell Phones, Real Estate)
- Reference structures provides a way to solve the
long-lasting hard cluster mapping (nm
cardinality) problem. - Data frames improve the instance-level matchers.
- The combination of schema-level and
instance-level matchers improve the results.