Title: Discovering Direct and Indirect Matches for Schema Elements
1Discovering Direct and Indirect Matches for
Schema Elements
- Li Xu
- Data Extraction Group
- Brigham Young University
- Sponsored by NSF
2Problem
Color
Year
Year
Feature
Make
Make Model
Body Type
Cost
Car
Model
Car
Car
Style
Phone
Cost
Source
3Applications
- Data Integration
- Schema Integration
- Message Mapping
- Data Translation
4Approach
- Direct Matches
- Indirect Matches
- Union
- Selection
- Composition
- Decomposition
5Union and Selection
Color
Year
Year
Feature
Make
Make Model
Body Type
Cost
Car
Model
Car
Car
Style
Phone
Cost
Source
6Composition and Decomposition
Color
Year
Year
Feature
Make
Make Model
Body Type
Cost
Car
Model
Car
Car
Style
Phone
Cost
Source
7Matching Techniques
- Terminological Relationships
- Value Characteristics
- Expected Data Values
- Structure
8Terminological Relationships
- WordNet
- Machine-Learned Rules
- Example (Make, Brand)
The number of different common hypernym roots of
A and B
The sum of the number of senses of A and B
Sum of distances of A and B to a common hypernym
9Value Characteristics
- Machine Learning
- Features LC94
- String length, numeric ratio, space ratio.
- Mean, variation, coefficient variation, standard
deviation
10Expected Values
- Application Concepts
- Data Recognizers
- CarMake
- ford
- honda
-
- CarModel
- accord
- mustang
- taurus
-
Make Model
Brand Model
Ford Mustang Ford Taurus Ford F150
Legend Mustang A4
Acura Audi BMW
CarMake . CarModel
CarMake
CarModel
Target
Source
11Structure
PO
PurchaseOrder
Items
POShipTo
POBillTo
POLines
DeliverTo
InvoiceTo
Count
Address
Item
ItemCount
City
Street
City
Street
Item
ItemNumber
City
Street
Line
Qty
UoM
Quantity
UnitOfMeasure
Target
Source
12Structure (Cont.)
PO
PurchaseOrder
Items
POShipTo
POBillTo
POLines
DeliverTo
InvoiceTo
DeliverTo
Count
Address
Item
Count
City
Street
City
Street
Item
ItemNumber
City
Street
Line
Qty
UoM
Quantity
UnitOfMeasure
Target
Source
13Structure (Cont.)
PO
PurchaseOrder
Items
POBillTo
POLines
InvoiceTo
POShipTo
DeliverTo
City
Count
City
Item
Count
Street
City
Street
City
Street
Item
Street
ItemNumber
Line
Qty
UoM
Quantity
UnitOfMeasure
Target
Source
14Structure (Cont.)
PO
PurchaseOrder
Items
POBillTo
POLines
InvoiceTo
POShipTo
DeliverTo
City
Count
City
Item
Count
Street
City
Street
City
Street
Item
Street
ItemNumber
ItemNumber
Line
Qty
UoM
Line
Qty
UoM
Line
Qty
Quantity
Quantity
Quantity
UnitOfMeasure
Target
Source
15Structure (Cont.)
PO
PurchaseOrder
Items
POBillTo
POLines
InvoiceTo
POShipTo
DeliverTo
City
City
Count
Count
City
City
Item
Count
Street
Street
Count
City
Street
City
Street
Item
City
Street
City
Street
Street
Street
ItemNumber
Line
Qty
UoM
Line
Qty
Quantity
Quantity
UnitOfMeasure
Target
Source
16Experiments
- Methodology
- Measures
- Precision
- Recall
- F Measure
17Results
Applications (Number of Schemes) Precision () Recall () F () Correct False Positive False Negative
Course Schedule (5) 98 93 96 119 2 9
Faculty Member (5) 100 100 100 140 0 0
Real Estate (5) 92 96 94 235 20 10
Indirect Matches 94 (precision, recall,
F-measure)
- Data borrowed from Univ. of Washington
18Ground-Truthing
Agents Or Firms
Cell phone
Agent name
Firm name
Office phone
Contact
Email
Firm location
Fax
19Limitation (Expected Data Value)
20Contributions
- Direct Matches
- Indirect Matches
- Expected values
- Structure
- High Precision and High Recall