Discovering Direct and Indirect Matches for Schema Elements - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Discovering Direct and Indirect Matches for Schema Elements

Description:

Style. Year. Feature. Cost. Car. Phone. Target. Car. Miles. Mileage. Model. Make. Make. Model. Color ... Example: (Make, Brand) The number of different common ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 21
Provided by: Urs65
Category:

less

Transcript and Presenter's Notes

Title: Discovering Direct and Indirect Matches for Schema Elements


1
Discovering Direct and Indirect Matches for
Schema Elements
  • Li Xu
  • Data Extraction Group
  • Brigham Young University
  • Sponsored by NSF

2
Problem
Color
Year
Year
Feature
Make
Make Model
Body Type
Cost
Car
Model
Car
Car
Style
Phone
Cost
Source
3
Applications
  • Data Integration
  • Schema Integration
  • Message Mapping
  • Data Translation

4
Approach
  • Direct Matches
  • Indirect Matches
  • Union
  • Selection
  • Composition
  • Decomposition

5
Union and Selection
Color
Year
Year
Feature
Make
Make Model
Body Type
Cost
Car
Model
Car
Car
Style
Phone
Cost
Source
6
Composition and Decomposition
Color
Year
Year
Feature
Make
Make Model
Body Type
Cost
Car
Model
Car
Car
Style
Phone
Cost
Source
7
Matching Techniques
  • Terminological Relationships
  • Value Characteristics
  • Expected Data Values
  • Structure

8
Terminological Relationships
  • WordNet
  • Machine-Learned Rules
  • Example (Make, Brand)

The number of different common hypernym roots of
A and B
The sum of the number of senses of A and B
Sum of distances of A and B to a common hypernym
9
Value Characteristics
  • Machine Learning
  • Features LC94
  • String length, numeric ratio, space ratio.
  • Mean, variation, coefficient variation, standard
    deviation

10
Expected Values
  • Application Concepts
  • Data Recognizers
  • CarMake
  • ford
  • honda
  • CarModel
  • accord
  • mustang
  • taurus

Make Model
Brand Model
Ford Mustang Ford Taurus Ford F150
Legend Mustang A4
Acura Audi BMW
CarMake . CarModel
CarMake
CarModel
Target
Source
11
Structure
PO
PurchaseOrder
Items
POShipTo
POBillTo
POLines
DeliverTo
InvoiceTo
Count
Address
Item
ItemCount
City
Street
City
Street
Item
ItemNumber
City
Street
Line
Qty
UoM
Quantity
UnitOfMeasure
Target
Source
12
Structure (Cont.)
PO
PurchaseOrder
Items
POShipTo
POBillTo
POLines
DeliverTo
InvoiceTo
DeliverTo
Count
Address
Item
Count
City
Street
City
Street
Item
ItemNumber
City
Street
Line
Qty
UoM
Quantity
UnitOfMeasure
Target
Source
13
Structure (Cont.)
PO
PurchaseOrder
Items
POBillTo
POLines
InvoiceTo
POShipTo
DeliverTo
City
Count
City
Item
Count
Street
City
Street
City
Street
Item
Street
ItemNumber
Line
Qty
UoM
Quantity
UnitOfMeasure
Target
Source
14
Structure (Cont.)
PO
PurchaseOrder
Items
POBillTo
POLines
InvoiceTo
POShipTo
DeliverTo
City
Count
City
Item
Count
Street
City
Street
City
Street
Item
Street
ItemNumber
ItemNumber
Line
Qty
UoM
Line
Qty
UoM
Line
Qty
Quantity
Quantity
Quantity
UnitOfMeasure
Target
Source
15
Structure (Cont.)
PO
PurchaseOrder
Items
POBillTo
POLines
InvoiceTo
POShipTo
DeliverTo
City
City
Count
Count
City
City
Item
Count
Street
Street
Count
City
Street
City
Street
Item
City
Street
City
Street
Street
Street
ItemNumber
Line
Qty
UoM
Line
Qty
Quantity
Quantity
UnitOfMeasure
Target
Source
16
Experiments
  • Methodology
  • Measures
  • Precision
  • Recall
  • F Measure

17
Results
Applications (Number of Schemes) Precision () Recall () F () Correct False Positive False Negative
Course Schedule (5) 98 93 96 119 2 9
Faculty Member (5) 100 100 100 140 0 0
Real Estate (5) 92 96 94 235 20 10
Indirect Matches 94 (precision, recall,
F-measure)
  • Data borrowed from Univ. of Washington

18
Ground-Truthing
Agents Or Firms
Cell phone
Agent name
Firm name
Office phone
Contact
Email
Firm location
Fax
19
Limitation (Expected Data Value)
20
Contributions
  • Direct Matches
  • Indirect Matches
  • Expected values
  • Structure
  • High Precision and High Recall
Write a Comment
User Comments (0)
About PowerShow.com