Mini-Ontology Generation from Canonicalized Tables - PowerPoint PPT Presentation

About This Presentation
Title:

Mini-Ontology Generation from Canonicalized Tables

Description:

Latitude. Longitude. 2,122,869. 817,376. 1,305,493. 9,690,665. 3,559,547. 6,131,118. 45. 44 ... Finding an appropriate label. Duplicate concepts. Relationship ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 17
Provided by: sgl91
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Mini-Ontology Generation from Canonicalized Tables


1
Mini-Ontology Generation from Canonicalized Tables
  • Stephen Lynn
  • Data Extraction Research Group
  • Department of Computer Science
  • Brigham Young University

Supported by the
2
TANGO Overview
TANGO Table ANalysis for Generating
Ontologies Project consists of the following
three components
  1. Transform tables into a canonicalized form
  2. Generate mini-ontologies
  3. Merge into a growing ontology

3
Sample Input
Region and State Information Region and State Information Region and State Information Region and State Information
Location Population (2000) Latitude Longitude
Northeast 2,122,869
Delaware 817,376 45 -90
Maine 1,305,493 44 -93
Northwest 9,690,665
Oregon 3,559,547 45 -120
Washington 6,131,118 43 -120
Sample Output
4
Mini-Ontology GeneratOr (MOGO)
  • Concept/Value Recognition
  • Relationship Discovery
  • Constraint Discovery

5
Concept/Value Recognition
  • Lexical Clues
  • Labels as data values
  • Data value assignment
  • Data Frame Clues
  • Labels as data values
  • Data value assignment
  • Default
  • Classifies any unclassified elements according to
    simple heuristic.

6
Relationship Discovery
  • Dimension Tree Mappings
  • Lexical Clues
  • Generalization/Specialization
  • Aggregation
  • Data Frames
  • Ontology Fragment Merge

7
Constraint Discovery
  • Generalization/Specialization
  • Computed Values
  • Functional Relationships
  • Optional Participation

Region and State Information Region and State Information Region and State Information Region and State Information
Location Population (2000) Latitude Longitude
Northeast 2,122,869
Delaware 817,376 45 -90
Maine 1,305,493 44 -93
Northwest 9,690,665
Oregon 3,559,547 45 -120
Washington 6,131,118 43 -120
8
Validation
  • Concept/Value Recognition
  • Correctly identified concepts
  • Missed concepts
  • False positives
  • Data values assignment
  • Relationship Discovery
  • Valid relationship sets
  • Invalid relationship sets
  • Missed relationship sets
  • Constraint Discovery
  • Valid constraints
  • Invalid constraints
  • Missed constraints

Precision Recall F-measure
Concept Recognition 87 94 90
Relationship Discovery 73 81 77
Constraint Discovery 89 91 90
9
Concept Recognition
  • What we counted
  • Correct/Incorrect/Missing Concepts
  • Correct/Incorrect/Missing Labels
  • Data value assignments

10
Relationship Discovery
  • What we counted
  • Correct/incorrect/missing relationship sets
  • Correct/incorrect/missing aggregations and
    generalization/specializations

11
Constraint Discovery
  • What we counted
  • Correct/Incorrect/Missing
  • Generalization/Specialization constraints
  • Computed value constraints
  • Functional constraints
  • Optional constraints

12
Concept Recognition
  • Successes
  • 98 of concepts identified
  • Missing label identification
  • 97 of values assigned to correct concept
  • Common problems
  • Finding an appropriate label
  • Duplicate concepts

13
Relationship Discovery
  • Recall of 92 for relationship sets
  • Missing aggregations and generalizations/specializ
    ations
  • Only found in label nesting

14
Constraint Discovery
  • F-measure of 98 for functional relationship sets
  • Poor computed value discovery
  • Rows/Columns with totals

15
Conclusions
  • Tool to generate mini-ontologies
  • Assessment of accuracy of automatic generation

Precision Recall F-measure
Concept Recognition 87 94 90
Relationship Discovery 73 81 77
Constraint Discovery 89 91 90
16
Future Work
  • Tool Enhancements
  • Linguistic processing
  • Data frame library
  • Domain specific heuristics
  • Alternate Uses
  • Annotation for the Semantic Web
Write a Comment
User Comments (0)
About PowerShow.com