Title: Mini-Ontology Generation from Canonicalized Tables
1Mini-Ontology Generation from Canonicalized Tables
- Stephen Lynn
- Data Extraction Research Group
- Department of Computer Science
- Brigham Young University
Supported by the
2TANGO Overview
TANGO Table ANalysis for Generating
Ontologies Project consists of the following
three components
- Transform tables into a canonicalized form
- Generate mini-ontologies
- Merge into a growing ontology
3Sample Input
Region and State Information Region and State Information Region and State Information Region and State Information
Location Population (2000) Latitude Longitude
Northeast 2,122,869
Delaware 817,376 45 -90
Maine 1,305,493 44 -93
Northwest 9,690,665
Oregon 3,559,547 45 -120
Washington 6,131,118 43 -120
Sample Output
4Mini-Ontology GeneratOr (MOGO)
- Concept/Value Recognition
- Relationship Discovery
- Constraint Discovery
5Concept/Value Recognition
- Lexical Clues
- Labels as data values
- Data value assignment
- Data Frame Clues
- Labels as data values
- Data value assignment
- Default
- Classifies any unclassified elements according to
simple heuristic.
6Relationship Discovery
- Dimension Tree Mappings
- Lexical Clues
- Generalization/Specialization
- Aggregation
- Data Frames
- Ontology Fragment Merge
7Constraint Discovery
- Generalization/Specialization
- Computed Values
- Functional Relationships
- Optional Participation
Region and State Information Region and State Information Region and State Information Region and State Information
Location Population (2000) Latitude Longitude
Northeast 2,122,869
Delaware 817,376 45 -90
Maine 1,305,493 44 -93
Northwest 9,690,665
Oregon 3,559,547 45 -120
Washington 6,131,118 43 -120
8Validation
- Concept/Value Recognition
- Correctly identified concepts
- Missed concepts
- False positives
- Data values assignment
- Relationship Discovery
- Valid relationship sets
- Invalid relationship sets
- Missed relationship sets
- Constraint Discovery
- Valid constraints
- Invalid constraints
- Missed constraints
Precision Recall F-measure
Concept Recognition 87 94 90
Relationship Discovery 73 81 77
Constraint Discovery 89 91 90
9Concept Recognition
- What we counted
- Correct/Incorrect/Missing Concepts
- Correct/Incorrect/Missing Labels
- Data value assignments
10Relationship Discovery
- What we counted
- Correct/incorrect/missing relationship sets
- Correct/incorrect/missing aggregations and
generalization/specializations
11Constraint Discovery
- What we counted
- Correct/Incorrect/Missing
- Generalization/Specialization constraints
- Computed value constraints
- Functional constraints
- Optional constraints
12Concept Recognition
- Successes
- 98 of concepts identified
- Missing label identification
- 97 of values assigned to correct concept
- Common problems
- Finding an appropriate label
- Duplicate concepts
13Relationship Discovery
- Recall of 92 for relationship sets
- Missing aggregations and generalizations/specializ
ations - Only found in label nesting
14Constraint Discovery
- F-measure of 98 for functional relationship sets
- Poor computed value discovery
- Rows/Columns with totals
15Conclusions
- Tool to generate mini-ontologies
- Assessment of accuracy of automatic generation
Precision Recall F-measure
Concept Recognition 87 94 90
Relationship Discovery 73 81 77
Constraint Discovery 89 91 90
16Future Work
- Tool Enhancements
- Linguistic processing
- Data frame library
- Domain specific heuristics
- Alternate Uses
- Annotation for the Semantic Web