Title: Semantically Conceptualizing and Annotating Tables
1Semantically Conceptualizing and Annotating Tables
- Stephen Lynn David W. Embley
- Data Extraction Research Group
- Department of Computer Science
- Brigham Young University
Supported by the
2Overview
- Context
- WoK Web of Knowledge
- TANGO Table ANalysis for Generating Ontologies
- MOGO Mini-Ontology GeneratOr
- Semantic Enrichment via MOGO
- Implementation
- Experimentation
- Enhancements
- Challenges Opportunities
3WoK a Web of Knowledge
4TANGO
TANGO repeatedly turns raw tables into conceptual
mini-ontologies and integrates them into a
growing ontology.
Growing Ontology
5MOGO
TANGO repeatedly turns raw tables into conceptual
mini-ontologies and integrates them into a
growing ontology.
Growing Ontology
MOGO generates mini-ontologies from interpreted
tables.
6MOGO Overview
- Table
- Interpretation
- Yields a canonical table
- Canonical Table
- Concept/Value Recognition
- Relationship Discovery
- Constraint Discovery
- Yields a semantically enriched conceptual model
- Mini-ontology
- Integration into a growing ontology
MOGO
7Sample Input
Sample Output
8Concept/Value Recognition
- Lexical Clues
- Labels as data values
- Data value assignment
- Data Frame Clues
- Labels as data values
- Data value assignment
- Default
- Recognize concepts and values by syntax and
layout
9Concept/Value Recognition
- Lexical Clues
- Labels as data values
- Data value assignment
- Data Frame Clues
- Labels as data values
- Data value assignment
- Default
- Recognize concepts and values by syntax and
layout
Region
State
10Concept/Value Recognition
- Lexical Clues
- Labels as data values
- Data value assignment
- Data Frame Clues
- Labels as data values
- Data value assignment
- Default
- Recognize concepts and values by syntax and
layout
Region
State
11Relationship Discovery
2000
- Dimension Tree Mappings
- Lexical Clues
- Generalization/Specialization
- Aggregation
- Data Frames
- Ontology Fragment Merge
12Relationship Discovery
- Dimension Tree Mappings
- Lexical Clues
- Generalization/Specialization
- Aggregation
- Data Frames
- Ontology Fragment Merge
13Constraint Discovery
- Generalization/Specialization
- Computed Values
- Functional Relationships
- Optional Participation
14Validation
- Concept/Value Recognition
- Correctly identified concepts
- Missed concepts
- False positives
- Data values assignment
- Relationship Discovery
- Valid relationship sets
- Invalid relationship sets
- Missed relationship sets
- Constraint Discovery
- Valid constraints
- Invalid constraints
- Missed constraints
15Concept Recognition
- Counted
- Correct/Incorrect/Missing Concepts
- Correct/Incorrect/Missing Labels
- Data value assignments
16Relationship Discovery
- Counted
- Correct/incorrect/missing relationship sets
- Correct/incorrect/missing aggregations and
generalization/specializations
17Constraint Discovery
- Counted
- Correct/Incorrect/Missing
- Generalization/Specialization constraints
- Computed value constraints
- Functional constraints
- Optional constraints
18Concept Recognition
- Successes
- 98 of concepts identified
- Missing label identification
- 97 of values assigned to correct concept
- Common problems
- Finding an appropriate label
- Duplicate concepts
19Relationship Discovery
- Recall of 92 for relationship sets
- Missing aggregations and gen./spec.s (only found
in label nesting) - Unnecessary rel. sets generated (are computable)
20Constraint Discovery
- F-measure of 98 for functional relationship sets
- Computed value discovery
- Funtional/non-functional ? lists in cells
21MOGO Contributions
- Tool to generate mini-ontologies
- Accuracy encouraging
22Opportunities Challenges MOGO
- Enhancements
- Check for inter-label relationships
- Check for more complex computations
- Check for lists in cells
-
- Wish List
- Data-frame library
- Atomic knowledge components
- Instance recognizers
- Library of molecular components
- Semi-automatic construction of a WordNet-like
resource for knowledge components
23Summary
- MOGO
- Semantic Enrichment
- Encouraging Results
- But More Possible
- Broader Implications Vision Challenges
- TANGO
- WoK
- Web of Data
- Semantic Annotation
- User-friendly Query Answering
www.deg.byu.edu embley_at_cs.byu.edu
24Opportunities Challenges TANGO
- Table Interpretation
- Transforming tables to F-logic Pivk07
- Layout-independent table representation Jha08
- Table interpretation by sibling tables Tao07
- Semantic Enhancement / Ontology Generation
- Naming unnamed table concepts Pivk07
- MOGO Lynn09
- Semi-automatic Ontology Integration
- Ontology Matching Euzenat07
- Ontology-mapping tools Falconer07
- Direct and indirect schema mappings for TANGO
Xu06
25Opportunities Challenges WoK
- Web of Data
- The Semantic Web is a web of data. W3C
- Upcoming special issue of Journal of Web
Semantics - Enabling a Web of Knowledge Tao09
- Information Extraction
- Domain-independent IE from web tables
Gatterbauer07 - Open IE Banko07
-
26Opportunities Challenges WoK
-
- Semantic Annotation wrt Ontologies
- Linking Data to Ontologies Poggi08
- TISP Tao07
- FOCIH Tao09
- Reasoning Query Answering
- Description Logics Baadar03
- NLIDB Community
- AskOntos Ding06
- SerFR Al-Muhammed07
27References
- Al-Muhammed07 Al-Muhammed and Embley,
Ontology-Based Constraint Recognition for
Free-Form Service Requests, Proceedings of the
23rd International Conference on Data
Engineering, 2007. - Baader, Calvanese, McGuinness, Nardi and
Patel-Schneider, The Description Logic Handbook,
Cambridge University Press, 2003. - Banko07 Banko, Cafarella, Soderland, Broadhead
and Etzioni, Open Information Extraction from
the Web, Proceedings of the International Joint
Conference on Artificial Intelligence, 2007. - Ding06 Ding, Embley and Liddle, Automatic
Creation and Simplified Querying of Semantic Web
Content An Approach Based on Information-Extracti
on Ontologies, Proceedings of the First Asian
Semantic Web Conference, 2006. - Euzenat07 Eusenat and Shvaiko, Ontology
Matching, Springer Verlag, 2007. - Falconer07 Falconer, Noy and Storey, Ontology
MappingA User Survey, Proceedings of the Second
International Workshop on Ontology Mapping, 2007. - Gatterbauer07 Gatterbauer, Bohunsky, Herzog and
Pollak, Towards Domain-Independent Information
Extraction from Web Tables, Proceedings of the
Sixteenth International World Wide Web
Conference, 2007. - Jha07 Jha and Nagy, Wang Notation Tool Layout
Independent Representation of Tables,
Proceedings of the 19th International Conference
on Pattern Recognition, 2007. - Pivk07 Pivk, Sure, Cimiano, Gams, Rajkovic and
Studer, Transforming Arbitrary Tables into
Logical Form with TARTAR, Data Knowledge
Engineering, 2007. - Poggi08 Poggi, Lembo, Calvanese, DeGiacomo,
Lenzerini and Rosati, Linking Data to
Ontologies, Journal on Data Semantics, 2008. - Tao07 Tao and Embley, Automatic Hidden-Web
Table Interpretation by Sibling page Comparison,
Proceedings of the 26th International Conference
on Conceptual Modeling, 2007. - Tao09 Tao, Embley and Liddle, Enabling a Web
of Knowledge, Technical Report
tango.byu.edu/papers, 2009. - Xu06 Xu and Embley, A Composite Approach to
Automating Direct and Indirect Schema Mappings,
Information Systems, 2006.