Semantically Conceptualizing and Annotating Tables - PowerPoint PPT Presentation

About This Presentation
Title:

Semantically Conceptualizing and Annotating Tables

Description:

Semantically Conceptualizing and Annotating Tables Stephen Lynn & David W. Embley Data Extraction Research Group Department of Computer Science Brigham Young University – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 28
Provided by: sgl91
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Semantically Conceptualizing and Annotating Tables


1
Semantically Conceptualizing and Annotating Tables
  • Stephen Lynn David W. Embley
  • Data Extraction Research Group
  • Department of Computer Science
  • Brigham Young University

Supported by the
2
Overview
  • Context
  • WoK Web of Knowledge
  • TANGO Table ANalysis for Generating Ontologies
  • MOGO Mini-Ontology GeneratOr
  • Semantic Enrichment via MOGO
  • Implementation
  • Experimentation
  • Enhancements
  • Challenges Opportunities

3
WoK a Web of Knowledge
4
TANGO
fleck velter velter
fleck gonsity (ld/gg) hepth(gd)
burlam 1.2 120
falder 2.3 230
multon 2.5 400
TANGO repeatedly turns raw tables into conceptual
mini-ontologies and integrates them into a
growing ontology.
Growing Ontology
5
MOGO
fleck velter velter
fleck gonsity (ld/gg) hepth(gd)
burlam 1.2 120
falder 2.3 230
multon 2.5 400
TANGO repeatedly turns raw tables into conceptual
mini-ontologies and integrates them into a
growing ontology.
Growing Ontology
MOGO generates mini-ontologies from interpreted
tables.
6
MOGO Overview
  • Table
  • Interpretation
  • Yields a canonical table
  • Canonical Table
  • Concept/Value Recognition
  • Relationship Discovery
  • Constraint Discovery
  • Yields a semantically enriched conceptual model
  • Mini-ontology
  • Integration into a growing ontology

MOGO
7
Sample Input
Region and State Information Region and State Information Region and State Information Region and State Information
Location Population (2000) Latitude Longitude
Northeast 2,122,869
Delaware 817,376 45 -90
Maine 1,305,493 44 -93
Northwest 9,690,665
Oregon 3,559,547 45 -120
Washington 6,131,118 43 -120
Sample Output
8
Concept/Value Recognition
  • Lexical Clues
  • Labels as data values
  • Data value assignment
  • Data Frame Clues
  • Labels as data values
  • Data value assignment
  • Default
  • Recognize concepts and values by syntax and
    layout

9
Concept/Value Recognition
  • Lexical Clues
  • Labels as data values
  • Data value assignment
  • Data Frame Clues
  • Labels as data values
  • Data value assignment
  • Default
  • Recognize concepts and values by syntax and
    layout

Region
State
10
Concept/Value Recognition
  • Lexical Clues
  • Labels as data values
  • Data value assignment
  • Data Frame Clues
  • Labels as data values
  • Data value assignment
  • Default
  • Recognize concepts and values by syntax and
    layout

Region
State
11
Relationship Discovery
2000
  • Dimension Tree Mappings
  • Lexical Clues
  • Generalization/Specialization
  • Aggregation
  • Data Frames
  • Ontology Fragment Merge

12
Relationship Discovery
  • Dimension Tree Mappings
  • Lexical Clues
  • Generalization/Specialization
  • Aggregation
  • Data Frames
  • Ontology Fragment Merge

13
Constraint Discovery
  • Generalization/Specialization
  • Computed Values
  • Functional Relationships
  • Optional Participation

Region and State Information Region and State Information Region and State Information Region and State Information
Location Population (2000) Latitude Longitude
Northeast 2,122,869
Delaware 817,376 45 -90
Maine 1,305,493 44 -93
Northwest 9,690,665
Oregon 3,559,547 45 -120
Washington 6,131,118 43 -120
14
Validation
  • Concept/Value Recognition
  • Correctly identified concepts
  • Missed concepts
  • False positives
  • Data values assignment
  • Relationship Discovery
  • Valid relationship sets
  • Invalid relationship sets
  • Missed relationship sets
  • Constraint Discovery
  • Valid constraints
  • Invalid constraints
  • Missed constraints

Precision Recall F-measure
Concept Recognition 87 94 90
Relationship Discovery 73 81 77
Constraint Discovery 89 91 90
15
Concept Recognition
  • Counted
  • Correct/Incorrect/Missing Concepts
  • Correct/Incorrect/Missing Labels
  • Data value assignments

16
Relationship Discovery
  • Counted
  • Correct/incorrect/missing relationship sets
  • Correct/incorrect/missing aggregations and
    generalization/specializations

17
Constraint Discovery
  • Counted
  • Correct/Incorrect/Missing
  • Generalization/Specialization constraints
  • Computed value constraints
  • Functional constraints
  • Optional constraints

18
Concept Recognition
  • Successes
  • 98 of concepts identified
  • Missing label identification
  • 97 of values assigned to correct concept
  • Common problems
  • Finding an appropriate label
  • Duplicate concepts

19
Relationship Discovery
  • Recall of 92 for relationship sets
  • Missing aggregations and gen./spec.s (only found
    in label nesting)
  • Unnecessary rel. sets generated (are computable)

20
Constraint Discovery
  • F-measure of 98 for functional relationship sets
  • Computed value discovery
  • Funtional/non-functional ? lists in cells

21
MOGO Contributions
  • Tool to generate mini-ontologies
  • Accuracy encouraging

Precision Recall F-measure
Concept Recognition 87 94 90
Relationship Discovery 73 81 77
Constraint Discovery 89 91 90
22
Opportunities Challenges MOGO
  • Enhancements
  • Check for inter-label relationships
  • Check for more complex computations
  • Check for lists in cells
  • Wish List
  • Data-frame library
  • Atomic knowledge components
  • Instance recognizers
  • Library of molecular components
  • Semi-automatic construction of a WordNet-like
    resource for knowledge components

23
Summary
  • MOGO
  • Semantic Enrichment
  • Encouraging Results
  • But More Possible
  • Broader Implications Vision Challenges
  • TANGO
  • WoK
  • Web of Data
  • Semantic Annotation
  • User-friendly Query Answering

www.deg.byu.edu embley_at_cs.byu.edu
24
Opportunities Challenges TANGO
  • Table Interpretation
  • Transforming tables to F-logic Pivk07
  • Layout-independent table representation Jha08
  • Table interpretation by sibling tables Tao07
  • Semantic Enhancement / Ontology Generation
  • Naming unnamed table concepts Pivk07
  • MOGO Lynn09
  • Semi-automatic Ontology Integration
  • Ontology Matching Euzenat07
  • Ontology-mapping tools Falconer07
  • Direct and indirect schema mappings for TANGO
    Xu06

25
Opportunities Challenges WoK
  • Web of Data
  • The Semantic Web is a web of data. W3C
  • Upcoming special issue of Journal of Web
    Semantics
  • Enabling a Web of Knowledge Tao09
  • Information Extraction
  • Domain-independent IE from web tables
    Gatterbauer07
  • Open IE Banko07

26
Opportunities Challenges WoK
  • Semantic Annotation wrt Ontologies
  • Linking Data to Ontologies Poggi08
  • TISP Tao07
  • FOCIH Tao09
  • Reasoning Query Answering
  • Description Logics Baadar03
  • NLIDB Community
  • AskOntos Ding06
  • SerFR Al-Muhammed07

27
References
  • Al-Muhammed07 Al-Muhammed and Embley,
    Ontology-Based Constraint Recognition for
    Free-Form Service Requests, Proceedings of the
    23rd International Conference on Data
    Engineering, 2007.
  • Baader, Calvanese, McGuinness, Nardi and
    Patel-Schneider, The Description Logic Handbook,
    Cambridge University Press, 2003.
  • Banko07 Banko, Cafarella, Soderland, Broadhead
    and Etzioni, Open Information Extraction from
    the Web, Proceedings of the International Joint
    Conference on Artificial Intelligence, 2007.
  • Ding06 Ding, Embley and Liddle, Automatic
    Creation and Simplified Querying of Semantic Web
    Content An Approach Based on Information-Extracti
    on Ontologies, Proceedings of the First Asian
    Semantic Web Conference, 2006.
  • Euzenat07 Eusenat and Shvaiko, Ontology
    Matching, Springer Verlag, 2007.
  • Falconer07 Falconer, Noy and Storey, Ontology
    MappingA User Survey, Proceedings of the Second
    International Workshop on Ontology Mapping, 2007.
  • Gatterbauer07 Gatterbauer, Bohunsky, Herzog and
    Pollak, Towards Domain-Independent Information
    Extraction from Web Tables, Proceedings of the
    Sixteenth International World Wide Web
    Conference, 2007.
  • Jha07 Jha and Nagy, Wang Notation Tool Layout
    Independent Representation of Tables,
    Proceedings of the 19th International Conference
    on Pattern Recognition, 2007.
  • Pivk07 Pivk, Sure, Cimiano, Gams, Rajkovic and
    Studer, Transforming Arbitrary Tables into
    Logical Form with TARTAR, Data Knowledge
    Engineering, 2007.
  • Poggi08 Poggi, Lembo, Calvanese, DeGiacomo,
    Lenzerini and Rosati, Linking Data to
    Ontologies, Journal on Data Semantics, 2008.
  • Tao07 Tao and Embley, Automatic Hidden-Web
    Table Interpretation by Sibling page Comparison,
    Proceedings of the 26th International Conference
    on Conceptual Modeling, 2007.
  • Tao09 Tao, Embley and Liddle, Enabling a Web
    of Knowledge, Technical Report
    tango.byu.edu/papers, 2009.
  • Xu06 Xu and Embley, A Composite Approach to
    Automating Direct and Indirect Schema Mappings,
    Information Systems, 2006.
Write a Comment
User Comments (0)
About PowerShow.com