Generating DataExtraction Ontologies By Example - PowerPoint PPT Presentation

About This Presentation
Title:

Generating DataExtraction Ontologies By Example

Description:

... 50 Digital Zoom - 4.1 x Shooting Modes - Frame movie mode. Context Phrase 1: ... With a small initial data frame library and a small set of sample pages, OntobyE ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 31
Provided by: joez
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Generating DataExtraction Ontologies By Example


1
Generating Data-Extraction Ontologies By Example
  • Joe Zhou
  • Data Extraction Group
  • Brigham Young University

2
Background
  • World Wide Web contains a huge amount of useful
    information.
  • Web data-extraction is necessary for querying the
    data of interest.
  • Most of wrappers generate extraction patterns
    based on delimiters or HTML tags. So they are
    source-dependent.
  • BYU ontology-based technique is resilient.

3
Problem and Solution
  • BYU Onto approach requires that ontology experts
    generate data-extraction ontologies for the
    domains of interest to ordinary users
  • A principal effort of our research is to automate
    ontology- generation process as much as possible
  • We developed a system OntoByE (Ontology By
    Example) to generate data-extraction ontologies
    semi-automatically

4
Extraction Ontology
  • Object sets, Relationship sets and Constraints
  • Data frames for Lexical Object Sets

5
Extraction Ontology
Object sets, Relationship sets and Constraints
Data frame for Digital Zoom
6
OntoByE System Overview and Architecture
7
OntoByE User Interface
8
Form Editor Basic Form Elements
9
Form Editor Nesting Forms
10
Form Editor Creating Forms for Digital Camera
Application
11
Training Web Document Preparation
12
Ontology Generator Workflow
Data Frames
Object Sets, Relationship Sets and Constraints
Extraction Ontology
13
Ontology Generator Form Analyzer
Sample Form
Object and Realationship Sets and Constraints
  • BaseForm 01 A 1
  • BaseForm 03 B 1
  • BaseForm 0 C 1
  • BaseForm 03 D1 1 D2 1 D3 1
  • BaseForm 0 E1 1 E2 1 E3 1

14
Ontology Generator Form Analyzer
  • Object and Relationship Sets
  • and Constraints
  • Digital Camera application Forms

15
Ontology Generator Context Phrase Locator
16
Ontology Generator Data Frame Matcher
  • Data Frame Matching Heuristics
  • Number of matched data
  • Data Frame Ranking Heuristics
  • Number of matched data
  • Keywords and/or Contexts Matching
  • Order of Specialization/Generalization

17
Ontology Generator Keyword and Context
Recognizer
18
Ontology Generator Data Frame Editor
19
Extraction Ontology
20
Experimental Preparation
  • Selected two domains of interest
  • Digital Camera Application and Apartment Rental
    Application
  • Constructed an initial data frame library
  • Integer (any integer value), SmallPositiveInteger
    (from 1 to 99), SingleDigit (from 0 to 9),
    RealNumber (any real value), SmallPositiveReal
    (from 0.01 to 99.99), Date, Email, PhoneNumber,
    and Price
  • Created application-dependent forms for each
    application
  • Collected 5 sample pages from different web sites
    for each domain
  • Marked desired data on sample pages

21
Experimental Results Digital Camera Application
22
Experimental Results Digital Camera Application
23
Experimental Results Apartment Rental
Application
24
Experimental Results Apartment Rental
Application
25
Experimental Results Apartment Rental
Application
26
Experimental Observations Strengths of OntoByE
  • OntoByE provides a friendly and intuitive
    interface to help ordinary users describe data of
    interest without exposing them to abstract
    ontology concepts
  • With a small initial data frame library and a
    small set of sample pages, OntobyE works well to
    search for and suggest appropriate existing data
    frames for object sets with application-independen
    t values
  • OntoByE successfully recognizes possible keywords
    and contexts for user marked-data from sample
    pages and helps users to create new data frames
    with the keywords and contexts

27
Experimental Observations Limitations of OntoByE
  • The performance of searching for or constructing
    data frames by OntoByE is limited by the scope
    and the quality of prior knowledge
  • The accuracy and completeness of keyword and
    context expression construction are limited by
    the number and representativeness of user samples
  • Constructing value expressions for
    application-dependent data frames requires that
    users know how to write regular expressions.

28
Conclusion
  • We implemented a user-friendly interface for
    ordinary users to take advantage of our
    ontology-based web data-extraction approach.
  • We developed a framework for interacting with
    ordinary users to generate ontologies by example.
  • Our experiments demonstrate that OntoByE works
    well to generate ontologies with assistance of a
    limited prior knowledge. As time goes by, along
    with the expansion of prior knowledge, OntoByE
    will achieve better performance.

29
Future Work
  • Have OntoByE learn to build application-dependent
    lexicons for users applications
  • Improve the sub-components of the back-end
    ontology generator, e.g. Context Phrase Locator

30
  • The end
Write a Comment
User Comments (0)
About PowerShow.com