WebOQL: Restructuring Documents, Databases and Webs - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

WebOQL: Restructuring Documents, Databases and Webs

Description:

WebOQL: Restructuring Documents, Databases and Webs. Gustavo O. Aracena. Alberto O. Mendelzon ... Restructuring Documents. Navigational patterns ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 11
Provided by: ccGa
Category:

less

Transcript and Presenter's Notes

Title: WebOQL: Restructuring Documents, Databases and Webs


1
WebOQL Restructuring Documents, Databases and
Webs
  • Gustavo O. Aracena
  • Alberto O. Mendelzon

2
Background
  • Background
  • widespread use of Web
  • new data management problems
  • What is WebOQL?
  • combination of architecture, data model and query
    language on which we would be able to extract
    information from online structured documents
    effectively withouth custom-tailored programs

3
Data Model
  • Tree-based data model
  • hypertree ordered arc-labeled trees
  • arcs internal and external
  • internal arcs are used to represent structured
    objects
  • external arcs are used to represent references
    among object
  • external arcs cannot have descendants
  • external arcs must have a field named Url
  • Hypertrees are useful because they subsume three
    abstractions collections, nesting, and ordering

4
Data Model
  • Web
  • a pair(t, F) consisting hypertree t and a
    function F that maps URLs to hypertrees
  • t is the schema
  • F is the browsing function of the web
  • Tails
  • Tails of trees of t obtained by chopping off
    prefixes of t
  • Simple trees
  • Simple trees of t are the trees composed of one
    arc followed by a tree that stems from ts root
  • Subtrees
  • subtrees of tree t are the trees at the end of
    the arcs that stem from ts root

5
Data Model
  • Operators
  • Prime operator () returns the first subtree of
    its argument
  • Peek operator (.) extracts a field from the
    record that labels the first outgoing arcs of its
    argument
  • Hang operator () builds an arc labeled with a
    record formed with the arguments
  • Tilda () is used for string matching pattern
  • Concatenate () to juxtapose two trees
  • Head ( n) returns the first n simple trees of a
    tree (n gt0)
  • Tail (!) returns all but the first simple tree of
    a tree

6
Operators
7
Query
  • Input of query in WebOQL URL deferencing
  • URL deferencing means replacing URL with the
    result of applyting the browsing function of the
    current web to it
  • Web restructuring query
  • a function that maps a web into another
  • example
  • this select y.Title, y.Url as schema
  • from x in csPapers, y in x
  • where y.Authors Smith
  • generalization
  • select q1 as s1, q2 as s2, , qm as sm
  • qi are queries
  • si is either a string query or the keyword schema

8
Restructuring Web
9
Structured Documents
  • Structured documents modeled as abstract syntax
    trees (AST)
  • Each arc in AST corresponds either to a
    subdocument enclosed in an occurrence of a paired
    tag, to a nonpaired tag, or to a piece on an
    untagged text

10
Restructuring Documents
  • Navigational patterns
  • regular expressions over an alphabet of record
    predicates
  • Tail variables
  • iterate over tails, instead of simple trees
  • indicated by capital letter
  • example
  • Tag OL /
  • select Tag LI / X3
  • from X in browse(Card Punching.html)!
  • where X.Tag H3
Write a Comment
User Comments (0)
About PowerShow.com