Symmetrically Exploiting XML - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Symmetrically Exploiting XML

Description:

Pullman, Washington, USA. The 15th International World Wide Web Conference. May 2006 ... part/project works on some, but not all. Path expressions are asymmetric ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 23
Provided by: www278
Category:

less

Transcript and Presenter's Notes

Title: Symmetrically Exploiting XML


1
Symmetrically Exploiting XML
  • Shuohao Zhang and Curtis Dyreson
  • School of E.E. and Computer Science
  • Washington State University
  • Pullman, Washington, USA
  • The 15th International World Wide Web Conference
  • May 2006
  • Edinburgh, Scotland

2
1970s Database Controversy
  • Hierarchical model vs. relational model
  • Codd symmetric exploitation of data
  • part/project works on some, but not all
  • Path expressions are asymmetric
  • Currently, all XML query languages use path
    expressions

Part
Project
Commit
Project
Part
Project
Part
3
Querying Data with Path Expressions
  • Task
  • Find books by E. F. Codd
  • XQuery
  • return doc("author.xml")//authorname 'E. F.
    Codd'/book

4
Same Data, Different Structure
author
book
book
name
book
book
publisher
title
title
author
author
price
price
publisher
E. F. Codd
DB
46.95
Automata
9.99
title
title
publisher
publisher
price
price
name
name
Addison Wesley
Academic Press
9.99
Automata
46.95
DB
Codd
E. F. Codd
Addison Wesley
Academic Press
  • Same task
  • Find books by E. F. Codd
  • Need different XQuery
  • return doc("book.xml")//bookauthor/name'E. F.
    Codd'

5
Goal
  • Make same query work on different structures
  • Useful when there is
  • lack of schema knowledge
  • heterogeneous data
  • irregular data
  • schema evolution
  • Factor off problem of different label sets,
    others are working on it

6
Existing Axes are Directional
ancestor
self
preceding
following
descendent
7
Proposal A Non-directional Axis
ancestor
self
preceding
following
descendent
8
Proposal A Non-directional Axis
ancestor
self
preceding
following
descendent
9
Proposal A Non-directional Axis
ancestor
self
preceding
following
descendent
10
The Closest Axis
  • Syntax
  • closest
  • -gtname is abbreviation for closestname
  • Semantics
  • a function that takes a context node and returns
    a sequence of closest nodes

11
Closest Axis of the First Title
  • closest
  • Returns a list of five nodes
  • closestprice
  • Returns the first price node

author
name
book
book
title
title
publisher
publisher
price
price
12
When the First Book Lacks a Price
  • Node selection restricted by minimal type
    distance
  • The minimal distance between a title and a price
    is 2
  • closestprice
  • Returns an empty list

author
name
book
book
title
title
publisher
publisher
price
13
Type Distance is Crucial
  • closestname for each book?
  • Root-to-node path type
  • author/name
  • author/book/publisher/name

author
name
book
book
title
title
publisher
publisher
price
name
14
Querying with the Closest Axes
  • Same query --
  • return doc("any.xml")-gtauthor-gtname'E. F.
    Codd'-gtbook

Closest axis-enabled XQuery evaluation engine
Result3
Query
15
Querying with Directional Axes
Query1 -- return doc("author.xml")//authorname
'E. F. Codd'/book
Result1
XQuery evaluation engine
Query2 --
Result2
Result3
Query3 -- return doc("book.xml")//bookauthor/nam
e'E. F. Codd'
16
In-memory Implementation
  • Naïve approach
  • Compute Closest for every node
  • Time complexity is O(sn2)
  • s number of labels in the signature
  • n number of nodes
  • Converting to a path expression
  • Find the closest price for title
  • Non-directional expression closestprice
  • Directional (path) expression parent/childp
    rice

author
book
name
title
publisher
price
17
Experiment
  • Compare directional vs. nondirectional
  • for b in doc("bib.xml")//title/closestpublishe
    r
  • return b
  • for b in doc("bib.xml")//title/..//publisher
  • return b
  • Implemented closest in
  • eXist (an XML DBMS)

18
Persistent Implementation
  • Take advantage of type indexes
  • LCA-join
  • Every Closest pair related via an LCA
  • Idea is to merge lists of types
  • O(sn)

19
Related Work
  • Data integration
  • TSIMMIS
  • Garcia-Molina et al. (Journal of Intelligent
    Information Systems 1997)
  • YAT
  • Christophides, Cluet, Simèon (SIGMOD Record June
    2000)
  • Silkroute
  • Fernandez, Tan, Suciu (WWW 2000)
  • LCA-related techniques
  • Schmidt, Kersten, Windhouwer (ICDE 2001)
  • Cohen, Mamou, Kanza, Sagiv (VLDB 2003)
  • Li, Yu, Jagadish (VLDB 2004)

20
Related Research Projects
  • XML Restructuring
  • Zhang, Dyreson (IIWeb 2006)
  • XML Compaction
  • Zhang, Dyreson, Dang (DASFAA 2006)
  • Common theme symmetric exploitation!

21
Conclusion
  • Current XQuery depends on path expressions
  • A path expression is directional (asymmetric)
  • May break down if structure changes
  • The closest axis is non-directional (symmetric)
  • Simple in syntax
  • Can be easily integrated in XQuery
  • Can be implemented efficiently
  • In-memory
  • Persistent

22
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com