Title: XTree for Declarative XML Querying
1XTree for Declarative XML Querying
- Zhuo Chen, Tok Wang Ling,
- Mengchi Liu, and Gillian Dobbie
- January 2004
2Outlines
- Introduction
- Preliminaries
- XTree
- Algorithm to transform XTree query to XQuery
- Conclusion and future works
3Outlines
- Introduction
- Preliminaries
- XTree
- Algorithm to transform XTree query to XQuery
- Conclusion and future works
4Introduction
- How to query XML documents is an important issue
in XML research - Various query languages proposed
- XPath, XQuery, Lorel, XML-GL, XQL, XML-QL, XSLT,
YATL, XDuce, a rule-based semantic querying, a
declarative XML querying, etc - XQuery based on XPath is selected as the basis
for an official W3C query language for XML
5Introduction
- In this paper, we will
- Analyze the limitations of XPath
- Propose a new set of syntax rules called XTree,
which is a generalization of XPath - Show how XTree can efficiently replace the
notations of XPath - Give algorithms to convert queries based on XTree
expressions to standard XQuery queries
6Outlines
- Introduction
- Preliminaries
- Background on XPath
- Limitations of XPath
- XTree
- Algorithm to transform XTree query to XQuery
- Conclusion and future works
7Preliminaries
- XPath
- A W3C standard
- A set of syntax rules for defining parts of an
XML document - It uses paths to identify nodes (elements and
attributes) in XML documents - These path expressions look very much like
computer file system
8Background on XPath
- Sample XML document of a bibliography
ltbib nameITgt ltbook idb001 year1994gt
lttitlegtTCP/IP Illustratedlt/titlegt
ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
horgt ltpublishergtAddison-Wesleylt/publishergt
lt/bookgt ltbook id b002 year1992gt
lttitlegtAdvanced Programming in the Unix
Environmentlt/titlegt ltauthorgtltlastgtStevenslt
/lastgtltfirstgtW.lt/firstgtlt/authorgt
ltpublishergtAddison-Wesleylt/publishergt
lt/bookgt ltbook idb003 year2000gt
lttitlegtData on the Weblt/titlegt lteditiongt3lt/editio
ngt ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSe
rgelt/firstgtlt/authorgt ltauthorgtltlastgtBuneman
lt/lastgtltfirstgtPeterlt/firstgtlt/authorgt
ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
orgt ltpublishergtMorgan Kaufmannlt/publishergt
lt/bookgt ltjournal idj001 year1998gt
lttitlegtXMLlt/titlegt
lteditorgtltlastgtDatelt/lastgtltfirstgtC.lt/firstgtlt/editor
gt lteditorgtltlastgtGerbarglt/lastgtltfirstgtM.lt/f
irstgtlt/editorgt ltpublishergtMorgan
Kaufmannlt/publishergt lt/journalgt lt/bibgt
9Background on XPath
- XPath examples
- /bib/book/_at_year
- Get attribute year of each book
- /bib/book/author
- Get element author of each book
- //author
- Get all elements named author, regardless of
their absolute paths - /bib/book/
- Get all sub-elements of each book
- /bib/book/_at_
- Get all attributes of each book
- /bib/book2
- Get the second book element
- /bib/booklast()
- Get the last book element
10Background on XQuery
- XQuery
- An XML querying language to search XML documents
- Based on XPath
- FLWOR statements
- For Let Where Order by Return
- For clause iterate the variable over the result
of its expression - Let clause bind the variable to the result of its
expression - Complex queries (nested clauses)
- Complex result constructions
- User-defined functions
11Background on XQuery
- XQuery example
- List year an title of all books published after
1995
for book in /bib/book where book/_at_year gt
1995 return ltbookgt book/_at_year
book/title lt/bookgt
XQuery
ltbook year2000gt lttitlegtData on the
Weblt/titlegt lt/bookgt
Result
12Limitations of XPath
- XPath has some limitations
- 1. We can only assign one variable for each XPath
expression - It is just a linear path, which is not like the
XMLs tree structure - Inefficient
- If a query needs to get values from several
places, it has to use several paths - 2. It is difficult to reveal the relationship
among correlated XPaths - This may cause mistakes if a user does not pay
attention when writing a query - Eg, if we want to output title and author of each
book - XPath 1 /bib/book/title, XPath 2
/bib/book/author - Wrong! The above two paths are not correlated
13Limitations of XPath
- XPath has some limitations
- 3. XPath is inefficient to express query that
returns elements at path A while the condition is
in a distant path B - Difficult to distinguish condition branch from
target branch - Especially for multiple conditions and nested
conditions - Eg, find the value of publisher id of a book
which has an author with last name as Stevens
and first name as W. - /bib/book/authorlastStevens and
firstW./../publisher/_at_pubid - 4. XPath expressions are only used in the
querying part of XQuery, not in the result
construction part - In XQuery, the result construction part mixes
literal text, variable evaluation and even nested
sub-queries - The whole query is difficult to read and
comprehend
14Limitations of XPath
- XPath has some limitations
- 5. XPath can only bind variable on the whole node
(element or attribute) structure, which is a
name-value pair - If we want to get the substructure of the node,
we have to invoke built-in functions - local-name() to get node name
- string() to get string value
- Difficult to query XML documents with unknown
structure, or to rename the nodes in the result
for book in /bib/book let attrib
book/_at_ return ltbookgt
book/text(), book/ ltattribute name
local-name(attrib) value
string(attrib) /gt lt/bookgt
- Eg, Suppose we do not know the sub-structure
of book element, we want to re-structure books in
this way keep text nodes and sub-elements
unchanged, but convert attributes to sub-elements
15Outlines
- Introduction
- Preliminaries
- XTree
- Basic syntax
- XTree for querying
- XTree for result construction
- Algorithm to transform XTree query to XQuery
- Conclusion and future works
16XTree
- XTree is a generalization of XPath
- XTree has a tree structure like XML
- XTree is more efficient than XPath
- In the querying part, one XTree expression can
bind multiple variables - In XQuery, one XPath expression can only bind one
variable - In the result construction part, one XTree
expression can be used to define the result
format - Avoid nested structure in the query
- Make the whole query easier to read and
understand - Supports list-valued variables explicitly, and
determines their values uniquely
17XTree syntax
- Similar to that of XPath
- / means parent-child hierarchy
- // means no matter how many levels down
(ancestor-descent) - ( ) in front to indicate the URL of the document
- Sibling tree nodes are enclosed by , and
separated by commas - can be nested
- In XTree, conditions are written directly without
- Use logic variables as place holders to
bind/match the values at their places - ? to assign variables in the querying part
- ? to get values from variables in the result
construction part - Only interested sub-trees are written in XTree,
not the whole XML tree structure
18XTree for querying
- Symbol ? will assign values of nodes on the left
side to the variable on the right side
- Example. For the sample bibliography document,
suppose we want to get the year and title of each
book, and its authors last names and first names - We can use the variables y, t, first, last to
bind them respectively as in the following XTree
expression
- /bib/book/_at_year?y, title?t, author/last?last,
first?first - We can instantiate many variables in one XTree
expression - The above XTree expression corresponds to the
following 6 XPath expressions in XQuery
for book in /bib/book, y in book/_at_year,
t in book/title, author in book/author,
last in author/last, first in author/first
19XTree for querying
- XTree allows a user to use path abbreviation as
in XPath
- Example. Suppose we want to get the last name and
first name elements at whatever depth in the
document, we can write the following XTree
expression - /bib//last?last, first?first
- The square braces enclosing two elements last and
first specifies that these two elements are
sibling. - According to the XML document, the parent of
sibling elements last and first is
/bib/book/author or /bib/journal/editor
20XTree for querying
- XTree allows a user to bind variables on the
structure of XML document - A user can assign variable var on the left side
of ? symbol - Here var will bind to the name of the
corresponding node
- Example. Suppose we want to obtain some attribute
with value 2000 in some book element, and bind
variable b to that book - /bib/book?b/_at_attr2000
- According to the sample document, b will bind to
the third book, and attr will bind to the
attribute name year.
21XTree
- Two types of variables
- Single-valued variables
- X
- An element instance of the specified path
- List-valued variables
- X
- A list of all X instances
- Explicitly indicated by a pair of curly braces
- Note that both sibling nodes and list-valued
variables are enclosed by curly braces - Sibling nodes will have commas as separators in
the braces - List-valued variables does not have commas in the
braces
22List-valued variables
- Object-oriented functions of list-valued
variables
- Aggregate functions
- Suppose list-valued variable nums binds to a
list of numbers - nums.count() returns the number of items in
the list - nums.avg() returns the average value of items
in the list - nums.min() returns the minimum value in the
list - nums.max() returns the maximum value in the
list - nums.sum() returns the sum of values in the
list
23List-valued variables
- Object-oriented functions of list-valued
variables
- List operations
- Suppose list-valued variable names binds to
a list of name elements
- names.1-3, 6 returns a sublist of 1st
to 3rd items, and 6th item - names.last() returns the last item in
the list - names.sort() sorts the items in the
list in ascending order - names.sort_desc() sorts the items in the
list in descending order - names.distinct() eliminates duplicate
items in the list - names.random(3) picks out 3 items randomly
- name ? names check whether an item is
in the list - names ? names check whether the first
list is a sub-list of the - second list
24Semantics of list-valued variables
- Definition 1. The associated path of variable a
(or a) is the absolute path expression from
root to the nodes represented by a (or a). - /bib/book?b/title?t
- the associated path of t is /bib/book/title.
- Definition 2. Variable a is an ancestor
variable of b if a and b are defined in the
same XTree expression, and the associated path of
a is a prefix of the associated path of b. - /bib/book?b/title?t, author?a
- b is an ancestor variable of t and a, but t
is not an ancestor variable of a.
25Semantics of list-valued variables
- Definition 3. In an XTree expression, when a
variable is bound to a value in the query
evaluation, the variable is instantiated. - /bib/book/author?a/first?first, title?t
- In the evaluation, when we have reach
/bib/book/author, a is instantiated when reach
/bib/book/author/first, first is instantiated. - Definition 4. The value of list-valued variable
a is a list of all instances of a with all
its ancestor variables instantiated. - /bib/book/author?a a means all the
author elements of all the
books - /bib/book?b/author?a a means all the
authors of a - certain book b
value of a
value of a
26XTree for result construction
- XTree expression can also be used to define the
result format - Symbol ? will get values of variables from right
side and assign them to the expression on the
left side - The result construction part is just one XTree
expression - No nested structure as the return clause of
XQuery - Since XTree already has a tree structure
- Easy to read and understand
- Must be concrete
- No condition checking or uncertainty in the
structure - Unlike XTree expressions in the querying part
27XTree for result construction
- Example. We want to list the titles and
publishers of books which are published after
1993, suppose we have bound the variables by the
following XTree expression - /bib/book/_at_yeargt1993, title?t, publisher?p
- We can write the following XTree expression to
define the result format - /result/recentbook/title?t, publisher?p
- The result format is defined as under the root
result, each recentbook element will store the
title and publisher of that book - ltresultgt
- ltrecentbookgt
- lttitlegtTCP/IP Illustratedlt/titlegt
- ltpublishergtAddison-Wesleylt/publishergt
- lt/recentbookgt
- ltrecentbookgt
- lttitlegtData on the weblt/titlegt
- ltpublishergtMorgan Kaufmannlt/publisher
gt - lt/recentbookgt
- ltresultgt
28XTree for result construction
- Example. For each book, show the title, the
number of authors and the first author, suppose
the variable bindings are defined in the
following XTree expression - /bib/book/title?t, author?a
- We can write the following XTree expression to
return the result - /result/book/title?t, authNum?a.count(),
author?a1 - a.count() counts the number of items in the
a list - a1 returns the first item in the a list
- Output
ltresultgt ltbookgt lttitlegtTCP/IP
Illustratedlt/titlegt ltauthNumgt1lt/authNumgt
ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firs
tgtlt/authorgt lt/bookgt ltbookgt
lttitlegtAdvanced Programming in the Unix
Environmentlt/titlegt ltauthNumgt1lt/authNumgt
ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firs
tgtlt/authorgt lt/bookgt ltbookgt
lttitlegtData on the Weblt/titlegtgt
ltauthNumgt3lt/authNumgt ltauthorgtltlastgtAbitebo
ullt/lastgtltfirstgtSergelt/firstgtlt/authorgt
lt/bookgt lt/resultgt
29XTree for result construction
- The right side of ? symbol can be
- A pre-defined variable or invocation of functions
on variables - Literal text, indicating static content
- Omitted, indicating an empty value
- Example. Suppose we want to return a book whose
title is Computer Architecture, and which does
not have a specified author, we can write the
following XTree expression - /bib/book/title?Computer Architecture,
no-author - It will output the following XML segment
ltbibgt ltbookgt lttitlegtComputer
Architecturelt/titlegt ltno-author/gt
lt/bookgt lt/bibgt
30XTree for result construction
- Query based on XTree expressions has QWOC
(Query-Where-Order by-Construct) statements - Query clause contains one or more XTree
expressions for selection and variables binding - Where clause is optional, it defines constraints
- Order by clause is optional, it defines the
ordering - Construct clause contains one XTree expression to
define the output format
31Outlines
- Introduction
- Preliminaries
- XTree
- Algorithm to transform XTree query to XQuery
- An algorithm to transform an XTree expression in
the query part to a set of XPath expressions - An algorithm to transform an XTree expression in
the result construction part to some nested
XQuery expressions - Conclusion and future works
32Transformation algorithm for querying part
- Transform an XTree expression in the querying
part to a set of XPath expressions - Not as trivial as just extracting each path
associated with a variable to be an XTree
expression - Variables may correlate to each other by some
common ancestors - We have to use such common ancestors to constrain
the descendent variables - The common ancestors we want are just those
branching nodes (the nodes just before every pair
of square braces for branching) - Use stack to store such common ancestors for
later use
33Transformation algorithm for querying part
- Process the XTree expression from left to right,
for each common ancestor of variables (except the
root), assign a single-valued variable on it if
it is not originally bound to a variable - Translate each single-valued variable to be an
XPath expression in a for clause translate each
list-valued variable to be an XPath expression in
a let clause - Try to write the path expression of a variable to
be the relative path of its nearest ancestor
variable (make use of the stack) - If it has such ancestor variable, then write its
path expression to be the relative path from that
ancestor variable - If it does not have any ancestor variable, then
write its path expression to be the absolute path
from the root - The output paths will be in depth-first order of
the XTree
34Transformation algorithm for querying part
/bib/book/title?t, author?a,
journal/title?jt, editor/last?last,
first?first
/bib/book?b/title?t, author?a,
journal/title?jt, editor/last?last,
first?first
/bib/book?b/title?t, author?a,
journal/title?jt, editor/last?last,
first?first
/bib/book?b/title?t, author?a,
journal/title?jt, editor/last?last,
first?first
/bib/book?b/title?t, author?a,
journal?j/title?jt, editor/last?last,
first?first
/bib/book?b/title?t, author?a,
journal?j/title?jt, editor/last?last,
first?first
/bib/book?b/title?t, author?a,
journal?j/title?jt, editor?e/last?last,
first?first
/bib/book?b/title?t, author?a,
journal?j/title?jt, editor?e/last?last,
first?first
/bib/book?b/title?t, author?a,
journal?j/title?jt, editor?e/last?last,
first?first
for b in /bib/book
for t in b/title
let a b/author
for j in /bib/journal
for jt in j/title
for e in j/editor
for last in e/last
for first in e/first
35Transformation algorithm for result construction
part
- Transform an XTree expression in the result
construction part to some XQuery expressions - More complicated
- We will often encounter nested sub-queries in
XQuery - Consider the case that the node name to get the
variable value is different from the node name
where the variable was bound in the querying part - Process the XTree expression step by step
- Find the corresponding XPath expression of each
variable in the XPaths generated from last
algorithm - Translate each variable value substitution to
some XQuery statement - Use curly braces to form sub-query blocks
according to the structure of the XTree
expression in construct clause
36Transformation algorithm for result construction
part
query /bib/book/title?t, author?a,
journal/title?jt,
editor/last?last, first?first construct
/result/book/name?t, authors/_at_count?a.count
( ), au?a,
journal/title?jt, editor/first?first,
last?last
- Generated XPath expressions of the querying part
for b in /bib/book for t in b/title let a
b/author for j in /bib/journal for jt in
j/title for e in j/editor for last in
e/last for first in e/first
37Transformation algorithm for result construction
part
38Outlines
- Introduction
- Preliminaries
- XTree
- Algorithm to transform XTree query to XQuery
- Conclusion and future works
- Conclusion
- Future works
39Conclusion
- Discussed the limitations of XPath
- Proposed a new set of syntax rules called XTree
- XTree has a tree structure
- In the querying part, one XTree expression can
bind multiple variables - In the result construction part, one XTree
expression can define the result format - List-valued variables are explicitly indicated,
and their values are uniquely determined - XTree is more compact and convenient to use than
XPath - Designed algorithms to transform a query based on
XTree expressions to a standard XQuery query
40Future works
- Implement an XTree query parser
- Queries based on XTree expressions can be
executed directly - The query evaluation will be more efficient on
this approach, since we will have a global view
of the whole query tree - Extend the transformation algorithms to support
queries with join, negation, grouping and
recursion - Optimize the output XQuery queries of our
transformation algorithms according to the schema
of the XML document - Observe the progressive development of XPath to
continuously enhance our XTree
41References
- S.Abiteboul, D.Quass, J.McHugh, J.Widom, and J.L.
Wiener. The Lorel Query Language for
Semistructured Data. International Journal of
Digital Library 1(1)68-99, 1997. - S.Ceri, S.Comai, E.Damiani, P.Fraternali,
S.Paraboschi, and L.Tanca. XML-GL a Graphical
Language for Querying and Restructuring WWW data.
In Proceedings of the 8th International World
Wide Web Conference, Toronto, Canada, 1999. - S.Cluet and J.Simeon. YATL a Functional and
Declarative Language for XML. Draft manuscript,
March 2000. - H.Hosoya and B.Pierce. XDuce A Typed XML
Processing Language (Preliminary Report). In
Proceedings of WebDB Workshop, 2000. - M.Liu and T.W.Ling. Towards Declarative XML
Querying. In Proceedings of WISE 2002, 127-138,
Singapore, 2002. - P.Chippimolchai, V.Wuwongse and C.Anutariya.
Semantic Query Formulation and Evaluation for XML
Databases. In Proceedings of WISE 2002, 205-214,
Singapore, 2002. - D.Chamberlin, P. Fankhauser, M.Marchiori, and
J.Robie. XML Query Requirements. W3C Working
Draft, In http//www.w3.org/TR/xquery-requirements
/, June 2003. - J. Clark and S.DeRose. XML Path Language (XPath)
Version 1.0. W3C Recommendation, In
http//www.w3.org/TR/xpath, November 2001. - D.Chamberlin, D.Florescu, J.Robie, J.Simon, and
M.Stefanescu. XQuery 1.0 A Query Language for
XML. W3C Working Draft, In http//www.w3.org/TR/xq
uery/, May 2003. - J.Robie, J.Lapp, and D.Schach. XML Query Language
(XQL). In - http//www.w3.org/TandS/QL/QL98/pp/xql.html,
1998. - A. Deutsch, M.Fernandez, D.Florescu, A.Levy, and
D.Suciu. XML-QL A Query Language for XML. In
http//www.w3.org/TR/NOTE-xml-ql/, August 1998. - J.Clark. XSL Transformations (XSLT) Version 1.0.
W3C Recommendation, In http//www.w3.org/TR/xslt,
November 1999.
42Thank you
43ltbib nameITgt ltbook idb001 year1994gt
lttitlegtTCP/IP Illustratedlt/titlegt
ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
horgt ltpublishergtAddison-Wesleylt/publishergt
lt/bookgt ltbook id b002 year1992gt
lttitlegtAdvanced Programming in the Unix
Environmentlt/titlegt ltauthorgtltlastgtStevenslt
/lastgtltfirstgtW.lt/firstgtlt/authorgt
ltpublishergtAddison-Wesleylt/publishergt
lt/bookgt ltbook idb003 year2000gt
lttitlegtData on the Weblt/titlegt lteditiongt3lt/editio
ngt ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSe
rgelt/firstgtlt/authorgt ltauthorgtltlastgtBuneman
lt/lastgtltfirstgtPeterlt/firstgtlt/authorgt
ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
orgt ltpublishergtMorgan Kaufmannlt/publishergt
lt/bookgt ltjournal idj001 year1998gt
lttitlegtXMLlt/titlegt
lteditorgtltlastgtDatelt/lastgtltfirstgtC.lt/firstgtlt/editor
gt lteditorgtltlastgtGerbarglt/lastgtltfirstgtM.lt/f
irstgtlt/editorgt ltpublishergtMorgan
Kaufmannlt/publishergt lt/journalgt lt/bibgt
a
back
44ltbib nameITgt ltbook idb001 year1994gt
lttitlegtTCP/IP Illustratedlt/titlegt
ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
horgt ltpublishergtAddison-Wesleylt/publishergt
lt/bookgt ltbook id b002 year1992gt
lttitlegtAdvanced Programming in the Unix
Environmentlt/titlegt ltauthorgtltlastgtStevenslt
/lastgtltfirstgtW.lt/firstgtlt/authorgt
ltpublishergtAddison-Wesleylt/publishergt
lt/bookgt ltbook idb003 year2000gt
lttitlegtData on the Weblt/titlegt lteditiongt3lt/editio
ngt ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSe
rgelt/firstgtlt/authorgt ltauthorgtltlastgtBuneman
lt/lastgtltfirstgtPeterlt/firstgtlt/authorgt
ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
orgt ltpublishergtMorgan Kaufmannlt/publishergt
lt/bookgt ltjournal idj001 year1998gt
lttitlegtXMLlt/titlegt
lteditorgtltlastgtDatelt/lastgtltfirstgtC.lt/firstgtlt/editor
gt lteditorgtltlastgtGerbarglt/lastgtltfirstgtM.lt/f
irstgtlt/editorgt ltpublishergtMorgan
Kaufmannlt/publishergt lt/journalgt lt/bibgt
b
a
a
b
a
b
back