Title: Query Languages
1Query Languages
- Why a query language? Extracting, Restructuring,
Integration, Browsing - XML-QL
- http//www.w3.org/TR/NOTE-xml-ql
- http//db.cis.upenn.edu/XML-QL/
- XPATH (part of a query language)
- httpwww.w3.org/TR/xpath
- XSLT
- http//www.w3.org/TR/xslt
- http//www.mulberrytech.com/quickref/XSLTquickref
.pdf -
2XQuery -- likely to gain acceptance...
- and, like other things in W3C, not necessarily
the best. XQuery is a group of projects.
General blather and informal specifications - http//www.w3.org/XML/Query
- Ingredients
- A concrete syntax http//www.w3.org/TR/xquery
- Based on XPath http//www.w3.org/TR/xpath.html
- A formal semantics and algebra
http//www.w3.org/TR/query-semantics/ - Some test cases http//www.w3.org/TR/xmlquery-use
-cases
3Query Languages and DTDs
- The DOM does not interact with DTDs (later
levels may do this) - For query languages there is almost no
interaction (only to find out the names of IDs
and IDREFs) - XDUCE (developed at Penn) is the only
well-developed language that uses DTDs as a type
system. www.cis.upenn.edu/hahosoya/xduce/
4XML-QL (XML Query Language)
- W3C proposal, August 1998
- authors
- Mary Fernandez ATT
- Dana Florescu INRIA
- Alon Levy Univ. of Washington
- Dan Suciu ATT
- Alin Deutsch Univ. of Pennsylvania
5Address Book Revisited
-
-
- Caesar
- Caesar Imperator
- The Capitol
- Rome, OH 98765
- (321) 786 2543
- (321) 786 2543
- (321) 786 2543
- jc_at_forum.rome.org
-
-
6XML-QL Pattern Matching
Find Caesars e-mail address where
Caesar
e
in http//db.cis.upenn.edu
/peter/address.xml construct e
jc_at_forum.rome.org
Data Extraction
7XML-QL Constructing New XML Data
Whom can we contact electronically? where
g
e
in http//... construct
g
e
Caesar
Imperator jc_at_forum.rome.org
Brutus mb_at_philippi.com
...
Data Restructuring
8XML-QL Joins
Who of our contacts was involved in a
movie? where
g
e
in
http//address.xml
t
g in
http//www.imdb.com construct
g
t
e
9XML-QL Joins (contd)
Caesar
Imperator jc_at_forum.rome.orgere Asterix and Cleopatra
Dr. Strangelove
strangelov_at_love.the.bomb
Dr. Strangelove or How I Stopped
... ...
Data Integration
10XML-QL Data Model
- Directed, labeled graph
- Tags represented as edge labels
- Sets of attribute name-value pairs as node labels
- Two models ordered and unordered
11XML-QL Data Model (contd)
-
- Caesar
- Caesar Imperator
- The Capitol
- Rome, OH 98765
- (321) 786 2543
- (321) 786 2543
- (321) 786 2543
- jc_at_forum.rome.org
-
12XSL (Extensible Stylesheet Language)
- W3C working draft by Adobe Systems
- Original purpose was to specify rendering of XML
documents (mainly by Web browsers HTML) - Consists of two parts
- an XML transformation language
- a formatting vocabulary denoting typographic
abstractions (paragraph, page, rule, footer,
etc.)
13The XSL Processor
original document
stylesheet
transformer
restructured document, elements are formatting
objects
plug in your favorite
formatter
presentation (other formats possible)
14Formatting Example
original document
An example This is a
test. This is anothertest.p
transformed document
An
example This is a
test. This is
another
test.
15 Template Rules
16 Template Rules (contd)
where n
e con
struct n
e
17XSL vs. XML-QL
XML-QL XSL
XML output no schema required data
extraction data restructuring data
integration schema browsing relational complete
18 19URLs -- XPath
- http//www.w3.org/TR/xpath
- This is the recommendation. Dense. Few
examples. Difficult to extract the big picture
from the morass of detail - http//www.zvon.org/xxl/XPathTutorial/
- General/examples.html
- A tutorial with some simple examples. Maybe too
simple. There are lots of tutorials on the web.
20URLs -- XQuery
- http//www.w3.org/TR/xquery/
- The basic recommendation. Plenty of examples,
so work through these first. - http//www.w3.org/TR/query-semantics/
- A formal semantics for XQuery. Despite its
forbidding title, it is remarkably readable. It
also discusses a type system for XQuery. - http//www.w3.org/TR/xmlquery-use-cases
- A bunch of example queries and their solution in
XQuery (not surprising, since XQuery is
Turing-complete!)
21How to Identify nodes in a Tree -- Regular Path
Expressions
In the normal syntax of regular
expressions db.emps.emp db.(depts.dept.mgr
emps.emp) db._.name
Mary
Bill
John
N.B. Regular path expressions have nothing to do
with regular expresions in DTDs
22More examples
With the DTD MOTHER)
the regular path expression
(PERSON.MOTHER) identifies matrilineal
ancestry XPATH is a superset of a subset of
regular path expressions. (It cannot express
this set of nodes.) However, it is not limited
to moving down the tree.
23XPath
- Primary goal to permit to access some nodes
from a given document - XPath main construct axis navigation
- An XPath path consists of one or more navigation
steps, separated by / - A navigation step is a triplet axis node-test
list of predicates - Examples
- /descendantnode()/childauthor
- /descendantnode()/childauthorparent/attribute
booktitle XML2 - XPath also offers some shortcuts
- no axis means child
- // º /descendant-or-selfnode()/
24XPath- child axis navigation
- author is shorthand for childauthor. Examples
- aaa -- all the child nodes labeled aaa (1,3)
- aaa/bbb -- all the bbb grandchildren of aaa
children (4) - /bbb all the bbb grandchildren of any child
(4,6) - . -- the context node / -- the root node
25XPath- child axis navigation (cont)
- /doc -- all the doc children of the root
- ./aaa -- all the aaa children of the context node
(equivalent to aaa) - text() -- all the text children of the context
node - node() -- all the children of the context node
(includes text and attribute nodes) - .. -- parent of the context node
- .// -- the context node and all its descendants
- // -- the root node and all its descendants
- //para -- all the para nodes in the document
- //text() -- all the text nodes in the document
- _at_font the font attribute node of the context node
26Predicates
- 2 -- the second child node of the context node
- chapter5 -- the fifth chapter child of the
context node - last() -- the last child node of the context
node - chaptertitleintroduction -- the chapter
children of the context node that have one or
more title children whose string-value is
introduction (the string-value is the
concatenation of all the text on descendant text
nodes) - person.//firstname joe -- the person
children of the context node that have in their
descendants a firstname element with string-value
Joe - From the XPath specification
- NOTE If x is bound to a node set then x
foo does not mean the same as not (x !
foo) .
27Unions of Path Expressions
- employee consultant -- the union of the
employee and consultant nodes that are children
of the context node - For some reason person/(employeeconsultant) --as
in regular path expressions -- is not allowed - However person/node()boolean(employeeconsultant)
is allowed!! - From the XPATH specification
- The boolean function converts its argument to a
boolean as follows - a number is true if and only if it is neither
positive or negative zero nor NaN - a node-set is true if and only if it is non-empty
- a string is true if and only if its length is
non-zero - an object of a type other than the four basic
types is converted to a boolean in a way that is
dependent on that type
28Axis navigation
- So far, nearly all our expressions have moved us
down the by moving to child nodes. Exceptions
were - . -- stay where you are
- / go to the root
- // all descendants of the root
- .// all descendants of the context node
- All other expressions have been abbreviations for
child e.g. childpara. childis an example of
an axis - XPath has several axes ancestor,
ancestor-or-self, attribute, child, descendant,
descendant-or-self, following, following-sibling,
namespace, parent, preceding, preceding-sibling,
self - Some of these (self, parent) describe single
nodes, others describe sequences of nodes.
29XPath Navigation Axes(merci, Arnaud Sahuguet)
ancestor
following-sibling
preceding-sibling
self
child
attribute
following
preceding
namespace
descendant
30XPath abbreviated syntax
(nothing) child _at_ attribute // /descendan
t-or-selfnode() . selfnode() .// descendan
t-or-selfnode .. parentnode() / (document
root)
31XPath
- Reasonably widely adopted -- in XML-Schema and
query languages. - Neither more expressive nor less expressive than
regular path expressions (cant do (ab) ) - Particularly messy in some areas
- defining order of results
- overloading of operations,
- e.g. chapter/title Introduction
- why not Introduction IN chapter/title ?
32XQuery
- proposed by Chamberlin, Robbie and Florescu
- (from the authors slides)
- Leverage the most effective features of several
existing and proposed query languages - Design a small, clean, implementable language
- Cover the functionality required by all the XML
Query use cases in a single language - Write queries that fit on a slide
33XQuery XPath comprehension syntax
34XQuery
35Examples from XQuery
List the titles of books published by Morgan
Kaufmann in 1998. FOR b IN document("bib.xml")/
/book WHERE b/publisher "Morgan Kaufmann"
AND b/_at_year "1998" RETURN b/title
XPath expressions in orange
36DTD for Sample Document
37Examples from XQuery (cont)
List each publisher and the average price of its
books. FOR p IN distinct(document("bib.xml")//p
ublisher) LET a avg( document("bib.xml")/
/bookpublisher p/price) RETURN
p/text()
a
LET binds a variable to a value. It does not
cause an iteration. Does this create a
(well-formed) XML document?
38Examples from XQuery (cont)
List the publishers who have published more than
100 books. FOR p IN
distinct(document("bib.xml")//publisher) LET b
document("bib.xml")//bookpublisher
p WHERE count(b) 100 RETURN p
What about efficiency?
39Examples from XQuery (cont)
Invert the structure of the input document so
that each distinct author element contains a
sequence of book-titles. FOR
a IN distinct(document("bib.xml")//author)
RETURN a/text()
FOR b IN document("bib.xml")//bookauthor
a RETURN b/title _list