Title: XML Data Management 8. XQuery
1XML Data Management 8. XQuery
2Requirements for an XML Query Language
- David Maier, W3C XML Query Requirements
- Closedness output must be XML
- Composability wherever a set of XML elements is
required, a subquery is allowed as well - Support for key operations
- selection
- extraction, projection
- restructuring
- combination, join
- fusion of elements
3Requirements for an XML Query Language
- Can benefit from a schema, but should also be
applicable without - Retains the order of nodes
- Formal semantics
- structure of results should be derivable from
query - defines equivalence of queries
- Queries should be representable in XML
- ? documents can have embedded queries
4How Does One Design a Query Language?
- In most query languages, there are two aspects to
a query - Retrieving data (e.g., from where in SQL)
- Creating output (e.g., select in SQL)
- Retrieval consists of
- Pattern matching (e.g., from )
- Filtering (e.g., where )
- although these cannot always be clearly
distinguished
5XQuery Principles
- Data Model identical with the XPath data model
- documents are ordered, labeled trees
- nodes have identity
- nodes can have simple or complex types
(defined in XML Schema) - A query result is an ordered list/sequence of
items - (nodes, values, attributes, etc., but not lists)
- special case the empty list ()
6XQuery Principles (cntd)
- XQuery can be used without schemas, but
can be checked against DTDs and XML schemas - XQuery is a functional language
- no statements
- evaluation of expressions
- function definitions
- modules
7The Recipes DTD (Reminder)
- lt!ELEMENT recipes (recipe)gt
- lt!ELEMENT recipe (title, ingredient,
preparation, nutrition)gt - lt!ELEMENT title (PCDATA)gt
- lt!ELEMENT ingredient (ingredient,
preparation?)gt - lt!ATTLIST ingredient
- name CDATA REQUIRED
- amount CDATA IMPLIED
- unit CDATA IMPLIEDgt
- lt!ELEMENT preparation (step)gt
- lt!ELEMENT step (PCDATA)gt
- lt!ELEMENT nutrition EMPTYgt
- lt!ATTLIST nutrition
- calories CDATA REQUIRED
- fat CDATA REQUIREDgt
8A Query over the Recipes Document
lttitlesgt for r in doc("recipes.xml")//recip
e return r/title lt/titlesgt returns
lttitlesgt lttitlegtBeef Parmesan with Garlic
Angel Hair Pastalt/titlegt lttitlegtRicotta
Pielt/titlegt lt/titlesgt
9Query Features
lttitlesgt for r in doc("recipes.xml")//recipe
return r/title lt/titlesgt
doc(String) returns input document
Sequence of results,one for each variable binding
10An Equivalent Stylesheet Template
- ltxsltemplate match"/"gt
- lttitlesgt
- ltxslfor-each select"//recipe"gt
- ltxslcopy-of select"title"/gt
- lt/xslfor-eachgt
- lt/titlesgt
- lt/xsltemplategt
11Features Summary
- The result is a new XML document
- A query consists of parts that are returned as is
- ... and others that are evaluated (everything in
... ) - Calling the function doc(String) returns
an
input document - XPath is used to retrieve node sets and values
- Iteration over node sets for
binds a variable to all nodes in a node set - Variables can be used in XPath expressions
- return returns a sequence of results,
one for each binding
of a variable
12XPath is a Fragment of XQuery
- doc("recipes.xml")//recipe1/title
- returns
- lttitlegtBeef Parmesan with Garlic Angel Hair
Pastalt/titlegt - doc("recipes.xml")//recipeposition()lt3
/title - returns
- lttitlegtBeef Parmesan with Garlic Angel Hair
Pastalt/titlegt, - lttitlegtRicotta Pielt/titlegt,
- lttitlegtLinguine alla Pescadoralt/titlegt
an element
a list of elements
13Beware Attributes in XPath
- doc("recipes.xml")//recipe1/ingredient1
/_at_name - ? attribute name "beef cube steak"
- string(doc("recipes.xml")//recipe1
/ingredient1/_at_name) - ? "beef cube steak"
an attribute, represented as a constructor for
an attribute node (not in Saxon)
a value of type string
14Beware Attributes in XPath (cntd.)
- ltfirst-ingredientgtstring(doc("recipes.xml")//rec
ipe1 /ingredient1/_at_name)
lt/first-ingredientgt -
- ? ltfirst-ingredientgtbeef cube steaklt/first-ingred
ientgt
an element with string content
15Beware Attributes in XPath (cntd.)
- ltfirst-ingredientgtdoc("recipes.xml")//recipe1
/ingredient1/_at_name - lt/first-ingredientgt
-
- ? ltfirst-ingredient name"beef cube steak"/gt
an element with an attribute
- Note The XML that we write down is only the
surface structure of the data model that is
underlying XQuery
16Beware Attributes in XPath (cntd.)
- ltfirst-ingredient
- oldName"doc("recipes.xml")//recipe1
/ingredient1/_at_name"gtBeeflt/first-i
ngredientgt - ? ltfirst-ingredient oldName"beef cube steak"gt
- Beef
- lt/first-ingredientgt
17Constructor Syntax
- For all constituents of documents, there are
constructors - element first-ingredient
-
- attribute oldName
- string(doc("recipes.xml")//recipe1
/ingredient1/_at_name), - "Beef"
-
- equivalent to the notation on the
previous slide
18Iteration with the For-Clause
- Syntax for var in xpath-expr
- Example for r in doc("recipes.xml")//recipe
return string(r) - The expression creates a list of bindings for a
variable var - If var occurs in an expression exp,
then exp is evaluated for
each binding - For-clauses can be nested
for r in doc("recipes.xml")//recipefor v in
doc("vegetables.xml")//vegetable return ...
19What Does This Return?
-
- for i in (1,2,3)for j in (1,2,3) return
element concat("x",i j) i j
20Nested For-clauses Example
- ltmy-recipesgt
- for r in doc("recipes.xml")//recipe
- return
- ltmy-recipe title"r/title"gt
- for i in r//ingredient
- return
- ltmy-ingredientgt
- string(i/_at_name)
- lt/my-ingredientgt
-
- lt/my-recipegt
-
- lt/my-recipesgt
Returns my-recipes with titles as attributes
and my-ingredients with names as text content
21The Equivalent Stylesheet Template
- ltxsltemplate match"/"gt
- ltmy-recipesgt
- ltxslfor-each select".//recipe"gt
- ltmy-recipe title"title"gt
- ltxslfor-each select"ingredient"gt
- ltmy-ingredientgt
- ltxslvalue-of select"_at_name"/gt
- lt/my-ingredientgt
- lt/xslfor-eachgt
- lt/my-recipegt
- lt/xslfor-eachgt
- lt/my-recipesgt
- lt/xsltemplategt
22The Let Clause
- Syntax let var xpath-expr
- binds variable var to a list of nodes,
with the nodes in document order - does not iterate over the list
- allows one to keep intermediate results for reuse
(not
possible in SQL) - Example
- let oorecps doc("recipes.xml")//recipe
.//ingredient/_at_name"olive oil"
23Let Clause Example
ltcalory-contentgt let oorecps
doc("recipes.xml")//recipe
.//ingredient/_at_name"olive oil" for r in
oorecps return ltcaloriesgt r/title/text()
" " string(r/nutrition/_at_calories)
lt/caloriesgtlt/calory-contentgt
Note the implicit string concatenation
Calories of recipes with olive oil
24Let Clause Example (cntd.)
- The query returns
- ltcalory-contentgt
- ltcaloriesgtBeef Parmesan 1167lt/caloriesgt
- ltcaloriesgtLinguine alla Pescadora
532lt/caloriesgt - lt/calory-contentgt
25The Where Clause
- Syntax where ltconditiongt
- occurs before return clause
- similar to predicates in XPath
- comparisons on nodes
- for node equality
- ltlt and gtgt for document order
- Example
for r in doc("recipes.xml")//recipe where
r//ingredient/_at_name"olive oil" return ...
26Quantifiers
- Syntax some/every var in ltnode-setgt
satisfies ltexprgt - var is bound to all nodes in ltnode-setgt
- Test succeeds if ltexprgt is true for some/every
binding - Note if ltnode-setgt is empty, then
some is false and all is
true
27Quantifiers (Example)
- Recipes that have some compound ingredient
- Recipes where every top level ingredient is
non-compound
for r in doc("recipes.xml")//recipe where some
i in r/ingredient satisfies i/ingredient
return r/title
for r in doc("recipes.xml")//recipe where every
i in r/ingredient satisfies
not(i/ingredient) return r/title
28Element Fusion
- To every recipe, add the attribute calories!
- ltresultgt
- let rs doc("recipes.xml")//recipe
- for r in rs return
- ltrecipegt
- r/nutrition/_at_calories
- r/title
- lt/recipegt
- lt/resultgt
29Element Fusion (cntd.)
- The query result
- ltresultgt
- ltrecipe calories"1167"gt
- lttitlegtBeef Parmesan with Garlic Angel Hair
Pastalt/titlegt - lt/recipegt
- ltrecipe calories"349"gtlttitlegtRicotta
Pielt/titlegtlt/recipegt - ltrecipe calories"532"gtlttitlegtLinguine
Pescadorolt/titlegtlt/recipegt - ltrecipe calories"612"gtlttitlegtZuppa
Ingleselt/titlegtlt/recipegt - ltrecipe calories"8892"gt
- lttitlegtCailles en Sarcophageslt/titlegt
- lt/recipegt
- lt/resultgt
30Fusion with Mixed Syntax
- We mix constructor and XMLSyntax
- element result
- let rs doc("recipes.xml")//recipe
- for r in rs return
- ltrecipegt
- attribute calories r/nutrition/_at_calories
- r/title
- lt/recipegt
31The Same with Constructor Syntax Only
- element result
- let rs doc("recipes.xml")//recipe
- for r in rs return
- element recipe
-
- attribute calories r/nutrition/_at_calories,
- r/title
-
-
32Join
- Pair every ingredient with the recipes where it
is used! - let rs doc("recipes.xml")//recipe
- for i in rs//ingredient
- for r in rs
- where r//ingredient/_at_namei/_at_name
- return
- ltusedingt
- i/_at_name
- r/title
- lt/usedingt
33Join (cntd.)
- The query result
-
- ltusedin name"beef cube steak"gt
- lttitlegtBeef Parmesan with Garlic Angel Hair
Pastalt/titlegt - lt/usedingt,
- ltusedin name"onion, sliced into thin rings"gt
- lttitlegtBeef Parmesan with Garlic Angel Hair
Pastalt/titlegt - lt/usedingt,
- ltusedin name"green bell pepper, sliced in
rings"gt - lttitlegtBeef Parmesan with Garlic Angel Hair
Pastalt/titlegt - lt/usedingt
34Join Exercise
- Return all pairs of ingredients such that
- the ingredients have the same name,
- but occur with different amounts
- and return
- the recipes where each of them is used
- together with the amount being used in those
recipes, - while returning every pair only once.
- Could a query for these ingredients be expressed
in XPath?
35Document Inversion
- For every ingredient, return all the recipes
where it is used! - ltresultgt
- let rs doc("recipes.xml")//recipe
- for i in rs//ingredient
- return
- ltingredientgt
- i/_at_
- rs.//ingredient/_at_namei/_at_name/title
- lt/ingredientgt
- lt/resultgt
36Document Inversion (cntd.)
- The query result
- ltresultgt ltingredient amount"1"
name"Alchermes liquor" unit"cup"gt
lttitlegtZuppa Ingleselt/titlegt lt/ingredientgt -
- ltingredient amount"2" name"olive oil"
unit"tablespoon"gt lttitlegtBeef Parmesan with
Garlic Angel Hair Pastalt/titlegt
lttitlegtLinguine Pescadorolt/titlegt - lt/ingredientgt
-
37Eliminating Duplicates
- The function distinct-values(Node Set)
- extracts the values of a sequence of nodes
- creates a duplicate free list of values
- Note the coercion nodes are cast as values!
- Example
- let rs doc("recipes.xml")//recipereturn
distinct-values(rs//ingredient/_at_name) - yields
- xdtuntypedAtomic("beef cube steak"),
- xdtuntypedAtomic("onion, sliced into thin
rings"), - ...
by the Galaxengine
38Avoiding Multiple Results in a Join
- We want that every ingredient is listed only
once - Eliminate duplicates using distinct-values!
-
- ltresultgtlet rs doc("recipes.xml")//recipe
for in in distinct-values(
rs//ingredient/_at_name) return ltrecipes
with"in"gt rs.//ingredient/_at_namein/tit
le lt/recipesgt lt/resultgt
39Avoiding Multiple Results (cntd.)
- The query result
- ltresultgt ltrecipes with"beef cube steak"gt
lttitlegtBeef Parmesan with Garlic Angel Hair
Pastalt/titlegt lt/recipesgt - ltrecipes with"onion, sliced into thin
rings"gt lttitlegtBeef Parmesan with Garlic
Angel Hair Pastalt/titlegt lt/recipesgt...
ltrecipes with"salt"gt lttitlegtLinguine
Pescadorolt/titlegt lttitlegtCailles en
Sarcophageslt/titlegt - lt/recipesgt
- ...
40The Order By Clause
- Syntax order by expr ascending
descending - for iname in doc("recipes.xml")//_at_name order
by iname descending return string(iname) - yields
- "whole peppercorns","whole baby clams","white
sugar",...
41The Order By Clause (cntd.)
- let rs doc("recipes.xml")//_at_namefor r in
rsorder by r/nutrition/_at_caloriesreturn
r/title -
- In which order will the titles come?
42The Order By Clause (cntd.)
- The interpreter must be told whether the values
- should be regarded as numbers or as strings
(alphanumerical sorting is default) - for r in rsorder by number(r/nutrition/_at_calor
ies)return r/title - Note
- The query returns titles ...
- but the ordering is according to calories, which
do not appear in the output - Also possible in SQL! What if combined with
distinct-values?
43FLWOR Expresssions (pronounced flower)
- We have now seen the main ingredients of XQuery
- For and Let clauses, which can be mixed
- a Where clause imposing conditions
- an Order by clause, which determines the order of
results - a Return clause, which constructs the output.
- Combining these yields FLWOR expressions.
44Conditionals
- if (expr) then expr else expr
- Example
- let is doc("recipes.xml")//ingredientfor i
in isnot(ingredient)let u if
(not(i/_at_unit)) then attribute unit
"pieces" else () - creates an attribute unit"pieces" if none
existsand an empty item list otherwise
45Conditionals (cntd.)
- We use the conditional to construct variants of
ingredients - let is doc("recipes.xml")//ingredientfor i
in isnot(ingredient) let u if
(not(i/_at_unit)) then attribute
"unit" "pieces" else
()returnltingredientgt i/_at_ ult/ingredientgt
46Conditionals (cntd.)
- The query result
- ltingredient name"beef cube steak" amount"1.5"
unit"pound"/gt, - ...
- ltingredient name"eggs" amount"12"
unit"pieces"/gt,
47Exercises
- Write queries that produce
- A list, containing for every recipe the recipe's
title element and an element with the number of
calories - The same, ordered according to calories
- The same, alphabetically ordered according to
title - The same, ordered according to the fat content
- The same, with title as attribute and calories as
content. - A list, containing for every recipe the top level
ingredients, dropping the lower level ingredients
48Sample Solution 1
- A list, containing for every recipe the recipe's
title element and an element with the number of
calories - ltresultgt
- for r in doc("recipes.xml")//recipe
- return
- (r/title,
- ltcaloriesgt
- number(r//_at_calories)
- lt/caloriesgt)
-
- lt/resultgt
The results returned are 2-element lists. The
list constructor is ( . , . )
49Sample Solution 6
- ltresultsgt for r in doc("recipes.xml")//recipe
return ltrecipegt attribute title
r/title, for i in r/ingredient
return if (not(i/ingredient)) then
i else ltingredientgt
i/_at_ lt/ingredientgt lt/recipegt
lt/resultsgt
50Aggregation
- Aggregation functions count, sum, avg, min, max
- Example The number of recipes with olive oil
- let doc doc("recipes.xml)returnltnumbergt
count(doc//recipe .//ingredient/_at_name
"olive oil")lt/numbergt
51Grouping and Aggregation
- For each recipe, the number of simple
ingredients - for r in doc("recipes.xml")//recipereturnltnumb
ergt attribute title r/title/text()
count(r//ingredientnot(ingredient))lt/numbergt
52Grouping and Aggregation (cntd.)
- The query result
- ltnumber title"Beef Parmesan with Garlic Angel
Hair Pasta"gt
11lt/numbergt, - ltnumber title"Ricotta Pie"gt12lt/numbergt,
- ltnumber title"Linguine Pescadoro"gt15lt/numbergt,
- ltnumber title"Zuppa Inglese"gt8lt/numbergt,
- ltnumber title"Cailles en Sarcophages"gt30lt/numbergt
53Grouping and Aggregation (cntd.)
- A list, containing for every ingredient, the
number of occurrences of that ingredient - let d doc("recipes.xml")
- let is distinct-values(d//ingredient/_at_name)
- return
- ltresultgt
- for i in is
- order by i
- return
- ltingredient name"i"gt
- count(d//ingredient_at_namei)
- lt/ingredientgt
- lt/resultgt
54Nested Aggregation
- The recipe with the maximal number of calories!
- let rs doc("recipes.xml")//recipelet
maxCal max(rs//_at_calories)for r in
rswhere r//_at_calories maxCalreturn
string(r/title) - returns
- "Cailles en Sarcophages"
55User-defined Functions
- declare function localfac(n as xsinteger) as
xsinteger -
- if (n 0)
- then 1
- else n localfac(n - 1)
-
- localfac(10)
FunctionDeclaration
FunctionCall
56Example Nested Ingredients
- declare function localnest(n as xsinteger,
content as xsstring) - as element()
-
- if (n 0)
- then element ingredientcontent
- else element ingredientlocalnest(n -
1,content) -
- localnest(3,"Stuff")
57What Does this Function Return?
- declare function localdepth(n as node())
- as xsinteger
-
- if (fnempty(n/))
- then 1
- else let cdepths
- for c in n/ return localdepth(c)
- return fnmax(cdepths) 1
-
58Exercise
- Write a function
- localelement-copy
- that
- takes as input a node ( XML tree)
- produces as output a copy of the tree, but
without the attributes