Title: Introduction to XQuery
1 - Introduction to XQuery
- Bob DuCharme
- www.snee.com/bob
- bob_at_snee.com
- these slides www.snee.com/xml
2What is XQuery?
- A query language that uses the structure of XML
intelligently can express queries across all
these kinds of data, whether physically stored in
XML or viewed as XML via middleware. This
specification describes a query language called
XQuery, which is designed to be broadly
applicable across many types of XML data
sources. - XQuery 1.0 An XML Query Language
- W3C Working Draft
3History
- February 1998 XML (Rec)
- November 1999 XSLT 1.0, Xpath 1.0 (Recs)
- (as of 8 June 2005) XPath 2.0, XSLT 2.0, XQuery
1.0 in last call Working Draft status - Steps for a W3C standard
- Working Draft
- Last Call Working Draft
- Candidate Recommendation
- Proposed Recommendation
- Recommendation
4input1.xml sample document
- ltdocgt
- ltpgtThis is a sample file.lt/pgt
- ltpgtThis line ltemphgtreallylt/emphgt has an inline
element.lt/pgt - ltpgtThis line doesn't.lt/pgt
- ltpgtDo ltemphgtyoult/emphgt like inline
elements?lt/pgt - lt/docgt
5Our first query
- Querying from the command line
- java net.sf.saxon.Query " doc('input1.xml')//p
emph " - Result
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltpgtThis line ltemphgtreallylt/emphgt has an inline
element.lt/pgt - ltpgtDo ltemphgtyoult/emphgt like inline elements?lt/pgt
6Query stored in a file
- xq1.xqy
- ( Here is an XQuery comment. )
- doc('data1.xml')//pemph
- Executing it
- java net.sf.saxon.Query xq1.xqy
7Simplifying the command line
- Linux shell script xquery
- java net.sf.saxon.Query 1 2 3 4 5 6
- Windows batch file xquery.bat
- java net.sf.saxon.Query 1 2 3 4 5 6
- (assuming saxon8.jar is in classpath)
- Executing either
- xquery xq1.xqy
8Data for more serious examples
- RecipeML DTD and documentation
- http//www.formatdata.com/recipeml
- Squirrel's RecipeML Archive
- http//dsquirrel.tripod.com/recipeml/indexrecipes2
.html - My sample 294 files
9RecipeML typical structure
- ltrecipeml version"0.5"gt
- ltrecipegt
- ltheadgt
- lttitlegtWalnut Vinaigrettelt/titlegt
- ltcategoriesgtltcatgtDressingslt/catgtlt/categories
gt - ltyieldgt1lt/yieldgt
- lt/headgt
- ltingredientsgt
- ltinggt
- ltamtgtltqtygt1lt/qtygtltunitgtcuplt/unitgtlt/amtgt
- ltitemgtCanned No Salt Chickenlt/itemgtlt/inggt
- ltinggt
- lt!-- more ing elements --gt
- lt/ingredientsgt
- ltdirectionsgt
- ltstepgtBring chicken broth to a boil.lt/stepgt
10Saxon and collection() function
- Argument to function names document in this
format - ltcollectiongt
- ltdoc href"_Band__Sloppy_Joes.xml"/gt
- ltdoc href"_Cheese__Fricadelle.xml"/gt
- lt!-- more doc elements... --gt
- ltdoc href"Walton_Mountain_Coffee_Cake.xml"/gt
- ltdoc href"Walty's_Dressing.xml"/gt
- ltdoc href"Wan_Tan_(Wonton).xml"/gt
- lt/collectiongt
11Looking for some sugar
- collection('recipeml/docs.xml')/recipeml/
recipe/head/title //ingredients/ing/itemcontains
(.,'sugar')
12A more SQL-like approach
- for ingredient in collection('recipeml/docs.xml')
// - ingredients/ing/itemcontains(.,'sugar')
- return ingredient/../../../head/title
13Outputting well-formed XML
- ltsweetsgt
-
- let target 'sugar'
- for ingredient in collection('recipeml/docs.x
ml')// - ingredients/ing/itemcontains(.,
target ) - return ingredient/../../../head/title
-
- lt/sweetsgt
14FLWOR expressions
- for
- let
- where
- order by
- return
- "a FLWOR expression ... supports iteration and
binding of variables to intermediate results.
This kind of expression is often useful for
computing joins between two or more documents and
for restructuring data."
15Extracting subsets XPath vs. FLWOR approach
- Get the title element for each recipe whose yield
is greater than 20 - collection('recipeml/docs.xml')/recipeml/
recipe/head/title../yield gt 20 - Go through all the documents in the collection,
and for any with a yield of more than 20, get the
title - for doc in collection('recipeml/docs.xml')/recipe
ml - where doc/recipe/head/yield gt 20
- return doc/recipe/head/title
16Doing more with the for clause variable
- ( Create an HTML page linking to recipes
- that serve more than 20 people. )
- lthtmlgtltheadgtlttitlegtFood for a Crowdlt/titlegtlt/headgt
- ltbodygt
- lth1gtFood for a Crowdlt/h1gt
-
- for doc in collection('recipeml/docs.xml')
- where doc /recipeml/recipe/head/yield gt 20
- return
- ltpgtlta href"document-uri( doc )"gt
- doc /recipeml/recipe/head/title/text()
- lt/agtlt/pgt
-
- lt/bodygtlt/htmlgt
17Calling functions from a let clause
- ( Which recipe(s) serves the most people? )
- let maxYield
- max(collection('recipeml/docs.xml')/recipeml/
recipe/head/yield) - return collection('recipeml/docs.xml')/recipeml/
recipehead/yield maxYield
18distinct-values and order by
- ( A unique, sorted list of all unique
- ingredients in the recipe collection,
- with URLS to link to the recipes. )
- ltingredientsgt
-
- for ingr in
- distinct-values( collection('recipeml/docs.xml')/
- recipeml/recipe/ingredients/ing/item )
- order by ingr
- return
- ltitem name"ingr"gt
-
- for doc in
- collection('recipeml/docs.xml')
- where doc/recipeml/recipe/
- ingredients/ing/item ingr
19distinct-values and order by, continued
- return
- lttitle url"document-uri(doc)"gt
- doc/recipeml/recipe/head/title/ text()
- lt/titlegt
-
- lt/itemgt
-
- lt/ingredientsgt
20Excerpt from output
- ltingredientsgt
- lt!-- some item elements removed --gt
- ltitem name" (12-oz) tomato paste "gt
- lttitle url"file/C/dat/recipeml/
- _Best_Ever__Pizza_Sauce.xml"gt
- "Best Ever" Pizza Saucelt/titlegt
- lt/itemgt
- ltitem name" Baking Powder"gt
- lttitle url"file/c/dat/recipeml/
- _Blondie__Brownies.xml"gt
- "Blondie" Brownieslt/titlegt
- lttitle url"file/c/dat/recipeml/
- Walnut_Pound_Cake.xml"gt
- Walnut Pound Cakelt/titlegt
- lt/itemgt
- ltitem name" Baking Soda "gt
- lttitle url"file/c/dat/recipeml/
- _Faux__Sourdough.xml"gt
- "Faux" Sourdoughlt/titlegt
- "Gold Room" Sconeslt/titlegt
- lttitle url"file/c/dat/recipeml/
- _Outrageous_Chocolate_Chipper.xml"gt
- "Outrageous" Chocolate-Oatmeal Chipper
- (Cookilt/titlegt
- lt/itemgt
- ltitem name"Baking soda"gt
- lttitle url"file/c/dat/recipeml/
- _First__Ginger_Cookies.xml"gt
- "First" Ginger Molasses Cookieslt/titlegt
- lttitle url"file/c/dat/recipeml/
- _Foot_in_the_Cake.xml"gt
- "Foot in the Fire" Chocolate Cakelt/titlegt
- lt/itemgt
- ltitem name"Tomato paste"gt
- lttitle url"file/C/dat/recipeml/
- Crawfish_Etouff'ee.xml"gt
- "Frank's Place" Crawfish Etouff'ee
- lt/titlegt
21RecipeML varying markup richness
- One way to do it
- ltinggtltitemgt
- (12-oz) tomato paste
- lt/itemgtlt/inggt
- Another way
- ltinggt
- ltamtgt
- ltqtygt12lt/qtygt
- ltunitgtozlt/unitgt
- lt/amtgt
- ltitemgttomato pastelt/itemgt
- lt/inggt
22Normalizing data with declared functions
- ( A unique, sorted list of all unique
ingredients in - the recipe collection, with URLs to link to
them. - Ingredient names get normalized by functions
- declared in the query prolog. )
- declare namespace sn "http//www.snee.com/ns/mis
c/" - declare function snnormIngName(ingName) as
xsstring - ( Normalize ingredient name. )
- ( remove parenthesized expression that may
begin - string, e.g. in "(10 ozs) Rotel diced
tomatoes") - let normedName replace(ingName,"\(.?\)\s
","") - ( convert to all lower-case )
- let normedName lower-case(normedName)
- ( replace multiple spaces with a
- single one )
- let normedName normalize-space(normedName)
- return normedName
-
23Normalizing data with functions, part 2 of 3
- declare function snnormIngList(ingList) as
item() - ( Normalize a list of ingredient names. )
- for ingName in ingList
- return snnormIngName(ingName)
-
- ltingredientsgt
-
- let normIngNames
- snnormIngList(collection('recipeml/docs.xml')//
- ing/item)
24Normalizing data with functions, part 3 of 3
- for ingr in distinct-values(normIngNames)
- order by ingr
- return
- ltitem name"ingr"gt
-
- for doc in
- collection('recipeml/docs.xml'),
- i in doc/recipeml/recipe/ingredients/ing
/item - where snnormIngName(i) ingr
- return
- lttitle url"document-uri(doc)"gt
- doc/recipeml/recipe/head/title/text()
- lt/titlegt
-
- lt/itemgt
-
- lt/ingredientsgt
25Specs at http//www.w3.org/tr
- XQuery 1.0 An XML Query Language
- XQuery 1.0 and XPath 2.0 Formal Semantics
- the XQuery 1.0 and XPath 2.0 Data Model
- XSLT 2.0 and XQuery 1.0 Serialization
- XQuery 1.0 and XPath 2.0 Functions and Operators
- XML Query Use Cases
26Other resources
- eXist http//www.exist-db.org
- httpww/w3.org/TR
- MarkLogic http//www.marklogic.com
- Mike Kay Comparing XSLT and XQuery
http//idealliance.org/proceedings/xtech05/papers/
02-03-01/ - httpww/w3.org/TR
- XQuery Update Requirements
- XQuery 1.0 and XPath 2.0 Full-Text