Introduction to XQuery - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to XQuery

Description:

Get the title element for each recipe whose yield is greater than 20: ... 'Best Ever' Pizza Sauce /title /item item name=' Baking Powder' ... – PowerPoint PPT presentation

Number of Views:382
Avg rating:3.0/5.0
Slides: 27
Provided by: snee
Category:

less

Transcript and Presenter's Notes

Title: Introduction to XQuery


1
  • Introduction to XQuery
  • Bob DuCharme
  • www.snee.com/bob
  • bob_at_snee.com
  • these slides www.snee.com/xml

2
What is XQuery?
  • A query language that uses the structure of XML
    intelligently can express queries across all
    these kinds of data, whether physically stored in
    XML or viewed as XML via middleware. This
    specification describes a query language called
    XQuery, which is designed to be broadly
    applicable across many types of XML data
    sources.
  • XQuery 1.0 An XML Query Language
  • W3C Working Draft

3
History
  • February 1998 XML (Rec)
  • November 1999 XSLT 1.0, Xpath 1.0 (Recs)
  • (as of 8 June 2005) XPath 2.0, XSLT 2.0, XQuery
    1.0 in last call Working Draft status
  • Steps for a W3C standard
  • Working Draft
  • Last Call Working Draft
  • Candidate Recommendation
  • Proposed Recommendation
  • Recommendation

4
input1.xml sample document
  • ltdocgt
  • ltpgtThis is a sample file.lt/pgt
  • ltpgtThis line ltemphgtreallylt/emphgt has an inline
    element.lt/pgt
  • ltpgtThis line doesn't.lt/pgt
  • ltpgtDo ltemphgtyoult/emphgt like inline
    elements?lt/pgt
  • lt/docgt

5
Our first query
  • Querying from the command line
  • java net.sf.saxon.Query " doc('input1.xml')//p
    emph "
  • Result
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltpgtThis line ltemphgtreallylt/emphgt has an inline
    element.lt/pgt
  • ltpgtDo ltemphgtyoult/emphgt like inline elements?lt/pgt

6
Query stored in a file
  • xq1.xqy
  • ( Here is an XQuery comment. )
  • doc('data1.xml')//pemph
  • Executing it
  • java net.sf.saxon.Query xq1.xqy

7
Simplifying the command line
  • Linux shell script xquery
  • java net.sf.saxon.Query 1 2 3 4 5 6
  • Windows batch file xquery.bat
  • java net.sf.saxon.Query 1 2 3 4 5 6
  • (assuming saxon8.jar is in classpath)
  • Executing either
  • xquery xq1.xqy

8
Data for more serious examples
  • RecipeML DTD and documentation
  • http//www.formatdata.com/recipeml
  • Squirrel's RecipeML Archive
  • http//dsquirrel.tripod.com/recipeml/indexrecipes2
    .html
  • My sample 294 files

9
RecipeML typical structure
  • ltrecipeml version"0.5"gt
  • ltrecipegt
  • ltheadgt
  • lttitlegtWalnut Vinaigrettelt/titlegt
  • ltcategoriesgtltcatgtDressingslt/catgtlt/categories
    gt
  • ltyieldgt1lt/yieldgt
  • lt/headgt
  • ltingredientsgt
  • ltinggt
  • ltamtgtltqtygt1lt/qtygtltunitgtcuplt/unitgtlt/amtgt
  • ltitemgtCanned No Salt Chickenlt/itemgtlt/inggt
  • ltinggt
  • lt!-- more ing elements --gt
  • lt/ingredientsgt
  • ltdirectionsgt
  • ltstepgtBring chicken broth to a boil.lt/stepgt

10
Saxon and collection() function
  • Argument to function names document in this
    format
  • ltcollectiongt
  • ltdoc href"_Band__Sloppy_Joes.xml"/gt
  • ltdoc href"_Cheese__Fricadelle.xml"/gt
  • lt!-- more doc elements... --gt
  • ltdoc href"Walton_Mountain_Coffee_Cake.xml"/gt
  • ltdoc href"Walty's_Dressing.xml"/gt
  • ltdoc href"Wan_Tan_(Wonton).xml"/gt
  • lt/collectiongt

11
Looking for some sugar
  • collection('recipeml/docs.xml')/recipeml/
    recipe/head/title //ingredients/ing/itemcontains
    (.,'sugar')

12
A more SQL-like approach
  • for ingredient in collection('recipeml/docs.xml')
    //
  • ingredients/ing/itemcontains(.,'sugar')
  • return ingredient/../../../head/title

13
Outputting well-formed XML
  • ltsweetsgt
  • let target 'sugar'
  • for ingredient in collection('recipeml/docs.x
    ml')//
  • ingredients/ing/itemcontains(.,
    target )
  • return ingredient/../../../head/title
  • lt/sweetsgt

14
FLWOR expressions
  • for
  • let
  • where
  • order by
  • return
  • "a FLWOR expression ... supports iteration and
    binding of variables to intermediate results.
    This kind of expression is often useful for
    computing joins between two or more documents and
    for restructuring data."

15
Extracting subsets XPath vs. FLWOR approach
  • Get the title element for each recipe whose yield
    is greater than 20
  • collection('recipeml/docs.xml')/recipeml/
    recipe/head/title../yield gt 20
  • Go through all the documents in the collection,
    and for any with a yield of more than 20, get the
    title
  • for doc in collection('recipeml/docs.xml')/recipe
    ml
  • where doc/recipe/head/yield gt 20
  • return doc/recipe/head/title

16
Doing more with the for clause variable
  • ( Create an HTML page linking to recipes
  • that serve more than 20 people. )
  • lthtmlgtltheadgtlttitlegtFood for a Crowdlt/titlegtlt/headgt
  • ltbodygt
  • lth1gtFood for a Crowdlt/h1gt
  • for doc in collection('recipeml/docs.xml')
  • where doc /recipeml/recipe/head/yield gt 20
  • return
  • ltpgtlta href"document-uri( doc )"gt
  • doc /recipeml/recipe/head/title/text()
  • lt/agtlt/pgt
  • lt/bodygtlt/htmlgt

17
Calling functions from a let clause
  • ( Which recipe(s) serves the most people? )
  • let maxYield
  • max(collection('recipeml/docs.xml')/recipeml/
    recipe/head/yield)
  • return collection('recipeml/docs.xml')/recipeml/
    recipehead/yield maxYield

18
distinct-values and order by
  • ( A unique, sorted list of all unique
  • ingredients in the recipe collection,
  • with URLS to link to the recipes. )
  • ltingredientsgt
  • for ingr in
  • distinct-values( collection('recipeml/docs.xml')/
  • recipeml/recipe/ingredients/ing/item )
  • order by ingr
  • return
  • ltitem name"ingr"gt
  • for doc in
  • collection('recipeml/docs.xml')
  • where doc/recipeml/recipe/
  • ingredients/ing/item ingr

19
distinct-values and order by, continued
  • return
  • lttitle url"document-uri(doc)"gt
  • doc/recipeml/recipe/head/title/ text()
  • lt/titlegt
  • lt/itemgt
  • lt/ingredientsgt

20
Excerpt from output
  • ltingredientsgt
  • lt!-- some item elements removed --gt
  • ltitem name" (12-oz) tomato paste "gt
  • lttitle url"file/C/dat/recipeml/
  • _Best_Ever__Pizza_Sauce.xml"gt
  • "Best Ever" Pizza Saucelt/titlegt
  • lt/itemgt
  • ltitem name" Baking Powder"gt
  • lttitle url"file/c/dat/recipeml/
  • _Blondie__Brownies.xml"gt
  • "Blondie" Brownieslt/titlegt
  • lttitle url"file/c/dat/recipeml/
  • Walnut_Pound_Cake.xml"gt
  • Walnut Pound Cakelt/titlegt
  • lt/itemgt
  • ltitem name" Baking Soda "gt
  • lttitle url"file/c/dat/recipeml/
  • _Faux__Sourdough.xml"gt
  • "Faux" Sourdoughlt/titlegt
  • "Gold Room" Sconeslt/titlegt
  • lttitle url"file/c/dat/recipeml/
  • _Outrageous_Chocolate_Chipper.xml"gt
  • "Outrageous" Chocolate-Oatmeal Chipper
  • (Cookilt/titlegt
  • lt/itemgt
  • ltitem name"Baking soda"gt
  • lttitle url"file/c/dat/recipeml/
  • _First__Ginger_Cookies.xml"gt
  • "First" Ginger Molasses Cookieslt/titlegt
  • lttitle url"file/c/dat/recipeml/
  • _Foot_in_the_Cake.xml"gt
  • "Foot in the Fire" Chocolate Cakelt/titlegt
  • lt/itemgt
  • ltitem name"Tomato paste"gt
  • lttitle url"file/C/dat/recipeml/
  • Crawfish_Etouff'ee.xml"gt
  • "Frank's Place" Crawfish Etouff'ee
  • lt/titlegt

21
RecipeML varying markup richness
  • One way to do it
  • ltinggtltitemgt
  • (12-oz) tomato paste
  • lt/itemgtlt/inggt
  • Another way
  • ltinggt
  • ltamtgt
  • ltqtygt12lt/qtygt
  • ltunitgtozlt/unitgt
  • lt/amtgt
  • ltitemgttomato pastelt/itemgt
  • lt/inggt

22
Normalizing data with declared functions
  • ( A unique, sorted list of all unique
    ingredients in
  • the recipe collection, with URLs to link to
    them.
  • Ingredient names get normalized by functions
  • declared in the query prolog. )
  • declare namespace sn "http//www.snee.com/ns/mis
    c/"
  • declare function snnormIngName(ingName) as
    xsstring
  • ( Normalize ingredient name. )
  • ( remove parenthesized expression that may
    begin
  • string, e.g. in "(10 ozs) Rotel diced
    tomatoes")
  • let normedName replace(ingName,"\(.?\)\s
    ","")
  • ( convert to all lower-case )
  • let normedName lower-case(normedName)
  • ( replace multiple spaces with a
  • single one )
  • let normedName normalize-space(normedName)
  • return normedName

23
Normalizing data with functions, part 2 of 3
  • declare function snnormIngList(ingList) as
    item()
  • ( Normalize a list of ingredient names. )
  • for ingName in ingList
  • return snnormIngName(ingName)
  • ltingredientsgt
  • let normIngNames
  • snnormIngList(collection('recipeml/docs.xml')//
  • ing/item)

24
Normalizing data with functions, part 3 of 3
  • for ingr in distinct-values(normIngNames)
  • order by ingr
  • return
  • ltitem name"ingr"gt
  • for doc in
  • collection('recipeml/docs.xml'),
  • i in doc/recipeml/recipe/ingredients/ing
    /item
  • where snnormIngName(i) ingr
  • return
  • lttitle url"document-uri(doc)"gt
  • doc/recipeml/recipe/head/title/text()
  • lt/titlegt
  • lt/itemgt
  • lt/ingredientsgt

25
Specs at http//www.w3.org/tr
  • XQuery 1.0 An XML Query Language
  • XQuery 1.0 and XPath 2.0 Formal Semantics
  • the XQuery 1.0 and XPath 2.0 Data Model
  • XSLT 2.0 and XQuery 1.0 Serialization
  • XQuery 1.0 and XPath 2.0 Functions and Operators
  • XML Query Use Cases

26
Other resources
  • eXist http//www.exist-db.org
  • httpww/w3.org/TR
  • MarkLogic http//www.marklogic.com
  • Mike Kay Comparing XSLT and XQuery
    http//idealliance.org/proceedings/xtech05/papers/
    02-03-01/
  • httpww/w3.org/TR
  • XQuery Update Requirements
  • XQuery 1.0 and XPath 2.0 Full-Text
Write a Comment
User Comments (0)
About PowerShow.com