Title: Querying XML: XQuery and XSLT
1Querying XML XQuery and XSLT
- Zachary G. Ives
- University of Pennsylvania
- CIS 550 Database Information Systems
- October 21, 2009
Some slide content courtesy of Susan Davidson,
Dan Suciu, Raghu Ramakrishnan
2Reminders
- Homework 4 handed out
- XQuery and XPath
- Midterm on Monday
- Will have more detailed project specs next week
3Querying XML
- How do you query a directed graph? a tree?
- The standard approach used by many XML,
semistructured-data, and object query languages - Define some sort of a template describing
traversals from the root of the directed graph - In XML, the basis of this template is called an
XPath - In its simplest form, an XPath is like a path in
a file system - but there are more elaborate versions with
axes, predicates, etc.
4XML Data Model Visualized
attribute
root
p-i
element
Root
text
dblp
?xml
mastersthesis
article
mdate
mdate
key
key
author
title
year
school
2002
editor
title
year
journal
volume
ee
ee
2002
1992
1997
The
ms/Brown92
tr/dec/
PRPL
Digital
db/labs/dec
Univ.
Paul R.
Kurt P.
SRC
http//www.
5XQuery
- A strongly-typed, Turing-complete XML
manipulation language - Attempts to do static typechecking against XML
Schema - Based on an object model derived from Schema
- Unlike SQL, fully compositional, highly
orthogonal - Inputs outputs collections (sequences or bags)
of XML nodes - Anywhere a particular type of object may be used,
may use the results of a query of the same type - Designed mostly by DB and functional language
people - Attempts to satisfy the needs of data management
and document management - The database-style core is mostly complete (even
has support for NULLs in XML!!) - The document keyword querying features are still
in the works shows in the order-preserving
default model
6XQuerys Basic Form
- Has an analogous form to SQLs SELECT..FROM..WHERE
..GROUP BY..ORDER BY - The model bind nodes (or node sets) to
variables operate over each legal combination of
bindings produce a set of nodes - FLWOR statement note case sensitivity!
- for iterators that bind variables
- let collections
- where conditions
- order by order-conditions (older version was
SORTBY) - return output constructor
7Iterations in XQuery
- A series of (possibly nested) FOR statements
assigning the results of XPaths to variables - for root in document(http//my.org/my.xml)
- for sub in root/rootElement,
- sub2 in sub/subElement,
- Something like a template that pattern-matches,
produces a binding tuple - For each of these, we evaluate the WHERE and
possibly output the RETURN template - document() or doc() function specifies an input
file as a URI - Old version was document now doc but it
depends on your XQuery implementation
8Two XQuery Examples
- ltroot-taggt
- for p in document(dblp.xml)/dblp/proceedings,
- yr in p/yr
- where yr 1999
- return ltprocgt p lt/procgt
- lt/root-taggt
- for i in document(dblp.xml)/dblp/inproceedings
author/text() John Smith - return ltsmith-papergt
- lttitlegt i/title/text() lt/titlegt
- ltkeygt i/_at_key lt/keygt
- i/crossref
- lt/smith-papergt
9Nesting in XQuery
- Nesting XML trees is perhaps the most common
operation - In XQuery, its easy put a subquery in the
return clause where you want things to repeat! - for u in document(dblp.xml)/universities
- where u/country USA
- return ltms-theses-99gt
- u/title
- for mt in u/../mastersthesis
- where mt/year/text() 1999 and
____________ - return mt/title
- lt/ms-theses-99gt
10Collections Aggregation in XQuery
- In XQuery, many operations return collections
- XPaths, sub-XQueries, functions over these,
- The let clause assigns the results to a variable
- Aggregation simply applies a function over a
collection, where the function returns a value
(very elegant!) - let allpapers document(dblp.xml)/dblp/articl
e - return ltarticle-authorsgt
- ltcountgt fncount(fndistinct-values(allpapers/
authors)) lt/countgt - for paper in doc(dblp.xml)/dblp/article
- let pauth paper/author
- return ltpapergt paper/title
- ltcountgt fncount(pauth) lt/countgt
- lt/papergt
- lt/article-authorsgt
11Collections, Ctd.
- Unlike in SQL, we can compose aggregations and
create new collections from old - ltresultgt
- let avgItemsSold fnavg(for order in
document(my.xml)/orders/orderlet totalSold
fnsum(order/item/quantity)return
totalSold)return avgItemsSold - lt/resultgt
12Distinct-ness
- In XQuery, DISTINCT-ness happens as a function
over a collection - But since we have nodes, we can do duplicate
removal according to value or node - Can do fndistinct-values(collection) to remove
duplicate values, or fndistinct-nodes(collection)
to remove duplicate nodes - for years in fndistinct-values(doc(dblp.xml)//
year/text() - return years
13Sorting in XQuery
- SQL actually allows you to sort its output, with
a special ORDER BY clause (which we havent
discussed, but which specifies a sort key list) - XQuery borrows this idea
- In XQuery, what we order is the sequence of
result tuples output by the return clause - for x in document(dblp.xml)/proceedings
- order by x/title/text()
- return x
14What If Order Doesnt Matter?
- By default
- SQL is unordered
- XQuery is ordered everywhere!
- But unordered queries are much faster to answer
- XQuery has a way of telling the query engine to
avoid preserving order - unordered for x in (mypath)
15Querying Defining Metadata Cant Do This in
SQL
- Can get a nodes name by querying node-name()
- for x in document(dblp.xml)/dblp/
- return node-name(x)
- Can construct elements and attributes using
computed names - for x in document(dblp.xml)/dblp/,
- year in x/year,
- title in x/title/text(),
- element node-name(x)
- attribute year- year title
-
16XQuery Summary
- Very flexible and powerful language for XML
- Clean and orthogonal can always replace a
collection with an expression that creates
collections - DB and document-oriented (we hope)
- The core is relatively clean and easy to
understand - Turing Complete well talk more about XQuery
functions soon
17XSL(T) The Bridge Back to HTML
- XSL (XML Stylesheet Language) is actually divided
into two parts - XSLFO formatting for XML
- XSLT a special transformation language
- Well leave XSLFO for you to read off
www.w3.org, if youre interested - XSLT is actually able to convert from XML ? HTML,
which is how many people do their formatting
today - Products like Apache Cocoon generally translate
XML ? HTML on the server side
18A Different Style of Language
- XSLT is based on a series of templates that match
different parts of an XML document - Theres a policy for what rule or template is
applied if more than one matches (its not what
youd think!) - XSLT templates can invoke other templates
- XSLT templates can be nonterminating (beware!)
- XSLT templates are based on XPath matches, and
we can also apply other templates (potentially to
selected XPaths) - Within each template, we describe what should be
output - (Matches to text default to outputting it)
19An XSLT Stylesheet
- ltxslstylesheet version1.1gt
- ltxsltemplate match/dblpgt
- lthtmlgtltheadgtThis is DBLPlt/headgt
- ltbodygt
- ltxslapply-templates /gt
- lt/bodygt
- lt/htmlgt
- lt/xsltemplategt
- ltxsltemplate matchinproceedingsgt
- lth2gtltxslapply-templates selecttitle /gtlt/h2gt
- ltpgtltxslapply-templates selectauthor/gtlt/pgt
- lt/xsltemplategt
-
- lt/xslstylesheetgt
20Results of XSLT Stylesheet
- ltdblpgt
- ltinproceedingsgt
- lttitlegtPaper1lt/titlegt
- ltauthorgtSmithlt/authorgt
- lt/inproceedingsgt
- ltinproceedingsgt
- ltauthorgtChakrabartilt/authorgt
- ltauthorgtGraylt/authorgt
- lttitlegtPaper2lt/titlegt
- lt/inproceedingsgt
- lt/dblpgt
- lthtmlgtltheadgtThis Is DBLPlt/headgt
- ltbodygt
- lth2gtPaper1lt/h2gt
- ltpgtSmithlt/pgt
- lth2gtPaper2lt/h2gt
- ltpgtChakrabartilt/pgt
- ltpgtGraylt/pgt
- lt/bodygt
- lt/htmlgt
21What XSLT Can and Cant Do
- XSLT is great at converting XML to other formats
- XML ? diagrams in SVG HTML LaTeX
-
- XSLT doesnt do joins (well), it only works on
one XML file at a time, and its limited in
certain respects - Its not a query language, really
- But its a very good formatting language
- Most web browsers (post Netscape 4.7x) support
XSLT and XSL formatting objects - But most real implementations use XSLT with
something like Apache Cocoon - You may want to use XSL/XSLT for your projects
see www.w3.org/TR/xslt for the spec
22Querying XML
- Weve seen three XML manipulation formalisms
today - XPath the basic language for projecting and
selecting (evaluating path expressions and
predicates) over XML - XQuery a statically typed, Turing-complete XML
processing language - XSLT a template-based language for transforming
XML documents - Each is extremely useful for certain applications!
23Views in SQL and XQuery
- A view is a named query
- We use the name of the view to invoke the query
(treating it as if it were the relation it
returns) - SQL
- CREATE VIEW V(A,B,C) AS
- SELECT A,B,C FROM R WHERE R.A 123
- XQuerydeclare function V() as element(content)
- for r in doc(R)/root/tree,
- a in r/a, b in r/b, c in r/c
- where a 123
- return ltcontentgta, b, clt/contentgt
-
Using the views
SELECT FROM V, RWHERE V.B 5 AND V.C R.C
for v in V()/content, r in doc(r)/root/tree
where v/b r/breturn v
24Whats Useful about Views
- Providing security/access control
- We can assign users permissions on different
views - Can select or project so we only reveal what we
want! - Can be used as relations in other queries
- Allows the user to query things that make more
sense - Describe transformations from one schema (the
base relations) to another (the output of the
view) - The basis of converting from XML to relations or
vice versa - This will be incredibly useful in data
integration, discussed soon - Allow us to define recursive queries
25Materialized vs. Virtual Views
- A virtual view is a named query that is actually
re-computed every time it is merged with the
referencing query - CREATE VIEW V(A,B,C) AS
- SELECT A,B,C FROM R WHERE R.A 123
- A materialized view is one that is computed once
and its results are stored as a table - Think of this as a cached answer
- These are incredibly useful!
- Techniques exist for using materialized views to
answer other queries - Materialized views are the basis of relating
tables in different schemas
SELECT FROM V, RWHERE V.B 5 AND V.C R.C
26Views Should Stay Fresh
- Views (sometimes called intensional relations)
behave, from the perspective of a query language,
exactly like base relations (extensional
relations) - But theres an association that should be
maintained - If tuples change in the base relation, they
should change in the view (whether its
materialized or not) - If tuples change in the view, that should reflect
in the base relation(s)
27View Maintenance and the View Update Problem
- There exist algorithms to incrementally recompute
a materialized view when the base relations
change - We can try to propagate view changes to the base
relations - However, there are lots of views that arent
easily updatable - We can ensure views are updatable by enforcing
certain constraints (e.g., no aggregation),but
this limits the kinds of views we can have!
R
S
R?S
delete?
28Next Time
- Can we have views in XML over tables in
relations? - Or vice versa?
- What other things can we use views for