Title: XML, XML Schema, Xpath and XQuery
1XML, XML Schema, Xpath and XQuery
Slides collated from various sources, many from
Dan Suciu at Univ. of Washington
2XML
- W3C standard to complement HTML
- origins structured text SGML
- motivation
- HTML describes presentation
- XML describes content
-
- http//www.w3.org/TR/2000/REC-xml-20001006
(version 2, 10/2000)
3From HTML to XML
HTML describes the presentation
4HTML
- lth1gt Bibliography lt/h1gt
- ltpgt ltigt Foundations of Databases lt/igt
- Abiteboul, Hull, Vianu
- ltbrgt Addison Wesley, 1995
- ltpgt ltigt Data on the Web lt/igt
- Abiteboul, Buneman, Suciu
- ltbrgt Morgan Kaufmann, 1999
5XML
- ltbibliographygt
- ltbookgt lttitlegt Foundations lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
- ltauthorgt Hull lt/authorgt
- ltauthorgt Vianu lt/authorgt
- ltpublishergt Addison Wesley
lt/publishergt - ltyeargt 1995 lt/yeargt
- lt/bookgt
-
- lt/bibliographygt
XML describes the content
6XML Terminology
- tags book, title, author,
- start tag ltbookgt, end tag lt/bookgt
- elements ltbookgtltbookgt,ltauthorgtlt/authorgt
- elements are nested
- empty element ltredgtlt/redgt abbrv. ltred/gt
- an XML document single root element
well formed XML document if it has matching tags
7More XML Attributes
- ltbook price 55 currency USDgt
- lttitlegt Foundations of Databases lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
-
- ltyeargt 1995 lt/yeargt
- lt/bookgt
attributes are alternative ways to represent data
8More XML Oids and References
- ltperson ido555gt ltnamegt Jane lt/namegt lt/persongt
- ltperson ido456gt ltnamegt Mary lt/namegt
- ltchildren
idrefo123 o555/gt - lt/persongt
- ltperson ido123 mothero456gtltnamegtJohnlt/namegt
- lt/persongt
oids and references in XML are just syntax
9XML Namespaces
- http//www.w3.org/TR/REC-xml-names (1/99)
- name prefixlocalpart
ltbook xmlnsisbnwww.isbn-org.org/defgt
lttitlegt lt/titlegt ltnumbergt 15 lt/numbergt
ltisbnnumbergt . lt/isbnnumbergt lt/bookgt
10XML Namespaces
- syntactic ltnumbergt , ltisbnnumbergt
- semantic provide URL for schema
lttag xmlnsmystyle http//gt
ltmystyletitlegt
lt/mystyletitlegt ltmystylenumbergt
lt/taggt
11XML Data Model
- Several competing models
- Document Object Model (DOM)
- http//www.w3.org/TR/2001/WD-DOM-Level-3-CMLS-2001
0209/ (2/2001) - class hierarchy (node, element, attribute,)
- objects have behavior
- defines API to inspect/modify the document
- Infoset - PSV (post schema validation)
- XML Query data model
12XML Schemas
- http//www.w3.org/TR/xmlschema-1/10/2000
- generalizes DTDs
- uses XML syntax
- two documents structure and datatypes
- http//www.w3.org/TR/xmlschema-1
- http//www.w3.org/TR/xmlschema-2
- XML-Schema is complex
13XML Schemas
- ltxsdelement namepaper typepapertype/gt
- ltxsdcomplexType namepapertypegt
- ltxsdsequencegt
- ltxsdelement nametitle
typexsdstring/gt - ltxsdelement nameauthor
minOccurs0/gt - ltxsdelement nameyear/gt
- ltxsd choicegt lt xsdelement
namejournal/gt - ltxsdelement
nameconference/gt - lt/xsdchoicegt
- lt/xsdsequencegt
- lt/xsdelementgt
DTD lt!ELEMENT paper (title,author,year,
(journalconference))gt
14Elements v.s. Types in XML Schema
ltxsdelement namepersongt ltxsdcomplexTypegt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegt
lt/xsdcomplexTypegtlt/xsdelementgt
ltxsdelement nameperson
typetttgtltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegtlt/xsdco
mplexTypegt
DTD lt!ELEMENT person (name,address)gt
15Elements v.s. Types in XML Schema
- Types
- Simple types (integers, strings, ...)
- Complex types (regular expressions, like in DTDs)
- Element-type-element alternation
- Root element has a complex type
- That type is a regular expression of elements
- Those elements have their complex types...
- ...
- On the leaves we have simple types
16Local and Global Types in XML Schema
- Local type
- ltxsdelement namepersongt
define locally the persons type
lt/xsdelementgt - Global type ltxsdelement nameperson
typettt/gt ltxsdcomplexType nametttgt
define here the type ttt
lt/xsdcomplexTypegt
Global types can be reused in other elements
17Local v.s. Global Elements inXML Schema
- Local element
- ltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement
nameaddress type.../gt...
lt/xsdsequencegt lt/xsdcomplexTypegt - Global element ltxsdelement nameaddress
type.../gt ltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement
refaddress/gt ... lt/xsdsequencegt
lt/xsdcomplexTypegt
Global elements like in DTDs
18Regular Expressions in XML Schema
- Recall the element-type-element alternation
- ltxsdcomplexType name....gt
regular expression on
elements lt/xsdcomplexTypegt - Regular expressions
- ltxsdsequencegt A B C lt/...gt
A B C - ltxsdchoicegt A B C lt/...gt
A B C - ltxsdgroupgt A B C lt/...gt
(A B C) - ltxsd... minOccurs0 maxOccursunboundedgt
..lt/...gt (...) - ltxsd... minOccurs0 maxOccurs1gt ..lt/...gt
(...)?
19Attributes in XML Schema
ltxsdelement namepaper typepapertype/gt ltxsd
complexType namepapertypegt
ltxsdsequencegt ltxsdelement
nametitle typexsdstring/gt . .
. . . . lt/xsdsequencegt ltxsdattribute
namelanguage" type"xsdNMTOKEN"
fixedEnglish"/gt lt/xsdcomplexTypegt
Attributes are associated to the type, not to the
element Only to complex types more trouble if we
want to add attributes to simple types.
20Derived Types by Extensions
ltcomplexType name"Address"gt ltsequencegt
ltelement name"street" type"string"/gt
ltelement name"city"
type"string"/gt lt/sequencegt lt/complexTypegt
ltcomplexType name"USAddress"gt
ltcomplexContentgt ltextension
base"ipoAddress"gt ltsequencegt ltelement
name"state" type"ipoUSState"/gt
ltelement name"zip"
type"positiveInteger"/gt lt/sequencegt
lt/extensiongt lt/complexContentgt lt/complexTypegt
Corresponds to inheritance
21Keys in XML Schema
XML
ltpurchaseReportgt ltregionsgt ltzip code"95819"gt
ltpart number"872-AA" quantity"1"/gt ltpart
number"926-AA" quantity"1"/gt ltpart
number"833-AA" quantity"1"/gt ltpart
number"455-BX" quantity"1"/gt lt/zipgt ltzip
code"63143"gt ltpart number"455-BX"
quantity"4"/gt lt/zipgt lt/regionsgt ltpartsgt
ltpart number"872-AA"gtLawnmowerlt/partgt ltpart
number"926-AA"gtBaby Monitorlt/partgt ltpart
number"833-AA"gtLapis Necklacelt/partgt ltpart
number"455-BX"gtSturdy Shelveslt/partgt
lt/partsgt lt/purchaseReportgt
XML Schema
ltkey name"NumKey"gt ltselector
xpath"parts/part"/gt ltfield xpath"_at_number"/gt lt
/keygt
22Keys in XML Schema
ltkey namesomeDummyNameHere"gt ltselector
xpathp"/gt ltfield xpathp1"/gt ltfield
xpathp2"/gt . . . ltfield
xpathpk"/gt lt/keygt
ltunique namesomeDummyNameHere"gt ltselector
xpathp"/gt ltfield xpathp1"/gt ltfield
xpathp2"/gt . . . ltfield
xpathpk"/gt lt/keygt
Note all Xpath expressions start at the
element currently being defined The fields must
identify a single node
23Keys in XML Schema
- Unique guarantees uniqueness
- Key guarantees uniqueness and existence
- All Xpath expressions are restricted
- /a/b /a/c OK for selector
- //a/b//c OK for field
- Note better than DTDs ID mechanism
24Keys in XML Schema
ltkey name"fullName"gt ltselector
xpath".//person"/gt ltfield xpath"forename"/gt
ltfield xpath"surname"/gt lt/keygt ltunique
name"nearlyID"gt ltselector xpath".//"/gt
ltfield xpath"_at_id"/gt lt/uniquegt
Recall must have A single forename, Single
surname
25Foreign Keys in XML Schema
ltkeyref name"personRef" refer"fullName"gt
ltselector xpath".//personPointer"/gt ltfield
xpath"_at_first"/gt ltfield xpath"_at_last"/gt lt/keyrefgt
26XPATH
27XPath
- Goal permit to access some nodes from document
- XPath main construct axis navigation
- XPath path consists of one or more navigation
steps, separated by / - Navigation step axis node-test predicates
- Examples
- /descendantnode()/childauthor
- /descendantnode()/childauthorparent/attribute
booktitle XML2 - XPath also offers shortcuts
- no axis means child
- // º /descendant-or-selfnode()/
28XPath- Child axis navigation
- author is shorthand for childauthor. Examples
- aaa -- all the child nodes labeled aaa (1,3)
- aaa/bbb -- all the bbb grandchildren of aaa
children (4) - /bbb all the bbb grandchildren of any child
(4,6) - . -- the context node
- / -- the root node
29XPath- child axis navigation
- /doc -- all the doc children of the root
- ./aaa -- all the aaa children of the context node
(equivalent to aaa) - text() -- all the text children of the context
node - node() -- all the children of the context node
(includes text and attribute nodes) - .. -- parent of the context node
- .// -- the context node and all its descendants
- // -- the root node and all its descendants
- //text() -- all the text nodes in the document
30Predicates
- 2 -- the second child node of the context node
- chapter5 -- the fifth chapter child of the
context node - last() -- the last child node of the context
node - chaptertitleintroduction -- the chapter
children of the context node that have one or
more title children whose string-value is
introduction (the string-value is the
concatenation of all the text on descendant text
nodes) - person.//firstname joe -- the person
children of the context node that have in their
descendants a firstname element with string-value
Joe
31Axis navigation
- So far, nearly all our expressions have moved us
down by moving to child nodes. Exceptions were - . -- stay where you are
- / go to the root
- // all descendants of the root
- .// all descendants of the context node
- XPath has several axes ancestor,
ancestor-or-self, attribute, child, descendant,
descendant-or-self, following, following-sibling,
namespace, parent, preceding, preceding-sibling,
self - Some of these (self, parent) describe single
nodes, others describe sequences of nodes.
32XPath Navigation Axes
ancestor
following-sibling
preceding-sibling
self
child
attribute
following
preceding
namespace
descendant
33XPath abbreviated syntax
(nothing) child _at_ attribute // /descendan
t-or-selfnode() . selfnode() .// descendan
t-or-selfnode .. parentnode() / (document
root)
34XPath
- Reasonably widely adopted -- in XML-Schema and
query languages. - Neither more expressive nor less expressive than
regular path expressions
35Query Languages - XQuery
36Summary of XQuery
- FLWR expressions
- FOR and LET expressions
- Collections and sorting
- Resources
- XQuery A Query Language for XML Chamberlin,
Florescu, et al. - W3C recommendation www.w3.org/TR/xquery/
37XQuery
- Based on Quilt (which is based on XML-QL)
- http//www.w3.org/TR/xquery/2/2001
- XML Query data model (ordered)
38FLWR (Flower) Expressions
- FOR ... LET... FOR... LET...
- WHERE...
- RETURN...
39XQuery
- Find all book titles published after 1995
FOR x IN document("bib.xml")/bib/book WHERE
x/year gt 1995 RETURN x/title
Result lttitlegt abc lt/titlegt lttitlegt def
lt/titlegt lttitlegt ghi lt/titlegt
40XQuery
- For each author of a book by Morgan Kaufmann,
list all books she published
FOR a IN distinct(document("bib.xml")
/bib/bookpublisherMorgan
Kaufmann/author) RETURN ltresultgt
a, FOR t IN
/bib/bookauthora/title
RETURN t lt/resultgt
distinct a function that eliminates duplicates
41XQuery
- Result
- ltresultgt
- ltauthorgtJoneslt/authorgt
- lttitlegt abc lt/titlegt
- lttitlegt def lt/titlegt
- lt/resultgt
- ltresultgt
- ltauthorgt Smith lt/authorgt
- lttitlegt ghi lt/titlegt
- lt/resultgt
42XQuery
- FOR x in expr -- binds x to each element in
the list expr - LET x expr -- binds x to the entire list
expr - Useful for common subexpressions and for
aggregations
43XQuery
ltbig_publishersgt FOR p IN
distinct(document("bib.xml")//publisher)
LET b document("bib.xml")/bookpublisher
p WHERE count(b) gt 100 RETURN
p lt/big_publishersgt
count a (aggregate) function that returns the
number of elms
44XQuery
- Find books whose price is larger than average
LET aavg(document("bib.xml")/bib/book/_at_price) FO
R b in document("bib.xml")/bib/book WHERE
b/_at_price gt a RETURN b
45XQuery
- Summary
- FOR-LET-WHERE-RETURN FLWR
FOR/LET Clauses
List of tuples
WHERE Clause
List of tuples
RETURN Clause
Instance of Xquery data model
46FOR v.s. LET
- FOR
- Binds node variables ? iteration
- LET
- Binds collection variables ? one value
47FOR v.s. LET
Returns ltresultgt ltbookgt...lt/bookgtlt/resultgt
ltresultgt ltbookgt...lt/bookgtlt/resultgt ltresultgt
ltbookgt...lt/bookgtlt/resultgt ...
FOR x IN document("bib.xml")/bib/book RETURN
ltresultgt x lt/resultgt
LET x document("bib.xml")/bib/book RETURN
ltresultgt x lt/resultgt
Returns ltresultgt ltbookgt...lt/bookgt
ltbookgt...lt/bookgt
ltbookgt...lt/bookgt ... lt/resultgt
48Collections in XQuery
- Ordered and unordered collections
- /bib/book/author an ordered collection
- Distinct(/bib/book/author) an unordered
collection - LET a /bib/book ? a is a collection
- b/author ? a collection (several authors...)
Returns ltresultgt ltauthorgt...lt/authorgt
ltauthorgt...lt/authorgt
ltauthorgt...lt/authorgt
... lt/resultgt
RETURN ltresultgt b/author lt/resultgt
49Sorting in XQuery
ltpublisher_listgt FOR p IN distinct(document("
bib.xml")//publisher) RETURN ltpublishergt
ltnamegt p/text() lt/namegt ,
FOR b IN document("bib.xml")//bookpublisher
p RETURN ltbookgt
b/title ,
b/_at_price
lt/bookgt SORTBY(price DESCENDING)
lt/publishergt SORTBY(name)
lt/publisher_listgt
50Sorting in XQuery
- Sorting arguments refer to name space of RETURN
clause, not FOR clause - To sort on an element you dont want to display,
first return it, then remove it with an
additional query.
51If-Then-Else
FOR h IN //holding RETURN ltholdinggt
h/title, IF
h/_at_type "Journal"
THEN h/editor ELSE
h/author lt/holdinggt SORTBY
(title)
52Existential Quantifiers
FOR b IN //book WHERE SOME p IN b//para
SATISFIES contains(p, "sailing") AND
contains(p, "windsurfing") RETURN b/title
53Universal Quantifiers
FOR b IN //book WHERE EVERY p IN b//para
SATISFIES contains(p, "sailing") RETURN
b/title