Title: XML Basic 2 XPath and XQuery
1XML Basic 2XPath and XQuery
- 2002, 10, 14
- Kim, Jung Hoon
2XPath(XML Path Language)
3Contents
- Introduction
- Basics
- Location Paths
- Expressions
- Core Function Library
- Data Model
4Introduction
- Language for addressing parts of an XML document,
designed to be used by XSLT and XPointer - Building blocks for other W3C standards
- W3C Recommendation 16 November 1999
- Version 1.0
- Features
- Addresses parts of an XML document
- A common syntax and semantics for functionality
- XSLT, Xpointer
- Manipulation of strings, numbers, booleans
- Models an XML document as a tree of nodes
- Operate on the abstract, logical structure of an
XML document - Fully supports XML Namespace
5Basics
- Node type for XML document tree
- Root node
- Element node
- Attribute node
- Namespace node
- Processing-instruction node
- Comment node
- Text node
6Basics
- Expression
- Primary syntactic construct in XPath
- Four basic types
- Node-set
- An unordered collection of nodes without
duplicates - Boolean
- True of false
- Number
- Floating-point number
- String
- a sequence of UCS characters
7Basics
- Context consists of
- A node(context node)
- A pair of non-zero positive integers(context
position and context size) - A set of variable bindings
- Mappings from variable names to variable values
- A function library
- Mappings from function names to functions
- The set of namespace declarations in scope for
the expression
8Location Path
- Most important expression in XPath
- Selects a set of nodes relative to the context
node - Consists of location steps
- Location step axis node test optional
predicate - Provides syntactic abbreviations
- / document root
- document element? parent
9Location Path
- Relative location path
- One or more location steps separated by /
- Location steps are composed together from left to
right - Each node in the resulting set of one step
- Used as a context node for the following step
- Absolute location path
- / a relative location path(optional)
- Location Step
- Axis Node test Predicates
- ex) child NAME position()5
10Location Step
- Axes
- Specifies the tree relationship between the node
selected by location step and the context node
Ancestor-or-self
Document order
ancestor
ancestor
preceding
following
reverse
parent
following
preceding
self(context node)
following-sibling
Preceding-sibling
forward
child
child
descendant
descendant
Descendant-or-self
11Node Test
- Specifies the node type and expanded-name of the
node - Every axis has principal node type
- Attribute axis attribute
- Namespace axis namespace
- Other axes element
- Node Test
- true for any node of PNT
- text() true for any text node
- comment() true for any comment node
- processing-instruction() true for any processing
instruction - node() true for any node of any type
- QName true for nodes which is PNT and has the
same expanded-name as QName
12Predicates
- Filters a node-set w.r.t. an axis to produce a
new node-set - Expr
- Expr is evaluated for each node to be filtered
- If true, the node is include in the new node-set
- para3 is equivalent to paraposition()3
13Location Path Examples
- /
- Selects the document root node
- /descendantpara
- Selects all the para elements in the same
document - childparaposition()1
- Selects the first para child of the context node
- preceding-siblingchapterposition()1
- Selects the previous chapter sibling of the
context node - childparaattributetypewarningposition()
5 - Selects the 5th para child of the context node
that has a type attribute with value warning
14Abbreviated Syntax
- Location Step
- child can be omitted from a location step -
Default axis - Ex) childdiv/childpara ? div/para
- attribute ? _at_
- Ex) childparaattributetypewarning ?
para_at_typewarning - /descendant-or-selfnode()/ ? //
- Ex) /descendant-or-selfnode()/childpara ?
//para - selfnode() ? .
- Ex) selfnode()/descendant-or-selfnode()/child
para ? .//para - parentnode() ? ..
- Ex) parentnode()/childtitle ? ../title
- //para1, /descendantpara1
15Other Expressions
- Function calls
- FunctionName(Argument1, Argument2, )
- Node-sets
- Location path
- operator
- Computes the union of its operands(node-sets)
- Booleans
- or, and, , !, lt, lt, gt
- lt, lt must be quoted like lt, lt in XML
documents - Left associative
- 3 gt 2 gt 1 false
16Other Expressions
- Numbers
- Floating-point number
- Double-precision 64-bit format
- NaN(Not-a-Number) value
- Positive, negative infinity
- Positive, negative zero
- , -, div, mod
- - operation typically needs to be preceded by
whitespace - foo-bar, foo - bar
- Strings
- A sequence of zero or more characters
- A character in XPath
- A single Unicode abstract character
- May be a pair of 16-bit Unicode code values
17Core Function Library
- XPath implementations must always include
- Node set functions
- number last()
- number position()
- number count(node-set)
- node-set id(object)
- Ex) id(foo)
- string local-name(node-set?)
- string namespace-uri(node-set?)
- string name(node-set?)
18Core Function Library
- String Functions
- string string(object?)
- string concat(string, string, string)
- boolean starts-with(string, string)
- boolean contains(string, string)
- string substring-before(string, string)
- string substring-after(string, string)
- string substring (string, number, number?)
- number string-length(string?)
- string normalize-space(string?)
- string translate(string, string, string)
19Core Function Library
- Boolean functions
- boolean boolean(bobject)
- boolean not(boolean)
- boolean true()
- boolean false()
- boolean lang(string)
20Core Function Library
- Number functions
- number number(object?)
- number sum(node-set)
- number floor(number)
- number ceiling(number)
- number round(number)
21Data Model
- How XPath models an XML document as a tree?
- Node type
- Root nodes
- Element nodes
- Text nodes
- Attribute nodes
- Namespace nodes
- Processing instruction nodes
- Comment nodes
- string-value
- expanded-name local part namespace URI
22Data Model
- Root node
- A element node for the document element for child
- Processing instruction, comment nodes for
children - string-value
- Concatenation of the string-values of all text
node descendants - No expanded-name
23Data Model
- Element node
- Element, comment, processing instruction, text
node for children - Unique IDs
- Attributes declared in DTD as type ID
- string-value
- Concatenation of the string-values of all text
node descendants - Has expanded-name
24Data Model
- Attribute node
- Each element node has an associated set of
attribute nodes - The element is the parent of each of the
attribute nodes - Not a child of its parent element
- tests whether two nodes have the same value
- Not whether they are the same node
- A defaulted attribute is treated the same as a
specified attribute - string-value
- Normalized value
- Has expanded-name
25Data Model
- Namespace node
- Each element has an associated set of namespace
nodes - For each distinct namespace prefix in scope for
the element - One for the default namespace if in scope for the
element - Not a child of its parent element
- string-value
- Namespace URI
- Has expanded-name
- Namespace URI null
26Data Model
- Processing instruction node
- For every processing instruction
- string-value
- Part of the processing instruction following the
target - Has expanded-name
- Local part processing instructions target
- Namespace URI null
- XML declaration is not a processing instruction
27Data Model
- Comment node
- string-value
- Content of the comment
- No expanded-name
- Text Nodes
- Character data
- Never have an immediately following/preceding
sibling text node - string-value
- Contents
- No expanded-name
28References and Resources
- http//www.w3c.org/TR/xpath
- http//www.w3c.org/TR/xpath20
- http//www.w3.org/TR/REC-xml-names
- http//www.w3.org/TR/WD-xptr
- http//www.w3.org/TR/xslt
29XQuery(XML Query Language)
- 2002, 10, 14
- ???
- DB Lab. EECS Dept. KAIST
30Contents
- Introduction
- Basics
- Path Expressions
- Constructors
- FLWR Expressions
- Order-Related Expressions
- Conditional Expressions
- Quantified Expressions
- Types
- Query Prolog
31History
- Dec, 98 W3C sponsors workshop on XML Query
- Oct, 99 W3C charters XML Query working group
- 2000 Working Group publishes requirements, use
cases, data model - June, 2000 Quilt proposal presented at WebDB
- Feb. 2001 First working draft of XQuery language
- Currently
- XQuery 1.0
- W3C Working Draft
32Introduction
- Query language
- Derived from Quilt
- Borrowed Feature
- XPath, XQL
- Path expression syntax
- XML-QL
- Binding variables
- SQL
- A series of clauses based on keywords
- Provide a pattern for restructuring data
- OQL
- Functional language
33Basics
- Namespace prefixes to be used in this lecture
- xs
- http//www.w3.org/2001/XMLSchema
- xsi
- http//www.w3.org/2001/XMLSchema-instance
- xf
- http//www.w3.org/2002/08/xquery-functions
34Basics
- Query data model
- A value
- Error value
- An ordered sequence of zero or more items
- An item
- A Node
- An atomic value
- Kinds of nodes
- Document, Element, Attribute, Text, Comment,
Processing instruction, Namespace
35Basics
- Facts about values
- No distinction between an item and a sequence of
length one - No null value
- A sequence can be empty
- Sequences can contain heterogeneous values
- All sequences are ordered
36Basics
lt?xml version 1.0?gt lt! Requires one trained
person --gt ltprocedure title Removing a light
bulbgt lttime unit secgt15lt/timegt
ltstepgtGrip bulb.lt/stepgt ltstepgt Rotate
it ltwarninggtslowlylt/warninggt
counterclockwise. lt/stepgt lt/proceduregt
37Basics
- Data model representation
D
procedure
P
C
E
A
title Removing a light bulb
time
step
step
E
E
unit sec
E
A
warning
T
T
T
E
T
Grip bulb.
Rotate it
counterclockwise.
15
T
slowly
38Basics
- Facts about nodes
- Nodes have identity
- Cf) atomic values dont
- Element and attribute nodes have a type
annotation - Generated by validating the node
- May be a complex type
- Type may be unknown
- xsanyType for element node
- xsanySimpleType for attribute node
- Each node has a typed value
- A sequence of atomic values(or ERROR)
- Type may be unknown
- xsanySimpleType
- There is a document order among nodes
39Basics
- General XQuery rules
- XQuery is a case-sensitive language
- Keywords are in lower-case
- XQuery is a functional language
- XQuery is a strongly-typed language
- Every expression has a value and no side effects
- Expressions propagate the error value
- Exception and, or quantifiers have early-out
semantics
40Basics
- Functions
- Function calls
- Three-argument-function (1, 2, 3)
- Two-argument-function (1, (2, 3))
- Functions are not overloaded(except certain
built-ins) - Evaluating a function call
- Convert arguments to expected types and bind
parameters - Evaluate function body
- Convert result to expected result type
41Path Expressions
- Inherited from XPath 1.0
- A path always returns a sequence of distinct
nodes in document order - A path consists of a series of steps E1/E2/E3
- Each step can be any expression that returns a
sequence of nodes - What E1/E2 means
- Evaluate E1-it must be a set of nodes
- For each node N in E1, evaluate E2 with N as
context node - Union together all the E2-values
- Eliminate duplicate node-ids and sort in document
order
42Path Expressions
- Axis step
- A frequently-used kind of step
- Maps a node into a sequence of related nodes
- An axis step has three parts
- The axis(defines the direction of movement)
- The node test(qualifies by name or kind of node)
- Zero or more predicates
- Ex)childproductprice gt 100
- Axis steps often use an abbreviated syntax
- Productprice gt 100
43Axes
- XPath axes
- Forward axes
- child
- descendant
- attribute
- self
- descendant-or-self
- following-sibling
- Following
- Namespace
- Reverse axes
- parent
- ancestor
- preceding-sibling
- preceding
- ancestor-or-self
- XQuery axes
- Forward axes
- child
- descendant
- attribute
- self
- descendant-or-self
- Reverse axes
- parent
44Predicates
- Serve as a filter on a sequence
- What E1E2 means
- For each item e in the value of E1, evaluate E2
with - Context item e
- Context position position of e within the value
of E1 - Retain those items in E1 for which the predicate
truth value of E2 is true - The predicate truth value of an expression E
- If E has a boolean value use that value
- Ex) empssalary gt 5000
- If E has a numeric value TRUE if e is equal to
the context position, otherwise FALSE - Ex) emps5
- If E is an empty sequence FALSE
- If E is a non-empty node sequence TRUE
- Ex) empsseceretary
- Otherwise, return an error
45Expressions
- Combining sequences union intersect except
- Return sequences of distinct nodes in document
order - Arithmetic operators - div mod
- Extract typed values from node
- Cast xsanySimpleType to double
- Promote numeric operands to a common type
- Multiple values error
- If operand is (), return ()
- Arithmetic supported for numeric and date/time
types
46Expressions
- Comparison operators
- eq ne gt ge lt le
- Compare single atomic values
- ! gt gt lt lt
- Compare sequences of values, with existential
semantics - is isnot
- Compare two nodes, based on node identity
- ltlt gtgt precedes follows
- Compare two nodes, based on document order
47Expressions
- Logical expressions
- Operators and or
- Function not ( )
- Return TRUE or FALSE (2-valued logic)
- Result depends on effective boolean value of
operands - If operand is of type boolean, it serves as its
own EBV - If operand is (), EBV is FALSE
- If operand is a non-empty node sequence, EBV is
TRUE - In any other case, return an error
- Early-out semantics
48Element Constructors
- To construct an element with a known name and
content, use XML syntax - If the content of an element or attribute must be
computed, use a nested expression enclosed in
ltbook isbn12345gt lttitlegtHuckleberry
Finnlt/titlegt lt/bookgt
ltbook isbnxgt b/title lt/bookgt
49Computed Constructor
- An alternative way to create nodes
- If both the name and the content must be
computed - Examples
element name-expr content-expr attribute
name-expr content-expr document
content-expr
element book attribute isbn
isbn-0060229357 , element author
element first Crockett , element last
Johnson
element xfnode-name(e) 2 xfdata(e)
50Whitespace in Constructors
- Boundary whitespace
- Occur in the boundaries between tags and/or
enclosed expressions - Declare xmlspace in the Query Prolog
- strip or not declared boundary whitespace is not
preserved - preserve boundary whitespace is preserved
- Examples for whitespace
ltagt abc lt/agt whitespace surrounding
abc is boundary whitespace ltagt z abclt/agt
whitespace surrounding the z is not boundary
whitespace ltagtx20abclt/agt character
reference such as x20 is not boundary
whitespace ltagt lt/agt whitespace
generated by enclosed expression is not boundary
whitespace
51Other Constructors
- Processing instruction
- Comments
- XQuery comments -- --
- CDATA sections
lt?format roleoutput ?gt
lt! Tags are ignored in the following section --gt
lt!CDATA ltaddressgt123 Roosevelt Ave.
Flushing, NY 11368lt/addressgt gt
52FLWR Expressions
- A FLWR expression binds some variables, applies a
predicate, constructs a new result - FOR, LET clauses
- Generate a list of tuples of bound variables,
preserving document order - WHERE clause
- Applies a predicate, eliminating some of the
tuples - RETURN clause
- Is executed for each surviving tuple, generating
an ordered list of outputs
for var in expr
return expr
let var expr
where expr
53FLWR Expressions
- let clause
- Each variable is bound directly to the result of
an expression - Example Query
- Result
let s (ltone/gt, lttwo/gt, ltthree/gt) return
ltoutgtslt/outgt
ltoutgt ltone/gt lttwo/gt ltthree/gt lt/outgt
54FLWR Expressions
- for clause
- Create tuples of variable boundings
- Cartesian product of the sequences of values
- Examples
ltoutgt ltone/gt lt/outgt ltoutgt
lttwo/gt lt/outgt ltoutgt ltthree/gt lt/outgt
lttuplegt ltigt1lt/igt ltjgt3lt/jgt lt/tuplegt lttuplegt
ltigt1lt/igt ltjgt4lt/jgt lt/tuplegt lttuplegt ltigt2lt/igt
ltjgt3lt/jgt lt/tuplegt lttuplegt ltigt2lt/igt
ltjgt4lt/jgt lt/tuplegt
for s (ltone/gt, lttwo/gt, ltthree/gt) return
ltoutgtslt/outgt
for xsinteger i in (1, 2), xsinteger j in
(3, 4) return lttuplegt ltigt i lt/igt
ltjgt j lt/jgt lt/tuplegt
55FLWR Expressions
ltbibgt ltbookgt lttitlegtTCP/IP
Illustratedlt/titlegt ltauthorgtW.
Stevenslt/authorgt ltpublishergtAddison-Wesleylt/
publishergt lt/bookgt ltbookgt
lttitlegtAdvanced Programming in the Unix
environmentlt/titlegt ltauthorgtW.
Stevenslt/authorgt ltpublishergtAddison-Wesleylt/
publishergt lt/bookgt lt/bibgt
56FLWR Expressions
ltauthlistgt let input document("bib.xml")
for a in distinct-values(input//author)
return ltauthorgt ltnamegt
a/text() lt/namegt, ltbooksgt
for b in input//book where
b/author a return b/title
lt/booksgt lt/authorgt lt/authlistgt
57FLWR Expressions
ltauthlistgt ltauthorgt ltnamegtW.
Stevenslt/namegt ltbooksgt lttitlegtTCP/IP
Illustratedlt/titlegt lttitlegtAdvanced
Programming in the Unix environmentlt/titlegt
lt/booksgt lt/authorgt lt/authlistgt
58Sort Expressions
- expr1 sortby expr2,
- For each item I in expr1, expr2 is evaluated with
I as focus - Resulting values used to reorder the items in E1
- Value of expr1 input sequence
- expr2, ordering expressions
- Result output sequence
- Contains all input items, but possibly in a
different order - Examples
//bookprice gt 100 sort by (author1, title)
data(//book/author) sort by (.)
59Unordered Expressions
- unordered keyword
- A prefix to any expression
- Indicate that its order is not significant
- Optimization
60Conditional Expressions
- if (expr1) then expr2 else expr3
- expr1 test expression
- expr2 then-expression
- expr3 else-expression
- if effective boolean value of expr1 is
- TRUE value of expr2 is returned
- FALSE value of expr3 is returned
- Examples
if (widget1/unit-cost lt widget2/unit-cost)
then widget1 else widget2
if (part/_at_discounted) then part/wholesale
else part/retail
61Quantified Expressions
- some/every var1 in expr1, var2 in expr2,
satisfies expr3 - expr1, expr2, expressions for bind variables
- expr3 a test expression
- some
- TRUE if at least one evaluation of the test
expression has EBV true - FALSE if the in-clauses generate zero
binding-tuples - every
- TRUE if every evaluation of the test expression
has the EBV true - TRUE if the in-clauses generate zero
binding-tuples - Allow early-out for errors
- Example
some x in (1, 2, 3), xsinteger y in (2, 3,
4) satisfies x y 4
62SequenceType
- Describes the type of an XQuery value(sequence)
- For any item
- item
- For atomic values
- atomic value
- A named atomic value type
- untyped
- For nodes
- node, text, processing-instruction, comment,
document, element, attribute
63SequenceType
64validate Expression
- validate expr
- Evaluate expr, then serialize its value as an XML
string and invoke the schema validator on it - Elements and attributes that are recognized by
the validator receive type annotations - ltagt5lt/agt has annotation xsanyType
- validate ltagt5lt/agt might have annotation
hatsize
65Testing Types
- instance of
- expr instance of ST
- True if the value of expr matches the type name
in ST(SequenceType) - typeswitch
- typeswitch(opexpr) case ST1 return expr1 case ST2
return expr2 default return defaultexpr - ST1, ST2, SequenceType
- Executes one branch, based on the type of its
operand
66Tinkering with Types
- cast as ST (expr)
- Converts value to target type
- Only for predefined type pairs and derived -gt
base type - May return error at run-time
- treat as ST (expr)
- Serves as a compile-time promise
- At run-time, returns an error if type of expr is
not ST - Ex) treat as element of type USAddress
(myaddress) - assert as ST (expr)
- Servers as a compile-time assertion
- Compile-time error if static type of expr is not
ST - Ex) assert as PurchaseOrder (query)
67Structure of an XQuery
- Query Query Prolog Query Body
- Query Prolog
- Declarations, definitions
- Create the environment for query processing
- Namespace declarations
- Schema imports
- An xmlspace declaration
- A default collation
- Function definitions
- Query Body
- A sequence of expressions
- Define the result of the query
68Namespace Declarations
- Defines a namespace prefix, associates it with a
namespace URI - Examples
declare namespace foo http//example.org ltfoo
bargtLentilsltfoobargt
-- Error multiple declarations of namespace
-- declare namespace xx http//example.org/foo
declare namespace xx http//example.org/bar
69Namespace Declarations
- Default namespace declarations
- default element namespace
- default function namespace
- No default element namespace
- Unqualified name of elements and types
- In no namespace
- No default function namespace
- Unqualified name of functions
- In the namespace of XPath/XQuery functions
- Predefined namespace prefixes
- xml http//www.w3.org/XML/1998/namespace
- xs http//www.w3.org/2001/XMLSchema
- xsd http//www.w3.org/2001/XMLSchema-datatypes
- xsi http//www.w3.org/2001/XMLSchema-instance
70Schema Imports
- Imports the element, attribute declarations and
type definitions from a schema - No effect on the in-scope namespaces
- Example
import schema http//www.w3.org/1999/xhtml
at http//example.org/xhtml/xhtml.xsd declare
namespace xhtml http//www.w3.org/1999/xhtml d
ocument(aspect.html)//xhtmltable
71xmlspace Declarations
- Controls whether boundary whitespace is preserved
- Example
declare xmlspace preserved
72Default Collation
- Used by all functions and operators if no other
collations is specified - Identified by a URI
- Example
default collation http//example.org/languages/
Icelandic
73Function Definitions
- Define functions of ones own
- Default parameter/return type
- xsanyType
- returns clause can be omitted
- Example
define function summary(element employee emps)
returns element dept expr
74References and Resources
- http//www.w3c.org/TR/xquery
- http//www.w3.org/TR/query-datamodel
- http//www.w3.org/TR/xquery-operators
- http//www.w3.org/TR/query-semantics
- http//www.w3c.org/XML/Query
- http//www.w3.org/XML/Schema
- http//www.w3.org/TR/REC-xml-names
- Progress Report on XQuery, Don Chamberlin
- http//www.almaden.ibm.com/cs/people/chamberlin/