XML Basic 2 XPath and XQuery

1 / 74
About This Presentation
Title:

XML Basic 2 XPath and XQuery

Description:

June, 2000: Quilt proposal presented at WebDB. Feb. 2001: First working draft of XQuery language ... Derived from Quilt. Borrowed Feature. XPath, XQL. Path ... – PowerPoint PPT presentation

Number of Views:203
Avg rating:3.0/5.0
Slides: 75
Provided by: dbserver

less

Transcript and Presenter's Notes

Title: XML Basic 2 XPath and XQuery


1
XML Basic 2XPath and XQuery
  • 2002, 10, 14
  • Kim, Jung Hoon

2
XPath(XML Path Language)
3
Contents
  • Introduction
  • Basics
  • Location Paths
  • Expressions
  • Core Function Library
  • Data Model

4
Introduction
  • Language for addressing parts of an XML document,
    designed to be used by XSLT and XPointer
  • Building blocks for other W3C standards
  • W3C Recommendation 16 November 1999
  • Version 1.0
  • Features
  • Addresses parts of an XML document
  • A common syntax and semantics for functionality
  • XSLT, Xpointer
  • Manipulation of strings, numbers, booleans
  • Models an XML document as a tree of nodes
  • Operate on the abstract, logical structure of an
    XML document
  • Fully supports XML Namespace

5
Basics
  • Node type for XML document tree
  • Root node
  • Element node
  • Attribute node
  • Namespace node
  • Processing-instruction node
  • Comment node
  • Text node

6
Basics
  • Expression
  • Primary syntactic construct in XPath
  • Four basic types
  • Node-set
  • An unordered collection of nodes without
    duplicates
  • Boolean
  • True of false
  • Number
  • Floating-point number
  • String
  • a sequence of UCS characters

7
Basics
  • Context consists of
  • A node(context node)
  • A pair of non-zero positive integers(context
    position and context size)
  • A set of variable bindings
  • Mappings from variable names to variable values
  • A function library
  • Mappings from function names to functions
  • The set of namespace declarations in scope for
    the expression

8
Location Path
  • Most important expression in XPath
  • Selects a set of nodes relative to the context
    node
  • Consists of location steps
  • Location step axis node test optional
    predicate
  • Provides syntactic abbreviations
  • / document root
  • document element? parent

9
Location Path
  • Relative location path
  • One or more location steps separated by /
  • Location steps are composed together from left to
    right
  • Each node in the resulting set of one step
  • Used as a context node for the following step
  • Absolute location path
  • / a relative location path(optional)
  • Location Step
  • Axis Node test Predicates
  • ex) child NAME position()5

10
Location Step
  • Axes
  • Specifies the tree relationship between the node
    selected by location step and the context node

Ancestor-or-self
Document order
ancestor
ancestor
preceding
following
reverse
parent
following
preceding
self(context node)
following-sibling
Preceding-sibling
forward
child
child
descendant
descendant
Descendant-or-self
11
Node Test
  • Specifies the node type and expanded-name of the
    node
  • Every axis has principal node type
  • Attribute axis attribute
  • Namespace axis namespace
  • Other axes element
  • Node Test
  • true for any node of PNT
  • text() true for any text node
  • comment() true for any comment node
  • processing-instruction() true for any processing
    instruction
  • node() true for any node of any type
  • QName true for nodes which is PNT and has the
    same expanded-name as QName

12
Predicates
  • Filters a node-set w.r.t. an axis to produce a
    new node-set
  • Expr
  • Expr is evaluated for each node to be filtered
  • If true, the node is include in the new node-set
  • para3 is equivalent to paraposition()3

13
Location Path Examples
  • /
  • Selects the document root node
  • /descendantpara
  • Selects all the para elements in the same
    document
  • childparaposition()1
  • Selects the first para child of the context node
  • preceding-siblingchapterposition()1
  • Selects the previous chapter sibling of the
    context node
  • childparaattributetypewarningposition()
    5
  • Selects the 5th para child of the context node
    that has a type attribute with value warning

14
Abbreviated Syntax
  • Location Step
  • child can be omitted from a location step -
    Default axis
  • Ex) childdiv/childpara ? div/para
  • attribute ? _at_
  • Ex) childparaattributetypewarning ?
    para_at_typewarning
  • /descendant-or-selfnode()/ ? //
  • Ex) /descendant-or-selfnode()/childpara ?
    //para
  • selfnode() ? .
  • Ex) selfnode()/descendant-or-selfnode()/child
    para ? .//para
  • parentnode() ? ..
  • Ex) parentnode()/childtitle ? ../title
  • //para1, /descendantpara1

15
Other Expressions
  • Function calls
  • FunctionName(Argument1, Argument2, )
  • Node-sets
  • Location path
  • operator
  • Computes the union of its operands(node-sets)
  • Booleans
  • or, and, , !, lt, lt, gt
  • lt, lt must be quoted like lt, lt in XML
    documents
  • Left associative
  • 3 gt 2 gt 1 false

16
Other Expressions
  • Numbers
  • Floating-point number
  • Double-precision 64-bit format
  • NaN(Not-a-Number) value
  • Positive, negative infinity
  • Positive, negative zero
  • , -, div, mod
  • - operation typically needs to be preceded by
    whitespace
  • foo-bar, foo - bar
  • Strings
  • A sequence of zero or more characters
  • A character in XPath
  • A single Unicode abstract character
  • May be a pair of 16-bit Unicode code values

17
Core Function Library
  • XPath implementations must always include
  • Node set functions
  • number last()
  • number position()
  • number count(node-set)
  • node-set id(object)
  • Ex) id(foo)
  • string local-name(node-set?)
  • string namespace-uri(node-set?)
  • string name(node-set?)

18
Core Function Library
  • String Functions
  • string string(object?)
  • string concat(string, string, string)
  • boolean starts-with(string, string)
  • boolean contains(string, string)
  • string substring-before(string, string)
  • string substring-after(string, string)
  • string substring (string, number, number?)
  • number string-length(string?)
  • string normalize-space(string?)
  • string translate(string, string, string)

19
Core Function Library
  • Boolean functions
  • boolean boolean(bobject)
  • boolean not(boolean)
  • boolean true()
  • boolean false()
  • boolean lang(string)

20
Core Function Library
  • Number functions
  • number number(object?)
  • number sum(node-set)
  • number floor(number)
  • number ceiling(number)
  • number round(number)

21
Data Model
  • How XPath models an XML document as a tree?
  • Node type
  • Root nodes
  • Element nodes
  • Text nodes
  • Attribute nodes
  • Namespace nodes
  • Processing instruction nodes
  • Comment nodes
  • string-value
  • expanded-name local part namespace URI

22
Data Model
  • Root node
  • A element node for the document element for child
  • Processing instruction, comment nodes for
    children
  • string-value
  • Concatenation of the string-values of all text
    node descendants
  • No expanded-name

23
Data Model
  • Element node
  • Element, comment, processing instruction, text
    node for children
  • Unique IDs
  • Attributes declared in DTD as type ID
  • string-value
  • Concatenation of the string-values of all text
    node descendants
  • Has expanded-name

24
Data Model
  • Attribute node
  • Each element node has an associated set of
    attribute nodes
  • The element is the parent of each of the
    attribute nodes
  • Not a child of its parent element
  • tests whether two nodes have the same value
  • Not whether they are the same node
  • A defaulted attribute is treated the same as a
    specified attribute
  • string-value
  • Normalized value
  • Has expanded-name

25
Data Model
  • Namespace node
  • Each element has an associated set of namespace
    nodes
  • For each distinct namespace prefix in scope for
    the element
  • One for the default namespace if in scope for the
    element
  • Not a child of its parent element
  • string-value
  • Namespace URI
  • Has expanded-name
  • Namespace URI null

26
Data Model
  • Processing instruction node
  • For every processing instruction
  • string-value
  • Part of the processing instruction following the
    target
  • Has expanded-name
  • Local part processing instructions target
  • Namespace URI null
  • XML declaration is not a processing instruction

27
Data Model
  • Comment node
  • string-value
  • Content of the comment
  • No expanded-name
  • Text Nodes
  • Character data
  • Never have an immediately following/preceding
    sibling text node
  • string-value
  • Contents
  • No expanded-name

28
References and Resources
  • http//www.w3c.org/TR/xpath
  • http//www.w3c.org/TR/xpath20
  • http//www.w3.org/TR/REC-xml-names
  • http//www.w3.org/TR/WD-xptr
  • http//www.w3.org/TR/xslt

29
XQuery(XML Query Language)
  • 2002, 10, 14
  • ???
  • DB Lab. EECS Dept. KAIST

30
Contents
  • Introduction
  • Basics
  • Path Expressions
  • Constructors
  • FLWR Expressions
  • Order-Related Expressions
  • Conditional Expressions
  • Quantified Expressions
  • Types
  • Query Prolog

31
History
  • Dec, 98 W3C sponsors workshop on XML Query
  • Oct, 99 W3C charters XML Query working group
  • 2000 Working Group publishes requirements, use
    cases, data model
  • June, 2000 Quilt proposal presented at WebDB
  • Feb. 2001 First working draft of XQuery language
  • Currently
  • XQuery 1.0
  • W3C Working Draft

32
Introduction
  • Query language
  • Derived from Quilt
  • Borrowed Feature
  • XPath, XQL
  • Path expression syntax
  • XML-QL
  • Binding variables
  • SQL
  • A series of clauses based on keywords
  • Provide a pattern for restructuring data
  • OQL
  • Functional language

33
Basics
  • Namespace prefixes to be used in this lecture
  • xs
  • http//www.w3.org/2001/XMLSchema
  • xsi
  • http//www.w3.org/2001/XMLSchema-instance
  • xf
  • http//www.w3.org/2002/08/xquery-functions

34
Basics
  • Query data model
  • A value
  • Error value
  • An ordered sequence of zero or more items
  • An item
  • A Node
  • An atomic value
  • Kinds of nodes
  • Document, Element, Attribute, Text, Comment,
    Processing instruction, Namespace

35
Basics
  • Facts about values
  • No distinction between an item and a sequence of
    length one
  • No null value
  • A sequence can be empty
  • Sequences can contain heterogeneous values
  • All sequences are ordered

36
Basics
  • An XML Document

lt?xml version 1.0?gt lt! Requires one trained
person --gt ltprocedure title Removing a light
bulbgt lttime unit secgt15lt/timegt
ltstepgtGrip bulb.lt/stepgt ltstepgt Rotate
it ltwarninggtslowlylt/warninggt
counterclockwise. lt/stepgt lt/proceduregt
37
Basics
  • Data model representation

D
procedure
P
C
E
A
title Removing a light bulb
time
step
step
E
E
unit sec
E
A
warning
T
T
T
E
T
Grip bulb.
Rotate it
counterclockwise.
15
T
slowly
38
Basics
  • Facts about nodes
  • Nodes have identity
  • Cf) atomic values dont
  • Element and attribute nodes have a type
    annotation
  • Generated by validating the node
  • May be a complex type
  • Type may be unknown
  • xsanyType for element node
  • xsanySimpleType for attribute node
  • Each node has a typed value
  • A sequence of atomic values(or ERROR)
  • Type may be unknown
  • xsanySimpleType
  • There is a document order among nodes

39
Basics
  • General XQuery rules
  • XQuery is a case-sensitive language
  • Keywords are in lower-case
  • XQuery is a functional language
  • XQuery is a strongly-typed language
  • Every expression has a value and no side effects
  • Expressions propagate the error value
  • Exception and, or quantifiers have early-out
    semantics

40
Basics
  • Functions
  • Function calls
  • Three-argument-function (1, 2, 3)
  • Two-argument-function (1, (2, 3))
  • Functions are not overloaded(except certain
    built-ins)
  • Evaluating a function call
  • Convert arguments to expected types and bind
    parameters
  • Evaluate function body
  • Convert result to expected result type

41
Path Expressions
  • Inherited from XPath 1.0
  • A path always returns a sequence of distinct
    nodes in document order
  • A path consists of a series of steps E1/E2/E3
  • Each step can be any expression that returns a
    sequence of nodes
  • What E1/E2 means
  • Evaluate E1-it must be a set of nodes
  • For each node N in E1, evaluate E2 with N as
    context node
  • Union together all the E2-values
  • Eliminate duplicate node-ids and sort in document
    order

42
Path Expressions
  • Axis step
  • A frequently-used kind of step
  • Maps a node into a sequence of related nodes
  • An axis step has three parts
  • The axis(defines the direction of movement)
  • The node test(qualifies by name or kind of node)
  • Zero or more predicates
  • Ex)childproductprice gt 100
  • Axis steps often use an abbreviated syntax
  • Productprice gt 100

43
Axes
  • XPath axes
  • Forward axes
  • child
  • descendant
  • attribute
  • self
  • descendant-or-self
  • following-sibling
  • Following
  • Namespace
  • Reverse axes
  • parent
  • ancestor
  • preceding-sibling
  • preceding
  • ancestor-or-self
  • XQuery axes
  • Forward axes
  • child
  • descendant
  • attribute
  • self
  • descendant-or-self
  • Reverse axes
  • parent

44
Predicates
  • Serve as a filter on a sequence
  • What E1E2 means
  • For each item e in the value of E1, evaluate E2
    with
  • Context item e
  • Context position position of e within the value
    of E1
  • Retain those items in E1 for which the predicate
    truth value of E2 is true
  • The predicate truth value of an expression E
  • If E has a boolean value use that value
  • Ex) empssalary gt 5000
  • If E has a numeric value TRUE if e is equal to
    the context position, otherwise FALSE
  • Ex) emps5
  • If E is an empty sequence FALSE
  • If E is a non-empty node sequence TRUE
  • Ex) empsseceretary
  • Otherwise, return an error

45
Expressions
  • Combining sequences union intersect except
  • Return sequences of distinct nodes in document
    order
  • Arithmetic operators - div mod
  • Extract typed values from node
  • Cast xsanySimpleType to double
  • Promote numeric operands to a common type
  • Multiple values error
  • If operand is (), return ()
  • Arithmetic supported for numeric and date/time
    types

46
Expressions
  • Comparison operators
  • eq ne gt ge lt le
  • Compare single atomic values
  • ! gt gt lt lt
  • Compare sequences of values, with existential
    semantics
  • is isnot
  • Compare two nodes, based on node identity
  • ltlt gtgt precedes follows
  • Compare two nodes, based on document order

47
Expressions
  • Logical expressions
  • Operators and or
  • Function not ( )
  • Return TRUE or FALSE (2-valued logic)
  • Result depends on effective boolean value of
    operands
  • If operand is of type boolean, it serves as its
    own EBV
  • If operand is (), EBV is FALSE
  • If operand is a non-empty node sequence, EBV is
    TRUE
  • In any other case, return an error
  • Early-out semantics

48
Element Constructors
  • To construct an element with a known name and
    content, use XML syntax
  • If the content of an element or attribute must be
    computed, use a nested expression enclosed in

ltbook isbn12345gt lttitlegtHuckleberry
Finnlt/titlegt lt/bookgt
ltbook isbnxgt b/title lt/bookgt
49
Computed Constructor
  • An alternative way to create nodes
  • If both the name and the content must be
    computed
  • Examples

element name-expr content-expr attribute
name-expr content-expr document
content-expr
element book attribute isbn
isbn-0060229357 , element author
element first Crockett , element last
Johnson
element xfnode-name(e) 2 xfdata(e)
50
Whitespace in Constructors
  • Boundary whitespace
  • Occur in the boundaries between tags and/or
    enclosed expressions
  • Declare xmlspace in the Query Prolog
  • strip or not declared boundary whitespace is not
    preserved
  • preserve boundary whitespace is preserved
  • Examples for whitespace

ltagt abc lt/agt whitespace surrounding
abc is boundary whitespace ltagt z abclt/agt
whitespace surrounding the z is not boundary
whitespace ltagtx20abclt/agt character
reference such as x20 is not boundary
whitespace ltagt lt/agt whitespace
generated by enclosed expression is not boundary
whitespace
51
Other Constructors
  • Processing instruction
  • Comments
  • XQuery comments -- --
  • CDATA sections

lt?format roleoutput ?gt
lt! Tags are ignored in the following section --gt
lt!CDATA ltaddressgt123 Roosevelt Ave.
Flushing, NY 11368lt/addressgt gt
52
FLWR Expressions
  • A FLWR expression binds some variables, applies a
    predicate, constructs a new result
  • FOR, LET clauses
  • Generate a list of tuples of bound variables,
    preserving document order
  • WHERE clause
  • Applies a predicate, eliminating some of the
    tuples
  • RETURN clause
  • Is executed for each surviving tuple, generating
    an ordered list of outputs

for var in expr
return expr
let var expr
where expr
53
FLWR Expressions
  • let clause
  • Each variable is bound directly to the result of
    an expression
  • Example Query
  • Result

let s (ltone/gt, lttwo/gt, ltthree/gt) return
ltoutgtslt/outgt
ltoutgt ltone/gt lttwo/gt ltthree/gt lt/outgt
54
FLWR Expressions
  • for clause
  • Create tuples of variable boundings
  • Cartesian product of the sequences of values
  • Examples

ltoutgt ltone/gt lt/outgt ltoutgt
lttwo/gt lt/outgt ltoutgt ltthree/gt lt/outgt
lttuplegt ltigt1lt/igt ltjgt3lt/jgt lt/tuplegt lttuplegt
ltigt1lt/igt ltjgt4lt/jgt lt/tuplegt lttuplegt ltigt2lt/igt
ltjgt3lt/jgt lt/tuplegt lttuplegt ltigt2lt/igt
ltjgt4lt/jgt lt/tuplegt
for s (ltone/gt, lttwo/gt, ltthree/gt) return
ltoutgtslt/outgt
for xsinteger i in (1, 2), xsinteger j in
(3, 4) return lttuplegt ltigt i lt/igt
ltjgt j lt/jgt lt/tuplegt
55
FLWR Expressions
  • Example document bib.xml

ltbibgt ltbookgt lttitlegtTCP/IP
Illustratedlt/titlegt ltauthorgtW.
Stevenslt/authorgt ltpublishergtAddison-Wesleylt/
publishergt lt/bookgt ltbookgt
lttitlegtAdvanced Programming in the Unix
environmentlt/titlegt ltauthorgtW.
Stevenslt/authorgt ltpublishergtAddison-Wesleylt/
publishergt lt/bookgt lt/bibgt
56
FLWR Expressions
  • Example query

ltauthlistgt let input document("bib.xml")
for a in distinct-values(input//author)
return ltauthorgt ltnamegt
a/text() lt/namegt, ltbooksgt
for b in input//book where
b/author a return b/title
lt/booksgt lt/authorgt lt/authlistgt
57
FLWR Expressions
  • Result

ltauthlistgt ltauthorgt ltnamegtW.
Stevenslt/namegt ltbooksgt lttitlegtTCP/IP
Illustratedlt/titlegt lttitlegtAdvanced
Programming in the Unix environmentlt/titlegt
lt/booksgt lt/authorgt lt/authlistgt
58
Sort Expressions
  • expr1 sortby expr2,
  • For each item I in expr1, expr2 is evaluated with
    I as focus
  • Resulting values used to reorder the items in E1
  • Value of expr1 input sequence
  • expr2, ordering expressions
  • Result output sequence
  • Contains all input items, but possibly in a
    different order
  • Examples

//bookprice gt 100 sort by (author1, title)
data(//book/author) sort by (.)
59
Unordered Expressions
  • unordered keyword
  • A prefix to any expression
  • Indicate that its order is not significant
  • Optimization

60
Conditional Expressions
  • if (expr1) then expr2 else expr3
  • expr1 test expression
  • expr2 then-expression
  • expr3 else-expression
  • if effective boolean value of expr1 is
  • TRUE value of expr2 is returned
  • FALSE value of expr3 is returned
  • Examples

if (widget1/unit-cost lt widget2/unit-cost)
then widget1 else widget2
if (part/_at_discounted) then part/wholesale
else part/retail
61
Quantified Expressions
  • some/every var1 in expr1, var2 in expr2,
    satisfies expr3
  • expr1, expr2, expressions for bind variables
  • expr3 a test expression
  • some
  • TRUE if at least one evaluation of the test
    expression has EBV true
  • FALSE if the in-clauses generate zero
    binding-tuples
  • every
  • TRUE if every evaluation of the test expression
    has the EBV true
  • TRUE if the in-clauses generate zero
    binding-tuples
  • Allow early-out for errors
  • Example

some x in (1, 2, 3), xsinteger y in (2, 3,
4) satisfies x y 4
62
SequenceType
  • Describes the type of an XQuery value(sequence)
  • For any item
  • item
  • For atomic values
  • atomic value
  • A named atomic value type
  • untyped
  • For nodes
  • node, text, processing-instruction, comment,
    document, element, attribute

63
SequenceType
64
validate Expression
  • validate expr
  • Evaluate expr, then serialize its value as an XML
    string and invoke the schema validator on it
  • Elements and attributes that are recognized by
    the validator receive type annotations
  • ltagt5lt/agt has annotation xsanyType
  • validate ltagt5lt/agt might have annotation
    hatsize

65
Testing Types
  • instance of
  • expr instance of ST
  • True if the value of expr matches the type name
    in ST(SequenceType)
  • typeswitch
  • typeswitch(opexpr) case ST1 return expr1 case ST2
    return expr2 default return defaultexpr
  • ST1, ST2, SequenceType
  • Executes one branch, based on the type of its
    operand

66
Tinkering with Types
  • cast as ST (expr)
  • Converts value to target type
  • Only for predefined type pairs and derived -gt
    base type
  • May return error at run-time
  • treat as ST (expr)
  • Serves as a compile-time promise
  • At run-time, returns an error if type of expr is
    not ST
  • Ex) treat as element of type USAddress
    (myaddress)
  • assert as ST (expr)
  • Servers as a compile-time assertion
  • Compile-time error if static type of expr is not
    ST
  • Ex) assert as PurchaseOrder (query)

67
Structure of an XQuery
  • Query Query Prolog Query Body
  • Query Prolog
  • Declarations, definitions
  • Create the environment for query processing
  • Namespace declarations
  • Schema imports
  • An xmlspace declaration
  • A default collation
  • Function definitions
  • Query Body
  • A sequence of expressions
  • Define the result of the query

68
Namespace Declarations
  • Defines a namespace prefix, associates it with a
    namespace URI
  • Examples

declare namespace foo http//example.org ltfoo
bargtLentilsltfoobargt
-- Error multiple declarations of namespace
-- declare namespace xx http//example.org/foo
declare namespace xx http//example.org/bar
69
Namespace Declarations
  • Default namespace declarations
  • default element namespace
  • default function namespace
  • No default element namespace
  • Unqualified name of elements and types
  • In no namespace
  • No default function namespace
  • Unqualified name of functions
  • In the namespace of XPath/XQuery functions
  • Predefined namespace prefixes
  • xml http//www.w3.org/XML/1998/namespace
  • xs http//www.w3.org/2001/XMLSchema
  • xsd http//www.w3.org/2001/XMLSchema-datatypes
  • xsi http//www.w3.org/2001/XMLSchema-instance

70
Schema Imports
  • Imports the element, attribute declarations and
    type definitions from a schema
  • No effect on the in-scope namespaces
  • Example

import schema http//www.w3.org/1999/xhtml
at http//example.org/xhtml/xhtml.xsd declare
namespace xhtml http//www.w3.org/1999/xhtml d
ocument(aspect.html)//xhtmltable
71
xmlspace Declarations
  • Controls whether boundary whitespace is preserved
  • Example

declare xmlspace preserved
72
Default Collation
  • Used by all functions and operators if no other
    collations is specified
  • Identified by a URI
  • Example

default collation http//example.org/languages/
Icelandic
73
Function Definitions
  • Define functions of ones own
  • Default parameter/return type
  • xsanyType
  • returns clause can be omitted
  • Example

define function summary(element employee emps)
returns element dept expr
74
References and Resources
  • http//www.w3c.org/TR/xquery
  • http//www.w3.org/TR/query-datamodel
  • http//www.w3.org/TR/xquery-operators
  • http//www.w3.org/TR/query-semantics
  • http//www.w3c.org/XML/Query
  • http//www.w3.org/XML/Schema
  • http//www.w3.org/TR/REC-xml-names
  • Progress Report on XQuery, Don Chamberlin
  • http//www.almaden.ibm.com/cs/people/chamberlin/
Write a Comment
User Comments (0)