Title: XPath
1XPath
2What is XPath?
- A language designed to be used by XSL
Transformations (XSLT), Xlink, Xpointer and XML
Query. - Primary purpose Address part of an XML
document, and provide basic facilities for
manipulation of strings, numbers and booleans.
3Outline
- Introduction
- Data Model
- Xpath Syntax
- Location Path
- General Xpath Expressions
- Core Function Library
- XPath utilities
- Conclusion
4Introduction
- W3C Recommendation. November 16, 1999
- Latest version http//www.w3.org/TR/xpath
- XPath uses a compact, string-based, rather than
XML element-based syntax. - Operates on the abstract, logical structure of an
XML document rather than its surface syntax. - Uses a path notation (like in URLs) to navigate
through this hierarchical tree structure.
Introduction
5Introduction Cont.
- Xpath models an XML doc as a tree of nodes and
defines a way to compute a string-value for each
type of node. - Supports Namespaces.
- Expression (Expr) is the primary syntactic
construct of Xpath.
Introduction
6Data Model
- The way to represent an XML document.
- This tree consists of 7 nodes
- Root Node
- Element Nodes
- Attribute Nodes
- Namespace Nodes
- Processing Instruction Nodes
- Comment Nodes
- Text Nodes
- The tree structure is ordered in order of the
occurrence of nodes start-tag in the XML doc.
Data Model
7Data Model Example
- lt?xml version1.0gt
- lt?xml-stylesheet typetext/xsl hrefbib.xsl
?gt - lt! -- simple XML document --gt
- ltbibgtltbook price25.00 pages400gt
- ltpublishergt IDG
bookslt/publishergt ltauthorgt
ltfirst-namegtRicklt/first-namegt
ltlast-namegt Hull lt/last-namegt
lt/authorgt ltauthorgt Simon
Northlt/authorgt lttitlegt XML complete
lt/titlegt ltyeargt 1997
lt/yeargtlt/bookgtltbookgt ltpublishergt
Freeman lt/publishergt ltauthorgt
Jeffrey D. Ullman lt/authorgt lttitlegt
Principles of Database lt/titlegt
ltyeargt 1998 lt/yeargtlt/bookgt - lt/bibgt
Data Model
8Xpath Syntax
- Expression is the primary syntactic construct in
XPath - Evaluated to yield an object of 4 basic types.
- node-set (unordered collection of nodes without
duplicates). - boolean (true/false)
- number (float)
- string (sequence of UCS chars)
- Expression Evaluation occurs will respect to a
context. (XSLT/XPointer specified context) - Location path is one important kind of
expression. - Location paths select a set of nodes relative to
the context node.
Expression
9Location Path
- Location Path provides the mechanism for
addressing parts of an XML doc, similar to file
system addressing. - Ex /book/year (select all the year
elements that have a book parent) - Every location path can be expressed using a
straightforward but rather verbose syntax - unabbreviated syntax (verbose syntax)
- Ex child (select all element children of
the context node) - abbreviated syntax
- Ex. (equivalent to unabbreviation above)
Location Path
10Location Path Cont.
- Two types of paths Relative Absolute
- Relative location path consists of a sequence of
one or more location steps separated by / - absolute location path consists of / optionally
followed by a relative location path - Composed of a series of steps (1 or more)
- Ex. Childbib/childbook (select the book
element children of the - bib element children of the context node)
Ex. / (select the root node of the document
containing the context node)
Location Path
11Location Path Examples
- Verbose syntax (has syntactic abbreviations for
common cases)Examples (unabbreviated) - childbook selects the book
element children of the context - node
- child selects all element
children of the context node - attributeprice selects the price
attribute of the context node - descendantbook selects all book
descendants of the context node - selfbook selects the context
node if it is a book element (otherwise
selects nothing) - child/childbook selects all book
grandchildren of the context node - / selects the
document root (which is always the - parent of
the document element)
Location Path
12Location Steps
- 3 parts
- axis (specifies relationship btwn selected nodes
and the context node) - node test (specifies the node type and
expanded-name of selected nodes) - predicates (arbitrary expressions to refine the
selected set of nodes) - The syntax for location step is the axis name and
node test separated by a double colon followed by
zero or more expressions, each in square bracket.
-
- Evaluate a location step is to generate an
initial node-set from axis (relationship to
context node) and node-test (node-type and
expanded-name), then filter that node-set by each
of the predicates in turn.
ex childbookposition( )1 child is the
name of the axis, book is the node test, and
position()1 is a predicate
ex descendantbookposition( )1
selects the all book element descendants of the
context node firstly, then filter the one
which is first book descendant of context
node.
Location Step
13Location Steps
- Axes
- 13 axes defined in XPath
- Ancestor, ancestor-or-self
- Attribute
- Child
- Descendant, descendant-or-self
- Self
- Following
- Preceding
- Following-sibling, preceding-sibling
- Namespace
- Parent
- Node test
- Identifies type and expanded-name of node.
- Can use a name, wildcard or function to
evaluate/verify type and name. - ex. Childtext() select the text node
children of context node. - Childbook select book element
children of context node. - Attribute select all attribute
children of context node.
Weve only seen these, so far
Location step
14Location Step Cont.
- Predicate
- A predicate filters a node-set with respect to an
axis to produce a new node-set. - Use XPath expressions (normally, boolean
expressions) in square brackets following the
basis (axis node test). - Ex. Childbookattributeprice25
- (select all book children of the context
node that have a price attribute with value 25. - A predicateExpr is evaluated by evaluating the
Expr and converting the result to a boolean (True
or False)
15Examples
- Axis and Node Test
- descendantpublisher
- (selects the publisher elements that are
descendant of the context node) - attributes
- (selects all attributes of the context node)
- Basis and Predicate
- childbook3
- (selects the 3rd book of the children of the
context node) - childselfauthor or selfyearposition()la
st() - (selects the last author or year child of the
context node) - childbookattributepage4005
- (selects the fifth book child of the context node
that has a page attribute - with value 400)
Location Path
16Abbreviated Syntax
- Abbreviated syntax is the simpler way to express
location path. - For common case, abbreviation can be used to
express concisely (not every case). - Each abbreviation can be converted to
unabbreviated one.
child can be omitted from a location step
(child is the default axis)ex. bib/book is
equivalent to childbib/childbook
attribute can be abbreviated to _at_ ex.
Book_at_price25 is short for childbookattribu
teprice25
// is short for /descendant-or-selfnode()/ ex.
Book//author is short for book/descendant-or-self
node()/childauthor
A location step of . is short for
selfnode()ex .//book is short for
selfnode()/descendant-or-selfnode()/childboo
k
Location step of .. is short for
parentnode() ex. ../title is short for
parentnode()/childtitle
Location Path
17Expressions
- Function Calls
- Node-sets
- Booleans
- Numbers
- Strings
Function Calls
Expressions
18Function Calls
- Function call expression is evaluated by using
the FunctionName to identify a function in the
expression evaluation context function library. - An argument is converted
- to type string (as if calling the string
function), - to type boolean (as if calling the Boolean
function), - to type number (as if calling the number
function), - An argument that is not of type node-set cannot
be converted to a node-set. - Ex. position() function returns the current
nodes position in the context node list as a
number.
Expressions
19Expressions
- Function Calls
- Node-sets
- Booleans
- Numbers
- Strings
Expressions
20Node-sets
- A location path can be used as an expression.
- The expression returns the set of nodes selected
by the path.
Expressions
21Expressions
- Function Calls
- Node-sets
- Booleans
- Numbers
- Strings
Expressions
22Booleans
- A boolean can only have two values true or false
- The following operators can be used in boolean
expressions or combine two boolean expressions
according to the usual rules of boolean logic - or
- and
- , !
- lt, lt, gt, gt
- Ex. BookXML complete or bookPrinciples of
Database
Expressions
23Expressions
- Function Calls
- Node-sets
- Booleans
- Numbers
- Strings
Expressions
24Numbers
- A number represents a floating-point number, no
pure integers exist in Xpath. - The basic arithmetic operators include
- , -, , div and mod.
- Ex. _at_id div 10
Expressions
25Expressions
- Function Calls
- Node-sets
- Booleans
- Numbers
- Strings
Expressions
26Strings
- Strings consist of a sequence of zero or more
character. - May be enclosed in either single or double
quotes. - Comparison operators , !
Expressions
27Core Function Library
- XPath defines a core set of functions to evaluate
expressions. - All implementations of Xpath must implement the
core function library. - Four type of functions
- Node Set Functions operate on or return info
about node sets. - String Functions are used for basic string
operations. - Ex. substring(12345, 0, 3) returns 12
- Boolean Functions all return true or false.
- Number Functions are used for basic number
operations.
Core Library
28Xpath Utilities
- Miscellaneous utilities related to Xpath
- http//www.xmlsoftware.com/xpath/
- XPath Visualiser
- This is a powerful tool for the evaluation of an
XPath expression and visual presentation of the
resulting node-set. - allowing you to experiment with XPath for finding
the correct expression. - The display of the XML source document is similar
to the default - IE display with the same syntax color and
collapsible expandable container nodes. - very straightforward XPath learning process.
Xpath Utilities
29XPath Visualiser
Context Node
Xpath input
Tree View of XML Doc
Xpath evaluating result
Result is highlighted
Xpath Utilities
30Conclusion
- Xpath is complete pattern match language.
- Provides an concise way for addressing parts of
an XML document. - Base for XSLT, Xpointer and XML Query WG.
Supported by W3C. - Implementing XPath basically requires learning
the abbreviated syntax of location path
expressions and the functions of the core library.
Conclusion
31Reference
- XML Path Language (XPath) V1.0
- http//www.w3.org/TR/xpath
- XML in a Nutshell
- http//www.oreilly.com/catalog/xmlnut/chapter/
- ch09.html
- Managing XML and Semistructured Data
http//www.cs.washington.edu/homes/suciu/COURSES/5
90DS/06xpath.htm - Xpath utilities
- http//www.xmlsoftware.com/xpath/
Xpath Reference