Title: Introduction to XSLT
1Introduction to XSLT
Evan Lenz XML/XSLT Consultant http//xmlportfolio.
com evan_at_evanlenz.net
August 2, 2005 OReilly Open Source
Convention August 1 - 5, 2005
2Who is this guy?
- Evan Lenz
- Majored in music
- Over 5 years ago, read Michael Kay's XSLT
Programmer's Reference cover-to-cover while
sitting by his newborn son's hospital bed - Participated on the XSL Working Group for a
couple years - Wrote XSLT 1.0 Pocket Reference (due out this
month) - Preparing for entrance to a Ph.D. program in
Digital Arts and Experimental Media
3Why does he like XSLT?
- XSLT is
- Powerful
- Small
- Beautiful
- In high demand
- Fun to learn
- Fun to teach
4What should I expect this afternoon?
- Fasten your seatbelts
- A variety of interactive exercises and
traditional presentation - Feel free to feel overwhelmed
- You're learning more than you think!
- Try your best while you're here and it will be
time well spent - Have fun!
5What's with the handouts?
- The big handout is a late-stage draft of XSLT 1.0
Pocket Reference, due out this month - If you would like a complimentary copy of the
final book, put your name and mailing address on
the sign-up sheet - The smaller handout contains exercises that we
will be using today
6XSLT from 30,000 feet
7What is XSLT?
- XSL Transformations
- A language for transforming XML documents into
other XML documents - W3C Recommendation
- http//www.w3.org/TR/xslt
- Version 1.0 1999-11-16
8OK, then what is XSL?
- Extensible Stylesheet Language
- A language for expressing stylesheets
- W3C Recommendation
- http//www.w3.org/TR/xsl
- Version 1.0 2001-10-15
- Has 2 parts
- XSLT
- Refactored out of XSL so that it could proceed
independently - XSL-FO
- Formatting Objects
9What is XPath?
- XML Path Language
- A language for addressing parts of an XML
document - W3C Recommendation
- http//www.w3.org/TR/xpath
- Version 1.0 1999-11-16
- Released on the same day as XSLT 1.0
- The expression language used in XSLT
10A relationship of subsets
- XPath is part of XSLT
- XSLT is part of XSL
- Today we are concerned only with the inner two
circles - XSLT and XPath
- XSL, a.k.a. XSL-FO, is out of scope for today
11What is XSLT used for?
- Common applications
- Stylesheets for converting XML to HTML
- Generating Web pages or whole websites
- Docbook -gt HTML
- Transformations from one document type to another
- ML to ML as many potential applications as
there are XML document types - RSS, SVG, UBL, LegalXML, HrXML, XBRL
- Office applications
- SpreadsheetML, WordML, Keynote XML, OOo XML,
PowerPoint (in next version), Access XML, etc. - Extracting data from documents
- Modifying or fixing up documents
12Where is XSLT used?
- Every platform
- Windows, Linux, Mac, UNIX, Java
- Many browsers support XSLT natively
- Firefox/Mozilla, Internet Explorer, Safari
- Many frameworks use or support XSLT
- .NET, Java, LAMP
- PHP5 now uses libxslt
- Cocoon, 4Suite, Amazon web services, Google
appliance, Cisco routers, etc., etc. - XSLT IS EVERYWHERE!!
13Interoperable implementations?
- In terms of interoperability, XSLT is unmatched
among languages having multiple implementations - Java
- Saxon http//saxon.sf.net (open-source)
- Xalan-J http//xml.apache.org/xalan-j/
(open-source) - Windows
- MSXML fast, fully conformant
- Python
- 4xslt http//www.4suite.org (open-source)
- C
- libxslt http//xmlsoft.org (open-source used
in Firefox, Safari, PHP5, etc.) - Xalan-C http//xml.apache.org/xalan-c/
(open-source)
14Enough already, let's see some code!
15Example XML file
- INPUT names.xml
- ltpeoplegt
- ltpersongt
- ltgivenNamegtJoelt/givenNamegt
- ltfamilyNamegtJohnsonlt/familyNamegt
- lt/persongt
- ltpersongt
- ltgivenNamegtJanelt/givenNamegt
- ltfamilyNamegtJohnsonlt/familyNamegt
- lt/persongt
- ltpersongt
- ltgivenNamegtJimlt/givenNamegt
- ltfamilyNamegtJohannsonlt/familyNamegt
- lt/persongt
- ltpersongt
- ltgivenNamegtJodylt/givenNamegt
- ltfamilyNamegtJohannsonlt/familyNamegt
- lt/persongt
16A very simple stylesheet, names.xsl
17OUTPUT the result of the transformation
saxon names.xml names.xsl gtnames.html
18Or we could open the XML directly in the browser
- Oops, we must first add a processing instruction
(PI) to the top, like this
lt?xml-stylesheet type"text/xsl"
href"names.xsl"?gt ltpeoplegt lt!-- ...
--gt lt/peoplegt
19That's better. Displays as HTML but viewing
source shows it's just XML.
20One more example for now
lt?xml-stylesheet type"text/xsl"
href"article.xsl"?gt ltarticlegt ltheadinggtThis is
a short articlelt/headinggt ltparagtThis is the
ltemphasisgtfirstlt/emphasisgt paragraph.lt/paragt
ltparagtThis is the ltstronggtsecondlt/stronggt
paragraph.lt/paragt lt/articlegt
21A rule-oriented stylesheet
22A rule-oriented stylesheet, cont.
23OUTPUT article.xml transformed to HTML
24See a pattern here?
25XPath in a nutshell
26How XPath fits in XSLT
- XPath expressions appear in attribute values,
e.g. - ltxslfor-each select"/people/person"/gt
- ltxslvalue-of select"givenName"/gt
- ltxslapply-templates select"/article/para"/gt
- What these mean
- /people/person
- Select all person child elements of all people
child elements of the root node - givenName
- Select all givenName child elements of the
context node - /article/para
- Select all para child elements of all article
child elements of the root node
27The skinny on XPath
- XPath is an expression language
- The only thing you can do with XPath is write
expressions - When we say expression, we mean XPath
expression - Every expression returns a value
- XPath 1.0 has just four data types
- Node-set (the most important)
- String
- Number
- Boolean
- All expressions are evaluated in a context
- Understanding context is crucial to understanding
XPath
28Path expressions
- Expressions that return node-sets are sometimes
called path expressions - A node-set is
- An unordered collection of zero or more nodes
- Every expression is evaluated relative to exactly
one context node - The context node is analogous to the current
directory in a filesystem - On a CLI, dir/ expands to all the files in the
dir directory inside the current directory - As an XPath expression, dir/ would select all
the element children of all the dir element
children of the context node
29A filesystem analogy
- Addressing files
- Relative
- dir/
- ../file
- Absolute
- /home/elenz/file.txt
- Addressing XML nodes
- Relative
- body/p
- ../table
- Absolute
- /html/body/p
30QUIZ 1 You have 5 minutes
31Go! Use this cheat sheet
- para selects the para element children of the
context node - selects all element children of the context
node - node() selects all children of the context node
- _at_name selects the name attribute of the context
node - _at_ selects all the attributes of the context node
- para1 selects the first para child of the
context node - paralast() selects the last para child of the
context node - /para selects all para grandchildren of the
context node - /doc/chapter5/section2 selects the second
section of the fifth chapter of the doc - chapter//para selects the para element
descendants of the chapter element children of
the context node - //para selects all the para descendants of the
document root and thus selects all para elements
in the same document as the context node - . selects the context node
- .//para selects the para element descendants of
the context node - .. selects the parent of the context node
- title
- ../_at_lang selects the lang attribute of the parent
of the context node
32XPath is all about trees
- A venture into the abstract world of the XPath
data model - Start filling out the NOTES page
33The XPath data model
- An abstraction of an XML document, after parsing
- In XSLT, models the source tree, stylesheet tree,
result tree - An XML document is a tree of nodes
- There are 7 kinds of nodes (memorize these!)
- Root node
- Element node
- Attribute node
- Text node
- Comment node
- Processing Instruction (PI) node
- Namespace node
34Root nodes
- Every XML document has exactly one root node
- An invisible container for the whole document
- The XPath expression / selects the root node of
the same document as the context node - The root node is not an element
- Instead, the document element or root element
is a child of the root node - It can also contain
- Processing instruction (PI) nodes
- Comment nodes
- XSLT extension to XPath data model
- Root node may contain text nodes
- Root node may contain more than one element node
35Element nodes
- There is one element node for each element that
appears in a document. (Duh.) - Example ltfoogtltbar/gtlt/foogt
- There are two element nodes above foo and bar.
- The foo element contains the bar element node.
- Element nodes can contain
- Text nodes
- Other element nodes
- Comment nodes
- Processing instruction (PI) nodes
36Node property children
- Applies only to
- Element nodes
- Root nodes
- Consists of
- Ordered list of zero or more other nodes
- 4 kinds of nodes can be children (memorize this
subset!) - Element nodes
- Text nodes
- Comment nodes
- Processing instruction (PI) nodes
- Instead of Lions, Tigers, and Bears, Oh My,
chant - Elements, comments, text, PIs! Elements,
comments, text, PIs! - Example ltfoogtltbar/gt lt!-- hi --gt lt/foogt
- The foo element's children consists of four nodes
in order - 1) element, 2) text, 3) comment, 4) text
37Why should I memorize that subset of four?
- Knowing what types of nodes can be children is
crucial to understanding what this little,
unassuming instruction does (as we shall see) - ltxslapply-templates/gt
- So remember
- Elements, comments, text, PIs!
- Elements, comments, text, PIs!
38How to access the children
- Use the child axis, e.g. (in non-abbreviated
form) - childnode()
- Selects all children of the context node
- child
- Selects all child elements of the context node
- childparagraph
- Selects all child elements named paragraph
- childxyzfoo
- Selects all child elements named foo in the
namespace designated by the xyz prefix - childxyz
- Selects all child elements that are in the
namespace designated by the xyz prefix
39Attribute nodes
- There is one attribute node for each attribute
that appears in a document. (Duh again.) - Example ltfoo bar"bat" bang"baz"/gt
- There are two attribute nodes in the above
example - bar and bang
40Node property attributes
- Applies only to
- Element nodes
- Consists of
- Unordered list of zero or more attribute nodes
- For example
- ltdoc lang"en"/gt
- The doc element's attributes property consists of
one lang attribute
41How to select attributes
- Use the attribute axis, e.g. (in abbreviated
form) - _at_lang
- Selects the attribute named lang
- _at_ or _at_node()
- Selects all attributes of the context node
- _at_abcfoo
- Selects the attribute named foo in the namespace
designated by the abc prefix - _at_abc
- Selects all attributes that are in the namespace
designated by the abc prefix
42Text nodes
- There is one text node for each contiguous
sequence of character data in a document - Text nodes are never adjacent siblings to each
other - Adjacent text nodes are always automatically
merged into one text node (e.g., when creating
the result tree in XSLT) - Lexical details are thrown away
- The XPath data model knows nothing about
- CDATA sections, entity references, or character
references - Example ltfoogtltlt/foogt
- There is one text node in the above document (a lt
character) - Example ltfoogtlt!CDATAltgtlt/foogt
- Identical to the first example, as far as XPath
is concerned
43Text node quiz
ltfoogt ltbargtHello world.lt/bargt lt/foogt
- How many text nodes are in the above document?
44Text node quiz ANSWER
ltfoogt ltbargtHello world.lt/bargt lt/foogt
- How many text nodes in the above document?
- ANSWER 3
- 1 Linefeed, space, space
- 2 Hello world.
- 3 Linefeed
45How to select text nodes
- Use the text() node test
- text()
- Short for childtext()
- descendanttext()
- Selects all text nodes that are descendants of
the context node
46Comment nodes
- There is one comment node for each comment
- Example
- lt!--This is a comment node--gt
47How to select comments
- Use the comment() node test on the child axis
- comment()
- Short for childcomment()
48Processing instruction (PI) nodes
- There is one PI node for each PI
- The XML declaration is not a PI
- lt?xml version"1.0"?gt is not a PI
- (It's not a node at all but just a lexical detail
that XPath knows nothing about.) - Example
- (This is a PI.)
- lt?xml-stylesheet type"text/xsl" href"a.xsl"?gt
49How to select processing instructions
- Use the processing-instruction() node test
- Any PI
- processing-instruction()
- Selects all PI children of the context node
- Short for childprocessing-instruction()
- PI with a specific target
- processing-instruction('xml-stylesheet')
- Selects all xml-stylesheet processing instruction
children of the context node
50Namespace nodes
- There is one namespace node for each in-scope
namespace URI/prefix binding for each element in
a document. (No duh... er... what?) - Always includes this (implicit) binding (used by
reserved attributes xmllang and xmlspace,
etc.) - Prefix xml
- URI http//www.w3.org/XML/1998/namespace
- Example ltfoo/gt
- There is one namespace node in the above document
- Example ltfoo xmlns"http//example.com"/gt
- There are two namespace nodes in the above
document - The implicit xml one (see above)
- And this one
- Prefix
- URI http//example.com
51Node property namespace nodes
- As with the attributes property, applies only to
- Element nodes
- Consists of
- Unordered list of zero or more namespace nodes
- For example
- ltfoo xmlnsxyz"http//example.com"/gt
- The foo element's namespace nodes property
consists of two namespace nodes (one for xyz and
one for xml)
52How to select namespace nodes
- Use the namespace axis
- namespace
- Selects all of the context node's namespace nodes
- namespacenode()
- Same as above
- namespacexyz
- Select the context node's namespace node that
declares the xyz prefix.
53Node property parent
- Applies to
- All node types except root node
- Element nodes
- Text nodes
- Comment nodes
- Processing instruction (PI) nodes
- Attribute nodes
- Namespace nodes
- Consists of
- Exactly one other node
- Root node, or
- Element node
- Example ltfoo bar"bat"/gt
- The bar attribute's parent is the foo element
54How to access the parent node
- Use the parent axis
- ..
- Selects the parent node of the context node
- Short for parentnode()
- parentdoc
- Select the parent node of the context node
provided that it is an element named doc - (otherwise return an empty node-set)
- parent
- Select the parent node of the context node
provided that it is an element
55A riddle
- You have a parent but you are not a child.
- What are you?
56A riddle, cont.
- You have a parent but you are not a child.
- What are you?
- Hint
- Only 4 node types are children, but 6 node types
have parents - 6 - 4 2 ...
57The answer
- You have a parent but you are not a child.
- What are you?
- Hint
- 4 node types are children, but 6 node types have
parents - 6 - 4 2 ...
- ANSWER
- A namespace node or an attribute node of course!
- Embracing the asymmetry and moving on...
58Derived node relationships
- A node's descendants consists of the transitive
closure of the children property - A fancy way of saying
- My children and my grandchildren and my great
grandchildren and their kids and so on - A node's ancestors consists of the transitive
closure of the parent property - A fancy way of saying
- My parent and my grandparent and my great
grandparent and its parent and so on
59Shooting blanks is okay
- QUIZ How many nodes will each of the following
expressions return? - parentcomment()
- attributetext()
- ancestorprocessing-instruction()
- namespacexyz
60Shooting blanks is okay
- QUIZ How many nodes will each of the following
expressions return? - parentcomment()
- attributetext()
- ancestorprocessing-instruction()
- namespacexyz
- ANSWER 0, by definition
- These expressions are perfectly legal they're
just guaranteed to return empty
61Node property string-value
- Applicable to
- All node types
- Root concatenation of all descendant text node
string-values - Element concatenation of all descendant text
node string-values - Attribute normalized attribute value
- Text character data (always at least one
character) - Comment the content of the comment
- PI text following the PI target and whitespace
- e.g., type"text/xsl" href"style.xsl" is the
string-value of an example stylesheet PI - Namespace node the namespace URI
- Use ltxslvalue-of/gt to insert the string-value of
a node into the result tree
62Node property expanded-name
- Applicable to
- Elements and attributes
- Local part local name of node, returned by
local-name() - URI part namespace name (URI) of node,
namespace-uri() - PIs
- Local part the PI target, e.g., xml-stylesheet
- URI part (always null)
- Namespace nodes
- Local part the namespace prefix, e.g., xml or
xyz or empty string () in the case of a default
namespace - URI part (always null)
- Root, text, and comment nodes do not have names
63Document order
- There is an ordering for all nodes in a document
called document order - The root node is always the first node in a
document - The rest are ordered according to where their XML
representation begins - Except that the relative order of attributes and
namespace nodes on the same element is
implementation-defined - Why should I care about document order?
- Because it's the default order in which nodes are
processed by both ltxslfor-eachgt and
ltxslapply-templatesgt
64Quiz counting nodes
- How many nodes are in the following XML document?
65Answer
- How many nodes are in the following XML document?
- 15!
66The first 14 nodes in the QUIZ 1 example
67Quiz review
- para
- Short for childpara
-
- Short for child
- node()
- Short for childnode()
- _at_name
- Short for attributename
- _at_
- Short for attribute
68Quiz review
- para1
- Short for childpara1
- Equivalent to childparaposition() 1
- paralast()
- Short for childparalast()
- Equivalent to childparaposition() last()
- /para
- Short for child/childpara
- /doc/chapter5/section2
- Short for /childdoc/childchapter5/childsec
tion2
69Quiz review
- chapter//para
- Short for
- childchapter/descendant-or-selfnode()/childp
ara - //para
- Short for
- /descendant-or-selfnode()/para
- .
- Short for
- selfnode()
- .//para
- Short for
- selfnode()/descendant-or-selfnode()/childpar
a
70Quiz review
- ..
- Short for parentnode()
- title
- Short for childtitle
- ../_at_lang
- Short for parentnode()/attributelang
71Summary of abbreviations
- XPath has five abbreviations. They are
- . is short for selfnode()
- .. is short for parentnode()
- _at_ is short for attribute
- // is short for /descendant-or-selfnode()/
- foo is short for childfoo
72XPath, the language
- Descending from the clouds
- Keep filling out that NOTES page
73XPath basics review
- XPath is an expression language
- Every expression returns a value
- XPath 1.0 has just four data types (write these
down!) - Node-set (the most important)
- String
- Number
- Boolean
- All expressions are evaluated in a context
- Understanding context is crucial to understanding
XPath
74XPath context
- All XPath expressions (whether in XSLT or not)
are evaluated in a context - The context consists of 6 parts
- The context node
- The context size (an integer 1 or higher)
- Returned by the last() function
- The context position (an integer 1 or higher)
- Returned by the position() function
- A set of namespace/prefix declarations in scope
for the expression - Used to evaluate QNames in the expression, e.g.,
xyzfoo/xyzbar - A set of variable bindings
- A function library
75XPath context, cont.
- The context comprises the entire world for an
XPath expression, so to speak. - Other than its context, there is no input to an
XPath expression. It consists of everything
outside the expression itself that may affect the
resulting value of the expression. - The context indicates
- Where you are
- Where in the tree
- Context node
- Where in processing
- Context size (the size of an arbitrary list of
nodes being processed) - Context position (the position of the current
node in that list) - What is available to you
- What variables you can reference
- What namespace prefixes you can use
- What functions you can call
76XPath syntax overview
- XPath supports these kinds of expressions
- Variable references
- foo, bar, etc.
- Function calls
- starts-with(str,'a')
- true()
- round(num)
- Parenthesized expressions
- (//para)
- (foo bar)
- String literals
- "foo", 'bar', etc.
- Numbers
- 13, 24.7, .007, etc.
- cont...
77XPath syntax overview, cont.
- Node-set expressions
- /html/body/p2/text()
- //_at_person //person
- (.//note para/fnote)1
- ns_at_id'xyz'
- Arithmetic expressions
- ((x - 5) 2) div -3
- pos mod 2
- Boolean expressions
- is-good and is-valid
- x gt 4
- position() ! last()
78Node-set expressions
- A node-set is
- An unordered collection of zero or more nodes
- Node-set expressions include
- Location paths (the most important kind of
expression!) - foo/bar3
- Union expressions (union of two node-set
expressions using the operator) - set foo/bar3
- Filtered expressions (a predicate applied to any
expression using the predicate operator) - set.'good'
- Path expressions (any expression composed with a
location path using the / or // operators) - set//bar
79Location paths a formal definition
- A location path
- Is the most important kind of XPath expression
- Returns a node-set
- Can be absolute or relative
- Relative
- One or more steps separated by /
- foo
- foo/bar
- Absolute
- /
- Selects the root node of the document that
contains the context node - The only location path that doesn't have any
steps in it - / followed by a relative location path
- /foo
- /foo/bar
80Location path steps
- A location path step has 3 parts
- An axis specifier
- A node test
- Zero or more predicates
- The above is equivalent to this abbreviated form
- paragraphstring-length(.) gt 100
- (because child is the default axis)
- It selects each paragraph child whose
string-value is greater than 100 characters in
length
81How a step is evaluated
- Moving from left to right
- The axis identifies a set of nodes relative to
the context node. - The node test acts as a filter on that set.
- Each of any number of optional predicates in turn
acts as a filter on the set identified by the
preceding predicates and node test to its left. - For example
- childparagraphstring-length(.)gt100
- The child axis identifies all the children of the
context node. - Among those, the paragraph node test selects only
the elements named paragraph. - Among those, the string-length(.)gt100 predicate
filters out all but the nodes whose string-value
is greater than 100 characters long.
82The axis
- That's this part
- Can be any one of 13 axes
- child
- self
- parent
- descendant
- descendant-or-self
- ancestor
- ancestor-or-self
- following
- following-sibling
- preceding
- peceding-sibling
- attribute
- namespace
83The 13 XPath axes
- What each axis contains
- child
- The children of the context node.
- descendant
- The descendants of the context node (children,
children's children, etc.). - parent
- The parent of the context node (empty if context
node is root node). - ancestor
- The ancestors of the context node (parent,
parent's parent, etc.).
84The 13 XPath axes, cont.
- What each axis contains, cont.
- attribute
- The attributes of the context node (empty if
context node is not an element). - namespace
- The namespace nodes of the context node (empty if
context node is not an element). - self
- Just the context node itself.
- descendant-or-self
- The context node and descendants of the context
node. - ancestor-or-self
- The context node and ancestors of the context
node.
85The 13 XPath axes, cont.
- What each axis contains, cont.
- following-sibling
- All nodes with the same parent as the context
node that come after the context node in document
order (empty if context node is an attribute or
namespace node). - preceding-sibling
- All nodes with the same parent as the context
node that come before the context node in
document order (excluding attributes and
namespace nodes). - following
- All nodes after the context node in document
order, excluding descendants, attributes, and
namespace nodes. - preceding
- All nodes before the context node in document
order, excluding ancestors, attributes, and
namespace nodes.
86A little observation about siblings
- The types of nodes that can be siblings are the
same as the types of nodes that can be children - Elements, comments, text, PIs!
- That's because attributes and namespace nodes are
by definition not siblings to anyone, not even to
each other. They're just attached to their
parent.
87The node test
- That's this part
- A node test is a filter on an axis
- There are two kinds of node test
- Node type tests
- Any node node()
- Specific node type text(), comment(),
processing-instruction() - Specific PI target processing-instruction('foo')
- Name tests
- Wildcard (any name)
- Namespace-qualified wildcard (any local name
within a particular namespace) xyz, abc,
etc. - QName (a specific expanded-name) foo, xyzfoo,
the highlighted example, etc. - Node type tests are not functions
- They're special forms that only happen to look
like functions
88Node type tests
- The most inclusive node test node()
- It includes every node regardless of its name or
node type - Thus, it effectively selects all nodes on the
given axis - childnode() selects all children
- ancestornode() selects all ancestors
- etc.
- Specific node type
- text(), comment(), processing-instruction()
- Selects only the nodes of the given type from the
given axis - descendanttext() selects all descendant text
nodes - followingcomment() selects all following
comment nodes - precedingprocessing-instruction() selects all
preceding PI nodes - Specific PI target
- childprocessing-instruction('xml-stylesheet')
89Name tests
- Name tests only select nodes of one type at a
time, depending on the axis - This is called the principal node type for an
axis - Attributes while on the attribute axis
- Namespace nodes while on the namespace axis
- Element nodes while on every other axis
- The wildcard
- Selects all nodes of the principal node type
- child selects all element nodes on the child
axis - attribute selects all attribute nodes on the
attribute axis - effectively no different than attributenode()
because the attribute axis can only ever contain
attribute nodes
90Name tests, cont.
- Namespace-qualified wildcards xyz
- Selects all nodes of the principal node type
whose expanded-name has a particular URI part - childxyz selects all element nodes on the
child axis that are in the namespace designated
by the xyz prefix - QNames foo, xyzfoo, etc.
- Selects all nodes of the principal node type
whose expanded-name has a particular local part
and a particular URI part - childfoo selects all element nodes on the child
axis that have local name foo and that are not in
a namespace - ancestorxyzfoo selects all element nodes on
the ancestor axis that have local name foo and
that are in the namespace designated by the xyz
prefix
91How multiple steps are evaluated
- The rightmost step indicates what nodes are
returned - table/tr/td
- The above location path returns a node-set of
zero or more td elements - /doc/section5/paratext() and _at_
- What does the above location path return?
- Each step is evaluated once for each node
returned by the step to its left, using that node
as the context node for the evaluation - The result is the union of the node-sets returned
by all the evaluations of the rightmost step
92Predicates
- A predicate filters a node-set to produce a new
node-set - price. gt 5
- Of all the price child elements, return only
those whose string-value, when converted to a
number, is greater than 5 - The predicate expression is evaluated once for
each node in the node-set to be filtered, using
that node as the context node for the evaluation - The result is converted to a boolean (if
necessary) - If true, the node is retained in the result
- If false, the node is excluded from the result
93Numeric predicates
- When the predicate expression evaluates to a
number - It is interpreted in a special way, such that
- foo5 is short for fooposition()5
- foolast() is short for fooposition()last()
94Context size in predicates
- The last() function returns the context size
- The number of nodes returned by the step (or
arbitrary node-set expression) to its left - foolast() evaluates to foo5 if there are a
total of 5 foo elements
95Context position in predicates
- The position() function returns the context
position - The proximity position of the context node for
the current predicate evaluation - The relative position of the node among all the
nodes being filtered in document order - foo5 returns the 5th foo element in document
order - Unless the step uses one of the four reverse
axes - preceding
- preceding-sibling
- ancestor
- ancestor-or-self
- ancestornode()1 is equivalent to ..
- preceding-siblingfoo1 returns the first foo
element in reverse document order
96Step filters vs. Expression filters
- A predicate can be used to filter two different
things - A location path step
- An arbitrary node-set expression
- node-set_at_foo'bar'
- Gotchas
- //para1 vs. (//para)1
- ancestor1 vs. (ancestor)1
97Comparisons with node-sets
- price gt 20
- True if there are any price element children
whose string-value when converted to a number is
greater than 20 - foobar bat
- Select all foo elements that have any bar element
child and any bat element child that have the
same string-value - Comparisons with empty node-sets always return
false - Gotcha
- foo ! 2
- foo ! bar
- Use not() for the true complement
- not(foo 2)
- not(foo bar)
98Functions overview
- String functions
- string(), concat(), starts-with(), contains(),
substring-before(), substring-after(),
substring(), string-length(), normalize-space(),
translate() - Node-set functions
- last(), position(), count(), id(), local-name(),
namespace-uri(), name() - Boolean functions
- boolean(), not(), true(), false(), lang()
- Number functions
- number(), sum(), floor(), ceiling(), round()
- XSLT adds
- document(), key(), generate-id(),
system-property(), format-number(), current(),
element-available(), function-available(),
unparsed-entity-uri()
99XSLT element overview
100XSLT elements, by use case
Creating nodes xslelement, xslattribute,
xsltext, xslcomment, xslprocessing-instruction
Copying nodes xslcopy-of, xslcopy Repetition
(looping) xslfor-each Sorting xslsort Conditiona
l processing xslchoose, xslif Computing or
extracting a value xslvalue-of Defining
variables and parameters xslvariable,
xslparam Defining and calling subprocedures
(named templates) xsltemplate,
xslcall-template Defining and applying template
rules xsltemplate, xslapply-templates,
xslapply-imports Numbering and number formatting
xslnumber, xsldecimal-format Debugging
xslmessage
101XSLT elements, cont.
Combining stylesheets (modularization) xslimport
, xslinclude Compatibility xslfallback Building
lookup indexes xslkey XSLT code
generation xslnamespace-alias Output
formatting xsloutput Whitespace stripping
xslstrip-space, xslpreserve-space
102XSLT's processing model
103The end construct a result tree
104The means process lists
- If XPath is about trees, then XSLT is about lists
- Populate arbitrary nodes from the source tree
into lists - Iterate over those lists
- For each node in the list, create part of the
result tree - Source tree -gt List processing -gt Result tree
- Thus, there is always
- a current node list, and
- a current node
105Two mechanisms for iterating over lists
- xslapply-templates and xslfor-each
- They both iterate over the nodes of a given
node-set - Supplied by the XPath expression in the select
attribute - For example
- ltxslapply-templates select"para"/gt
- Populate the current node list with para
elements, sorted in document order. For each para
element, invoke the best-matching template rule.
106All XSLT processing begins with...
- A virtual call to
- ltxslapply-templates select"/"/gt
- The current node list initially consists of just
one node - The root node of the source tree
- In other words, the XSLT processor invokes the
template rule that matches the root node - This call constructs the entire result tree
- Nothing happens before it
- Nothing happens after it
107Your job as an XSLT stylesheet author...
- ...is to defineusing template ruleswhat happens
when the XSLT processor executes this
instruction - ltxslapply-templates select"/"/gt
108Template rules
- An XSLT stylesheet contains a set of template
rules - Two kinds of template rule
- Those you define
- Those that XSLT defines for you
- These are called the built-in template rules.
- There is a built-in template rule for each of the
7 types of node - Ensures that all calls to xslapply-templates
will never fail to find a matching template rule - Even if your stylesheet contains no explicit
template rules at all
109The empty stylesheet
ltxslstylesheet version"1.0"
xmlnsxsl"http//www.w3.org/1999/XSL/Transform"gt
lt/xslstylesheetgt
- If you apply the above stylesheet to the example
XML from QUIZ 1... - What will the result be?
110The result
xsltproc empty.xsl quiz1.xml lt?xml
version"1.0"?gt This is a simple XML
document You can do it! There's nothing
to it! Go fast! This will be
interesting Here we go... sub-chapter
Who ever heard of nested chapters?!
another sub-chapter End of sub-chapter
No more nested chapters for now...
111Template rules that you define
- When you define template rules, you override the
default behavior - An explicit template rule is
- An xsltemplate element that has a match
attribute - For example
ltxsltemplate match"foo"gt lt!-- construct part
of the result tree --gt ltxslapply-templates/gt
lt!--...--gt lt/xsltemplategt
112Applying template rules
- ltxslapply-templates/gt
- Short for
- ltxslapply-templates select"node()"/gt
- Process all child nodes of the context node
113Applying template rules an OOP analogy
- ltxslapply-templates/gt
- For each item in the list
- Invoke the same polymorphic function
- Each template rule is an implementation of that
polymorphic function
114Patterns
- The value of the match attribute is a pattern
- Looks like an XPath expression
- Uses a subset of XPath syntax
- But has a more passive role
- Does the current node match this pattern? Yes or
no. - When xslapply-templates is invoked, for each
node in the list, the XSLT processor searches all
the patterns of the stylesheet for the
best-matching one
115Example patterns
- Example patterns
- /
- /doc_at_format'simple'
- bar
- foo/bar
- section//para
- _at_foo
- _at_
- node()
- text()
-
- xyz
116Does the pattern match?
- Informal
- If this pattern were an expression, would the
node in question ever be selected by it? - Formal
- A node matches a pattern if the node is a member
of the result of evaluating the pattern as an
expression with respect to some possible context
node.
117Template rules with multiple patterns
- Separate the alternative patterns with
- ltxsltemplate match"foo bar"gt...
- Is short for
ltxsltemplate match"foo"gt lt!--...--gt lt/xsltem
plategt ltxsltemplate match"bar"gt
lt!--...--gt lt/xsltemplategt
118What about conflicts?
- A foo element would match both of these template
rules
ltxsltemplate match"foo"gt lt!--...--gt lt/xsltem
plategt ltxsltemplate match""gt
lt!--...--gt lt/xsltemplategt
- Which one gets invoked by ltxslapply-templates
select"foo"/gt?
119Two steps to resolving conflicts
- When more than one template rule matches
- Eliminate rules with lower import precedence.
- Eliminate rules with lower priority.
- Only one rule should be left, otherwise error
- Import precedence depends on what file the rule
occurs in - Where it occurs in the import tree (via
xslimport) - Priority depends on
- The priority attribute of the xsltemplate
element, or - The default priority (when priority attribute is
absent)
120Default priority
- Priority is a positive or negative decimal number
- The higher the number, the higher the priority
- There are four default priorities
- -.5
- -.25
- 0
- .5
- -.5 -.25 0 .5
- _________________________________
121Default priority depends on...
- ...the syntax of the match pattern
- The most common pattern format has a priority of
0 - 0
- Match a particular name
- foo, xyzfoo, _at_foo, _at_xyzfoo,
processing-instruction('foo') - .5
- The highest default priority
- Any pattern with a predicate or multiple steps
- foo/bar, foo2, foo_at_good'yes'
122The lower default priorities are...
- -.25
- One-step wildcards within a namespace
- xyz, _at_xyz
- -.5
- The lowest default priority
- One-step wildcards regardless of name
- , _at_, text(), comment(), processing-instruction()
, node()
123Modes
- Modes allow you to process the same node again
but do something different this time - ltxslapply-templates select"heading"
mode"toc"/gt - ltxsltemplate match"heading" mode"toc"gt...
- When the mode attribute is absent, that means the
default (unnamed) mode - You can segment your template rules into sets
organized by concern - What they generate in the result tree
124The built-in template rules
- For elements and root nodes
- Apply templates to children
ltxsltemplate match"/ "gt ltxslapply-template
s/gt lt/xsltemplategt
- For text nodes and attribute nodes
- Output the string-value of the node
ltxsltemplate match"text() _at_"gt
ltxslvalue-of select"."/gt lt/xsltemplategt
125The built-in template rules
- For processing instructions and comments
- Do nothing
ltxsltemplate match"comment()
processing-instruction()"/gt
- For namespace nodes
- Do nothing
126Template rule content
- Three kinds of elements
- XSLT instructions
- Any element in the XSLT namespace, e.g.,
ltxslvalue-of/gt - Literal result elements
- Any element in any other namespace, or no
namespace - Creates a shallow copy of itself to the result
tree - Extension elements
- Any element in a namespace that's declared as an
extension namespace (using the extension-element-p
refixes attribute on the xslstylesheet element)
127Attribute value templates
- Attributes on literal result elements can contain
dynamic values, delimited by curly braces - ltpara class"_at_format"gt...
- To include a literal curly brace, double it
- ltfoo bar"not interpreted as XPath"/gt
128Miscellaneous topics...
129(No Transcript)
130The template rule engine
- A lot goes on behind-the-scenes
- xslapply-templates is the most important
instruction in XSLT - ltxslapply-templates select"para"/gt
- This means Apply templates to the para element
children of the context node.
131But what does apply templates mean?
- ltxslapply-templates select"para"/gt
- Let the para node-set populate the current node
list (in document order) - For each node in the list, invoke the
best-matching template rule - A template rule
132(No Transcript)
133(No Transcript)
134Namespace node quiz
- Example
- ltfoo xmlnse"http//example.com"gtltbar/gtlt/foogt
- Quiz How many namespace nodes in the above
document?
135Answer
- Example
- ltfoo xmlns"http//example.com"gtltbar/gtlt/foogt
- Quiz How many namespace nodes are in the above
document? - ANSWER 4
- Two namespace nodes for each element
- As we'll see, namespace nodes are a property of
the element for which they're in scope. - Doesn't that make for a huge proliferation of
namespace nodes? - Yes.
- Should I care?
- Hardly ever.
136QNames in XPath/XSLT
- QNames are expanded
- Into local and URI parts
- Using a set of namespace/prefix declarations
- Supplied in the XPath expression context
- This does not include a default namespace
declaration (declared by xmlns) - Thus, if you want to select nodes in a particular
namespace, then you must use a prefix - In other words, a QName without a prefix always
designates a node that is not in a namespace