Title: Introduction to XSLT
1Introduction to XSLT
- David G. Durand
- Director Electronic Publishing Services, Ingenta
Inc. - Adjunct Associate Professor, Brown University
2Thanks
3What is XSLT?
- eXtensible Stylesheet Language for
Transformations - Language for transforming XML documents
- A programming language for XML documents
- A functional language, based on value
substitution - Augmented with pattern matching
- And also template substitution to construct
output (based on namespaces - Uses XML syntax
4Why transform?
- Convert one schema to another
- I say potato, you say paragraph
- Rearrange data for formatting
- Present style languages cant re-order or copy
- see section ltxref sidsec37/gt
- Project or select document portions
5Some special transforms
- XML to HTML for old browsers
- XML to LaTeXfor TeX layout
- XML to SVGgraphs, charts, trees
- XML to tab-delimitedfor db/stat packages
- XML to plain-textoccasionally useful
- XML to FOXSL formatting objects
6Document Transformation
- The perspective is tree editing, not syntax
- Basic operations
- Changes to node properties
- Structural rearrangement
- Several models for this kind of task
7Models for tree editing
- Functional
- Rewrite rule-based
- Template-based
- Imperative
8Functional tree rewriting
- Recursive processing
- Invoke start function at the root, construct a
new tree - Can think of this as node functions
- Result is compositional substitution is
generally nested - Side effects often avoided caching values,
clarity.
9Rule-based (rewriting systems)
- A transformation is defined by a list of
pattern/result pairs - Each is a piece of a tree with holes
(variables) - A match leads to replacement of the matched tree
nodes by a result tree - Variables shared between pattern and result allow
preservation and rearrangement of arbitrary data - Poweful, incremental, definitions
non-deterministic processing
10Template based processing
- This is a model in which a pattern document is
the starting point - This model is very familiar from many web-based
systems. - It contains literal results interleaved with
queries and sometimes imperative code - Well-suited to repetitive or rigid structures
- Often requires extensions to deal with recursion
and looping - Frequently appropriate for database-style XML
11Imperative
- Parser calls imperative code, which uses
- Stacks
- Global variables
- Explicit output commands
- Result is a side effect.
- Reasoning about the program may be hard, but
creating it often starts out easily - This approach makes it easy to create non-XML, or
ill-formed XML documents
12Whats the biggest drawback to tree editing?
- Buffering!
- You need a copy of the tree to edit
- This means that its very easy to build
transformer for a document entirely in-memory - Doing this from secondary storage is fairly
subtle, and has its own performance penalties - This is a complex speed/size/coding effort
tradeoff - This is one reason imperative approaches are
sometimes appealing even to purists.
13What side are we on?
- XSLT falls squarely in the middle
- Styles of XSLT transform
- Functional
- Rule-based
- Template-based
- Imperative (although unusual)
14XSLT and transformation styles
- Rule-based substitution (but results are like
template languages) - XPath addressing also looks like queries in
traditional template languages - Limited non-determinism
- Sufficient control over rule evaluation order
that functional transformations are easy
15Where does XSLT fit?
- Dependencies
- XML -gt XPath -gt XSLT -gt XSL
- The WGs involved
- XSL Working GroupXML Linking for XPath
- Status
- Full W3C Recommendation, in wide use
- http//www.w3.org/TR/xsl/
16XML Documents as trees of nodes
- Root
- Elements
- Attributes
- Text Nodes (not characters)
- Namespaces
- Processing Instructions
- Comments
17XML Document order
- Root -- First
- Elements -- Occur in order of their starts
- Text Nodes -- As if children (leaves)
- Attributes, namespaces -- Attached to element,
unordered - PIs, comments -- Leaves like text nodes
18Other XML notions
- XML declaration identifies a document as
intending to conform to XML rules - DTD or schema rules for permissible elements and
attributes for a genre - Well-formedness correct XML syntax, but maybe
not valid to specified DTD - XML name token ok as element/attr names
- Stylesheet PI hooks document to ss.
19XPath and its use in XSLT
- An expression language over XML trees
- Used to identify sets of elements
- all paragraphs
- all paragraphs directly inside footnotes
- the section with IDsec37
- footnotes with authorKnuth
- first paragraph in each section
- the parent of each caption
- Then you can say what to do with them
20The math
- For all nodes, gaps between children are
numbered. - Before first child counts as 0
- Text nodes count like elements
- So ltpgta ltemgtbiglt/emgtthinglt/pgt has three children
in the p - Characters count within text nodes
- Before the first character is 0
21Counting locations
22Basic kinds of pointing
- Directions of navigation/specification
- Finding elements by ID
- Finding nearby/related elements
- ancestors, children, following-siblings, etc.
- Finding attribute and namespaces applicable to an
element - Finding strings in content
- Qualifications
- Test properties of locations
- Attributes, types, content
- Combine all these in multiple "steps"
23Anatomy of a location step
predicate
childpara_at_type"weak"3
node test
literalstring
position test
axisname
attributereference
Finds the third child of the current node that
(a) is an element of type 'para' and (b) has
a 'type' attribute whose value is 'weak'
Case matters
24Location step details
- A node test tests
- The element type for axes of elements,
- attribute name for axes of attributes,
- or an explicit node-type test, e.g. text()
- Multiple predicates (left-to-right)
- Predicates can be arbitrary expressions
- Shorthhand
- Default axis is child
- Number in predicate is position test
- E.g. chap4/sec5/para
25The simplest functions
- root()
- Locates the root of the containing resource
- (not the document element )
- lt?foo?gtltdocgt...lt/docgtlt!-- hi --gt
- Abbreviation /
- id()
- Locates element with that ID value
- Note Finding this requires DTD
- Cf descendantattribute(id)'foo'
- Can have multiple ID tokens in argument
- id('chap37 sec12 xyzzy')
26root() and id()
root()/
id("p37")
27Relative axes in general
- Locate nodes by genetic relationships
- Axis name specifies relationship
- Always count outward from starting point
- Predicates (more later) follow in
- Pick from candidates along the axis
- Test serial position
- Test element type name or attribute values
- Can have embedded XPointer expressions
- Special "node test" predicates after ""
- childpara tests element type
- childnodetype() tests node type
28child axis
- childtypepredicate
- Locates direct substructures
- (only elements have children)
- (attributes are not children)
- First (eldest) child that is a para
- childpara1
- Last (youngest) child that is a para
- childparalast
- Abbreviation /
- sec2/para1
29parent and self axes
- parent
- Gets direct parent of a node
- Only elements and root can be parents
- Every node but root has a parent
- Element containing one with id'foo'
- id('foo')/parent
- Can use predicates to filter
- id('foo bar')/parentsection
- self
- Returns the node you started at ('context')
30child, parent, self
id('intro')/child2
id("summary")/self
id('p37')/parent
31ancestor, ancestor-or-self
- ancestor
- Locates direct and indirect ancestors
- First ancestor (parent) that is a div
- ancestordiv1
- All ancestors that are divs
- ancestordiv
- ancestor-or-self(args)
- Same except the context node counts
- Example sec containing id foo, even if the id is
on the sec itself - id('foo')/ancestor-or-selfsec
32ancestor()
id(p37)/ancestorchapter1
id(p37)/ancestor3
id(baz)/ancestor-or-self selfp or selfa
33descendant axis
- Descendant (not descendent!)
- Locates direct and indirect sub-nodes
- Depth-first, left-to-right ( start-tag order)
- Third descendant that is a para
- descendantpara3
- All descendants that are FOOTNOTEs
- descendantFOOTNOTE
- Abbreviation //
- descendant-or-self
- Same except that context node counts
34descendant, -or-self
/descendant-or-selfchapter id(intro)/descendant-
or-selfchapter
/descendantsection1
35Preceding/following -sibling
- preceding-sibling, following-sibling
- Locate preceding (older)/following (younger)
siblings - Closest node is 1 farthest is "last"
- PIs, comments, text nodes count
36preceding/following
- preceding and following
- Works in order of start-tags (pre-order)
- Locate many nodes other than ancestors
- Not frequently useful
- Can land you at odd places since the tree
structure is not really involved - One useful case
- Find prev/next element X, wherever
- ltmanuscript-page-start n"25" /gt
- ltfootnotegt
37following-sibling() etc.
id(intro)/following-siblinglast
id(summary)/precedingp3
38Preceding, another way
id(summary)/precedingp3
ltdocgt lttitlegtIntrolt/titlegt ltabstractgtlt/abstract
gt ltchapter ID''intro'gt lttitlegtlt/titlegt ltsecti
ongt lttitlegt lt/titlegt ltp ID'p377'gt lt/pgt
ltpgtlta name'baz'/gt lt/pgt ltlistgt lt/listgt ltpgt
ltxref href'id(intro)'gt lt/pgt lt/sectiongt ltsectio
ngt lt/sectiongt lt/chaptergt ltchapter
ID'concepts'gt lt/chaptergt ltchapter
ID'summary'gt lt/chaptergt lt/docgt
Real document would have lots of text nodes
"lt" indicates the candidate nodes counting back
to the right one
39attribute axis
- attributename
- Locates attribute specification, not value
- Abbreviation _at_
- To refer to second attribute ofltp id"hello"
status"draft"gt - Use id(hello).attribute("status")
- Careful!
- Attributes of an element are unordered
- Attributes have parent elements, but are not
their children
40namespace axis
- XML namespaces are declared via attributes
- And apply throughout descendants
- ltsec xmlnsmy"http//"gtltmytitlegt...
- Much like attribute nodes
- All active namespaces are accessible via the
namespace axis from a given element. - Distinct elements do not share ns nodes
41Summary axes and functions
- root( ), id( )
- parent, self, child
- ancestor, ancestor-or-self
- descendant, descendant-or-self
- preceding-, following-sibling
- preceding, following
- attribute, namespace
42XPointer datatypes
- Strings
- Not the same as the location of some text
- Unicode abstract characters
- (implementers must normalize surrogate pairs)
- Quote literals with ' or "
- Numbers
- IEEE 754 standard floating point
- Booleans
- true() and false()
- Locations and location sets
43XPointer operators/functions
- Math - div mod
- (- allowed in names, so precede by space)
- sum(location-set), floor(), ceiling(), round()
- id('foo')//img_at_height _at_width gt 100
- Logic or and not()
- id('foo')//_at_type'a' or _at_type'b'
- Comparisons ! lt gt lt gt
- id('foo')//img_at_height lt _at_width
- Escape when needed in XML
- lta href"http//www.example.com/foo.xml
xpointer(id('foo')//img_at_height lt _at_width)"gt
44Comparisons
- For node sets
- A comparison is true if there is a node in each
set for which the comparison on the string values
is true. - For other things
- If at least one side is Boolean, compare Boolean
- If at least one side is a number, compare numeric
- Else convert to strings and compare
45Specialty functions
- last() returns locations in current context
(candidate set) - position() returns where the current location is
in the context - name() returns the node's "expanded name"
(including namespace) - string(), boolean(), number()
- lang(string) to test a location's xmllang value
46String functions
- concat(string,)
- starts-with(string, string)
- contains(string, string)
- substring-before (string, string)
- substring-after (string, string)
- contains(string, string) -- from 1!
- string-length(string)
- normalize-space(string)
- translate(string, from, to)
47Advanced notes on strings
- Every node has a "string value"
- "" comparison does not mean "the same node"
- Concatenated text of all descendants
- No spaces inserted (e.g. between list items)
48On to XSLT proper
49Whats inside an XSLT transform?
- Any number of templates
- A template uses Xpath to match nodes
- Highest priority matching template selected
- Then the remplate takes over and generates
- Literal output XML (based on namespace)
- Computational results (of XSLT functions)
- Results of further template applications
- Results of queries on the document
- Many options
50The process
- XSLT takes
- A source XML document
- A transform (XSLT program)
- XSLT applies templates to found nodes
- (may delete or include the rest)
- (may process in document or tree or any order)
- XSLT generates
- A result XML or text document
51The boilerplate
- ltxslstylesheet version"1.0 xmlnsxsl
"http//www.w3.org/1999/XSL/Transform"gt
- ltxsltemplate match"/_at_text()"gt
- ltxslcopy-ofgt
- ltxslapply-templates select"_at_"/gt
- ltxslapply-templates/gt
- lt/xslcopy-ofgt
- lt/xsltemplategt
- lt/xslstylesheetgt
52From Copy to Transform
- lt?xml version1.0?gtlt!-- Rename all p elements
to para --gtltxslstylesheet xmlnsxsl
"http//www.w3.org/1999/XSL/Transform"gt
ltxsltemplate match/_at_text() priority1gt - ltxslcopygt
- ltxslapply-templates select"_at_"/gt
- ltxslapply-templates/gt
- lt/xslcopygt
- lt/xsltemplategt
- ltxsltemplate matchp priority2gt
ltparagt ltxslapply-templates/gt
lt/paragt  lt/xsltemplategtlt/xslstylesheetgt
53How do you apply one?
- Refer via Stylesheet PI
- Defined in W3C xml-stylesheet rec
- lt?xml-stylesheet hrefURI type title
media charset alternateyes ?gt - Apply via standalone program
- E.g. XT, Xalon, Saxon (see Web for latest
versions)
54Caveats
- Many constructs have extra options
- These are more constructs
- We will not cover all these
- For example
- ltxslstylesheet idIDextension-element-prefixes
my-Fnsenclose-result-prefixeshtmlversion
1.0xmlspacedefaultgt
55Template styles
- Push vs. Pull templates
- Or, per Michael Kay
- Fill-in-the-blanks
- Looks like output document with pulls to merge
- Navigation
- Adds top-level ltxsltransformgt, macros
- Rule-based
- Conceptually, a template for each elemet type
- Computational
- Gory processing to generate markup from none
56At the top level
- Key thing templates
- Also several option-settings
- ltxslincludegt -- must be first
- ltxslimportgt
- ltxsl strip-spacegt or ltxslpreserve-spacegt
- ltxsloutputgt, ltxsldecimal-formatgt
- ltxslkeysgt, ltxslnamespace-aliasgt
- ltxslattribute-setgt, ltxslvariablegt, ltxslparamgt
- Most of these are more advanced.
57Anatomy of a template
- XPath to select elements to apply template to
- (this is where programming/scripting comes in)
- XML to output, for each instance selected
- Embedded within that output
- XSLT instruction elements
- Literal output (including XML tags)
- References to content to transclude
- Place to put results of transforming the
elements children (if desired)
58Trivial Templates Tag Renaming
- ltxsltemplate matchdiv_at_typeidx"gt
ltindexgt ltxslapply-templates/gt
lt/indexgt lt/xsltemplategt - ltxsltemplate matchdiv1"gt ltdiv levelgt
ltxslprocess-children/gt
lt/divgt lt/xsltemplategt
59Trivial Templates Mapping to HTML
- ltxsltemplate matchfn_at_authKnuth"gt
ltblockquote stylecolorredgt
ltxslapply-templates/gt lt/blockquotegt lt/xsltemp
lategt - ltxsltemplate matchprice"gt ltugtltbgt
ltxslapply-templates/gt lt/bgtlt/ugtlt/xsltemplate
gt
60Template options
- Match xpath
- Which elements to apply template to
- Name qname
- Name a template for later reference
- Mode -- (limit template to work in a certain
named mode -- more later) - xmlspace defaultpreserve
- Override inherited space-handling
- Priorityn -- for conflicting rules
61The ultimate default
- Elements are not copied
- Attribute values and text are copied,
- Thus a transform with no templates except for the
root, strips markup from a document - ltxsltransformgt ltxsltemplate match//gt
lt/xsltemplategtlt/xsltransformgt
62Priority example
- Delete all nested ltlistgts
- ltxsltemplate matchlist/list
priority2gt lt!-- deleted nested list
--gtlt/xsltemplategt - ltxsltemplate matchlist priority1gt
ltlistgtltxslapply-templatesgtlt/listgtlt/xsltemplategt
63Template priority
- Multiple templates may match an element
- lttemplate priority3 matchh1gtlttemplate
priority5 match_at_classbiggtlttemplate
priority9 matchh1_at_idS1gt - Highest priority number wins
- Priorities are integers, including negative
- There are also default rules
- All have priority -0.5 lt p lt 0.5
64What goes in a template?
- Literal XML to output
- Pull references to other content
- Instructions to generate more output
- Setting and using variables
- Invoking other templates like macros
- Manually constructed XML constructs
- Conditional instructions (if, choose, etc.)
- Auto-numbering hacks
65Instructions apply-templates
- ltxslapply-templates selectxpath
modeqnamegt - Main use (no attributes or content)
- mark where to include result of processing
children - select
- Include certain children
- selectsecurepublic
- Pull (transclude) anything from elsewhere
- select//idwarning17
- Mode Apply only templates of this mode
66Keeping things in variables
- 2 types (names are XML qnames)
- Variables are assigned once and for all
- Parameters can be overridden later
- Value types
- A template
- The result of instantiating a template
- Node-set, string, Boolean, or number
- An RTF is a restricted type of node-set
- References varname
67Setting XSLT variables
- Default parameters declared at top level
- ltxslparam namep selects/gt
- or
- ltxslparam namepgt lttemplategtlt/templategtlt/
xslparamgt - Override via similar xslwith-param
- ltxslwith-param namepgt
lttemplategtlt/templategt lt/xslparamgt
68Instructions call-template
- Invoke a template (like a subroutine)
- ltxslcall-template nametgt ltxslwith-param
namep selectxpathgt
lt/xslcall-templategt
69Using XSLT variables
- Limited processing can be done on RTFs
- Mainly string processing
- Embed variables via varname
- Can do for markup as well as content
- Can process via functions (later)
70ltxslvalue-ofgt
- ltxslvalue-of selectexpr disable-output-esca
pingyesnogt - Outputs the string value of the selected node(s).
- Any type can be cast to string.
71ltxslcopy-ofgt
- ltxslcopy-of selectexpr/gt
- No content allowed
- Select attribute picks what to copy
- Using the usual XPath method
- The result is copied
- A node-set is copied (entire forest of subtrees)
- An RTF is copied (likewise)
- Anything else is cast to a string that is copied
- No processing is allowed enroute
72ltxslcopygt
- ltxslcopy use-attribute-sets qnamesgt
ltxsltemplategtlt/xsltemplategtlt/xslcopygt - Generates the start- and end-tags
- Does not include attributes or children
- May contain ltxslapply-templates/gt etc.
73Conditional constructs
74ltxslifgt
- ltxslif testboolean-exprgt
ltxsltemplategtlt/xslifgt - Applies the template only if the expression
evaluates to true. - These can be nested
- No else construct
- See also xslchoose (case or switch)
- E.g. Test_at_showT
75ltxslchoosegt
- Like select/switch/case statement
- Good for handling enumerated attributes
- ltxslchoosegt ltxslwhen testboolean-exprgt
ltxsltemplategt lt/xslwhengt
ltxslotherwisegt ltxsltemplategt
lt/xslotherwisegtlt/xslchoosegt
76ltxslfor-eachgt
- ltxslfor-each selectnode-set-exprgt
- May contain
- Xslsort -- any number of keys
- Template
- Applies template to each node found
- ltxslsort selectstring-expr langlg
data-typetextnumberqname
orderascendingdescending
case-orderupper-firstlower-firstgt
77More macro-type instructions
78ltxslapply-importsgt
- Affects templates imported via xslimport that
would not otherwise by applied - Imported templates have lowest priority
- Invoke from within a template
79ltxslvariablegt
- Declares a variable
- Variables are scoped to where declared
- ltxslvariable nameqname selectexprgt
ltxsltemplategt
80ltxslmessagegt
- Issues a message to the output
- terminateyesno
- Message is specified via contained template
- Thus may include data from source
81ltxslfallbackgt
- Provides backup for when an instruction fails
- Contains template to use
- Example
- trying to use an unknown extension instruction
82ltxslnumbergt
- Used to generate auto-numbering
- ltxslnumber levelsinglemultipleany
countpattern -- which nodes count?
frompattern -- starting point
valuenumber-expr -- force value formats
-- (not covering) langlg -- lang
to use letter-valuealphabetictraditional
grouping-separatorchar -- 1,000
grouping-sizenumber -- 3 in EN/gt
83Numbering example
- ltxsltemplate selectlistgt ltxslelement
nametoplistgt ltxslattribute
namemarkergt ltxslnumber
levelsingle/gt lt!--count defaults to
siblings--gt lt/xslattributegt
lt/xslelementgtlt/xsltemplategt - multiple -- gathers up sibling numbers of
ancestors - ltxslnumber levelmultiple format1.1.1
countchapsecssec/gt
84Building XML from parts
- Why?
- Generate element type name, etc. by expression
- Content is any template
- ltxslelement nameqname namespaceuri use-
attribute-setsqnamesgt - ltxslattribute nameqname namespaceurigt
- ltxslprocessing-instruction namencnamegt
- ltxslcommentgt
- ltxsltext disable-output-escapingyesgt
85Attributes for XML constructors
- name namesp u-a-s d-o-e   Â
- ltxslelement gt
- ltxslattribute gt
- ltxslprocessing-instruction gt
- ltxslcomment gt
- ltxsltext gt
86Examples
- Generate element attributes
- ltxslelement nameconcat(_at_type,-,_at_ngt
ltxslattribute namestylegt
font-size12pt displayinline
lt/xslattributegt ltxslapply-templates/gtlt/xsle
lementgt - (more later on concat() later)
87Data handling via functions
- For strings
- For numbers
- For truth values
- For XML information
88String values
- Anything can be cast to a string
- Boolean true or false
- Numbers To decimal
- Nodes
- Root, Elements character content of all
descendants - Text nodes the character content
- Attributes the attribute value
- Comments, PIs the character content
- Namespaces the namespaces URI
89For strings
- String(object) -- explicit type-cast
- Concat(s1, s2, s3,) -- concatenate
- Substring(s, offset, length)
- Substring-after(s,s), Substring-before(s,s)
- Translate(s,from,to)
- Substitute chars in from, with ones from to
- Normalize-space(s) -- nuke extra whitespace
- Contains(s1,s2), starts-with(s1,s2)
- Returns true or false
- String-length(s) -- length in characters
90For numbers and logic
- Number
- Ceiling, Floor, Round, Sum
- Boolean
- True, False, Not
91For XML information
- Id(object)
- If arg is a node-set, each node is cast to string
- E.g. context of //footnote/attr(ref) gets ref
attributes - Else arg is cast to a string
- Filters the context by picking node w/ ids in
list - Many space-separated Ids may be included
- Lang
92For looking around the context
- Count(node-set)
- Returns number of nodes in the argument
- Last()
- Returns number of nodes in the context
- Position()
- Returns the position of the current node in the
context
93For names and namespaces
- Local-name(node-set?)
- Returns local part of the name of the first node
- Name(node-set?)
- Returns entire qualified name of the first node
- Namespace-uri(node-set?)
- Returns the uri identifying the namespace of the
first node
94A few examples
- Creating an SVG graphical overview of your
document - Counting and displaying document statistics
- Testing beliefs about document structure
- Merging in annotations or transcluded data
95Oddities of XPath and XSLT
- Navigational language for specifying pattern
matches - You specify the tree pattern implicitly by
specifying a query for a node where a pattern
will be replaced - This sometimes makes the structure less explicit
- You can invoke further processing on children
- You use template-style access functions rather
than pattern variables
96Surface Oddities
- The language is a mixture of predicate / query
and structural pattern - Unix path syntax and query syntax syntax make a
peculiar mix - Matching within XSLT is always relative to a
particular node, so the first few times results
can be very puzzling
97Strategies for XSLT
- Try to pick a single style as much as possible
- May vary by project
- Mixing may be necessary but can get confusing
- Be sure you understand (and probably override the
default rules) - Shorter patterns are better
- ltxslvalue-ofgt and ltxslifgt may be easier to deal
with than a complex path
98Strategies
- Use several filters in row
- Its often easier to manage a series of global
changes, than interactions between several
complex conditions. - Intermediate results make debugging easier
- Intermediate results may be cacheable
- Critical for online applications
- Where possible code things one element at a time
99References
- Key siteshttp//www.w3.org/Style/XSL
http//www.mulberrytech.com/xsl/xsl-list
http//www.oasis-open.org/cover/xsl.html - Interactive XSLT reference http//www.zvon.org/xxl
/XSLTreference/Output/ - XSLT 2nd Edition Programmers Reference Michael
Kay Good reference clear, but not really a
tutorial - XSLT XPath On the EdgeTennison And her other
books