Title: Inside an XSLT Processor
1Inside an XSLT Processor
- Michael Kay, ICL
- 19 May 2000
2About me
- ICL Fellow, systems architect
- Database background
- Developer of SAXON
- Author of XSLT Programmers Referencepublished
by Wrox Press - Recently joined XSL WG as invited expert
3About this talk
- The XSLT Processing Model
- Structure of an XSLT Processor
- Performance
- current limitations
- possible ways forward
- Ideas on future development of the language
4The XSLT Processing Modelfirst approximation
Style
sheet
Source
Result
Document
Document
Transformation
Process
5The XSLT Processing Modelin more detail
Style
sheet
Parsing
Serialization
Stylesheet
Tree
Source
Result
Document
Document
SourceTree
ResultTree
TransformationProcess
6An XSLT Template Rule
Pattern
ltxsltemplate match"appendix/para1"gt
lth4gt ltxslnumber level"single"/gt
ltxslvalue-of select"_at_title"/gt lt/h4gt
ltpgt ltxslapply-templates/gt
lt/pgt lt/xsltemplategt
ResultElement
XPathExpression
Instruction
7Architecture of an XSLT processor
XML Parser
Stylesheet
Tree Builder
XPathcompiler
XSLTcompiler
Compiled Stylesheet
XSLT interpreter
XPath interpreter
Source
Result
XMLParser
TreeBuilder
OutputManager
XML serializer
HTML serializer
Text serializer
SourceTree
8At compile time
- Parse and validate the stylesheet
- Parse and validate all XPath expressions
- and attribute value templates
- Build rule base for matching patterns
- Resolve references to named variables, functions,
and templates - Flatten the import tree
- Optimize XPath expressions
9Where does the time go?
Serialize
Build Source
Output
Tree
Process
Templates
Compile
Stylesheet
10Is Performance a Problem?
- Client side usually not
- XSLT processing is generally faster than download
speed - Server side sometimes
- CPU usage when handling very high throughput
- Memory problems when handling very large
documents
11Some performance tips
- Keep documents small split them first
- Process once, at publishing time
- or use caching
- Do several simple transforms in series
- Avoid complex patterns in template rules
- Use keys
- Use external functions
- Avoid "//item"
12Performance progress
Simpleoptimization
Stylesheet compilation Java code
optimization Lazy evaluation Simple XPath
optimization Tail recursion
20 sec/Mb
Incremental parsing Pipelining Use of
schema Pattern matching Full XPath
optimization Compile to bytecodes
Advancedoptimization
5 sec/Mb
1 sec/Mb
Today
13Interesting research areas
- Database integration transforming a document
without loading into memory - Applying regular expression theory
- Execution as a sequence of serial passes
- Using schema knowledge at compile time
- Eager node numbering
14Potential language features
- Serial transformation language?
- Multi-pass stylesheets
- Higher-level "relational" constructs grouping,
joins, logical quantifiers - Richer data types
- Assignment statement ????
15Summary
- XSLT language is now stable
- XSLT processor technology is starting to be well
understood - First crop of products are capable of significant
performance - Now the research needs to start on the next phase
of optimization techniques