Title: XPipe - An XML Processing Methodology
1XPipe - An XML Processing Methodology
- XML SIG, NY USA
- Feb 12, 2002
- Sean McGrath
- CTO
- Propylon
2What is XPipe?
- It is an architecture / methodology /framework
for developing robust, scaleable, manageable XML
processing systems. - based on proven mechanical manufacturing
techniques. Specifically - The Assembly Line Principle
- Component assembly and component re-use
3What is XPipe?
- An open source project hosted on Sourceforge
- http//xpipe.sourceforge.net
- A contribution to the blossoming meme of using
pipeline based processing to tame the burgeoning
complexity of XML transformations - (If you do not find XML transformation
complicated, you are not sufficiently well
informed.) - (And no, XSLT does not solve all your problems)
4What is XPipe?
- A way of thinking about systems that focuses on
structured dataflows rather than Object APIs - It is also
- A Scandinavian sewage treatment technology
- An exhaust pipe system for high performance
engines - A VT100 based strategy game for DECs VAX/VMS
Operating System
5Contents of this talk
- The XPipe philosophy
- Major functional elements
- Some examples
- The XGrid and Commoditized XML Processing
- Some anticipated objections (and answers)
- Relationship to other technologies
6Contents of this talk
- Current status
- Current problems
- Future plans
- Some (contentious) musings
- Something cold to drink
7XPipe Philosophy
- XML is all about (potentially) complex,
hierarchical data structures
8XPipe Philosophy
Cars are complex, hierarchical structures
Henry Fords Model T Ford Assembly Line 1914
9XPipe Philosophy
Lunch is a complex, hierarchical structure
Lunch Assembly Line. NY, 2002
10XPipe Philosophy
We are complex, hierarchical structures
11XPipe philosophy
- What have these scenes got it common?
- Complex construction of cars, tuna melts and
tendons made possible and efficient through - assembly line manufacturing
- re-usable component processes and component
materials - Why not apply this approach to XML
manufacturing?
12XPipe philosophy
- Why does the assembly line approach work?
- Transformation task decomposition
- Re-usable transformation components
- Transformation decomposition is the key to
complexity management. Just ask - Henry Ford
- Herbert Simon (The Two Watchmakers The
Architecture of Complexity) - George Miller (7/-2)
- Adam Smith (An Inquiry into the Nature And Causes
of the Wealth of Nations,1776) - Any electrical or chemical engineer.
13XPipe philosophy
- Component re-use is the key to productivity
- Ask any form of engineer (electrical, chemical
etc.) apart from software engineers - Component re-use remains a holy grail in software
engineering - XPipe is yet another attempt
14XPipe philosophy
- A lot of data processing for the forseable future
will consist of XML to XML transformation - A lot of non-XML data processing can consist of
XML to XML transformations with the addition of
top and tail transformations - Mantra
- Get data into XML as quickly as possible
- Keep it in XML until the last possible minute
- Bring all your XML tools to bear on solving the
data processing problem
15XPipe philosophy
Input XML
Output XML
Top Transformation
Tail Transformation
Non-XML Input
Non-XML Output
16XPipe philosophy
- The philosophy hinges on the fact that every
complex XML transformation can be broken down
into a series of smaller ones than can be chained
together
17XPipe philosophy
- Only so many ways to re-arrange an XML tree
structure - A finite number of fundamental transformations,
from which all higher order transformations can
be derived
18XPipe philosophy
- Transformation Decomposition leads to
- a series of small, manageable, stand alone
problems with an XML input spec and an XML
output spec. - Can build, test, use and then re-use these
transformation components - Very team development friendly
- High cohesion, loose coupling just like the
professor advised
19XPipe philosophy
- Pipeline approach means you can mix nmatch
black-box components that internally use whatever
paradigm best suited the problem - Lexical
- SAX
- DOM
- XSLT
- XDuce, Pyxie, Haskell, AF-NG
20Sample XPipe
DB /CMS
Character Set Mods
Add Doctype validate strip doctype
Lexical
Re-arrange Elements
Validation
Lexical
DOM
Stats FTP
Schematron/ RelaxNG/ Rhino
SQL Replace
Jython
XHTML Generate
Java
XSLT
21XPipe philosophy
- Assertion developers would use a component
based approach to XML processing if they did not
have to write the plumbing (orchestration,
exception handling) themselves - Gee, this problem is complex. Maybe Ill do it
in multiple stages! Gee, now I have to
orchestrate the stages somehow. Batch files/shell
scripts/driver program all ugly and error
prone. Maybe Ill just write a single program
after all
22XPipe philosophy
- Professional developers spend 50 percent of
their time writing plumbing Adam Bosworth - XPipe aims to look after the plumbing letting
developers concentrate on the interesting stuff
23Philosophy Summary
- Preambles
- Make things as complex as necessary but not more
complex than necessary - Solve all the worlds problems but only one at a
time - Dont even think about performance until it is
too late then it will look after itself - Only increase complexity linearly w.r.t.
functionality and only in elevator pitch sized
functionality quanta
24Philosophy Summary 12
- Data processing data transformation w.r.t.
time. - XML is the current runaway winner in the
self-descriptive data stakes and a very good QDDL
(Quiescent Data Description Language)
25Philosophy summary 22
- Inside every complex XML transformation is a
sequence of simpler XML transformations trying to
get out a Pipe - Decomposed transformation new transformations
already componentized transformations -gt
Component Reuse - Inside every graph transformation (read
workflow or business process model) is a
combination of simple Pipes trying to get out
26XPipe Philosophy
Leveled architetecture levels build on one
another but any level is usable independently of
higher levels
Out
Level 2 - XRigs
In
Out
Level 1 - XPipes
In
Out
Level 0 - XComponents
In
Out
27Major Functional Elements XComponents
In
Out
- Developed in any language that runs on the Java
Virtual Machine (Jython, Java, XSLT, Rhino
(JavaScript) etc.) - All XComponents are standalone programs of the
form - Name InputXML OutputXML ErrorXML
Optional Args
28Major Functional Elements - XComponents
- XComponents described in XML form. An XComponent
consists of - Metadata (keywords etc.)
- Documentation
- Pre and Post Conditions
- Unit Tests (input,output XML stream pairs
Pre/Post Conditions) - Code (Java / Jython / XSLT / Exec)
29Major Functional Elements XPipes
In
Out
- A linear assembly of XComponents that together
achieve some useful transformation function - Described in XML
- Documentation
- Metadata (keywords etc.)
- Pre/Post conditions
- Unit Tests (input,output XML stream pairs
Pre/Post Conditions) - References to XComponents (URIs) which are
resolved when the XPipe is installed/executed
30Major Functional Elements XRigs
Out
In
In
Out
- An assembly of XPipes that together achieve some
useful transformation function - Described in XML
- Documentation
- Metadata (keywords etc.)
- Pre/Post conditions
- Unit Tests (input,output XML stream pairs
Pre/Post Conditions) - References to XPipes (URIs) which are resolved
when the XRig is installed/executed
31Major Functional Elements
- Unit Testers
- XComponent, XPipe and XRig level Test Harnesses
- Executives
- XComponent, XPipe and XRig level Execution
Environments (on-the-fly, disk install, compiled,
web service) - (Executing an Xcomponent is identical to
executing an XPipe of arity 1, is identical to
executing an XRig of arity 1)
32Major Functional Elements
- Executives
- Uniprocessor Execution
- Executed on 1 CPU, possibly with separate threads
for each instantiated X - Multiprocessor Execution (Vapor)
- XML based protocol to implement Job Shop work
distribution over a P2P network (XJCL)
33Major Functional Elements XPipe Monitor (Vapor)
34Major Functionality Elements Miscellany (Vapor)
- Whizzy GUI Component and Pipe Editors
- XComponent Creators
- Wrap Java, XSLT etc. into XComponent compliant
XML, Ant build target - XComponent Proxies pretend to be a simple
XComponent but invoke some external functionality
from Windows DLL to SOAP end-point - XPipe masquerading as XComponent this could be
a very powerful paradigm
35Major Functionality Elements Miscellany (Vapor)
- Compilers / Packers
- Pack XPipes/XRigs into standalone XPipes/XRigs
for distribution (with or without an executive) - Compile pure XSLT XPipe into a self contained
translet (self contained or as an XComponent) - Compile away/optimize intermediate files via a
variety of tricks (Jackson Inversion, Java IO
hook, shadow marshalling etc.)
36Simple XComponent examples
- Fundamental Operation Rename Element
- Rename
- Input ltfoogtbazlt/foogt
- Output ltbargtbazlt/bargt
foo
bar
baz
baz
37Simple XComponent examples
- Fundamental Operation - Peel
- Input ltfoogtltbargtbazlt/bargtlt/foogt
- Output ltfoogtbazlt/foogt
foo
foo
bar
baz
baz
38Simple XComponent examples
- Compound Operation - Matryoshka
- Input
- ltfoogtltbargtbazlt/bargtlt/foogt
- Output
- ltfoogtlt/foogtltbargtlt/bargtbaz
foo
bar
foo
bar
baz
baz
39Simple XComponent examples
- KlingonCloak
- Input
- ltfoogtltbargtbazlt/bargtlt/foogt
- Output
- lttag namefoogtlttag namebargtbazlt/taggtlt/taggt
foo
tag typefoo
bar
tag typebar
baz
baz
40Sample XComponents
- Once you start thinking in terms of Pipes
components appear everywhere - Regular fragmentations
- Doctype changer
- Namespace normalizer
- Character set transcoder
- Hash generator
- Architectural Forms
- RelaxNG/Schematron etc
- A validator can be thought of as a component in
an XPipe that mirrors its input on its output
41Sample XComponents
- Reading a file is an XML to XML transformation
- ltfilegtlewisscarrol.xmllt/filegt
- ltpoemgtltlinegtTwas brillig, and the slithy tomes,
did gyre and gimbal in the wavelt/linegtlt/poemgt
42Sample XComponents
- Arithmetic is an XML to XML transformation
- ltexprgt1 2lt/exprgt
- ltresgt3lt/resgt
43Sample XComponents
- Unix pipe utilities e.g. tr
- hello world
- HELLO WORLD
44Sample XComponents
- Conditionals are XML to XML transformation tee
junctions triggered by XPaths
if XPath TRUE branch
In
if XPath
if XPath FALSE branch
45Validation as an XComponent
XML A
XML A
RelaxNG Schematron Jython/Java/JACL XComponent
Input
Output
Validation Log
Error
46Some related open technologies
- - Unix Pipes
- SAX Filters
- TRAX
- XBeans
- Cocoon
- axKit
- Ant
- JXTA
- Translets
- TupleSpaces
47The XGrid
- Grid Technologies computational power on tap
(http//www.gridforum.org) - The XGrid computational power on tap to
execute XPipes/XRigs
48The XGrid
Out
In
Out
DMZ
49Some objections (with some answers)
- It will be slow
- No it wont - Premature optimization is the root
of all evil! - Speed is a three headed monster. Im old enough
to have left the X axis and currently heading for
Y through Z
The 3 Axes to Speed
50Some objections (with some answers)
- It will be slow (cont.)
- Massive Parallelism will kill all von Neumann
throughput arguments - Documents per second, not seconds per document
throughput is the true measure of XML processing
speed - Document fulcra Locality of reference (Denning)
applies to XML processing (more on this later) - A myriad of compile time optimizations on
XPipes possible - Keep the architecture simple and speed will
sort itself out
51Some objections (with some answers)
- Component based software? Harumph! We have heard
that one before - XPipe is data flow based not API based (COM, VBX,
CORBA). They payload is what is important not
the plumbing - Information integration (needed on the server
side) not application integration (needed on the
client side)
52Document fulcra and the scatter/gather pattern
- For any given task t to be performed on documents
conforming to schema s, there is a fragment
expression that can be used to chop any document
into n pieces on which t can be performed
independently - These points are called fulcra and are a function
of (t,s)
53Document fulcra and scatter/gather pattern
- Having identified the fulcra-
- Chop the input document into fragments scatter
phase - Perform t
- Join all the processed fragments together to
constitute the output document gather phase - Three stage XPipe scatter gather are (or more
accurately soon will be) standard XPipe components
54Document Fulcra
Input Doc
Scatter
n fragments
TIME
Invoke t
t
t
t
t
t
n fragments
Gather
Output Doc
55Document Fulcra
- For data-oriented XML, the fulcra often coincide
with the record iteration in the XML schema and
may be independent of t. - For document-oriented XML, the fulcra are much
more dependent on t. - ltColloquialgtA good fulcra based scatter/gather
will make performance head north faster, cheaper
and with a high upper limit than any amount of
hand-crafted, genius level XML coding of your
transformations.lt/Colloquialgt
56The XSLT/DOM -gt SAX non-sequiter
- XSLT and DOM are memory bound trade off between
ease of use and resource usage ease of use
favoured - SAX is not memory bound trade off between ease
of use and resource usage low resource usage
favoured - On xml-dev users often advised to rewrite their
apps using SAX! Ugh!
57XSLT/DOM -gt XPipe
- XPipe and scatter/gather allow you to keep the
ease of use of XSLT/DOM with the finite resource
utilization of SAX - As long as you can identify a good fulcrum
function - They exist more often than not
- If they exist, they are very easily found
58Current status
- The philosophy is known to work
- Seven years agrowing in consulting company (IDM
1995, Digitome) - Uniprocessor XPipe used to develop
- 80-C pipe from Hub notation for a complex
document type to a legacy mainframe display
notation. 120 page spec. - 20-C pipe for semantic validation of legislation
documents
59Current Status
- Version 0.6
- Schemas for XPipes and XComponents on
xpipe.sourceforge.net. feedback required - Sample components (Java/XSLT/Jython) and some
documentation - Simple, illustrative XComponent and XPipe
uniprocessor executive
60Current Status
- Object model for XCompontents in Jython Java
(David Starr) - Object model for Xpipes in Jython
- Execution, testing utilities in Jython
- Start of a NetBeans based XComponent editor
61Current Status
- Uniprocessor XPipe used to develop
- 80-C pipe from Hub notation for a complex
document type to a legacy mainframe display
notation. 120 page spec. - 20-C pipe for semantic validation of legislation
documents - Xpipe and XComponent validators
62Current Status
- Some aspects of the XComponent model need testing
- Parameters
- Exec XComponents
- Pre/Post condition checking
- This will be a point release in late Feb. Then
focus on developing the XComponent repository in
parallel with core dev. - Scatter/Gather raises some interesting scheduling
issues currently being grappled with - Balance between developer-hit and ease of
execution current in favour of low developer-hit
63Current Problems
- No GUI stuff and not enough documentation?
- Everybody agrees that an XML document is a tree
but - The content and structure of the tree depends on
the parser - The content and structure of re-generated XML
(The round-tripping problem) - Roll on XML-SW!
64Current Problems
- Naming things
- Taxonomy of XTLs (XML Transformation Languages)
- Taxonomy of re-usable XComponents and XPipes
65Current Problems
- Flexible transformation scheduling is hard
- Optimal transformation scheduling is very hard
- Calling all process engineers help!
66Future Plans
- Evangelize the idea that DTD validated XML 1.0 is
just Well Formed XML that has been through a pipe
consisting of - A transclusion component (entity expansion)
- A macro pre-processor (conditional marked
sections) - An attribute decorator (implied/fixed attributes)
- A grammar checker
67Valid XML
Well Formed XML
Paremeter Entity Expansion
Conditional Sections
General Entity Expansion
Attribute Decoration
Grammer Validation
Valid XML
68Future plans
- When DOCTYPE goes away (which it will), provide
all DTD functionality as a set of XComponents)
69Future Plans
- Getting to the point where we can grow the
XComponent repository is priority 1 - XRigs, XPipes, and XComponents as web services
(SOAP/XML-RPC, WSDL, UDDI etc.) - Getting the P2P and Grid Technology communities
input into XGrid/XJCL - See if a P2P execution environment for
XRigs/XPipes can be shortcircuited e.g. JXTA - Getting help to develop the XPipe reference
implementation on Sourceforge
70Future Plans
- Development of commercial implementations of
XPipe integrated with leading EAI systems
(Ongoing) - Use of SCADA tools to develop XPipe process
control and monitoring systems - Use of UML tools to create XPipes and XRigs using
state transition diagrams
71Future Plans
- Use of Animation Engineering techniques for CAXTE
tools (Computer Aided XML Transformation
Engineering) - Digging around swarm intelligence, hierarchy
theory, complexity theory, self-assembly,
bio-informatics and nanofabrication for concepts
and tools applicable to XML transformations
72In conclusion
- XPipe is simple
- Simplicity works!
- Plenty of evidence outside of XML engineering
that this approach will work - Plenty of lore and tools from other fields of
science can be brought to bear to build systems
using the XPipe approach
73Musings 1 - Debugging
- XPipe is very debugging friendly
- log2(N) time required for fault diagnosis
- Probes in the form of loggers, RelaxNG
validators, easily plug-inable to a pipe to watch
what is going on. - Pre/Post condition on/off switch is a useful
design by contract debugger - Unit testing at Rig, Pipe and Component level
allows layer at a time re-assembly after a fault
has been fixed.
74Musings 2 Inbetweening and XComponent
development
- Transformation analysts spec the transformation
- Only need to code new components
- Spec XComponent or XPipe with doc, pre/post
etc. but no code - Built in JIT-style acceptance test
- Outsource friendly and third-party market friendly
75Musing 3 - Web Services
- First generation will be a total blind alley
RPC - Document Oriented Messaging not Object Oriented
Messaging the next stage in encapsulation and
loose coupling something like XPipe will be a
pre-requisite.
76Musing 4 Parametric Typing of XComponents
- Numerous XComponents that do the same thing, not
necessarily duplication - Space
- Time
- Infoset considerations
77Musing 5 Pre-validation Transformation
- Killing ourselves seeking one-shot expressivity
in schema validation languages - Many complex validations become a lot simpler if
you do some transformation(s) first - Co-occurrence constraints
- Contextual constraints
- Clear analog with formatting (pre-flow
transformation(s) flow)
78Musing 6 location, location, location
- Abstraction 1 keep code and data on the same
high-speed bus monolithic systems - Abstraction 2 allow code to be downloaded from
the Web sandbox required owing to security
issues - Abstraction 3 leave the code out there and
move the data bandwidth issues and data gtgt code
79Musing 6 location, location, location
- Monolithic bad (have to install stuff which
is very 20th century) - Sandbox bad (the better the sandbox the less
useful the code running in it.) - XGrid Design as if data pulled by the code
(easy model) but DMZ the code data the only
thing the flows over the firewall is the
transformed data
80Thank you
- http//xpipe.sourceforge.net