Title: How Much Pain for XMLs Gains
1How Much Pain for XMLs Gains?
- Michael Champion
- Sr. Technologist, Software AG USA
- XML 2004
2Outline
- Measuring the Pain
- Diagnosing the Causes
- Proposed Analgesics
- Just a Bunch of Snake Oil?
- Conclusions
Laocoon Beware of geeks bearing gifts
3XML Pain vs Existing Formats
- SOAP-HTTP vs RMI
- Suns research shows 10x overhead for SOAP
- XML vs CSV
- Nicola/John show XML parsing 26x slower
- SOAP vs CDR
- Kohlhoff/Steele find SOAP 2-4x bigger, 8-10x
slower - SOAP vs FIX
- Kohlhoff/Steele find SOAP 3-4x bigger, 9x slower
- WS-Security vs SSL
- Don Box asserts 10x slower
- Generally people find XML imposes an overhead of
an order of magnitude
4Diagnosing the Pain
- Element, attribute labels and namespace
declarations create the bloat - Performance bottlenecks include
- Well-formedness checking
- Unicode character conversion
- Char by char string processing
- Node object construction
- Entity reference expansion
- Not to mention schema validation!
- Issue is not text vs binary
- But XMLs particular constraints
5Where Does It Hurt?
- Wireless industry - XML bandwidth requirements
excessive - Maps
- Images
- Enterprise Transaction Processing
- SOAP-based messaging
- Multiple parse-serialize steps
- XML-aware routers, firewalls,...
- See Binary XML WG Use Case doc!
6Relieving the Pain
- Moores Law?
- Doesnt apply to batteries!
- Wireless bandwidth constrained by fundamental
physical laws - In military scenarios, least power/bandwidth at
the pointy end - GZIP?
- Not for small documents
- Considerable processing overhead
- Only improves user latency in low bandwidth - big
CPU scenarios - Better code?
- IBM, MS seem convinced that parsers can be much
faster - Doesnt help with bandwidth
Acetylsalicylic Acid (Asprin)
7More Proposed Analgesics
- Hardware Acceleration?
- not much real world info found
- Ask Datapower, Sarvega, Tarari, ...
- Format Simplification?
- SOAP forbids DTDs
- Obvious interoperability issues!
- Binary Infoset Serializations?
- Much experimentation in the wild
- W3C investigating value of stds
- Assuming shared schema gives best technical
results - No shared schema has best use cases but only 3-5x
speedup over XML text - Hybrid approaches such as VTD-XML
8Lots of Second Opinions
Premature optimization is the root of all evil
XML is about interop, stupid!!!
Fix your XML, dont expect standards to
accommodate your bad practices
If XML doesnt fit your needs, avoid it, dont
pollute it for the rest of us
9Binary XML Snake Oil?
- Binary XML is an oxymoron
- There is ALREADY an unmanagemeable number of XML
variants, we dont need more. - One-size-fits-all binary format is a pipedream
- Industry-specific binary standards are fine, W3C
core standard is premature - Better to invest in optimizing tools for existing
formats
10Facts Not In Serious Dispute
10x XML overhead vs app-specific formats
Wireless needs what XML offers but with less
overhead
GZIP is not the cure for bandwidth pain
Moores Law does not apply to batteries or
wireless networks
Its a user-perceived delay problem, not a
bandwidth problem
Overhead is probably NOT a problem for the
majority of existing XML users!
11Perfect Storm of XML Politics?
Binary XML vs XML text
Infoset vs bits on the wire
Subsets / Profiles vs Complete Recommendations
12Personal Assessment
- Convenience has always come at a performance cost
- Convenience eventually wins
- Right now XML text overhead inhibits adoption in
niches - REAL pain in niches that XML family could address
- This is a genuine dilemma for W3C and mainstream
vendors XML is NOT ubiquitous where it causes
more pain than gain
Laocoon punished for speaking unpleasant truths
13Recommendations
- Dont deny unpleasant facts about XML pains
- Moores Law wont make them all go away
- Be smart about XML
- Doctor, doctor it hurts when I do stupid thing
X! - Densely code information if bandwidth is an issue
- Use the right tool for speed - convenience
tradeoff - Consider hybrid formats such as VTD-XML
- Dont use off-the-shelf XML when you need a
database - Let enterprise-class tools do the heavy lifting
- DBMS, middleware, inference engines ...
- Specialized XML processing hardware
- Leave technology evolution to Darwin, not
Berners-Lee - Mature standards good, premature standardization
bad. Problems and solutions will find each other!
14Further Reading
- Proceedings of W3C Workshop on Binary Interchange
of XML Information Item Sets. http//www.w3.org/2
003/08/binary-interchange-workshop/Report.html - Sun Microsystems position paper to W3C Workshop
Fast Web Services - Matthias Nicola, Jasmi John XML Parsing A Threat
to Database Performance CIKM03 - Christopher Kohlhoff, Robert Steele. Evaluating
SOAP for High Performance Business Applications. - Jimmy Zhang. Better, Faster XML Processing with
VTD-XML. http//www.devx.com/xml/Article/22219. - Michael Leventhal. Binary Showdown. XML Journal
September 2003 - W3C XML Binary Characterization Use Cases.
http//www.w3.org/TR/xbc-use-cases/