Efficient XML Interchange - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient XML Interchange

Description:

Efficient XML Interchange XML Why is XML good? A widely accepted standard for data representation Fairly simple format Flexible It s not used by ... – PowerPoint PPT presentation

Number of Views:196
Avg rating:3.0/5.0
Slides: 25
Provided by: Office20041654
Category:

less

Transcript and Presenter's Notes

Title: Efficient XML Interchange


1
Efficient XML Interchange
2
XML
  • Why is XML good?
  • A widely accepted standard for data
    representation
  • Fairly simple format
  • Flexible
  • Its not used by everyone, but its used by
    enough people to make for a rich tools
    environment
  • Its flexible enough to be used in lots of
    contexts
  • Its text based and human readable, which makes
    it a good archival format

3
XML
  • XML in 10 points
  • http//www.w3.org/XML/1999/XML-in-10-Points
  • Includes (3) XML is meant to be read, and (4)
    XML is verbose by design
  • XML can (but should not be) read by humans, and
    is not very compact

4
XML
  • These design principles also make it very
    difficult to use XML in some environments
  • Wireless military links low bandwidth
  • Mobile devices battery life limitations
  •  Processing efficiency it can take CPU cycles
    to parse XML
  • Data binding

5
Limitations
  • Lots of ships have 64 Kbit/sec at best. It is
    problematic to ship XML across these links
  • CPUs are on Moores law curve, but battery power
    is limited by the state of chemistry. We cant
    assume that faster processors will save us. Lots
    of applications for hand held devices with
    limited battery power (cell phones, etc.)
  • Cell phones dont necessarily have strong CPUs,
    so parsing XML can be expensive relative to other
    tasks

6
Data Binding
  • This is a more subtle problem.
  • ltPoint x1.0 y2.0/gt
  • How do you convert this to an object? You need to
    parse the string 1.0, then convert it to a
    binary representation
  • Its the difference between
  • string x
  • And
  • float x

7
Data Binding
  • Typically something comes in from the wire, and
    you have to do the Java equivalent of
  • Float.parseFloat(1.0)
  • This is expensive when working with numeric-heavy
    documents
  • It is much more efficient to keep the value X in
    a binary representation in the document, then
    simply read it on the receiving side

8
Efficient XML Interchange
  • EXI relaxes some of the requirements of XML in
    order to be more compact, faster to parse, and
    have better data binding characteristics
  • Relax the human readable requirement
  • Allow binary data
  • What you get is an alternate encoding of the XML
    infoset that is more compact, faster to parse,
    and allows deployment in new environments that
    XML previously could not be deployed in

9
EXI
  • EXI is being developed by a W3C working group and
    is on a standards track. The hope is that this
    will become a W3C-blessed encoding of the XML
    infoset
  • Working group draft now working its way to
    approval.
  • Need multiple implementations, blessed by W3C
    technical architecture group, approval by other
    W3C working groups (encryption, processors, etc.)

10
EXI
  • Represents the same data as an XML document,
    only in a more efficient encoding
  • Minimal impact on other XML technologies, such
    as encryption
  •  More efficient to parse, better data binding
    performance

11
EXI
  • http//www.w3.org/XML/EXI
  • Includes file format specification, primer on
    EXI, best practices
  • Note that one thing that is NOT specified is an
    API for accessing the data. This is an important
    and significant omission
  • Lack of a standardized typed API means we still
    have to go through string representations

12
Typed API
  • What is meant by a typed API?
  • DOM and SAX return string values
  • Attr anAttribute
  • // DOM returns a String attribute value here
  • String val anAttribute.getValue()
  • And then we need to convert val into a float via
  • Float aFloat Float.parseFloat(val)

13
Typed API
  • But what we often want is the value specified in
    the schema
  • Float aFloat anAttribute.getFloat()
  • There are proposals for a generalized typed API,
    but it is not part of this standard

14
EXI
  • EXI has several options to handle different
    situations.
  • You have an XML document and a schema
  • You have an XML document but no schema
  • You have an XML document, and a schema that
    almost, but not quite, matches the document

15
Element and Attribute Names
  • Tag names take up a lot of space, and can be
    somewhat expensive to parse
  • ltName firstJames lastMadisongt
  • ltStategtVirginialt/Stategt
  • lt/Namegt
  • Count up the characters used for markup
    here31/55 50-60 of file size for markup tags
  • If we replace the character tags with numeric
    stand-ins we can get much more compact, and it
    will be faster to parse

16
Schema-Informed
  • If you have a schema, that gives you type
    information about the XML document. You know that
    ltfoo x1.0/gt means the x is a float value
    rather than a string, because the schema tells
    you that.
  • That means you can store the 1.0 value in a
    binary format, which is generally more compact
    and has the potential to have better data binding
    with a typed API

17
Schemaless
  • What if you dont have a schema? This means you
    cant exploit type information. But EXI should
    support this situation, because it should be a
    general solution
  • EXI handles this by replacing repeating strings
    with a compact identifier

18
Schemaless
  • ltAddress townMonterey zip93943/gt
  • The strings Monterey and the zip code are
    likely to be repeated many times in an XML
    document. We can create a table of these values,
    and then use the table ID rather than the whole
    string

String ID
Monterey 1
93940 2
San Jose 3
98842 5
19
Almost Schemas
  • If you have a document that doesnt quite match
    the schema, EXI can take a forgiving attitude. It
    uses the schema to encode the types it knows
    about, and uses strings and string table
    identifiers to handle the ones not described by
    the schema

20
Implementations
  • As of now there is one implementation of the
    draft spec, Efficient XML from Agile Delta
    (http//www.agiledelta.com)
  • Other open source projects underway, and some
    commercial projects
  • The standards process requires that multiple
    independent implementations be available before
    the standard is approved

21
Results
  • Example Distributed Interactive Simulation (DIS)
    is an IEEE standard for modeling and simulation.
    It is a binary standard that contains (x,y,z),
    velocity, acceleration, and other numeric-heavy
    data
  • We did an XML representation of the binary DIS
    standard

22
Results
DIS Binary (bytes) DIS XML EXI Format
1 PDU 144 1167 129
1000 PDUs 464,480 3,924,680 365,564
23
Results
  • Somewhat better size than the original binary
    format. The exact size varies somewhat depending
    on the numeric data, while the original binary
    format is always the same size. Exi seems to be
    consistently better, though
  • AND it is marked up in a way that makes it
    equivalent to an XML file. This means we can
    easily access all the tools of the XML ecosystem
    by simply converting it to a text XML
    representation

24
Conclusions
  • Replace all text XML with EXI? No! EXI is
    intended to expand the use of XML into use cases
    that XML could not service. XML mostly does fine
    in its existing environment
  • EXI can be used to XML-ify existing binary
    protocols and get slightly better performance
    with greatly increased interoperability (no one
    knows DIS binary, everyone knows XML)
  • Next great frontier typed XML APIs
Write a Comment
User Comments (0)
About PowerShow.com