A Table-Driven Streaming XML Parsing Methodology for High-Performance Web Services

1 / 41
About This Presentation
Title:

A Table-Driven Streaming XML Parsing Methodology for High-Performance Web Services

Description:

A Table-Driven Streaming XML Parsing Methodology for High-Performance ... Schema (abbreviated syntax): complexType name='example_type' sequence /sequence ... –

Number of Views:277
Avg rating:3.0/5.0
Slides: 42
Provided by: IBMU491
Learn more at: https://ww2.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: A Table-Driven Streaming XML Parsing Methodology for High-Performance Web Services


1
A Table-Driven Streaming XML Parsing Methodology
for High-PerformanceWeb Services
  • Wei Zhang
  • Robert van Engelen

2
Outline
  • XML Performance
  • Related Work Schema-Specific XML Parsing (SSP)
  • Table-Driven Streaming XML Parsing (TDX)
  • Experiment Results
  • Conclusion

3
XML Performance
  • XML messaging is at the heart of Web Services
  • XML is widely seen as underperforming
  • Increasingly, XML is being used in processes that
    demand high-performance
  • Validation is even worse
  • Often, validation is typically applied during
    debugging and testing, and
  • is often disabled in production systems

4
Why are traditional XML parsers slow?
5
Traditional parsers performance issues(1)
  • Three stages of XML processing
  • Well-formedness parsing
  • Validation
  • Application data handling

application
Validation
Parsing
XML
6
Traditional parsers performance issues(2)
  • Frequent access to schema
  • Comparison done on String (typically inefficient)
  • Work duplicated between validator and
    deserializer
  • Repeated data format validation and conversion
    (e.g. string/integer)
  • Data copying

7
Related Work Schema-Specific XML Parsing (1)
  • Idea
  • Constructing a parser that is hard-coded to
    process XML by exploiting schema information
  • Merging well-formedness parsing and validation

application
XML
8
Related Work Schema-Specific XML Parsing(2)
  • Merging parsing and validation by
  • Constructing PDA Chiu 03
  • No namespace support
  • Converting from NFA to DFA may result in
    exponentially growing space requirement
  • Constructing DFA van Engelen 04
  • Cannot process cyclic XML schema
  • gSOAP toolkit van Engelen 04
  • Based on recursive-descent parsing
  • Not suitable for generic XML parsing without
    application data (de)serialization

9
Table-Driven Streaming XML Parsing Methodology
(TDX)
An integrated Approach to XML Parsing,
validation, deserialization, and even
application-specific events for High Performance
Web Services
XML
10
Table-Driven Streaming XML Parsing Methodology (1)
  • LL(1) Grammar can be generated from schema
  • XML well-formedness parsing can be verified
    through grammar productions
  • XML structure can be verified through grammar
    productions
  • e.g. Occurrence, enumeration simpleType
  • CDATA value validation can be accomplished by
    semantic actions
  • Application-specific events can also be encoded
    as semantic actions

11
An Illustrating Example(1)
Schema (abbreviated syntax)
ltelement nameexample typeexample_type/gt
12
An Illustrating Example(1)
Grammar
Schema (abbreviated syntax)
(1) s -gt ltexamplegt t lt/examplegt
ltelement nameexample typeexample_type/gt
(2) t -gt t1 t2 t3
(3) t1 -gt ltidgt CDATA lt/idgt
//isIdType()
(4) t2 -gt ltvaluegt CDATA lt/valuegt
//isValueType()
(5) t3 -gt ltstategt v lt/stategt
(6) v -gt ON EVENT //doStateON()
(7) v -gt OFF
13
An Illustrating Example(2)
s
ltexamplegt t lt/examplegt
t1 t2 t3
ltidgt CDATA lt/idgt
ltvaluegt CDATA lt/valuegt
ltstategt v lt/stategt
ON EVENT
invoke isIdType()
invoke isValueType()
invoke doStateOn()
Top-down parsing tree
14
TDX Architecture(1)
15
TDX Architecture(2)
16
TDX Architecture(3)
17
TDX Architecture(4)
18
TDX Modularity
  • TDX parsing engine is schema-independent
  • Hot swap modules for SSP

19
TDX Construction Toolkit(1)
  • Two Code generators WSDL2TDX and LL2Table
  • Given a schema or WSDL specification, the toolkit
    automatically generates tables for parsing engine

20
TDX Construction Toolkit(2)
  • Why two generators?
  • Application-specific events can not be generated
    automatically
  • Allows insertion of application specific events

21
TDX Scanner/Tokenizer
  • TDX scanner is also runtime tokenizer
  • Why tokenization?
  • Comparison done on tokens (more efficient)
  • Defined by component tags
  • Element names, attribute names
  • Classified as starting tags, ending tags
  • Enumeration values
  • CDATA, EVENT
  • Normarlized namespace binding
  • ltnamespace,tag_namegt

22
Scanner/Tokenizer example
ltbook xmlns x.org"
xmlnsyy.org"gt lttitlegt XML Bible lt/titlegt
ltauthorgt ltnamegt Bob lt/namegt
ltytitlegt professor lt/ytitlegt
lt/authorgt lt/bookgt
Part of tokens
23
Mapping Rules
  • Define mapping from XML schema to LL(1) grammars
  • Preserves structural constrains
  • Many types of validation constraints are
    incorporated in resulting grammar productions
  • e.g., occurrence constraints
  • Some type-checking constraints are incorporated
    as grammar productions
  • e.g., enumeration simpleType

24
Sample Mapping Rules
25
Mapping Example
ltcomplexType nameexamplegt ltsequencegt
ltelement nameid typeid_type
minOccurs0/gt ltelement namevalue
typevalue_type minOccurs0
maxOccursunbounded/gt lt/sequencegt
lt/complexTypegt
26
TDX Table Generation Example
Grammar
(1) s -gt bE t eE
(2) t -gt t1 t2 t3
(3) t1 -gt bI CD eI
//isIdType()
(4) t2 -gt bV CD eV
//isValueType()
(5) t3 -gt bS v eS
//doStateOn()
(6) v -gt cON EV
(7) v -gt cOFF
27
TDX Table Generation Example(2)
LL(1) Parse Table
28
TDX Parsing Engine Exmple
Parsing Table
bE
TDX Parsing Engine
s
stack
29
Parsing Example (contd)
Parsing Table
bE
TDX Parsing Engine
bE t eE
stack
30
Parsing Example (contd)
Parsing Table
bE bI
TDX Parsing Engine
t eE
stack
31
Parsing Example (contd)
Parsing Table
bE bI
t1 t2 t3 eE
TDX Parsing Engine
stack
32
Parsing Example (contd)
Parsing Table
bE bI
bI CD eI t2 t3 eE
TDX Parsing Engine
stack
33
Parsing Example (contd)
Parsing Table
bE bI CD
TDX Parsing Engine
CD eI t2 t3 eE
invoke isIdType()
stack
34
Parsing Example (contd)
Parsing Table
bE bI CD
TDX Parsing Engine

stack
35
Experiment Results
  • Test environment
  • 2.4 GHz P4, 512 MB RAM, Red Hat Linux 3.2.2-5,
    GNU Compiler g.3.2.3 with option 02
  • Memory-resident XML message
  • Measures with elapsed real time using timeofday()
    for 100 runs
  • Compared with
  • DFA-based Parser
  • gSOAP 2.7
  • eXpat 1.2
  • Xerces 2.7.0

36
Experiment Results(contd)
  • XML Schema for echoString (abbreviated syntax)
  • ltschemagt
  • ltelement name"echoString"gt
  • ltcomplexTypegt
  • ltsequencegt
  • ltelement name"input"
    type"xsdstring

  • maxOccursunbounded/gt
  • lt/sequencegt
  • lt/complexTypegt
  • lt/elementgt
  • lt/schemagt

37
Parsing Performance (1)
XML document size 1024B
38
Parsing Performance (2)
39
Conclusions
  • TDX is fast
  • Integrated approach across layers
  • Avoid schema access at runtime
  • Comparison done on tokens
  • Avoid data copying
  • Avoid format conversions
  • Minimized function calls
  • Optimization based on schema structure

40
Conclusions (contd)
  • XML can be parsed, validated, and deserialized
    efficiently for high-performance Web services
    using table-driven methodology
  • Can be up to several times faster than than
    industry-strength high-performance validating XML
    parsers.
  • Table-Driven methodology can offer high-level of
    modularity, and
  • Provides a mechanism integrating
    application-specific events, such as SOAP
    deserializers

41
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com