XML Processing Performance Comparison with XPB4J - PowerPoint PPT Presentation

About This Presentation
Title:

XML Processing Performance Comparison with XPB4J

Description:

http://www.pankaj-k.net/xpb4j. 1. XML Processing Performance Comparison with ... Download it from http://www.pankaj-k.net/xpb4j as a .zip file. Extract it. ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 41
Provided by: Pankaj8
Category:

less

Transcript and Presenter's Notes

Title: XML Processing Performance Comparison with XPB4J


1
  • XML Processing Performance Comparison with XPB4J

July 25, 2002
Pankaj Kumar, Web Services Architect, HP
2
Agenda
  • XPB4J Whys and Whats?
  • XStat Processing
  • How to run XPB4J? -- Show it with a Demo
  • Measurements
  • Parsing/Processing APIs and implementations
  • What are we looking for?
  • Input Data
  • Measurement Method
  • Results
  • What Next?
  • How can you benefit ( and contribute )?

3
Why?
  • Input for Design and Development
  • Performance Modeling
  • Comparing parser/processor performance
  • Learning XML
  • Having Fun!!

4
A different kind of benchmark
  • A benchmark for developers
  • Traditional benchmarks are for vendors of systems
    to be used as sales tool
  • XPB4J is for developers to study and understand
  • Performance tradeoffs
  • Performance modeling
  • Performance Tuning
  • Focus on relative numbers
  • No single metric

5
Components of XPB4J
  • Infrastructure ( Java code and Jakarta-Ant
    scripts ) to run the processing code on input
    data and report the performance numbers and
    results.
  • A framework to plug any XML processing code
  • A couple of light-weight Java interfaces
  • A specific processing code -- XStat Processing
    code

6
XStat Processing
  • Collect structural statistics on an XML file
  • No. of times an element occurred
  • No. of times it had a particular element as
    parent
  • No. of times it had a particular element as child
  • No. of times it had a particular attribute
  • Amount of character data it had
  • Whether the element was empty
  • Other assumptions
  • Namespaces ignored. Take qualified names as the
    element identifiers.
  • No validation.

7
How to run XPB4J?
  • Download it from http//www.pankaj-k.net/xpb4j as
    a .zip file
  • Extract it. It creates subdirectory xpb4j-0.90
  • Make sure that you have
  • JDK 1.4.x and JAVA_HOME is set to its base
    directory
  • Jakarta-Ant 1.4.x or higher and its bin directory
    is in PATH.
  • Issue ant run
  • Changing Input Data and other parameters
  • Changing Parser implementations

8
XPB4J Demo
  • XPB4J Demo

9
What determines processing time?
  • Processing Activity
  • Input Data Type and size of data
  • Machine ( CPU, RAM, OS, Disk, )
  • JVM implementation
  • JVM state Steady, First few executions
  • Processing API SAX, XmlPull, DOM, JDOM,
    DOM4J , XSLT
  • Parser/Processor implementation

10
Parsing/Processing APIs and implementations
  • SAX
  • JDK 1.4.0, Xerces-2.0.1, GNU JAXP 1.0 beta1,
    Piccolo 1.02
  • XmlPull
  • XPP3, kXML
  • DOM
  • JDK 1.4.0, Xerces, GNU JAXP
  • JDOM (beta8)
  • DOM4J 1.3
  • XSLT
  • JDK1 1.4.0, xalan-2.3.1

11
Input Data
Search Results from Googles Web Services API
on Bill Gates

12
Measurement Machine
  • Self-assembled Server
  • AMD Athlon 900MHz CPU
  • 512 MB RAM
  • Dual boot -- Windows 2000/Mandrake Linux 8.1

13
Measurement Loop
  • // Psuedo code. Wont compile.
  • for (int r 0 r runs
  • Runtime.gc() // Hope that this will force
    garbage collection.
  • long startMem Runtime.totalMemory() -
    Runtime.freeMemory()
  • long startTime System.currentTimeMillis()
  • for (int l 0 l loopcount loops
  • for (file f in input files ) // Do the
    processing.
  • process f
  • long endTime System.currentTimeMillis()
  • long endMem Runtime.totalMemory() -
    Runtime.freeMemory()
  • int avgPT (endTime - startTime)/loopcount
  • int memU (endMem - startMem)/1024
  • System.out.println("Processing Time "
    avgPT " milli secs.")
  • System.out.println("Memory Use " memU "
    KB.")

14
Questions 1
  • How does performance vary with SAX parsers?
  • Fixed
  • Measurement Machine
  • JVM Suns JDK1.4.0
  • Processing Activity XStat
  • Processing API SAX
  • JVM State Steady
  • Variable
  • SAX Parser JDK1.4, Piccolo 1.02, Xerces 2.0.1,
    GNUJAXP-Beta1, Xerces 1.4.4
  • Input Data DS1, DS2, DS3

15
Results 1
16
Questions 2
  • How does performance vary with DOM parsers?
  • Fixed
  • Measurement Machine
  • JVM Suns JDK1.4.0
  • Processing Activity XStat
  • Processing API DOM
  • JVM State Steady
  • Variable
  • DOM Parser JDK1.4, Xerces 2.0.1, GNUJAXP-Beta1,
    Xerces 1.4.4
  • Input Data DS1, DS2, DS3

17
Results 2
18
Questions 3
  • How does performance vary with XmlPull parsers?
  • Fixed
  • Measurement Machine
  • JVM Suns JDK1.4.0
  • Processing Activity XStat
  • Processing API XmlPull
  • JVM State Steady
  • Variable
  • XmlPull Parser XPP3, kXML
  • Input Data DS1, DS2, DS3

19
Results 3
20
Questions 4
  • How does performance vary with Memory Tree
    oriented parsers/processors?
  • Fixed
  • Measurement Machine
  • JVM Suns JDK1.4.0
  • Processing Activity XStat
  • Processing API Memory Tree oriented
  • JVM State Steady
  • Variable
  • Parser/Processor JDK1.4 DOM Parser, JDOM beta8,
    DOM4J, JDK1.4 XSLT Processor
  • Input Data DS1, DS2, DS3

21
Results 4
22
Questions 5
  • How does performance compare across best of
    XmlPull, SAX and DOM parsers?
  • Fixed
  • Measurement Machine
  • JVM Suns JDK1.4.0
  • Processing Activity XStat
  • JVM State Steady
  • Variable
  • Parser/Processor XPP3, JDK1.4 DOM, JDK1.4 SAX
  • Input Data DS1, DS2, DS3

23
Results 5
24
Questions 6
  • How does performance vary with JVM?
  • Fixed
  • Measurement Machine
  • Processing Activity XStat
  • JVM State Steady
  • Input Data DS2
  • Variable
  • Parser/Processor XPP3, Xerces 1.4.4
  • JVM IBM-JDK1.3, JRockit1.3.1, Suns JDK1.3.1,
    Suns JDK1.4

25
Results 6
26
Questions 6
  • How does performance vary with JVM warmup?
  • Fixed
  • Measurement Machine
  • Processing Activity Xstat
  • JVM JDK 1.4.0
  • Input Data DS2
  • Variable
  • Parser/Processor XPP3, JDK1.4, JDOM beta8,
    DOM4J
  • JVM State First time, Steady

27
Results 7
28
Questions 8
  • How does memory use vary with parser/processor?
  • Fixed
  • Measurement Machine
  • Processing Activity Xstat
  • JVM JDK 1.4.0
  • JVM State Steady
  • Input Data DS2
  • Variable
  • Parser/Processor XPP3, JDK1.4, JDOM beta8, DOM4J

29
Results 8
30
Questions 9
  • How does performance vary with input xml
    filesize?
  • Fixed
  • Measurement Machine
  • Processing Activity Xstat
  • JVM JDK 1.4.0
  • JVM State Steady
  • Variable
  • Parser/Processor XPP3, JDK1.4, JDOM beta8,
    DOM4J
  • Input Data 100KB, 1MB, 10MB

31
Results 9
32
Questions 10
  • How does memory use vary with input xml filesize?
  • Fixed
  • Measurement Machine
  • Processing Activity Xstat
  • JVM JDK 1.4.0
  • JVM State Steady
  • Variable
  • Parser/Processor XPP3, JDK1.4, JDOM beta8,
    DOM4J
  • Input Data 100KB, 1MB, 10MB

33
Results 10
34
Questions 11
  • Any Interesting Observation?
  • Fixed
  • Measurement Machine
  • Processing Activity Xstat
  • JVM JDK 1.4.0
  • JVM State Steady
  • Parser/Processor JDOM beta8
  • Input Data 100KB, 1MB, 10MB
  • Variable
  • Node traversal loop Loop1, Loop2

35
Questions 11 ( Contd. )
Loop1 List children elem.getChildren() for
(int i 0 i collectStat((Element)children.get(i), sc)
Loop2 ListIterator li children.listIterator()
while (li.hasNext()) collectStat((Element)li.
next(), sc)
36
Results 11
37
Caveats
  • Different APIs are not perfect substitutes
  • XSLT processors are significantly different from
    parsers
  • Performance should be only one criterion among
    many others
  • Xstat is an artificial processing and favors
    SAX/XmlPull API

38
What Next?
  • Comparison with C/C Parsers/Processors
  • Dynamic generation of input data
  • Framework improvements
  • Better Reporting and Presentation
  • More processing activities
  • Better tuning ?!

39
How can you benefit ( and contribute )?
  • Benefit from XPB4J
  • Gain insight from the report
  • Learn XML by playing with code
  • Validate your assumptions
  • Tune your parser/processor ( if you are an
    implementer )
  • Contribute to XPB4J
  • Run it under your environment and share your
    results
  • Write processing code
  • Extend the framework
  • Discussion mailing list is
  • xpb4j-users_at_lists.sourceforge.net

40
  • Q A
Write a Comment
User Comments (0)
About PowerShow.com