Title: The Connection Factory
1The Connection Factory Jeroen van Rotterdam,
CTO May 19th, WWW9
2Contents
- Xhive setup - Xpath - Xpath performance
issues within XML collections
3Xhive
- OO-XML database - Highly scalable - High
granularity - W3C DOM L2 compliant - Xpath 1.0
compliant
4Architecture
5Architecture
6Why XPath
Competing solutions - XML-QL Where-In
constructs - XQL limited - SQL no
alternative Xpath a complete pattern match
language.
7Xpath
Advantages - fairly complete - multiple axes -
supported by W3C - base for Xpointer, Xlink -
base for XML Query WG - user based
functions Disadvantages - document oriented -
minor different tree model - no updates
8Extending DOM
Collection setup Every document is a Bastard
Node
9Library Node
Advantages - Natural extension of DOM -
extendible - closely related to directory
structures - searchable with Xpath
10Library Node
Disadvantages - potential bottleneck
11Xpath
- Xpath in a large PDOM collection
environment 1. Address memory issues 2. Solve
differences in specs 3. Address performance
issues
12Memory issues
- Avoid recursion - make subresults persistent
capable
13Solve differences
Differences in specs are f.i. - getParent on
attributes vs. ownerElement - namespace nodes
14Performance
Increase Xpath performance - Query analysis -
Avoid reparsing - Lazy evaluation - Index
structures - Cache strategy - DTD analysis -
Statistical data
15Performance
1. Query analysis a. Can I simplify my
query f.i /childchapter55
16Performance
1. Query analysis b. Does your query depends on
the context node. Absolute queries are context
independent Give me all chapters where the
title is the same as the book title //chapterti
tlestring(/book/title) Evaluate
string(/book/title) only once.
17Performance
2. Storing parsed queries Compile, optimize
queries only once
18Performance
3. Lazy evaluation f.i. operations on
Nodesets - booleans (evaluate first node) -
strings (first in doc order) - number (string to
number) Example give me all chapters which
have paragraphs /chapterparagraph Findi
ng 1 paragraph will do
19Performance
4. Indexing - getFirstChildElementByName(String
name) - getNextSiblingElementBySameName() -
getFirstChildByType( short type ) -
getNextSiblingByType( short type )
20Performance
5. Caching strategy top level paging/cluster
strategy
21Performance
6. Use DTD information f.i. /childchapter/chil
dbook4 Might return null if you have info on
the DTDs used.
22Performance
7. Gather statistical info DTDs or Xschema
specify structures that may occur, not whats
actually in your collection.
23Conclusion
- DOM within database environments - Xpath on top
of a PDOM - Xpath is fairly complete - Focus on
performance
24WWW9
Beta testers, Developers wanted. Email
info_at_xhive.com Have fun...