Title: XQuery Implementation in a Relational Database System
1XQuery Implementation in a Relational Database
System
- Shankar Pal
- Istvan Cseri, Oliver Seeliger, Michael Rys,
Gideon Schaller, Wei Yu, Dragan Tomic, Adrian
Baras, Brandon Berg, Denis Churin, Eugene Kogan - SQL Server
- Microsoft Corp
2Overview
- Background
- XML Support in SQL Server 2005
- OrdPath labeling of XML nodes
- XML indexes PATH, VALUE, PROPERTY
- Main topic XQuery compilation
- Architecture
- XML operators
- Mapping XML operators to relational ops
- Conclusions
3Background XML Support in SQL Server 2005
- Create table DOCS ( ID int primary key,
- XDOC xml)
- XML stored in an internal, binary form (blob)
- Optionally typed by a collection of XML schemas
- Used for storage and query optimizations
- 3 of 5 methods on XML data type
- query() returns XML type
- value() returns scalar value
- exist() checks conditions on XML nodes
- XML indexing
- More information at http//msdn.microsoft.com/xml
4Background XQuery embedded in SQL
- Retrieve section titles from ltbookgt wrapped in
new lttopicgt elements - SELECT ID, XDOC
- .query('
- for s in /BOOK/SECTION
- return lttopicgt
- data(s/TITLE)
- lt/topicgt
- ')
- FROM DOCS
5Background XQuery supported features
- XQuery clauses for, where, return and
order by - XPath axes child, descendant, parent,
attribute, self and descendant-or-self - Functions numeric, string, Boolean, nodes,
context, sequences, aggregate, constructor, data
accessor - SQL Server extension functions to access SQL
variable and column data within XQuery - Numeric operators (, -, , div, mod)
- Value comparison operators (eq, ne, lt, gt, le,
ge) - General comparison operators (, !, lt, gt, lt, gt)
6Background SIGMOD04 ORDPATH Label of Nodes
node1 is ancestor of node2 ?? ORDPATH
(node1) is prefix of ORDPATH
(node2)
BOOK 1
Section 1.3
Section 1.5
_at_ISBN 1.1
Figure 1.3.3
Title 1.3.1
Title 1.5.1
Figure 1.5.3
node1 precedes node2 in document order ??
ORDPATH (node1) lt ORDPATH (node2)
ORDPATH(1.3) id lt Descendant_Limit (1.3) 1.4
7Background VLDB 2004Indexing XML column
- Primary XML index on an XML column
- Creates Btree tree on data model content of the
XML nodes - Adds column Path_ID for the reversed, encoded
path from each XML node to root of XML tree - OrdPath labeling schema is used for XML nodes
- Relative order of nodes
- Document hierarchy
8Background XML example
- INSERT INTO myTable VALUES (7,
- ltBook xmlns"myns" ISBN "1-55860-3612"gt
- ltSectiongt
- ltTitlegtBad Bugslt/Titlegt
- lt/Sectiongt
- ltSectiongt
- ltTitlegt Tree frogs lt/Titlegt
- ltFiguregtlt/Figuregt
- lt/Sectiongt
- lt/Bookgt)
9Background Primary XML Index Entries
- Encoding of tags types stored in system
meta-data - Additional details not shown
Clustering key
10Background Secondary XML indexes
- To speed up different classes of commonly
occurring queries - Statistics created on key columns of the primary
and secondary XML indexes - Used for cost-based selection of secondary XML
indexes
11Background Handling Types
- If XML column is typed
- Values are stored in XML blob and XML indexes
with appropriate typing - Untyped XML
- Values are stored as strings
- Convert to appropriate types for operations
- SQL typed values stored in primary XML index
- Most SQL types are compatible with XQuery types
(integer) - Value comparisons on XML index columns suffice
- Some types (e.g. xsdatetime) are stored in
internal format and processed specially
12XQuery Processing Architecture
- XQuery Compiler
- Parses XQuery expr
- Checks static type correctness
- Type annotations
- Applies static optimiztns
- Path collapsing
- Rewrites using XML schemas
- XML Operator Mapper
- Recursively traverses XML algebra tree
- Converts each XmlOp to reln operator sub-tree
- Mapping depends upon existence of primary XML
index
XQuery expression
XQuery Compiler
XML algebra tree (XmlOp ops)
XML Operator Mapper
Relational Operator Tree (relational operators)
Reln Query Processor
13Examples of XML Operators
14XML Operator Mapping Overview
PATH Index
XQUERY
PK
OrdPath
XML
PK
Primary XML Index
VALUE Index
PROPERTY Index
Special handling for SELECT XDOC
REL tree
15New operators
- Some produce N rows from M (? N) rows
- XML_Reader streaming, pull-model XML parser
- XML_Serializer to serialize query result as XML
- Some are for efficiency
- Contains to evaluate XQuery contains()
- TextAdd to evaluate the XQuery function
string() - Data to evaluate XQuery data() function
- Some are for specific needs
- Check validate XML during insertion or
modification
16XML Operator Mapping
- Following categories
- Mapping of XPath expressions
- Mapping of XQuery expressions
- Mapping of XQuery built-in functions
17XPath Expressions
- Two cases
- Fully known, forward paths without branching
after path collapsing - Paths without branching that are not fully known
after path collapsing - Segments of the path cannot be collapsed or a
path is split into multiple segments - Occurs most commonly for paths containing
wildcard steps, //, self and parent axes - Evaluated using LIKE operator on XML idx
18Non-indexed XML, Full Path
- XML_Reader produces subtrees of ltSECTIONgt
- Node table rows
- Contains OrdPath
- No PK or PATH_ID
- XML_Serialize reassembles those row into XML data
type - To output result
XML operator tree XmlOp_Path PATH
/BOOK/SECTION Rel operator tree XML_Serializ
e XML_Reader (XDOC, /BOOK/SECTION)
19Query Execution on XML Blob
- XDOC column value in each row parsed at runtime
- Parser is XmlReader (not DOM)
- Evaluate simple XPath (without branching) during
parsing - Rest of processing done in memory using
relational operators - // and are also pushed into XML_Reader
SELECT ID, XDOC.query ('
/BOOK/SECTION 2 ') FROM DOCS
20Sample query execution using Primary XML Index
- /Book/Section ? 31 (by XML Op Mapper)
Clustering key
21Indexed XML, Full Path
- XmlOp_Path mapped to SELECT
- GET(PXI) rows from primary XML index
- Match PATH_ID
- Not shown
- JOIN with base table on PK
XML_Serialize
Assemble Subtree
Apply
Select (b)
Select
b.OrdP OrdPlt DL(b)
GET (PXI)
GET (PXI)
Path_IDSECTIONBOOK
22XML index PATH
- Speeds up path evaluations
- Example /Book/Section ? 31
23Indexed XML, Imprecise Paths
- /BOOK/SECTION//
- TITLE
- Matched using LIKE operator on Path_ID
XML_Serialize
Apply
Assemble subtree of ltTITLEgt
Select (s)
GET (PXI)
Path_ID LIKE TITLESECTIONBOOK
24Predicate Evaluation
- /BOOK_at_ISBN 12
- Search value compared with VALUE column in PXI
- Collapsed path /BOOK/_at_ISBN
- Induce index seeks
- Reduce intermediate result size
- Parent check Par(b)
- Using OrdPath
- Value conversion might be needed
XML_Serialize
Apply
Apply
Assemble subtree of ltBOOKgt
Select (b)
Select
Path_ID_at_ISBNBOOK VALUE12 Par(b)
GET (PXI)
GET (PXI)
Path_IDBOOK
25Ordinal Predicate
- /BOOKn
- Adds ranking column to the rows for ltBOOKgt
elements - Retrieves the nth ltBOOKgt node
- Special optimizations
- 1 ? TOP 1 ascending
- last() ? TOP 1 descending
- Avoids sorting when input is sorted
- Example in XML_Serializer
26Error handling
- Static type errors at compilation time
- Raises static type errors if an expression could
fail at runtime due to type safety violation - Addition of string to integer
- Querying non-existent node name in typed XML
- Non-singleton in eq
- Some can be fixed using explicit cast or ordinal
specification - Dynamic error converted to empty sequence
- Yields correct result in predicates without
negations
27for Iterator
XML_Serialize
- for s in /BOOK//SECTION
- where s/_at_num gt 3
- return s/TITLE
- XML op for for is XmlOp_Apply
- Maps to APPLY
- Binds s and iterates over ltSECTIONgt
- Determines its ltTITLEgt children
- Nested for and for with multiple bindings
turn into nested APPLY - Each APPLY binds to a different variable
Apply
Assemble ltSECTIONgt
Apply (s)
Path_ID LIKE TITLESECTIONBOOK Par(s)
Select
Exists
Select (s)
Select
GET(PXI)
GET (PXI)
Path_ID LIKE _at_numSECBK VALUE gt 3 Par(s)
Path_ID LIKE SECTIONBOOK
28XQuery order by and where
- Order by
- Sorts rows based on order-by expression
- Adds a ranking column to these rows
- Ranking column converted into OrdPath values
- Yield the new order of the rows
- Fits rest of query processing framework
- Where
- Becomes SELECT on input sequence
- Filters rows satisfying specified condition
29XQuery return
- Return nodes sequence in document order
- Use OrdPath values and
- XML_Serialize operator
- New element and sequence constructions
- Merge constructed and existing nodes into a
single sequence (SWITCH_UNION)
30XQuery Functions Operators
- Built-in fn and op are mapped to relational fn
and op if possible - fncount() ? count()
- Additional support for XQuery types, functions
and operators that cannot be mapped directly - Intrinsics
31Optimizations
- Exploiting Ordered Sets
- Sorting information (OrdPath) made available to
further relational operators - XML_Serialize is an example
- Using static type information
- Eliminates CONVERT() in operations
- Allows range scan on VALUE index
32Conclusions
- Built-up infrastructure for query processing
framework - Other XQuery features (such as let and
typeswitch) can be implemented - Data modification language
- Fits into relational query processing framework
- XQuery features can be implemented using rel
operators - Optimizations pose the biggest challenges
- More cost-based optimizations can be done
- Enhanced costing model (e.g. choice of PXI)
- Matching materialized views
33Thank you!