Sam Idicula, Oracle XML DB Development Team - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Sam Idicula, Oracle XML DB Development Team

Description:

This presentation contains information proprietary to Oracle Corporation ... Binary XML Storage and Query Processing in Oracle Sam Idicula, Oracle XML DB Development Team – PowerPoint PPT presentation

Number of Views:164
Avg rating:3.0/5.0
Slides: 22
Provided by: vldb2009O
Category:

less

Transcript and Presenter's Notes

Title: Sam Idicula, Oracle XML DB Development Team


1
Binary XML Storage and Query Processing in Oracle
VLDB 2009
  • Sam Idicula, Oracle XML DB Development Team

2
Outline
  • Motivation
  • Binary XML Overview
  • Storage Format Details
  • Query Processing
  • Performance Evaluation
  • Conclusion

3
Previous Oracle XML Storage Models
  • CLOB Storage
  • Text representation preserves exact form of
    original document (including white spaces)
  • Very good performance for insert full retrieval
  • Size bloat (including tags, string representation
    of dates, numbers etc)
  • Need to parse the document for all XML processing
  • Query DML processing are not efficient
  • Memory overhead with DOM
  • Mid-tier does not take advantage of parsing and
    validation already done on DB tier (and
    vice-versa)


4
Previous Oracle XML Storage Models
  • Object Relational Storage (OR)
  • XML Schema-based mapping to object-relational
    tables
  • Preserves DOM fidelity (more than traditional
    shredding)
  • Simple XPaths translate to table/column access
  • Very good query performance for highly structured
    use cases
  • Flexibility is limited due to schema dependency
  • Insert, full retrieval etc are poor (expand on
    this separate these into 2 slides)

5
Motivation/Goals for Binary XML
  • Bridge the gap between two extremes
  • Structure-unaware text representation Full
    flexibility, poor query performance
  • Object-relational mapping Heavily dependent on
    rigid structure
  • Several customer use cases fall in between these
    extremes
  • Native format that can
  • Handle full spectrum of XML database use cases
  • Optimized semi-structured use cases
  • Provide good performance for a wide variety of
    operations
  • Retain flexibility advantage of XML data model
    while providing good performance

6
Customer use cases
High Flexibility
Majority of semi-structured customer use cases
Low Flexibility
7
Motivation/Goals for Binary XML
  • XML Schema usage
  • Need to be efficient for query processing on
    schemaless loosely structured schemas
  • Ability to use schema constraints for more
    efficient processing
  • Provide good performance for a wide range of
    operations
  • Query
  • DML Insert/Load, Partial (piecewise) update
  • Full-document fragment retrieval
  • Schema Validation Evolution
  • Mid-tier integration

8
Oracle Binary XML Overview
  • Compact Schema-aware XML Format
  • Pre-parsed tokenized binary representation
  • Addresses space-bloat associated XML 1.x
    serialization
  • Intended for use in all tiers of Oracle stack
  • Oracle XML DB
  • Oracle iAS / XDK Java
  • Exploits XML Schema information if available
  • Also supports non-schema-based encoding
  • Preserves Infoset or Data Model fidelity Not
    bytes
  • Can create an XML Index for query optimization

9
Oracle Binary XML
WebCache
AppServer
Database
Client
Oracle Binary XML
  • Mid-tier Processing Oracle XDK Java support
  • Binary XML allows direct access to
    fragments/sub-trees
  • XML processing optimization Scalable mid-tier DOM

10
Format Details
  • Opcodes roughly corresponding to SAX events
  • Each opcode has fixed number of operands
  • Document-ordered serialization of opcodes
  • Stored as a BLOB
  • Tag names are tokenized into qname IDs
  • Central repository (or)
  • Inlined definitions
  • Optimized opcodes for simple elements, repeating
    elements etc.
  • Uses native data-types in the presence of XML
    schema

11
Streaming Capabilities
  • Streaming XPath evaluation
  • XPathTable with NFA Multiple XPaths evaluated in
    a single pass
  • Forward axes
  • Streaming partial updates
  • Most common update scenarios handled in streaming
    manner
  • eg updateXML( /purchaseOrder/Reference/text()
    , XXXX)
  • Can be directly applied on disk avoiding
    expensive DOM construction
  • Takes advantage of the Oracle SecureFile LOB
    storage to perform delta update

12
Query Processing Architecture
XQuery
SQL/XML
DB XQuery Rewrite
XMLIndex
Functional Evaluation (Streaming XPath)
Path-based XMLIndex
Table-based XMLIndex
Binary XML
13
Document-level Summary
  • Long-term goal Efficient tree-oriented
    navigation
  • Important for query execution
  • Pure streaming is too costly over large documents
  • Current Implementation
  • Start end offsets for large subtrees
  • Threshold for large can be adjusted
  • Used for skipping to end of subtree
  • Working on significant enhancements
  • Handling all axes

14
Search-based Decoder
  • Goal Search for a simple XPath or XPath location
    step in a Binary XML stream
  • Main search params are (axis, qname ID)
  • Supports wild cards
  • OR of multiple qnameIDs allowed
  • Return only when theres a result or search is
    done
  • Skip irrelevant subtrees
  • Using summary if possible
  • Schema-aware search
  • Can search for kidnum or child-position instead
    of qname ID
  • Can terminate search earlier based on schema

15
Schema-aware NFA
  • Goal Evaluate multiple XPaths in single pass
    over document
  • Uses Y-Filter-like approach to build NFA
  • Works in conjunction with search-based decoder
  • Translates transitions to searches when possible
  • Push unbranched linear state transition paths
    into search-based decoder
  • Uses XML schema when available
  • Use of kidnum instead of qname ID
  • Sequence Occurrence constraints
  • Derives a strict sequential constraint

16
Performance Query - XMark
  • Ratio of elapsed time geometric mean for 100M
    XMark doc
  • SB Schema-based
  • NSB Non-schema-based
  • CLOB is 144x
  • No indexes

17
Performance Insert
  • Ratio of elapsed time for XMark 10M doc
  • SB Schema-based

18
Performance Full Retrieval
  • Ratio of elapsed time for XMark 10M doc
  • SB Schema-based

19
Performance Compression
  • D1 Structured
  • D2 Semi-structured
  • D3 Document-centric
  • Based on actual customer datasets mix of XML
    document sizes
  • Further compression possible via SecureFile LOB
    compression

20
Summary
  • Binary XML
  • Native XML storage format
  • Handle full spectrum of XML use cases
  • Schema-aware
  • Query Processing Optimizations
  • Search-based Decoder
  • Document-level Summary
  • Performance Results

21
For more information
  • Contact
  • Sam.Idicula_at_Oracle.com
  • Downloads, technical documentation
  • http//www.oracle.com/technology/tech/xml/xmldb/
    index.html
Write a Comment
User Comments (0)
About PowerShow.com