Title: LegoDB: Cost-based XML to Relational
1LegoDBCost-based XML to Relational Shredding
- Jerome Simeon
- Bell Labs Lucent Technologies
- joint work with
- Juliana Freire (Bell Labs, OGI)
- Jayant Haritsa (IISc, Bangalore)
- Maya Ramanath (IISc, Bangalore)
- Prasan Roy (Bell Labs, ITT Bombay)
2(No Transcript)
3Why Storing XML in a RDBMS?
- For some applications it makes sense to use a
relational database backend - Leverage many years of development of relational
technology - concurrency control/transaction support
- Scalability
- Safety (crash recovery, duplication)
- Integrate with existing data stored in an RDBMS
- Performance matters!!
- But storing and querying XML data in an
- RDBMS is a non-trivial task
4XML and Relational Databases
- Mismatch between the relational model and XML.
- How to store XML data into relational tables?
- Mapping (Shredding) XML data into flat and
regular tables - How to evaluate XML queries over relational
tables? - Mapping XQuery into SQL (or SQL-XML?)...
- Litterature filled with various mapping
proposals - Many variations over binary tables (Florescu et
al, Grust et al). - Shanmugasundaram et. All try to inline as much
nested elements as possible in the same table.
There are many alternative mappings!
5(No Transcript)
6...have various performances
Q1 Simple selection query Q2 Join involving a
fragment of the reviews Q3 publishing query
No given mapping is best
7The LegoDB Storage "Shredding" Engine
- An optimization approach
- automatically explores a space of possible
mappings - selects the mapping which has the lowest cost for
a given application - Important features
- Application-driven takes into account schema,
data statistics and query workload - Logical/physical independence interface is
XML-based (XML Schema, XQuery, XML data
statistics) - Leverage existing technology XML standards
XML-specific operations for generating space of
mappings relational optimizer for evaluating
configurations
8(No Transcript)
9A word about mapping queries
XQuery for x in //show/review where
contains(x/review, Potter) SQL select
review from Show_Table, Review_Table where
Parent_show Show_id // join here! and review
contains Potter
10Transforming Schemas
- Key idea A given document can be validated by
different XML Schemas - Different but equivalent regular expressions can
be used to define an element - The presence or absence of a type name does not
change the semantics of an XML Schema - Applying transformations that manipulate the
types (but preserve the element structure of
schema) leads to a space of distinct relational
configurations - Define XML Schema transformations that
- Exploit the structure of the schema, and
- lead to useful relational configurations
11(No Transcript)
12Regular expression rewritings
(group Agroup B),group C gt group Dgroup E
horizontal table partition define group
D(group A,group C) define group E(group B,group
C) group A gt put
first item in a table, put the group A, group A
rest in a different
table element type string extracts
certain element gt
in separate table element a type string
element (a) type string
...........
13(No Transcript)
14Searching for a good configuration
- Cost is key use a relational optimizer as a
black box - Support different cost-models
- Quality of selected configuration depends on the
accuracy of the optimizer! - Set of possible configurations that result from
applying the rewritings is very large - possibly
infinite! - How to search for the optimal solution?
- LegoDB use a greedy search
- Importance of statistics for cost evaluation
- Large collections vs. small collections (e.g.,
many new york time review or not?) - Selectivity of predicates
- XML structure distribution (distribution of
children / parent relation)
15(No Transcript)
16Conclusion
- Cost-based approach for generating relational
storage for XML - takes application characteristics into account
- schema data 'statistics' queries
- Performance shows that storage significant
performance improvements - The same is likely to be true for other indexing
techniques - Full text index on XML (best for text queries?)
- Native XML indexes (best for Xpath queries
without schema?) - Files!!! (Best when a lot of small documents an
no need for concurrency control) - There is no one best way to store XML...
- We should try to hide that from the user!
17A few other things I've done...
- Mapping XQuery to SQL/XML
- Work with Jonathan Robie
- SQL/XML is an extension of SQL to XML
- Standard mapping from table to XML document
- Standard mapping from relational schema to XML
Schema - Extension of SQL query language to builds XML
elements - Goal is to evaluation XQueries on top of an ODBC
driver supporting XQuery - Approach identify a fragment of XQuery which has
a direct syntactic mapping into SQL / XML - Surprise the syntactic approach worked really
well, because of the way XQuery is designed.
(i.e., FLWOR close to SQL statement). - Galax XQuery 1.0 implementation
18My take on the Data Binding problem
- The main problem will be about the mismatch
between type systems - SQL vs. XML Schema vs. Object-Oriented
- I've never seen a good proposal that brings any
two type systems together (then builds a language
one top of it) - Where I see the best chance of success
- Use XML has the data model (can represent pretty
much anything). - XML Schema can represent relational schema (see
SQL/XML standard) - Can it represent Object-Oriented?