Title: Efficiently Publishing Relational Data as XML Documents
1Efficiently Publishing Relational Data as XML
Documents
Jayavel ShanmugasundaramUniv. Wisconsin/IBM
Almaden
Eugene ShekitaRimon BarrMichael CareyBruce
LindsayHamid PiraheshBerthold Reinwald
Joint work with
IBM Almaden Research Center
2Outline
3XML Example
John
Mary
Internet
Recycling
4What is the big deal about XML?
- Elegantly models complex, hierarchical/
graph-structured data - Domain-specific tags (unlike HTML)
- Standardized!
- Fast emerging as dominant standard for data
exchange on the WWW
5Why Relational Data?
- Most business data stored in relational databases
- Unlikely to change in the near future
- Scalability, Reliability, Performance, Tools
- Need efficient means to publish relational data
as XML documents
6Usage Scenario
Application/User Query to produce XML Documents
XML Result (processed or displayed in browser)
The Internet
Existing Database System (RDBMS)
7Outline
8Example Relational Schema
9XML Representation
-
-
- John
- Mary
-
-
- Internet
- Recycling
-
10Main Issues
- Relational data is flat, XML is a tagged graph
- How do we specify translation from flat model to
a graph model? - A query language to map from relations to XML
- How do we transform flat representations to
tagged nested representations? - Efficient implementation strategies
11Outline
- Why?
- How?
- Language?
- Mechanism?
- Which?
- Hence
12SQL Key Ideas
- Sub-queries to specify nesting
- Scalar functions to specify tags/attributes
- XML Constructors
- Aggregate functions to group child elements
13Example Relational Schema
14SQL Query to publish XML
Select DEPT(d.name,
,
)From Department d
15SQL XML Constructor
Define XML Constructor DEPT(dname varchar(20),
emplist xml,
projlist xml) As ( namedname emplist
projlist
)
16SQL Query to publish XML
Select DEPT(d.name,
,
)From Department d
17SQL Query to publish XML
Select DEPT(d.name,
(Select XMLAGG(EMP(e.name))
From Employee e Where
e.deptno d.deptno),
(Select XMLAGG(PROJ(p.name))
From Project p Where
p.deptno d.deptno) )From
Department d
18Query Result
()
-
-
- John
- Mary
-
-
- Internet
- Recycling
-
19Outline
- Why?
- How?
- Language?
- Mechanism?
- Which?
- Hence
20Relations to XML Issues
- Two main differences
- Nesting (structuring)
- Tagging
- Space of alternatives
Late Tagging
Early Tagging
Outside Engine
Outside Engine
Early Structuring
Inside Engine
Inside Engine
Outside Engine
Late Structuring
Inside Engine
21Stored Procedure Approach
Early Tagging, Early Structuring, Outside Engine
- Issue queries for sub-structures and tag them
- Could be a Stored Procedure
(10, Purchasing)
DBMS Engine
Department
(John) (Mary)
(Internet) (Recycling)
Employee
Project
- Problem Too many SQL queries!
22Correlated CLOB Approach
Early Tagging, Early Structuring, Inside Engine
Select DEPT(d.name,
(Select XMLAGG(EMP(e.name))
From Employee e Where
e.deptno d.deptno),
(Select XMLAGG(PROJ(p.name))
From Project p Where
p.deptno d.deptno) )From
Department d
- Problem Correlated execution of sub-queries
23De-Correlated CLOB Approach
Early Tagging, Early Structuring, Inside Engine
- Compute employee lists associated with all
departments - Compute project lists associated with all
departments - Join results above on department id
- Problem CLOBs during query processing
24Late Tagging, Late Structuring
- XML document content produced without structure
(in arbitrary order) - Tagger enforces order as final step
Result XML Document
Tagging
Unstructured content
Relational QueryProcessing
25Redundant Relation Approach
Late Tagging, Late Structuring
- How do we represent nested content as relations?
- Problem Large relation due to data redundancy!
26Outer Union Approach
Late Tagging, Late Structuring
- How do we represent nested content as relations?
(Purchasing, null, Internet , 0) (Purchasing,
null, Recycling, 0) (Purchasing, John, null
, 1) (Purchasing, Mary, null , 1)
(Purchasing, Internet) (Purchasing, Recycling)
(Purchasing, John) (Purchasing, Mary)
(10, Purchasing)
- Problem Wide tuples (having many columns)
27Hash-based Tagger
Late Tagging, Late Structuring
- Results not structured early
- In arbitrary order
- Tagger has to enforce order during tagging
- Hash-based approach
- Inside/Outside engine tagger
- Problem Requires memory for entire document
28Late Tagging, Early Structuring
- Structured XML document content produced
- Tagger just adds tags (constant space)
Result XML Document
Tagging
Structured content
Relational QueryProcessing
29Sorted Outer Union Approach
Late Tagging, Early Structuring
A B n D n n n
A B n n E n n
B
C
A n C n n F n
D
E
F
G
A n C n n n G
Sort By Aid, Bid, Cid
- Problem Only partial ordering required
30Constant Space Tagger
Late Tagging, Late Structuring
- Detects changes in XML document hierarchy
- Adds appropriate opening/closing tags
- Inside/outside engine
31Classification of Alternatives
Late Tagging
Early Tagging
Inside Engine
Inside Engine
De-Correlated CLOB
Sorted Outer Union(Tagging inside)
Outside Engine
Outside Engine
Correlated CLOB
EarlyStructuring
Sorted Outer Union(Tagging outside)
Stored Procedure
Inside Engine
Unsorted Outer Union(Tagging inside)
Outside Engine
LateStructuring
Unsorted Outer Union(Tagging outside)
32Outline
- Why?
- How?
- Language?
- Mechanism?
- Which?
- Hence
33Where Does Time Go?
34Performance Evaluation Summary
Late Tagging
Early Tagging
?
Inside Engine
Inside Engine
De-Correlated CLOB
Sorted Outer Union(Tagging inside)
Outside Engine
Outside Engine
Correlated CLOB
EarlyStructuring
Sorted Outer Union(Tagging outside)
Stored Procedure
Inside Engine
Unsorted Outer Union(Tagging inside)
?
Outside Engine
LateStructuring
Unsorted Outer Union(Tagging outside)
35Outline
- Why?
- How?
- Language?
- Mechanism?
- Which?
- Hence
36Conclusion
- Publishing XML from relational sources important
in Internet - SQL-based language specification
- Implementation Alternatives
- Inside engine Outside engine
- Unsorted Outer Union sufficient main memory
- Sorted Outer Union otherwise (most stable)
37Related Work
- SilkRoute (WWW 2000)
- Oracles XML extensions (ICDE 2000)
- Microsofts XDR
- XPERANTO (VLDB 2000 - demo tomorrow)
38Performance Evaluation
Database Size
Query Depth
Query Fan Out
39Effect of Query Depth
40De-Correlated CLOB Approach
Early Tagging, Early Structuring, Inside Engine
With EmpStruct (deptname, empinfo) AS (
Select d.deptname,
XMLAGG(EMP(employee, e.empname)) From
department d left join employee e on
d.deptid e.deptid Group By d.deptname)
With ProjStruct (deptname, projinfo) AS (
Select d.deptname,
XMLAGG(PROJ(employee, p.projname)) From
department d left join project p on
d.deptid e.deptid Group By d.deptname)
Select DEPT(name, d1.empinfo, d2.projinfo)) From
EmpStruct d1 full join ProjStruct d2
on d1.deptname d2.deptname
- Problem CLOBs during processing