Efficiently Publishing Relational Data as XML Documents - PowerPoint PPT Presentation

About This Presentation
Title:

Efficiently Publishing Relational Data as XML Documents

Description:

Michael Carey. Bruce Lindsay. Hamid Pirahesh. Berthold Reinwald. Jayavel Shanmugasundaram ... (Purchasing, Mary) (10, Purchasing) (Purchasing, null, Internet , 0) ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 41
Provided by: jayavelsha
Category:

less

Transcript and Presenter's Notes

Title: Efficiently Publishing Relational Data as XML Documents


1
Efficiently Publishing Relational Data as XML
Documents
Jayavel ShanmugasundaramUniv. Wisconsin/IBM
Almaden
Eugene ShekitaRimon BarrMichael CareyBruce
LindsayHamid PiraheshBerthold Reinwald
Joint work with
IBM Almaden Research Center
2
Outline
  • Why?
  • How?
  • Which?
  • Hence

3
XML Example

John
Mary
Internet
Recycling

4
What is the big deal about XML?
  • Elegantly models complex, hierarchical/
    graph-structured data
  • Domain-specific tags (unlike HTML)
  • Standardized!
  • Fast emerging as dominant standard for data
    exchange on the WWW

5
Why Relational Data?
  • Most business data stored in relational databases
  • Unlikely to change in the near future
  • Scalability, Reliability, Performance, Tools
  • Need efficient means to publish relational data
    as XML documents

6
Usage Scenario
Application/User Query to produce XML Documents
XML Result (processed or displayed in browser)
The Internet
Existing Database System (RDBMS)
7
Outline
  • Why?
  • How?
  • Which?
  • Hence

8
Example Relational Schema
9
XML Representation
  • John
  • Mary
  • Internet
  • Recycling

10
Main Issues
  • Relational data is flat, XML is a tagged graph
  • How do we specify translation from flat model to
    a graph model?
  • A query language to map from relations to XML
  • How do we transform flat representations to
    tagged nested representations?
  • Efficient implementation strategies

11
Outline
  • Why?
  • How?
  • Language?
  • Mechanism?
  • Which?
  • Hence

12
SQL Key Ideas
  • Sub-queries to specify nesting
  • Scalar functions to specify tags/attributes
  • XML Constructors
  • Aggregate functions to group child elements

13
Example Relational Schema
14
SQL Query to publish XML
Select DEPT(d.name,
,

)From Department d
15
SQL XML Constructor
Define XML Constructor DEPT(dname varchar(20),

emplist xml,
projlist xml) As ( namedname emplist
projlist
)
16
SQL Query to publish XML
Select DEPT(d.name,
,

)From Department d
17
SQL Query to publish XML
Select DEPT(d.name,
(Select XMLAGG(EMP(e.name))
From Employee e Where
e.deptno d.deptno),
(Select XMLAGG(PROJ(p.name))
From Project p Where
p.deptno d.deptno) )From
Department d
18
Query Result
()
  • John
  • Mary
  • Internet
  • Recycling

19
Outline
  • Why?
  • How?
  • Language?
  • Mechanism?
  • Which?
  • Hence

20
Relations to XML Issues
  • Two main differences
  • Nesting (structuring)
  • Tagging
  • Space of alternatives

Late Tagging
Early Tagging
Outside Engine
Outside Engine
Early Structuring
Inside Engine
Inside Engine
Outside Engine
Late Structuring
Inside Engine
21
Stored Procedure Approach
Early Tagging, Early Structuring, Outside Engine
  • Issue queries for sub-structures and tag them
  • Could be a Stored Procedure

(10, Purchasing)
DBMS Engine
Department
(John) (Mary)
(Internet) (Recycling)
Employee
Project
  • Problem Too many SQL queries!

22
Correlated CLOB Approach
Early Tagging, Early Structuring, Inside Engine
Select DEPT(d.name,
(Select XMLAGG(EMP(e.name))
From Employee e Where
e.deptno d.deptno),
(Select XMLAGG(PROJ(p.name))
From Project p Where
p.deptno d.deptno) )From
Department d
  • Problem Correlated execution of sub-queries

23
De-Correlated CLOB Approach
Early Tagging, Early Structuring, Inside Engine
  • Compute employee lists associated with all
    departments
  • Compute project lists associated with all
    departments
  • Join results above on department id
  • Problem CLOBs during query processing

24
Late Tagging, Late Structuring
  • XML document content produced without structure
    (in arbitrary order)
  • Tagger enforces order as final step

Result XML Document
Tagging
Unstructured content
Relational QueryProcessing
25
Redundant Relation Approach
Late Tagging, Late Structuring
  • How do we represent nested content as relations?
  • Problem Large relation due to data redundancy!

26
Outer Union Approach
Late Tagging, Late Structuring
  • How do we represent nested content as relations?

(Purchasing, null, Internet , 0) (Purchasing,
null, Recycling, 0) (Purchasing, John, null
, 1) (Purchasing, Mary, null , 1)
(Purchasing, Internet) (Purchasing, Recycling)
(Purchasing, John) (Purchasing, Mary)
(10, Purchasing)
  • Problem Wide tuples (having many columns)

27
Hash-based Tagger
Late Tagging, Late Structuring
  • Results not structured early
  • In arbitrary order
  • Tagger has to enforce order during tagging
  • Hash-based approach
  • Inside/Outside engine tagger
  • Problem Requires memory for entire document

28
Late Tagging, Early Structuring
  • Structured XML document content produced
  • Tagger just adds tags (constant space)

Result XML Document
Tagging
Structured content
Relational QueryProcessing
29
Sorted Outer Union Approach
Late Tagging, Early Structuring
A B n D n n n
A B n n E n n
B
C
A n C n n F n
D
E
F
G
A n C n n n G
Sort By Aid, Bid, Cid
  • Problem Only partial ordering required

30
Constant Space Tagger
Late Tagging, Late Structuring
  • Detects changes in XML document hierarchy
  • Adds appropriate opening/closing tags
  • Inside/outside engine

31
Classification of Alternatives
Late Tagging
Early Tagging
Inside Engine
Inside Engine
De-Correlated CLOB
Sorted Outer Union(Tagging inside)
Outside Engine
Outside Engine
Correlated CLOB
EarlyStructuring
Sorted Outer Union(Tagging outside)
Stored Procedure
Inside Engine
Unsorted Outer Union(Tagging inside)
Outside Engine
LateStructuring
Unsorted Outer Union(Tagging outside)
32
Outline
  • Why?
  • How?
  • Language?
  • Mechanism?
  • Which?
  • Hence

33
Where Does Time Go?
34
Performance Evaluation Summary
Late Tagging
Early Tagging
?
Inside Engine
Inside Engine
De-Correlated CLOB
Sorted Outer Union(Tagging inside)
Outside Engine
Outside Engine
Correlated CLOB
EarlyStructuring
Sorted Outer Union(Tagging outside)
Stored Procedure
Inside Engine
Unsorted Outer Union(Tagging inside)
?
Outside Engine
LateStructuring
Unsorted Outer Union(Tagging outside)
35
Outline
  • Why?
  • How?
  • Language?
  • Mechanism?
  • Which?
  • Hence

36
Conclusion
  • Publishing XML from relational sources important
    in Internet
  • SQL-based language specification
  • Implementation Alternatives
  • Inside engine Outside engine
  • Unsorted Outer Union sufficient main memory
  • Sorted Outer Union otherwise (most stable)

37
Related Work
  • SilkRoute (WWW 2000)
  • Oracles XML extensions (ICDE 2000)
  • Microsofts XDR
  • XPERANTO (VLDB 2000 - demo tomorrow)

38
Performance Evaluation
Database Size
Query Depth
Query Fan Out
39
Effect of Query Depth
40
De-Correlated CLOB Approach
Early Tagging, Early Structuring, Inside Engine
With EmpStruct (deptname, empinfo) AS (
Select d.deptname,
XMLAGG(EMP(employee, e.empname)) From
department d left join employee e on
d.deptid e.deptid Group By d.deptname)
With ProjStruct (deptname, projinfo) AS (
Select d.deptname,
XMLAGG(PROJ(employee, p.projname)) From
department d left join project p on
d.deptid e.deptid Group By d.deptname)
Select DEPT(name, d1.empinfo, d2.projinfo)) From
EmpStruct d1 full join ProjStruct d2
on d1.deptname d2.deptname
  • Problem CLOBs during processing
Write a Comment
User Comments (0)
About PowerShow.com