Efficiently Publishing Relational Data as XML Documents - PowerPoint PPT Presentation

About This Presentation
Title:

Efficiently Publishing Relational Data as XML Documents

Description:

Efficiently Publishing Relational Data as XML Documents. Jayavel Shanmugasundaram ... XML rapidly emerging as a global standard ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 31
Provided by: san57
Category:

less

Transcript and Presenter's Notes

Title: Efficiently Publishing Relational Data as XML Documents


1
Efficiently Publishing Relational Data as XML
Documents
  • Jayavel Shanmugasundaram et al.
  • Proceedings -VLDB 2000, Cairo.

2
What drove them?
  • Noit wasnt the chaffeur
  • XML rapidly emerging as a global standard
  • Large amount of data stored in RDBMS and needs to
    be exchanged

3
Primary Issues
  • Language specification
  • Implementation what method works best?
  • Adding TAG and STRUCTURE - when do you do these
    operations?

4
Roadmap
  • Language specification
  • Implementation
  • Early tagging, structuring
  • Late tagging, structuring
  • Early structure, late tagging
  • Performance Evaluation

5
Our little sample
ltcustomer idC1gt ltnamegt Dilys Thomas lt/namegt
ltaccountsgt ltaccount idA1gt34552211233lt/ac
countgt ltaccount idA2gt98653412223lt/accountgt
lt! Dilys we know where your money isgt
lt/accountsgt ltpordersgt ltporder idPO1
acctA1gt ltitemsgt ltitem
idI1gtGift for Consulate VISA Womanlt/itemgt
ltitem idI1gtTravellers chequeslt/itemgt
lt/itemsgt ltpaymentsgt ltpayment
idP1gt due Feb 12 lt/paymentgt lt!thats today!!gt
lt/paymentsgt lt/pordergt
lt/pordersgt lt/customergt
  • Note the
  • Elements
  • Names/Tags
  • ID Refs

6
Underlying tables
Customer(id int, name varchar)
PurchOrder(id int, custID int,
acctID varchar,
date varchar)
Account(id varchar, custID int,
acctnum int)
Item(id int, poID int, desc
varchar)
Payment(id int, poID int,
desc varchar)
7
SQL-based language spec.
Sqlfunctions Define XMLConstruct ITEM(id int,
desc varchar) AS ltitem ididgtdesc
lt/itemgt Sqlaggregates Select XMLAGG(ITEM(id,
desc)) From Item //
returns an XML aggregation of items
8
Sample query
Select cust.name, CUST(cust.id, cust.name,
(Select XMLAGG(ACCT(acct.id,
acct.acctnum)) From Account
acct Where acct.custIdcust.id
), (Select XMLAGG(PORDER(porder.i
d, porder.acct, porder.date,
(Select XMLAGG(ITEM(item.id,
item.desc))
From Item item
Where item.poidporder.id)
(Select XMLAGG(PAYMENT(pay.id,pay.desc)
) From Payment
pay, Where
pay.poidporder.id))) From
PurchOrder porder Where
porder.custIDcust.id)) From Customer cust

Constructs XML from the relational tables.
9
Roadmap
  • Language specification
  • Implementation
  • Early tagging, structuring
  • Late tagging, structuring
  • Early structure, late tagging
  • Performance Evaluation

10
Implementation alternatives
TAG
STRUCTURE
DB
Result
  • Late/early tagging
  • Late/early structuring
  • (No late structuringearly tagging)

11
Early tagging and structuring
  • Stored Procedure
  • Explicitly issue nested queries
  • Get corresponding nested data using other queries
  • Done outside relational engine. Tag/str as soon
    as results are available.
  • Too many queries per tuple.
  • Fixed order (nested loop join)

12
Contd
  • Correlated CLOB
  • Push queries into the engine
  • Plug in XMLAGG, XMLCONSTRUCT support into engine
  • Have to handle huge CLOBS in the engine
  • Fixed join order
  • Decorrelated CLOB
  • Decorrelate and use Outer Joins no longer fixed
    order
  • Still carry around CLOBs (due to early tagging!)

13
Roadmap
  • Language specification
  • Implementation
  • Early tagging, structuring
  • Late tagging, structuring
  • Early structure, late tagging
  • Performance Evaluation

14
Late tagging and Structuring
  • 2 phases -gt Content creation Tagging/Structuring
  • Redundant Relation
  • Blindly join all constituent tables
  • Parent data repeated
  • Unsorted Outer Union
  • Decorrelate query, compute common subexpressions
    and use Outer Joins
  • Take an Outer Union of result tables
  • Columns grow with width/depth of XML doc. - Path
    Outer Union

15
Contd
  • Alternatively, dont repeat Parent node at every
    child.
  • Feed parent into Outer Union and only keep parent
    Ids with children. Node Outer Union
  • Greatly increases no of tuples generated

16
Outer Union
  • Note Outer Joins to retain parents.
  • OU
  • Separate column in result for each column of
    input
  • Unused cols set to NULL
  • Type column added for each row.

17
Contd
  • 2 phases -gt Content creation Tagging/Structuring
  • Inside the Engine
  • XMLAGG, XMLCons support required
  • Final step after content generated
  • CLOBs not carried around
  • Outside the Engine
  • GROUP Siblings
  • Eliminate Duplicates
  • Extract info and TAG

18
Grouping data
  • HASH!
  • Every row in the final table has a column with
    name of element with all parents (a.b.c.d.e)
  • Check if lta.b.c.dgt hashes true, TAG accordingly
    and add as another child at that level
  • Else check if lta.b.cgt hashes true, add lta.b.c.dgt
    and then lta.b.c.d.egt
  • And so ontill you either find hash or hit root
    element.
  • Tuples can come in any order. Sufficient mem
    required!

19
Roadmap
  • Language specification
  • Implementation
  • Early tagging, structuring
  • Late tagging, structuring
  • Early structure, late tagging
  • Performance Evaluation

20
Late tagging, Early Structuring
  • As beforeonly, now SORT the outer union
  • Ensure
  • Parent info comes before child
  • Info about node and desc. completed before any
    other node info starts
  • Ordering follows user-def condns

21
Sort and Tag
  • Sort on Pkeys
  • Define an order on Pkeys (CustID, AcId, POId,
    ItemId, PaymentID) based on structure of XML
    Doc.
  • Parent tuples will have filled values for first
    few cols and null for the later ones
  • Nulls sort low
  • Tag in constant memory
  • Maximum amount of info to be stored is
    proportional only to the depth of the XML Doc.

22
Roadmap
  • Language specification
  • Implementation
  • Early tagging, structuring
  • Late tagging, structuring
  • Early structure, late tagging
  • Performance Evaluation

23
Modeling transformations
  • Query fanout
  • Query nesting depth
  • Root nodes (tuples in the root table)
  • Leaf tuples (tuples corresp to all leaf nodes)

Structure
Result size
24
Results (graph time!!)
  • Inside the engine versions are about 3 times
    as fast as outside counterparts

25
Inside vs- Outside the engine
  • Query Execution
  • Bind out
  • Tag/Structure
  • Write XML to file
  • Bind-out time not required for Inside Engine
    approaches

26
Query Fan out
  • Increasing QFO -gt Greater Joins -gt More time
  • CorrCLOB has to use Nested Loop Join Order bad
    performance
  • Unsorted OU better than Sorted OU. Sorting cost gt
    Cost of complex tagging
  • DecorrCLOB optimized by DB2 engine. CLOBs
    retained in memory (low fanout)

27
Query Depth
  • DecorrCLOB huge increase!
  • Complexity of queries increases.
  • Engine makes bad choices (sorting after XMLAGG
    etc)

28
Number of Roots
  • Outer Union approaches not affected
  • CorrCLOB at root1 equiv to just 2 queries!

29
Number of tuples, Memory
  • If sufficient memory, no great changes!
  • If not, Unsorted OU which requires large space
    for tagging, fails. Overflow!
  • Sorted OU based on scalable sorting. Adapt to
    large size and less mem better.

30
Roadmap
  • Language specification
  • Implementation
  • Early tagging, structuring
  • Late tagging, structuring
  • Early structure, late tagging
  • Performance Evaluation
  • Quo vadis?
Write a Comment
User Comments (0)
About PowerShow.com