Title: Efficiently Publishing Relational Data as XML Documents
1Efficiently Publishing Relational Data as XML
Documents
- Jayavel Shanmugasundaram et al.
- Proceedings -VLDB 2000, Cairo.
2What drove them?
- Noit wasnt the chaffeur
- XML rapidly emerging as a global standard
- Large amount of data stored in RDBMS and needs to
be exchanged
3Primary Issues
- Language specification
- Implementation what method works best?
- Adding TAG and STRUCTURE - when do you do these
operations?
4Roadmap
- Language specification
- Implementation
- Early tagging, structuring
- Late tagging, structuring
- Early structure, late tagging
- Performance Evaluation
5Our little sample
ltcustomer idC1gt ltnamegt Dilys Thomas lt/namegt
ltaccountsgt ltaccount idA1gt34552211233lt/ac
countgt ltaccount idA2gt98653412223lt/accountgt
lt! Dilys we know where your money isgt
lt/accountsgt ltpordersgt ltporder idPO1
acctA1gt ltitemsgt ltitem
idI1gtGift for Consulate VISA Womanlt/itemgt
ltitem idI1gtTravellers chequeslt/itemgt
lt/itemsgt ltpaymentsgt ltpayment
idP1gt due Feb 12 lt/paymentgt lt!thats today!!gt
lt/paymentsgt lt/pordergt
lt/pordersgt lt/customergt
- Note the
- Elements
- Names/Tags
- ID Refs
6Underlying tables
Customer(id int, name varchar)
PurchOrder(id int, custID int,
acctID varchar,
date varchar)
Account(id varchar, custID int,
acctnum int)
Item(id int, poID int, desc
varchar)
Payment(id int, poID int,
desc varchar)
7SQL-based language spec.
Sqlfunctions Define XMLConstruct ITEM(id int,
desc varchar) AS ltitem ididgtdesc
lt/itemgt Sqlaggregates Select XMLAGG(ITEM(id,
desc)) From Item //
returns an XML aggregation of items
8Sample query
Select cust.name, CUST(cust.id, cust.name,
(Select XMLAGG(ACCT(acct.id,
acct.acctnum)) From Account
acct Where acct.custIdcust.id
), (Select XMLAGG(PORDER(porder.i
d, porder.acct, porder.date,
(Select XMLAGG(ITEM(item.id,
item.desc))
From Item item
Where item.poidporder.id)
(Select XMLAGG(PAYMENT(pay.id,pay.desc)
) From Payment
pay, Where
pay.poidporder.id))) From
PurchOrder porder Where
porder.custIDcust.id)) From Customer cust
Constructs XML from the relational tables.
9Roadmap
- Language specification
- Implementation
- Early tagging, structuring
- Late tagging, structuring
- Early structure, late tagging
- Performance Evaluation
10Implementation alternatives
TAG
STRUCTURE
DB
Result
- Late/early tagging
- Late/early structuring
- (No late structuringearly tagging)
11Early tagging and structuring
- Stored Procedure
- Explicitly issue nested queries
- Get corresponding nested data using other queries
- Done outside relational engine. Tag/str as soon
as results are available. - Too many queries per tuple.
- Fixed order (nested loop join)
12Contd
- Correlated CLOB
- Push queries into the engine
- Plug in XMLAGG, XMLCONSTRUCT support into engine
- Have to handle huge CLOBS in the engine
- Fixed join order
- Decorrelated CLOB
- Decorrelate and use Outer Joins no longer fixed
order - Still carry around CLOBs (due to early tagging!)
13Roadmap
- Language specification
- Implementation
- Early tagging, structuring
- Late tagging, structuring
- Early structure, late tagging
- Performance Evaluation
14Late tagging and Structuring
- 2 phases -gt Content creation Tagging/Structuring
- Redundant Relation
- Blindly join all constituent tables
- Parent data repeated
- Unsorted Outer Union
- Decorrelate query, compute common subexpressions
and use Outer Joins - Take an Outer Union of result tables
- Columns grow with width/depth of XML doc. - Path
Outer Union
15Contd
- Alternatively, dont repeat Parent node at every
child. - Feed parent into Outer Union and only keep parent
Ids with children. Node Outer Union - Greatly increases no of tuples generated
16Outer Union
- Note Outer Joins to retain parents.
- OU
- Separate column in result for each column of
input - Unused cols set to NULL
- Type column added for each row.
17Contd
- 2 phases -gt Content creation Tagging/Structuring
- Inside the Engine
- XMLAGG, XMLCons support required
- Final step after content generated
- CLOBs not carried around
- Outside the Engine
- GROUP Siblings
- Eliminate Duplicates
- Extract info and TAG
18Grouping data
- HASH!
- Every row in the final table has a column with
name of element with all parents (a.b.c.d.e) - Check if lta.b.c.dgt hashes true, TAG accordingly
and add as another child at that level - Else check if lta.b.cgt hashes true, add lta.b.c.dgt
and then lta.b.c.d.egt - And so ontill you either find hash or hit root
element. - Tuples can come in any order. Sufficient mem
required!
19Roadmap
- Language specification
- Implementation
- Early tagging, structuring
- Late tagging, structuring
- Early structure, late tagging
- Performance Evaluation
20Late tagging, Early Structuring
- As beforeonly, now SORT the outer union
- Ensure
- Parent info comes before child
- Info about node and desc. completed before any
other node info starts - Ordering follows user-def condns
21Sort and Tag
- Sort on Pkeys
- Define an order on Pkeys (CustID, AcId, POId,
ItemId, PaymentID) based on structure of XML
Doc. - Parent tuples will have filled values for first
few cols and null for the later ones - Nulls sort low
- Tag in constant memory
- Maximum amount of info to be stored is
proportional only to the depth of the XML Doc.
22Roadmap
- Language specification
- Implementation
- Early tagging, structuring
- Late tagging, structuring
- Early structure, late tagging
- Performance Evaluation
23Modeling transformations
- Query fanout
- Query nesting depth
- Root nodes (tuples in the root table)
- Leaf tuples (tuples corresp to all leaf nodes)
Structure
Result size
24Results (graph time!!)
- Inside the engine versions are about 3 times
as fast as outside counterparts
25Inside vs- Outside the engine
- Query Execution
- Bind out
- Tag/Structure
- Write XML to file
- Bind-out time not required for Inside Engine
approaches
26Query Fan out
- Increasing QFO -gt Greater Joins -gt More time
- CorrCLOB has to use Nested Loop Join Order bad
performance - Unsorted OU better than Sorted OU. Sorting cost gt
Cost of complex tagging - DecorrCLOB optimized by DB2 engine. CLOBs
retained in memory (low fanout)
27Query Depth
- DecorrCLOB huge increase!
- Complexity of queries increases.
- Engine makes bad choices (sorting after XMLAGG
etc)
28Number of Roots
- Outer Union approaches not affected
- CorrCLOB at root1 equiv to just 2 queries!
29Number of tuples, Memory
- If sufficient memory, no great changes!
- If not, Unsorted OU which requires large space
for tagging, fails. Overflow! - Sorted OU based on scalable sorting. Adapt to
large size and less mem better.
30Roadmap
- Language specification
- Implementation
- Early tagging, structuring
- Late tagging, structuring
- Early structure, late tagging
- Performance Evaluation
- Quo vadis?