Bridging Relational Technology and XML - PowerPoint PPT Presentation

About This Presentation
Title:

Bridging Relational Technology and XML

Description:

Bridging Relational Technology and XML Jayavel Shanmugasundaram University of Wisconsin & IBM Almaden Research Center – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 74
Provided by: IBMU648
Learn more at: https://dsf.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: Bridging Relational Technology and XML


1
Bridging Relational Technologyand XML
  • Jayavel Shanmugasundaram

University of Wisconsin IBM Almaden Research
Center
2
Business to Business Interactions
Cars R Us
Tires R Us
3
Shift in Application Developers Conceptual Data
Model
XML
4
Are XML Database Systemsthe Answer?
Cars R Us
Tires R Us
Purchasing Application
XML DatabaseSystem
5
Why use Relational Database Systems?
  • Highly reliable, scalable, optimized for
    performance, advanced functionality
  • Result of 30 years of Research Development
  • XML database systems are not industrial
    strength and not expected to be in the
    foreseeable future
  • Existing data and applications
  • XML applications have to inter-operate with
    existing relational data and applications
  • Not enough incentive to move all existing
    business applications to XML database systems
  • Remember object-oriented database systems?

6
A Solution
Cars R Us
Tires R Us
eXtensible Markup Language (XML)
Internet
Order Fulfillment Application
Purchasing Application
Relational DatabaseSystem
Relational DatabaseSystem
7
XML Translation Layer(Contributions)
  • Store and query XML documents
  • Harnesses relational database technology for this
    purpose
  • Publish existing relational data as XML documents
  • Allows relational data to be viewed in XML terms

8
Bridging Relational Technologyand XML
XML
9
Outline
  • Motivation High-level Solution
  • Background (Relations, XML)
  • Storing and Querying XML Documents
  • Publishing Relational Data as XML Documents
  • Conclusion

10
Relational Schema and Data
PurchaseOrder
Id
Customer
Year
Month
Day
200I
Cars R Us
10
June
1999
300I
Bikes R Us
null
July
1999
Item
Payment
Name
Cost
Quantity
Installment
Percentage
Pid
Pid
1
200I
200I
Firestone Tire
50
2000.00
40
200I
300I
Schwinn Tire
100
60
2
2500.00
300I
300I
Trek Tire
20
100
1
400.00
200I
Goodyear Tire
200
8000.00
11
XML Document
ltPurchaseOrder id200I customerCars R Usgt
ltDategt ltDaygt 10 lt/Daygt
ltMonthgt June lt/Monthgt ltYeargt 1999
lt/Yeargt lt/Dategt ltItem nameFirestone
Tire cost2000.00gt ltQuantitygt 50
lt/Quantitygt lt/Itemgt ltItem
nameGoodyear Tire cost8000.00gt
ltQuantitygt 200 lt/Quantitygt lt/Itemgt
ltPaymentgt 40 lt/Paymentgt ltPaymentgt 60
lt/Paymentgt lt/PurchaseOrdergt
12
XML Document
ltPurchaseOrder id200I customerCars R Usgt
ltDategt ltDaygt 10 lt/Daygt
ltMonthgt June lt/Monthgt ltYeargt 1999
lt/Yeargt lt/Dategt ltItem nameFirestone
Tire cost2000.00gt ltQuantitygt 50
lt/Quantitygt lt/Itemgt ltItem
nameGoodyear Tire cost8000.00gt
ltQuantitygt 200 lt/Quantitygt lt/Itemgt
ltPaymentgt 40 lt/Paymentgt ltPaymentgt 60
lt/Paymentgt lt/PurchaseOrdergt
Self-describing tags
Nested structure
13
XML Document
ltPurchaseOrder id200I customerCars R Usgt
ltDategt ltDaygt 10 lt/Daygt
ltMonthgt June lt/Monthgt ltYeargt 1999
lt/Yeargt lt/Dategt ltItem nameFirestone
Tire cost2000.00gt ltQuantitygt 50
lt/Quantitygt lt/Itemgt ltItem
nameGoodyear Tire cost8000.00gt
ltQuantitygt 200 lt/Quantitygt lt/Itemgt
ltPaymentgt 40 lt/Paymentgt ltPaymentgt 60
lt/Paymentgt lt/PurchaseOrdergt
Self-describing tags
Nested structure
Nested sets
14
XML Document
ltPurchaseOrder id200I customerCars R Usgt
ltDategt ltDaygt 10 lt/Daygt
ltMonthgt June lt/Monthgt ltYeargt 1999
lt/Yeargt lt/Dategt ltItem nameFirestone
Tire cost2000.00gt ltQuantitygt 50
lt/Quantitygt lt/Itemgt ltItem
nameGoodyear Tire cost8000.00gt
ltQuantitygt 200 lt/Quantitygt lt/Itemgt
ltPaymentgt 40 lt/Paymentgt ltPaymentgt 60
lt/Paymentgt lt/PurchaseOrdergt
Self-describing tags
Nested structure
Nested sets
Order
15
XML Schema
ltPurchaseOrder idinteger customerstringgt
Date (Item) (Payment)lt/PurchaseOrdergt
PurchaseOrder
Date
ltDategt Day? Month Yearlt/Dategt
Day
ltDaygt integer lt/Daygt
Month
ltMonthgt string lt/Monthgt
Year
ltYeargt integer lt/Yeargt
Item
ltItem namestring costfloatgt
Quantitylt/Itemgt
and so on
16
XML Schema (contd.)
ltPurchaseOrder idinteger customerstringgt
Date? (Item Payment)lt/PurchaseOrdergt
PurchaseOrder
ltPurchaseOrder idinteger customerstringgt
(Date Payment) (Item (Item Item)
Payment)lt/PurchaseOrdergt
PurchaseOrder
ltPurchaseOrder idinteger customerstringgt
Date Item (PurchaseOrder) Paymentlt/PurchaseO
rdergt
PurchaseOrder
17
XML Query
Find all the items bought by Cars R Us in 1999
For po in /PurchaseOrderWhere
po/_at_customer Cars R Us and po/date/year
1999 Return po/Item
18
XML Query (contd.)
//Item
//Item5
//Item Before //Payment
/(Item/(Item/Payment)/(Payment Item))/Date
19
Outline
  • Motivation High-level Solution
  • Background (Relations, XML)
  • Storing and Querying XML Documents
  • Publishing Relational Data as XML Documents
  • Conclusion

20
Storing and Querying XML DocumentsShanmugasundar
am et. al., VLDB99
XML Translation Layer
Relational Database System
21
Outline
  • Motivation High-level Solution
  • Background (Relations, XML)
  • Storing and Querying XML Documents
  • Relational Schema Design and XML Storage
  • Query Mapping and Result Construction
  • Publishing Relational Data as XML Documents
  • Conclusion

22
XML Schema
ltPurchaseOrder idinteger customerstringgt
(Date (Payment)) (Item (Item Item)
Payment)lt/PurchaseOrdergt
PurchaseOrder
23
Desired Properties of Generated Relational Schema
R
  • All XML documents conforming to XML schema should
    be mappable to tuples in R
  • All queries over XML documents should be
    mappable to SQL queries over R
  • Not Required Ability to re-generate XML schema
    from R

24
Simplifying XML Schemas
  • XML schemas can be simplified for translation
    purposes

ltPurchaseOrder idinteger customerstringgt
(Date (Payment)) (Item (Item Item)
Payment)lt/PurchaseOrdergt
PurchaseOrder
ltPurchaseOrder idinteger customerstringgt
Date? (Item) (Payment)lt/PurchaseOrdergt
PurchaseOrder
  • All without undermining storage and query
    functionality!

25
Why is Simplification Possible?
  • Structure in XML schemas can be captured
  • Partly in relational schema
  • Partly as data values

ltPurchaseOrder idinteger customerstringgt
Date? (Item) (Payment)lt/PurchaseOrdergt
PurchaseOrder
  • Order field to capture order among siblings
  • Sufficient to answer ordered XML queries
  • PurchaseOrder/Item5
  • PurchaseOrder/Item AFTER PurchaseOrder/Payment
  • Sufficient to reconstruct XML document

26
Simplification Desiderata
  • Simplify structure, but preserve differences that
    matter in relational model
  • Single occurrence (attribute)
  • Zero or one occurrences (nullable attribute)
  • Zero or more occurrences (relation)

ltPurchaseOrder idinteger customerstringgt
(Date (Payment)) (Item (Item Item)
Payment)lt/PurchaseOrdergt
PurchaseOrder
ltPurchaseOrder idinteger customerstringgt
Date? (Item) (Payment)lt/PurchaseOrdergt
PurchaseOrder
27
Translation Normal Form
  • An XML schema production is either of the form

ltP attr1type1 attrmtypemgt a1 ap
ap1? aq? aq1 ar lt/Pgt
P
where ai ? aj
  • or of the form

ltP attr1type1 attrmtypemgt
type lt/Pgt
P
28
Example Simplification Rules
(e1 e2)
e1? e2?
(Date (Payment)) (Item (Item Item) Payment)
Date? (Item)? (Item (Item Item) Payment)
e?
e
Date? (Payment)? (Item (Item Item) Payment)
Date? (Item) (Item (Item Item) Payment)
29
Simplified XML Schema
ltPurchaseOrder idinteger customerstringgt
Date (Item) (Payment)lt/PurchaseOrdergt
PurchaseOrder
Date
ltDategt Day? Month Yearlt/Dategt
Day
ltDaygt integer lt/Daygt
Month
ltMonthgt string lt/Monthgt
Year
ltYeargt integer lt/Yeargt
Item
ltItem namestring costfloatgt
Quantitylt/Itemgt
and so on
30
Relational Schema Generation
PurchaseOrder (id, customer)
1


Date
Item (name, cost)
Payment
?
1
1
1
Day
Month
Year
Quantity
Satisfy Fourth normal form
Minimize Number of joins for path expressions
31
Generated Relational Schemaand Shredded XML
Document
PurchaseOrder
Id
Customer
Year
Month
Day
200I
Cars R Us
10
June
1999
Item
Payment
Name
Order
Value
Order
Cost
Quantity
Pid
Pid
2
200I
200I
Firestone Tire
2000.00
40
50
1
200I
200I
Goodyear Tire
200
8000.00
3
60
4
32
Recursive XML Schema
PurchaseOrder (id, customer)


Item (name)
1
Quantity
33
Relational Schema Generation and XML Document
Shredding (Completeness and Optimality)
  • Any XML Schema X can be mapped to a relational
    schema R, and
  • Any XML document XD conforming to X can be
    converted to tuples in R
  • Further, XD can be recovered from the tuples in R
  • Also minimizes the number of joins for path
    expressions (given fourth normal form)

34
Outline
  • Motivation High-level Solution
  • Background (Relations, XML)
  • Storing and Querying XML Documents
  • Relational Schema Design and XML Storage
  • Query Mapping and Result Construction
  • Publishing Relational Data as XML Documents
  • Conclusion

35
XML Query
Find all the items bought by Cars R Us in 1999
For po in /PurchaseOrderWhere
po/_at_customer Cars R Us and po/date/year
1999 Return po/Item
36
Path Expression Automata(Moore Machines)
PurchaseOrder
/PurchaseOrder/Item
Item

Item
//Item
37
XML Schema Automaton(Mealy Machine)
PurchaseOrder (id, customer)
Date
Item (name, cost)
Payment
Day
Month
Year
Quantity
38
Intersected Automaton
PurchaseOrder (customer)
Cars R Us
Date
Item (name, cost)
Year
Quantity
1999
39
Generated SQL Query
Select i.name, i.cost, i.quantity From
PurchaseOrder p, Item i Where p.customer Cars
R Us and p.year 1999 and
p.id i.pid
Predicates
Join condition
40
Recursive XML Query
PurchaseOrder (id, customer)
Find all items (directly or indirectly)under a
Cars R Us purchase order
Item (name)
For po in /PurchaseOrderWhere
po/_at_customer Cars R Us Return po//Item
Quantity
41
Recursive Automata Intersection
PurchaseOrder (customer)
PurchaseOrder (id, customer)
Cars R Us
Item (name)
Quantity
42
Recursive SQL Generation
ResultItems (id, name, quantity) as (
PurchaseOrder (customer)
Select it.id, it.name, it.quantityFrom
PurchaseOrder po, Item itWhere po.customer
Cars R Us and po.id it.pid
Cars R Us
43
SQL Generation for Path Expressions (Completeness)
  • (Almost) all path expressions can be translated
    to SQL
  • SQL does not support seamless querying across
    data and meta-data (schema)
  • SQL based on first-order logic (with least
    fix-point recursion)
  • Meta-data query capability provided in the XML
    translation layer
  • Implemented and optimized using a higher-order
    operator

44
Constructing XML Results
ltitem name Firestone Tire cost2000.00gt
ltquantitygt 50 lt/quantitygt lt/itemgt ltitem
nameGoodyear Tire cost8000.00gt
ltquantitygt 200 lt/quantitygt lt/itemgt
(Firestone Tire, 2000.00, 50) (Goodyear
Tire, 8000.00, 200)
45
Complex XML Construction
PurchaseOrder (id, customer)
Cars R Us
Date
Item (name, cost)
Payment
Day
Month
Year
Quantity
1999
46
Outline
  • Motivation High-level Solution
  • Background (Relations, XML)
  • Storing and Querying XML Documents
  • Publishing Relational Data as XML Documents
  • Conclusion

47
Relational Schema and Data
PurchaseOrder
Id
Customer
Year
Month
Day
200I
Cars R Us
10
June
1999
Item
Payment
Name
Cost
Quantity
Installment
Percentage
Pid
Pid
1
200I
200I
Firestone Tire
50
2000.00
40
200I
200I
Goodyear Tire
200
8000.00
60
2
48
XML Document
ltPurchaseOrder id200I customerCars R Usgt
ltDategt ltDaygt 10 lt/Daygt
ltMonthgt June lt/Monthgt ltYeargt 1999
lt/Yeargt lt/Dategt ltItem nameFirestone
Tire cost2000.00gt ltQuantitygt 50
lt/Quantitygt lt/Itemgt ltItem
nameGoodyear Tire cost8000.00gt
ltQuantitygt 200 lt/Quantitygt lt/Itemgt
ltPaymentgt 40 lt/Paymentgt ltPaymentgt 60
lt/Paymentgt lt/PurchaseOrdergt
49
Naïve Approach
  • Issue many SQL queries that mirror the structure
    of the XML document to be constructed
  • Tag nested structures as they are produced

(200I, Cars R Us, 10, June, 1999)
DBMS Engine
PurchaseOrder
Item
(40) (60)
(Firestone Tire, 2000.00, 50) (Goodyear Tire,
8000.00, 200)
Payment
Problem 1 Too many SQL queries
Problem 2 Fixed (nested loop) join strategy
50
Relations to XML IssuesShanmugasundaram et.
al., VLDB00
  • Two main differences
  • Ordered nested structures
  • Self-describing tags
  • Space of alternatives

Late Tagging
Early Tagging
Early Structuring
Late Structuring
51
Naïve (Stored Procedure) Approach
Early Tagging, Early Structuring, Outside Engine
  • Issue queries for sub-structures and tag them
  • Could be a Stored Procedure to maximize
    performance

(200I, Cars R Us, 10, June, 1999)
DBMS Engine
PurchaseOrder
Item
(40) (60)
(Firestone Tire, 2000.00, 50) (Goodyear Tire,
8000.00, 200)
Payment
Problem 1 Too many SQL queries
Problem 2 Fixed (nested loop) join strategy
52
CLOB Approach
Early Tagging, Early Structuring, Inside Engine
  • Push down computation inside engine
  • Use extensibility features of SQL
  • Scalar functions for tagging
  • Aggregate functions for creating nested
    structures
  • Character Large Objects (CLOBs) to hold
    intermediate tagged results

Problem CLOBs during query execution
(storage, size estimation, copying)
53
Late Tagging, Late Structuring
  • XML document content produced without tags or
    structure (in arbitrary order)
  • Tagger adds tags and structure as final step

Result XML Document
Tagging
Unstructured content
Relational QueryProcessing
54
Outer Union Approach
Late Tagging, Late Structuring
(200I, null , null, null , null ,
null , null , null, 1 ,
40, 2) (200I, Cars R Us, 10 , June, 1999,
null , null , null, null,
null, 0) (200I, null , null, null ,
null , null , null ,
null, 2 , 60, 2) (200I, null ,
null, null , null , Firestone Tire , 2000.00,
50 , null, null, 1) (200I, null ,
null, null , null , Goodyear Tire, 8000.00,
200, null, null, 1)
(200I, 2, 60) (200I, 1, 40)
(200I, Firestone Tire, 2000.00, 50) (200I,
Goodyear Tire, 8000.00, 200)
(200I, Cars R Us, 10, June, 1999)
Problem Wide tuples (having many columns)
55
Hash-based Tagger
Late Tagging, Late Structuring
  • Outer union results not structured early
  • In arbitrary order
  • Tagger has to enforce nesting order during
    tagging
  • Essentially a nested group-by operation
  • Hash-based approach
  • Hash on ancestor ids
  • Group siblings together and tag them
  • Inside/Outside engine tagger

Problem Requires memory for entire document
56
Late Tagging, Early Structuring
  • Structured XML document content produced
  • In document order
  • Sorted Outer Union approach
  • Tagger just adds tags
  • In constant space
  • Inside/outside engine

Result XML Document
Tagging
Structured content
Relational QueryProcessing
57
Sorted Outer Union Approach
Late Tagging, Late Structuring
(200I, null , null, null , null ,
null , null , null, 1 ,
40, 2) (200I, Cars R Us, 10 , June, 1999,
null , null , null, null,
null, 0) (200I, null , null, null ,
null , null , null ,
null, 2 , 60, 2) (200I, null ,
null, null , null , Firestone Tire , 2000.00,
50 , null, null, 1) (200I, null ,
null, null , null , Goodyear Tire, 8000.00,
200, null, null, 1)
(200I, 2, 60) (200I, 1, 40)
(200I, Firestone Tire, 2000.00, 50) (200I,
Goodyear Tire, 8000.00, 200)
(200I, Cars R Us, 10, June, 1999)
58
Sorted Outer Union Approach
Late Tagging, Late Structuring
(200I, Cars R Us, 10 , June, 1999,
null , null , null, null, null,
0) (200I, null , null, null , null
, Firestone Tire , 2000.00, 50 , null, null,
1) (200I, null , null, null , null
, Goodyear Tire, 8000.00, 200, null, null,
1) (200I, null , null, null , null
, null , null , null,
1 , 40, 2) (200I, null , null, null
, null , null , null ,
null, 2 , 60, 2)
(200I, 2, 60) (200I, 1, 40)
(200I, Firestone Tire, 2000.00, 50) (200I,
Goodyear Tire, 8000.00, 200)
(200I, Cars R Us, 10, June, 1999)
Problem Enforces Total Order
59
Classification of Alternatives
Late Tagging
Early Tagging
Inside Engine
Inside Engine
Sorted Outer Union(Tagging inside)
CLOB
Outside Engine
Outside Engine
EarlyStructuring
Sorted Outer Union(Tagging outside)
Stored Procedure
Inside Engine
Unsorted Outer Union(Tagging inside)
Outside Engine
LateStructuring
Unsorted Outer Union(Tagging outside)
60
Performance Evaluation
366 MHz Pentium II, 256MB
Database Size 10MB 100MB
Query Depth
Query Fan Out
61
Where Does Time Go?
Database Size 10MB, Query Fan Out 2, Query
Depth 2
62
Effect of Query Depth
Database Size 10MB, Query Fan Out 2
63
Memory Considerations
  • Sufficient memory
  • Unsorted outer union is method of choice
  • Efficient hash-based tagger
  • Limited memory (large documents)
  • Sorted outer union is method of choice
  • Relational sort is highly scalable

64
XML Document Construction (Completeness and
Performance)
  • Any nested XML document can be constructed using
    outer union approaches
  • 9x faster than previous approaches
  • 10 MB of data
  • 17 seconds for sorted outer union approach
  • 160 seconds for naïve XML application developer
    approach

65
Outline
  • Motivation High-level Solution
  • Background (Relations, XML)
  • Storing and Querying XML Documents
  • Publishing Relational Data as XML Documents
  • Conclusion

66
Conclusion
  • XML has emerged as the Internet data format
  • But relational database systems will continue to
    be used for data management tasks
  • Internet application developers currently have to
    explicitly bridge this data model gap
  • Can we design a system that automatically bridges
    this gap for application developers?

67
For cust in /CustomerWhere cust/name
Jack Return cust
68
// First prepare all the SQL statements to be
executed and create cursors for them Exec SQL
Prepare CustStmt From select cust.id, cust.name
from Customer cust where cust.name Jack Exec
SQL Declare CustCursor Cursor For CustStmt Exec
SQL Prepare AcctStmt From select acct.id,
acct.acctnum from Account acct where acct.custId
? Exec SQL Declare AcctCursor Cursor For
AcctStmtExec SQL Prepare PorderStmt From select
porder.id, porder.acct, porder.date from
PurchOrder porder
where porder.custId
? Exec SQL Declare PorderCursor Cursor For
PorderStmtExec SQL Prepare ItemStmt From select
item.id, item.desc from Item item where item.poId
? Exec SQL Declare ItemCursor Cursor For
ItemStmtExec SQL Prepare PayStmt From select
pay.id, pay.desc from Payment pay where item.poId
? Exec SQL Declare PayCursor Cursor For
PayStmt// Now execute SQL statements in nested
order of XML document result. Start with
customer XMLresult Exec SQL Open
CustCursorwhile (CustCursor has more rows)
Exec SQL Fetch CustCursor Into custId,
custName XMLResult ltcustomer id
custId gtltnamegt custName
lt/namegtltaccountsgt // For each customer,
issue sub-query to get account information and
add to custAccts Exec SQL Open AcctCursor
Using custId while (AcctCursor has more
rows) Exec SQL Fetch AcctCursor
Into acctId, acctNum XMLResult
ltaccount id acctId gt acctNum
lt/accountgt XMLResult
lt/accountsgtltpordersgt // For each
customer, issue sub-query to get purchase order
information and add to custPorders Exec SQL
Open PorderCursor Using custId while
(PorderCursor has more rows) Exec
SQL Fetch PorderCursor Into poId, poAcct,
poDate XMLResult ltporder id
poId acctpoAcct gtltdategtpoDate
lt/dategtltitemsgt // For each
purchase order, issue a sub-query to get item
information and add to porderItems
Exec SQL Open ItemCursor Using poId
while (ItemCursor has more rows)
Exec SQL Fetch ItemCursor Into itemId,
itemDesc XMLResult ltitem
id itemId gt itemDesc lt/itemgt
XMLResult
lt/itemsgtltpaymentsgt // For each
purchase order, issue a sub-query to get payment
information and add to porderPays
Exec SQL Open PayCursor Using poId
while (PayCursor has more rows)
Exec SQL Fetch PayCursor Into payId, payDesc
XMLResult ltpayment id
payId gt payDesc lt/paymentgt
XMLResult lt/paymentsgtlt/pordergt
// End of looping over all purchase
orders associated with a customer
XMLResult lt/customergt Return
XMLResult as one result row reset XMLResult
// loop until all customers are tagged and
output
69
Conclusion (Contd.)
  • Yes! XPERANTO is the first such system
  • Allows users to
  • Store and query XML documents using a relational
    database system
  • Publish existing relational data as XML documents
  • using a high-level XML query language
  • Also provides a dramatic improvement in
    performance

70
Relational Database System Vendors
  • IBM, Microsoft, Oracle, Informix,
  • SQL extensions for XML
  • XML Translation Layer
  • Pure XML philosophy provides high-level XML
    query interface
  • SQL extensions for XML, while better than writing
    applications, is still low-level
  • More powerful than XML-extended SQL
  • SQL just not designed with nifty XML features in
    mind

71
Related Research
  • Storing and querying schema-less XML documents
    using a relational database system
  • STORED Deutsch et. al.
  • Binary decomposition Florescu Kossmann
  • XML Views of relational databases
  • SilkRoute Fernandez et. al.
  • Mediators

72
Future Work
  • Functionality
  • Update XML documents
  • Insert/Update XML views of relational data
  • Information-retrieval style queries
  • Performance
  • Query-aware relational schema generation
  • Caching
  • Relational engine extensions

73
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com