On Relational Support for XML Publishing - PowerPoint PPT Presentation

About This Presentation

Title:

On Relational Support for XML Publishing

Description:

Rapidly emerging as a standard for exchanging business data ... Performed in nested loop fashion. PGQ is evaluated on each group of tuples ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 38

Provided by: connpadrai

Learn more at: http://web.cs.wpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: On Relational Support for XML Publishing

1
On Relational Support for XML Publishing

Beyond Sorting and Tagging
Surajit Chaudhuri
Raghav Kaushik
Jeffrey F. Naughton
Presented by
Conn Doherty

2
Outline

Motivation Observations
XML
Topic of Paper
GApply Operator Approach
Transformation Rules
Experiments and Results
Related Work
Conclusions
Future Problems

3
Motivation

Does the need for efficient XML publishing bring
any new requirements for relational query
engines, or is sorting query results in the
relational engine and tagging them in middleware
sufficient?

4
Observations

The mismatch between the XML data model and
relational model requires relational engines to
be enhances for efficiency
Need support for relation-valued variables

5
XML

Extendible Markup Language (rather
a metalanguage or metametalanguage)
Rapidly emerging as a standard for exchanging
business data
Substantial interest in publishing existing
relational data as XML

6
Current XML Publishing

Most focus has been on issues external to the
RDBMS
Determining the class of XML views that can be
defined
Languages used to specify the conversion from
relational data to XML
Methods of composing XML queries with XML views
Data warehousing has caused focus on similar
issues internal to RDBMS

7
Primary Topic of Paper

Focus closely on the class of SQL queries that
are typically generated by XML publishing
applications
Ask if anything needs to be changed within the
relational engine to efficiently evaluate these
queries?

8
YES!

Differences in the XML and relational data models
cause awkward and inefficient translations of XML
queries to relational SQL queries
Main Issue
XMLs hierarchical model makes it very convenient
and natural to apply operators to subtrees

9
Part Supplier Example

Part and Supplier Data Set
supplier(s_key, s_name)
partsupp(ps_suppkey, ps_partkey)
part(p_partkey, p_name, p_retailprice)

10
Part Supplier Example

Query Q1 For each supplier element, return the
names and retail prices of all parts supplied by
that supplier, and also, the over-all average
retail price of all parts supplied

Example XML Document ltsuppliersgt ltsuppliergt lt
snamegtS1lt/snamegt ltpartsgt ltpartgt ltpnamegtP1
lt/pnamegt ltretailpricegt10lt/retailpricegt lt/pa
rtgt ltpartgt ltpnamegtP2lt/pnamegt ltretailpri
cegt10lt/retailpricegt lt/partgt lt/partsgt lt/suppl
iergt ltsuppliergt ltsnamegtS2lt/snamegt ltpartsgt
ltpartgt ltpnamegtP21lt/pnamegt ltretailpricegt12lt
/retailpricegt lt/partgt ltpartgt ltpnamegtP22lt
/pnamegt ltretailpricegt13lt/retailpricegt lt/par
tgt lt/partsgt lt/suppliergt ltsuppliersgt
11
Example Queries

XQuery
For s in /doc(tpch.xml)/suppliers/supplier
Return ltretgt s/s_suppkey
ltpartsgt
For p in s/part
Return ltpartgt
p/p_name
p/p_retailprice
lt/partgt
lt/partsgt
avg(s/part/p_retailprice)
lt/retgt

SQL
(select ps_suppkey, p_name, p_retailprice,null
from partsupp, part
where ps_partkey p_partkey
union all
select ps_suppkey,null,null, avg(p_retailprice)
from partsupp, part
where ps_partkey p_partkey
group by ps_suppkey)
Order by ps_suppkey

SQL (relational data model) is hard to express
and inefficient
Unable to bind a variable to sets of tuples and
execute subqueries on these sets

12
3 Angle Approach

1) New operator, GApply
Binds variable to sets of tuples
Allows subqureies to be executed over set of
tuples (tmp relation) bound to a variable
2) Propose transformation rules to modify query
plan trees with GApply operator
3) Expose GApply operator in SQL syntax

13
GApply Operator

Syntax GApply(GCols, PGQ)
GCols grouping/partitioning columns
PGQ per-group query
Input tuple stream is partitioned on GCols
PGQ applied to each group
Output is the union of all above results taken
over all groups

14
Terminology

Outer tuple stream input tuple stream
Inner query per-group query
Outer child of GApply root of outer query
Inner child of GApply root of inner query

15
PGQ Restrictions

Only operate on temporary relation associated
with the group of tuples
Operator type also known as groupwise processing
Operators allowed in PGQ scan, select, project,
distinct, apply, exists, union(all), groupby,
aggregate, and orderby

16
Physical Implemenation

Two Phases
Partitioning Phase
Implemented using sorting or hashing
Execution Phase
Performed in nested loop fashion
PGQ is evaluated on each group of tuples
Each group is a temporary relation bound to a
relation-valued parameter group

17
Implementation Diagram
NL Nested Loop
Tmp relation group
group
Outer Child Outer Query Partition Phase
Inner Child Inner Query Execution Phase
18
Expose GApply in Syntax

Difficult for the parser and optimizer to
determine when GApply applies
Tests on Microsoft SQL Server 2000 with GApply
operator not exposed in syntax
Need sometimes identified by optimizer
Use in each case, considerably speeds up
performance

19
Proposed Syntax

Proposed extension to SQL syntax
SQL query performing groupwise processing
Select gapply(PGQ(x)) as ltcolumn listgt
from ltrelation listgt
where ltconditionsgt
group by ltgrouping columnsgt x
x is a relation-valued variable

20
Example Query in Syntax

Query Q1
select gapply(PGQ1(tmpSupp))
from partsupp, part
where ps_partkey p_partkey
group by ps_suppkey tmpSupp
PGQ1(tmpSupp)
select p_name, p_retailprice, null
from tmpSupp
union all
select null, null, avg(p_retailprice)
from tmp

21
Transformation Rules

Precise semantics of the operators
Three categories
1) Pushing Computation into the Outer Query
Placing Projections Before GApply
Placing Selections Before GApply
Converting GApply to groupby
2) Group Selection
3) Pushing GApply Below Joins

22
Rule 2

Group Selection
Consider PGQ that either return whole group
(subtree) or nothing based on a predicate
Two methods to evaluate
Join suppliers parts, group by suppkey, check
selection method on group, if true - return group
Selection method to get suppkeys, then return
join
Second method will win if predicate is highly
selective

23
Rule 2 cont.

Example
For s in /doc(tpch.xml)/suppliers
/supplier/part/p_retailprice gt 1000
Return s

24
Integrating Rules in Optimizer

None of the rules above loop -gt optimizer
terminates
Optimizer must estimate the cost of the GApply
operation

25
Preliminary Experiments

Performance study
Find efficacy of the GApply operator to speed up
queries
Understand impact of each proposed transformation
rule
Microsoft SQL Server 2000
Supports GApply without syntax exposure
Control over GApply invocation is needed
Simulate operation of GApply on the client side

26
Client Side Simulation of GApply

Partition
Sorting
Hashing (simulation)
Execute
Store result of outer query in temporary table
For each distinct tmp group relation, evaluate
PGQ on that relation, then union all results

27
Estimate Running Time

Measure both elapsed time and CPU time
Operator trees with GApply is the top most
operator
Expect real elapsed time less in full server
implementation

28
Setup

Experimental Setup
TPCH benchmark data
5GB database
Server
1 GHz processor
784 MB main memory
512 MB buffer pool
Each query ran several times and then average
taken

29
Results

Effectiveness of GApply
Comparable whether performing partitioning using
sorting or hashing
Tested 4 queries representing a wide range of
queries

30
GApply Effectiveness Results

Main conclusions
GApply is a useful operator even for simple
XQuery queries
Yields improvements of factors of up to 2x faster
Queries representative of a wide class of queries
Q4 took 20 longer with the client side
implementation
Q1, Q2, Q3 expect performance improvements with
server side implementation

(hash-based partitioning)
31
Results cont.

Effectiveness of Optimization Rules
Tested the improvement obtained by firing each
rule
Performance metric is elapsed time
Method
Choose relevant parameterized query
Vary parameter and find performance benefit for
each value
Benefit ratio elapsed time without the rule to
time taken with the rule fired

32
Rule Effectiveness Example

Query
For s in /doc(tpch.xml)/suppliers
/supplier/part/p_retailprice gt x
Return s
x parameter determines the selectivity of
selection

33
Results cont.

Effectiveness of Optimization Rules
Main conclusions
Proposed rules can have significant impact on
elapsed time of a query involving GApply
Some rules always lowered cost of the query,
while other occasionally lowered or increased
cost
Benefit of converting GApply to groupby is
comparatively lower

34
Related Work

Xperanto Project
Concluded, pushing as much computation to
relational engine is best
SilkRoute Project
Language to specify the conversion between
relational data and XML
ROLEX Project
To avoid inefficient parsing in applications, the
relational engine returns a navigable result tree
Difference
Question whether whole process of XML publishing
has any impact on the core relational operators
(YES)

35
Conclusions

Relational engine must provide support for
binding variable to sets of tuples
Required support can be enabled through the
GApply operator with seamless integration into
existing relational engines
Operator should be exposed in the syntax
Optimization rules are needed

36
Future Problems

How should modified syntax be exploited by
algorithms to translate XML queries over XML
views of relational data?
Any other changes needed to meet the requirements
of XML publishing?
What changes are needed in the optimizer if the
relational database returns navigable results?

37
Other Papers

D. Chatziantoniou and K. A. Ross. Querying
multiple features of groups in relational
databases. In VLDB, 1996.
Extension to SQL syntax with relational algebra
implementation
D. Chatziantoniou and K. A. Ross. Groupwise
processing of relational queries. In VLDB, 1997.
Methods to identify group query components
C. A. Galindo-Legaria and M. M. Joshi. Ortogonal
optimization of subqueries and aggregation. In
SIGMOD, 2001.
Introduction of segmentApply operator and many
transformation rules