Mary F. Fernandez

1 / 33
About This Presentation
Title:

Mary F. Fernandez

Description:

To view product code, call CN1.2.6.1.1= SELECT p.code. FROM ... adds a key to each SQL ... XML generator consumes tuple streams and construct one single ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 34
Provided by: hillma

less

Transcript and Presenter's Notes

Title: Mary F. Fernandez


1
SilkRoute A Framework for Publishing Rational
Data in XML
  • Mary F. Fernandez
  • Yana Kadiyska
  • Dan Suciu
  • Atsuyuki Morishima
  • Wang Chiew Tan

ACM Transactions on Database System 2002, volume
27, no. 4
2
Outline
  • Introduction
  • System Architecture
  • XQuery to SQL Translation
  • XML Publishing Optimization
  • Conclusion

3
Introduction
4
Introduction
  • XML as language for data exchange between
    applications
  • Platform neutral
  • Storage independent
  • Semantics rich
  • But most existing data are stored in RDBMS
  • Conversion from RDBMS to XML should be
  • General
  • Selective
  • Efficient

5
Design Goals
  • General
  • Two different data representations
  • Inherent complex conversion
  • No public XML schema matches exactly relational
    counterparts
  • Need for mapping multiple XML to same relational
    source under different situations

XML Relational
Nested Flat
Unnormalized Normalized
Public Proprietary
6
Design Goals
  • Selective
  • XML schema as virtual view
  • XQuery may only extract parts of the view
  • Only materialize the data fragment needed
  • Efficient
  • Reduce computational overhead as much as possible
  • Exploit optimization and evaluation engines
    provided by DB

7
Contributions
  • XQuery to SQL Translation
  • Separate structure of XML output from computation
    which constructs it
  • Introduce view forest which nodes are mapping
    from XML to SQL
  • Optimization Problem for XML Publishing
  • XQuery is mapped to one or more SQL
  • Selecting the most efficient set

8
System Architecture
9
System Architecture
  • 3-tier architecture
  • Backend database for storage
  • SilkRoute as middleware
  • Third party application (Client)

10
System Architecture
  • SilkRoute transforms relational schema to
    canonical view forest

11
System Architecture
  • Administrator uses XQuery to extract public XML
    view over canonical view, this gives public view
    forest

12
System Architecture
  • Client application uses XQuery to data in
    interest, this gives application view forest

13
System Architecture
  • Planner inside SilkRoute gets application view
    forest to produce
  • SQLQuery (which then enters RDBMS)
  • XMLTemplate (which then enters XML Generator)

14
System Architecture
  • Tuples from RDBMS RDBMS is fed into the
    XMLTemplate inside generator
  • Final XML output is then returned to the client

15
XQuery to SQL Translation
16
View Forest
  • Internal representation of XML, a critical role
    in SilkRoute
  • Each node carries
  • XML label
  • SQL query fragment
  • For internal nodes, XML label is either element
    or attribute names SQL fragment contains FROM
    and WHERE clauses
  • For leaf nodes, XML label is atomic type SQL
    fragment consists of SELECT clause

17
View Forest SQL Association
  • A complete SQL query Cn is associated with each
    node n
  • SELECT if n is leaf node, use its SELECT
    clause
  • otherwise SELECT
  • FROM concat all FROM clauses of n and its
    ancestor
  • WHERE concat all WHERE clauses of n and its
    ancestor

18
View Forest An Example
  • To view the list of clothing item, call CN1.1.5.1
  • SELECT c.price
  • FROM Clothing c
  • To view all data, call CN1.1
  • SELECT
  • FROM Clothing c

19
An Example
  • To view product code, call CN1.2.6.1.1
  • SELECT p.code
  • FROM Clothing c, Problems p
  • WHERE c.category outerwear
  • AND p.pid c.pid

20
VFCA
  • View Forest Composition Algorithm
  • Translate XQuery into view forest
  • Used in view forest composer
  • Operate on a subset of XQuery syntax, XQueryCore,
    which includes everything but NOT
  • Recursive functions and operators (eg before ,
    after, is, isnot
  • Functions depending on XML document order

21
VFCA
  • Suppose
  • I instance of relational schema
  • V(I) view forest corresponding to the instance
    I
  • Q(V(I)) XQuery over the view forest V(I), the
    result return from this query is again a view
    forest, denoted by VQ(I)
  • Input An XQueryCore expression Q over view
    forests V1, , Vm
  • Output A view forest VQ VFCA(V1, , Vm) such
    that
  • Q(V1, , Vm) VQ(I) for any I of the schema

22
VFCA
  • Traditional XQuery Evaluation
  • Inner XQuery is evaluated first to return results
    for outer XQuery
  • Producing a number of intermediate XML outputs
  • VFCA
  • Only view forest is produced, eliminating
    intermediate results
  • Data extraction is done once using the set of SQL
    realized from forest

23
XML Publishing Optimization
24
View Forest Evaluation
  • Steps in evaluating view forest
  • Planner adds a key to each SQL fragment
  • Planner partitions each view tree into one or
    more subtrees
  • Each subtree is realized to a SQL
  • The set of all SQL is submitted to relational
    engine
  • XML generator consumes tuple streams and
    construct one single integrated relation
  • XML output is produced by nesting tuples
    integrated relation and tagging each element

25
View Forest Evaluation
26
View Forest Evaluation
  • The set of SQL emitted from view forest
    evaluation is called a plan
  • Given a view forest , there can be many different
    plans
  • (exponential to the number of edges)
  • Two extreme cases
  • Emit a SQL for each node (fully partitioned
    strategy)
  • Emit a SQL for each tree in the forest (unified
    strategy)

27
View Forest Evaluation
  • LEFT OUTER JOIN to connect a parent and child
    nodes SQL
  • UNION to bind together sibling nodes SQL
  • ORDER BY is needed to allow XML generator takes
    only one single pass over the tuple streams
  • The tree can also be split into connected
    components
  • SQL is then generated for each component

28
View Forest Evaluation
  • SELECT 1 AS L1, c.pid, L2, (c.price Q.discount)
    as sale, Q.code, Q.comments
  • FROM Clothing c LEFT OUTER JOIN
  • ((SELECT 1 AS L2, d.pid AS pid, d.discount AS
    discount, null AS code, null AS comments FROM
    Discount d)
  • UNION
  • (SELECT 2 AS L2, p.pid AS pid, null AS discount,
    p.code AS code, p.comments AS comments FROM
    Problems p)) AS Q ON c.pid Q.pid
  • ORDER BY L1, c.pid, L2, Q.code

29
View Forest Evaluation
  • SELECT 1 as L1, c.pid, (d.discount c.price)
  • FROM Cloothing c
  • LEFT OUTER JOIN (SELECT d.pid, d.discount FROM
    Discount d)
  • ON c.pid d.pid ORDER BY c.pid
  • SELECT 2 as L1, c.pid, p.comments
  • FROM Clothing c, Problems p
  • WHERE c.pid p.pid
  • ORDER BY c.pid, p.code

30
Choosing Efficient Plan
  • As there exists exponential number of plans,
    SilkRoute resorts to heuristcs
  • UNION and OUTER JOIN are costly operations
  • Only choosing edges whose two associated queries
    are less expensive to evaluate together than
    separately
  • Cost Difference cost(qc) (cost(q1) cost(q2))
  • Where qc is the combined query of parent and
    child nodes
  • And q1 and q2 are the parent query and child
    query respectively

31
Choosing Efficient Plan
  • Seek for cost estimates from relational source
    engine
  • The estimate cost of a query q is obtained by
  • cost(q, a, b) a evaluation_cost(q) b
    data_size(q)
  • AND data_size(q) f(attrs(q) cardinality(q))
  • Relational engine provides evaluation_cost and
    cardinality
  • a and b are weights

32
Results
33
Conclusion
  • Proprietary commercial products have prospered in
    this field
  • DB2
  • SQL Server
  • Oracle
  • Native XML database
  • SilkRoute free!! (work with MySQL/PostgreSQL)

34
Thank you!!
Write a Comment
User Comments (0)