XML Query processing - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

XML Query processing

Description:

Though typically parent-child relationships. Often don't have good measure of 'fan-out' ... They claim ARIES needs some modifications why? Their changes: ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 81
Provided by: VV37
Category:
Tags: xml | processing | query

less

Transcript and Presenter's Notes

Title: XML Query processing


1
XML Query processing
  • Vasilis Vassalos ATHENS UNIVERSITY OF ECONOMICS
    AND BUSINESS
  • COMPUTER SCIENCE DEPARTMENT

2
eXtensible Markup Language
Markup indicates structure and semantics
ltbibliographygt ltpaper ID "XML-rewriting"gt
ltauthorsgt ltauthorgtY.Papakonstantinoult/autho
rgt ltauthorgtV. Vassaloslt/authorgt
lt/authorsgt ltfullPaper source"rewrite"/gt
lttitlegtQuery Rewriting for semistructured
datalt/titlegt ltbooktitlegtSIGMOD
99lt/booktitlegt lt/papergt lt/bibliographygt
Decoupled from presentation
3
Elements and their Content
Element Content
element name
ltbibliographygt ltpaper ID "XML-rewriting"gt
ltauthorsgt ltauthorgtY.Papakonstantinoult/auth
orgt ltauthorgtV. Vassaloslt/authorgt
lt/authorsgt ltfullPaper source"rewrite"/gt
lttitlegtQuery Rewriting for semistr.datalt/titlegt
ltbooktitlegtSIGMOD 99lt/booktitlegt
lt/papergt lt/bibliographygt
element
Empty Element
Character content
4
Path Expressions
In the second chapter of the document zoo.xml
find the figures with caption Tree Frogs
doc(zoo.xml)//chapter2//figurecaptionTree
Frogs
book
chapter
appendix
part
chapter
chapter
chapter
part
section
chapter
paragraph
paragraph
figure
figure
caption
caption
Tree Frogs
Just Frogs
5
More Path Expressions
Find the first immediate chapter subelements of
immediate part subelements of the document
zoo.xml and retrieve figures that have
doc(zoo.xml)/part/chapter1//figurecaptionTr
ee Frogs
book
chapter
appendix
part
chapter
chapter
chapter
part
section
chapter
paragraph
paragraph
figure
figure
caption
caption
Tree Frogs
Just Frogs
6
Element Construction
In the second chapter of the document zoo.xml
find the figures with caption Tree Frogs and
place them into an element called result
ltresultgt doc(zoo.xml)//chapter2//figurecapti
onTree Frogs lt/resultgt
result
figure
caption
Tree Frogs
7
Bibliography Example Data Set
ltbibgt ltbook ISBN12345gt ltauthorgt Aho
lt/authorgt ltauthorgt Hopcroft lt/authorgt
ltauthorgt Ullman lt/authorgt lttitlegt Automata
Theory lt/titlegt ltpublishergt Morgan Kaufmann
lt/publishergt ltyeargt 1998 gt/yeargt lt/bookgt
ltbookgt ltauthorgt Ullman lt/authorgt lttitlegt
Database Systems lt/titlegt ltpublishergt Morgan
Kaufmann lt/publishergt ltyeargt 1998 gt/yeargt
lt/bookgt ltbookgt ltauthorgt Berners-Lee lt/authorgt
ltauthorgt Robie lt/authorgt ltauthorgt Bosak
lt/authorgt lttitlegt Semantic Web lt/titlegt
ltpublishergt Prentice Hall lt/publishergt ltyeargt
1998 lt/yeargt lt/bookgt lt/bibgt
8
Reviews Example Data Set
ltreviewsgt ltreviewgt lttitlegt Automata Theory
lt/titlegt ltcommentgt Its the best in automata
theory lt/commentgt ltcommentgt A definitive
textbook lt/commentgt lt/reviewgt lt/reviewsgt
9
For-Let-Where-Return (FLWR)
List the titles of books published by Morgan
Kaufmann
FOR b in doc(bib.xml)//book WHERE b/publisher
Morgan Kaufmann RETURN b/title
bib
book
book
book
book
title
title
title
year
year
year
publisher
publisher
publisher
1998
1998
1998
Morgan Kaufmann
Morgan Kaufmann
Prentice Hall
10
(Tuples of) Variable Bindings
bib
book
book
book
title
title
title
year
year
year
publisher
publisher
publisher
1998
1998
1998
Morgan Kaufmann
Morgan Kaufmann
Prentice Hall
b book book book
FOR/LET
Ordered lists of tuples of variable bindings
WHERE
b book book
Tuples of that satisfy the conditions
RETURN
title
title
List of trees
year
11
FOR b in doc(bib.xml)//book WHERE b/year gt
1990 RETURN b/author
Return the list of authors who published after
1990
12
Tuples
List publishers who have published more than 1
book
FOR p in distinct(doc(bib.xml)//publisher) LET
b document(bib.xml)//bookpublisher
p WHERE count(b) gt 1 RETURN p
Tuples (p, b) are formulated
13
Boolean Expressions in WHERE
List the titles of books published by Morgan
Kaufmann in 1998
FOR b in doc(bib.xml)//book WHERE b/publisher
Morgan Kaufmann AND b/year 1998 RETURN
b/title
14
Joins
For every book with a matching review output a
book_with_review that contains all the
attributes and subelements of book and the
comment subelements of review
FOR b in doc(bib.xml)/book, r in
doc(review.xml)/review WHERE b/title
r/title RETURN ltbook_with_reviewgt b/_at_,
b/, r/comment lt/book_with_reviewgt
ltbook_with_review ISBN1234gt ltauthorgt Aho
lt/authorgt ltauthorgt Hopcroft lt/authorgt
ltauthorgt Ullman lt/authorgt lttitlegt Automata
Theory lt/titlegt ltpublishergt Morgan Kaufmann
lt/publishergt ltyeargt 1998 gt/yeargt ltcommentgt
Its the best in automata theory lt/commentgt
ltcommentgt A definitive textbook lt/commentgt
lt/book_with_reviewgt
15
Relax Order Conditions
List the titles of books published by Morgan
Kaufmann in 1998
FOR b in unordered(doc(bib.xml)//book) WHERE
b/publisher Morgan Kaufmann AND b/year
1998 RETURN b/title
Very important feature in dealing with relational
sources and other set-oriented sources.
SELECT title FROM bib WHERE publisher Morgan
Kaufmann AND year 1998
Depending on the indices and access methods used,
the SQL query processor may deliver the tuples in
different order
16
Nested queries
FOR a IN distinct(document(bib.xml)//author) RE
TURN ltauthorgt ltnamegt a/text() lt/namegt
FOR b IN document(bib.xml)//bookauthor
a RETURN b/title lt/authorgt
Invert the structure of the input document so
that there is a list of author elements
containing the name of the author and the list of
books he wrote
17
Conditionals
FOR b IN doc(bib.xml)/book RETURN ltshortgt
b/title ltauthorgt IF count(b/author) lt
3 b/author ELSE
b/author1, ltauthorgtand otherslt/authorgt
lt/shortgt
18
Existential and Universal Quantification
Return books where at least one of the authors is
Ullman
FOR b in doc(bib.xml)/book WHERE b/author
Ullman RETURN b
Return books where all authors are Ullman
FOR b in doc(bib.xml)/book WHERE EVERY author
IN b/author SATISFIES author
Ullman RETURN b
19
Functions
DEFINE FUNCTION depth(e) RETURNS xsdinteger
IF (empty(e/) THEN 1 ELSE max(depth(e/) 1
FOR b in doc(bib.xml)/book RETURN depth(b)
FOR a in doc(bib.xml)/book/author RETURN
ltallcapsnamegt allcaps(a) lt/allcapsnamegt
20
XML What Makes It Hard?
  • Its not normalized
  • It conceptually centers around some origin,
    meaning that navigation becomes central
  • Contrast with E-R diagrams
  • How to store the hierarchy?
  • Complex navigation
  • Updates, locking
  • Optimization
  • Also, its ordered
  • May restrict order of evaluation (or at least
    presentation)
  • Makes updates more complex
  • Many of these issues arent unique to XML
  • Semistructured databases, esp. with ordered
    collections, were similar

21
XML Whats It Good For?
  • Integration of heterogeneous data
  • More on this later
  • Collections of text documents, e.g., the Web, doc
    DBs
  • How would we want to query those?
  • IR/text queries, path queries, XQueries?
  • Interchanging data
  • SOAP messages, RSS, XML streams
  • Perhaps subsets of data from RDBMSs
  • Storing native, database-like XML data
  • Caching
  • Logging of XML messages
  • ?

22
Lots of XML Research Out There
  • Integration
  • Specialized query processing techniques
  • Integration specification tools
  • Text
  • Hybrids of database and IR techniques for search
  • (e.g., Amer-Yahia Shanmugasundaram, Weikum
    Ramakrishnan, )
  • Interchange
  • Web service verification
  • XML stream processing
  • XML databases
  • Natix, TIMBER,
  • Tamino, DB2 UDB, Oracle,

23
The Main Focal Points
  • XML Query processing
  • XML Query Algebra
  • XPath processing techniques
  • XQuery optimization
  • Streaming XML/XML for integration
  • RDBMS ? XML export
  • Partitioning of computation between source and
    mediator
  • Streaming XPath engines
  • XML databases
  • XML to relational issues
  • Hierarchical storage locking (Natix, TIMBER,
    BerkeleyDB, Tamino, )
  • Query optimization
  • XML with documents
  • Inverted indices
  • Integration of ranking into DBMS
  • Interaction between structure and content

24
XML as a Wire Format
  • RDBMS ? XML export
  • SilkRoute and Xperanto, outer unions
  • Interaction with RDBMS optimization techniques
  • Updates Tatarinov01
  • Cascading updates are already possible in RDBMSs
  • Updating XML views
  • Streaming XML
  • SAX-based XPath-matching engines
    Ives01AltinelFranklin00Green02
    DiaoFranklinChen
  • Push-down of XPath matching as early as possible
  • Query decomposition (still in need of a standard
    means of pushing XQuery to a source)
  • Subsets of XQuery that are amenable to streaming

25
XML in a Database
  • Use a legacy RDBMS
  • Shredding Shanmugasundaram99 and many others
  • Path-based encodings Cooper01
  • Region-based encodings Bruno02Chen04
  • Translation to SQL
  • Order preservation in updates Tatarinov02,
  • Whats novel here? How does this relate to
    materialized views and warehousing?
  • Native XML databases
  • Hierarchical storage (Natix, TIMBER, BerkeleyDB,
    Tamino, )
  • Updates and locking
  • Query optimization

26
Query Processing for XML
  • Why is optimization harder?
  • Hierarchy means many more joins (conceptually)
  • traverse, tree-match, x-scan, unnest,
    path, op
  • Though typically parent-child relationships
  • Often dont have good measure of fan-out
  • More ways of optimizing this
  • Order preservation limits processing in many ways
  • Nested content left outer join
  • Except that we need to cluster a collection with
    the parent
  • Relationship with NF2 approach
  • Tags (dont really add much complexity except in
    trying to encode efficiently)
  • Complex functions and recursion
  • Few real DB systems implement these fully
  • New operators (staircase join, stack-based joins)
  • Why is storage harder?
  • Thats the focus of Natix, really

27
The Natix System
  • In contrast to many pieces of work on XML,
    focuses on the bottom layers
  • Physical layout
  • Indexing
  • Locking/concurrency control
  • Logging/recovery

28
Physical Layout
  • What are our options in storing XML trees?
  • At some level, its all smoke-and-mirrors
  • Need to map to flat byte sequences on disk
  • But several options
  • Shred completely, as in many RDBMS mappings
  • Each path may get its own contiguous set of pages
  • e.g., vectorized XML Buneman et al.
  • An element
  • With path index instead of foreign keys
    Georgiadis
  • An element may get its 11 children
  • e.g., shared inlining Shanmugasundaram and
    Chen
  • All content may be in one table
  • e.g., Florescu/Kossmann and most interval
    encoded XML
  • We may embed a few items on the same page and
    overflow the rest
  • How collections are often stored in ORDBMS
  • We may try to cluster XML trees on the same page,
    as interpreted BLOBs
  • This is Natixs approach (and also IBMs DB2)
  • Pros and cons of these approaches?

29
Challenges of the Page-per-Tree Approach
  • How big of a tree?
  • What happens if the XML overflows the tree?
  • Natix claims an adaptive approach to choosing the
    trees granularity
  • Primarily based on balancing the tree,
    constraints on children that must appear with a
    parent
  • Natix uses a B Tree-like scheme for achieving
    balance and splitting a tree across pages

30
Example
Split point in parent page
Note proxy nodes
31
That Was Simple But What about Updates?
  • Clearly, insertions and deletions can affect
    things
  • Deletion may ultimately require us to rebalance
  • Ditto with insertion
  • But insertion also may make us run out of space
    what to do?
  • Their approach add another page ultimately may
    need to split at multiple levels, as in B Tree
  • Others have studied this problem and used integer
    encoding schemes (plus B Trees) for the order

32
Does this Help?
  • The Natix experiments in this paper were pretty
    weak, so we wont look at them in more detail
  • The IBM guys say their approach, which is
    similar, works significantly better than Oracles
    shredded approach
  • Experiments by our group dont support this
    thesis.

33
Theres More to Updates than the Pages
  • What about concurrency control and recovery?
  • We already have a notion of hierarchical locks,
    but they claim
  • If we want to support IDREF traversal, and
    indexing directly to nodes, we need more
  • Whats the idea behind SPP locking?

34
Logging
  • They claim ARIES needs some modifications why?
  • Their changes
  • Need to make subtree updates more efficient
    dont want to write a log entry for each subtree
    insertion
  • Use (a copy of) the page itself as a means of
    tracking what was inserted, then batch-apply to
    WAL
  • Annihilators if we undo a tree creation, then
    we probably dont need to worry about undoing
    later changes to that tree
  • A few minor tweaks to minimize undo/redo when
    only one transaction touches a page

35
Annihilators
36
Assessment
  • Native XML storage isnt really all that
    different from other means of storage
  • There are probably some good reasons to make a
    few tweaks in locking
  • Optimization stays harder
  • Using smart techniques for XPath to SQL
    translation (ie, support for XML view creation)
    would probably make RDBMSs come close to
    delivering the same performance, modulo locking

37
Text-Based XML
  • The fundamental questions
  • How should we model ranking in query processing?
  • Simply as another value (e.g., Amer-Yahia
    Shanmugasundaram)
  • Using a probabilistic model or as an undefined
    metric
  • e.g., Weikum and Ramakrishnan work-in-progress
  • How does structure affect ranking?
  • PageRank-style (e.g., Shanmugasundaram et al.)
  • Query relaxation (FleXPath)
  • Other?
  • How do we achieve efficient pruning?
  • A search Cohen 98
  • Fagins Threshold Algorithm
  • Custom logic?
  • How do we integrate keyword indexing with
    structural indexing?
  • Multiple indices (e.g., Lore, Natix, )
  • Integrated indices (e.g., ViST)

38
Questions
  • Where are the main challenges of XML processing
    at this point?
  • Are we working on the right problems? Whats XML
    going to be used for, anyway?

39
XQuery Queries Views for XML
ltcustomersgt for cust in document(db)/custom
er return ltcustomergt cust/id,
for order in document(db)/order where
order/cid cust/id return ltordergt
order/id lt/ordergt lt/customergt
lt/customersgt
40
Multiple Plans are Possible
  • Retrieve customers
  • For each customer find matching orders

41
XML query processing
  • Native
  • Algebra-based
  • Holistic
  • Storage issues
  • Based on Relational storage and QP engines
  • XML to Relational mapping issues
  • Off the shelf Query processor vs additional
    operators

42
An Algebra-Based Query Processor Architecture
Client
XQuery
Navigation Requests
Results
XQuery Views
Translation to Algebra
Algebra Plan
Source Schemas Types
Source Description
Rewriter/Optimizer
Physical Algebra Plan
Functions
Plan Execution Engine
Function Description
43
Query Processing on Tuple-Oriented Algebra
Enables
  • Well-known efficient physical implementations of
    the operators
  • Join optimization
  • Nested queries by nested plans or group-by
  • Efficient iterator model

44
XQuery Queries Views for XML
ltcustomersgt for cust in document(db)/custom
er return ltcustomergt cust/id,
for order in document(db)/order where
order/cid cust/id return ltordergt
order/id lt/ordergt lt/customergt
lt/customersgt
45
Access and Navigation
getD cust, id ? cust_id
db
customer_table customer name
John id 56 customer
name George id 58
getD db1, customer ? cust
source db, db1
46
Simplification Using Schema Inference
Since cust_id ? cust and cust is useless
otherwise
db
customer_table customer name
John id 56 customer
name George id 58
getD db1, customer/id ? cust_id
i1
i2
source db, db1
47
Nested Plans
ct i2 o21
apply part, p ? orders
for part
48
Joins and Selections
cust_id
db1 cust_id db2 order cust_id2
order_id
?cust_id2?
db2 order cust_id2 order_id
getD order, id ? order_id
getD order, cid ? cust_id2
getD db2, order ? order
nestedSrc part
source db, db2
49
Constructors
listify oidE ? orders
o2
crEl order, oidL ? oidE
o1
crList orders ? oidL
order_id o1
o2
50
Algebra Example
51
Plan
52
Replacing Nested Joins with GroupBy/Outerjoin
Combinations
apply part, p ? R
apply part, p ? R
p3
p3
nestedSrc part
groupBy S(p1) ? part
p2
nestedSrc part
for part
p1
p1
p2
53
Multiple Possible Plans
54
Optimizations Performed during Rewriting
  • Operations pushed to sources
  • selections, projections, joins, sorts
  • Join order selection
  • Join type selection
  • sort/merge, parameter passing, nested loops
  • Nested plan elimination
  • nested plans converted to outer joins with
    group-by
  • Research topic
  • Constant folding
  • constant expressions evaluated at plan
    compilation time
  • Multi-threaded evaluation of web services
  • Unused element elimination in views
  • Condition push-down through views
  • Research topic

55
Indexing
  • In order to do efficient query processing,
    indexes are used
  • Reminder An index is a structure that points
    directly to nodes satisfying a given constraint
  • More indexes usually allow query processing to be
    more efficient, but also take up more space
    (time/space tradeoff)

56
Basic Indexes
  • At minimum, the following indexes are usually
    stored
  • Value indexes for each value appearing in the
    tree there is a list of nodes containing the
    value
  • Element indexes for each element name appearing
    in the tree, there is a list of nodes with the
    corresponding element
  • Sometimes also structure indexes for certain
    XPath expressions, there is a list of nodes that
    satisfy the expression

57
Example Value Indexes
1
transaction
2
account
11
4
buy
sell
3
89-344
12
shares
5
shares
13
14
ticker
6
7
30
ticker
15
100
exch
8
exch
16
17
NYSE
GE
10
9
NYSE
WEBM
58
Example Element Indexes
1
transaction
2
account
11
4
buy
sell
3
89-344
12
shares
5
shares
13
14
ticker
6
7
30
ticker
15
100
exch
8
exch
16
17
NYSE
GE
10
9
NYSE
WEBM
59
Example Structure Indexes
1
transaction
2
account
11
4
buy
sell
3
89-344
12
shares
5
shares
13
14
ticker
6
7
30
ticker
15
100
exch
8
exch
16
17
NYSE
GE
10
9
NYSE
WEBM
60
Query Processing
  • Suppose that we only have value indexes and
    element indexes
  • How should we process the query //buy//exch ?
  • Strategy 1 Find buy elements. Then traverse the
    subtree of these elements to look for exch
    elements
  • Strategy 2 Find exch elements. Then traverse the
    ancestors of these elements to look for buy
    elements
  • Which is a better strategy?

61
//buy//exch Strategy 1
1
transaction
2
account
11
4
buy
sell
3
89-344
12
shares
5
shares
13
14
ticker
6
7
30
ticker
15
100
exch
8
exch
16
17
NYSE
GE
10
9
NYSE
WEBM
62
//buy//exch Strategy 2
1
transaction
2
account
11
4
buy
sell
3
89-344
12
shares
5
shares
13
14
ticker
6
7
30
ticker
15
100
exch
8
exch
16
17
NYSE
GE
10
9
NYSE
WEBM
63
Improving the Execution
  • Instead of storing a running id for each element,
    store triple (start, end, level)
  • Find buy elements
  • Find exch elements
  • Merge these two lists by finding exch elements
    that are nested within buy elements
  • Level is used in case we are interested in
    finding children, not descendents

64
//buy//exch Improved
Start
End
Level
buy
(4,10,2)
Merge the 2 lists by finding descendent elements
exch
(15,17,4)
(8,9,4)
65
Merging Lists
  • What is the complexity of merging the lists?
  • Is it enough to go through each list once?
  • Assuming the lists are sorted by start?
  • Example Suppose we want to find all pairs of a
    and b such that b is a descendent of a

a
b
a
b
b
66
Merging Lists Example
  • Example Suppose we want to find all pairs of a
    and b such that b is a descendent of a

a
(3,6,2)
(1,7,1)
(5,5,3)
b
(4,4,3)
(2,2,2)
67
Merging Lists Example
  • Example Suppose we want to find all pairs of a
    and b such that b is a descendent of a

a
(3,6,2)
(1,7,1)
(5,5,3)
b
(4,4,3)
(2,2,2)
68
Merging Lists Example
  • We did extra work
  • Need a method to find the correct place to start
    in the b list

a
(3,6,2)
(1,7,1)
(5,5,3)
b
(4,4,3)
(2,2,2)
69
Minimizing the Work
  • Several algorithms have been defined to minimize
    the amount of work required, by identifying
    exactly where to restart
  • See
  • Shu-Yao Chien, Zografoula Vagena, Donghui Zhang,
    Vassilis J. Tsotras, Carlo Zaniolo, Efficient
    Structural Joins on Indexed XML Documents
    Proc.of VLDB 2002
  • Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas,
    Jingesh M. Patel, Divesh Srivastava, Yuqing Wu,
    Structural Joins A Primitive for Efficient XML
    Query Pattern Matching, ICDE 2002
  • Nicolas Bruno, Nick Koudas, Divesh Srivastava,
    Holistic Twig Joins Optimal XML Pattern
    Matching, ACM SIGMOD 2002

70
Tree Pattern Can Be Computed From Structural
Relationships
Descendent edge
Child edge
book
title
author
jane
XML
71
Stack-Tree Algorithms Intuition
  • A depth-first traversal of a tree can be
    performed in linear time, using a stack as large
    as the height of the tree.
  • An ancestor-descendant structural relationship is
    manifested as the ancestor appearing higher on
    the stack than the descendant.
  • Unfortunately, a depth-first traversal requires
    going over all the tree.

72
Stack-Tree Algorithms
  • We present 2 algorithms
  • Stack-Tree-Desc that returns the result ordered
    by (desc-start, anc-start)
  • Stack-Tree-Anc that returns the result ordered by
    (anc-start, desc-start)
  • Why is the ordering of the result of interest?

73
Stack-Tree-Desc
  • a Alist-gtfirst node d Dlist-gtfirst node
    OutputList NULL
  • while (lists are not empty)
  • if (a.startPos lt d.startPos) then e a else e
    d
  • while (e.startPos gt stack.Top().endPos)
    stack.Pop()
  • if (e a)
  • stack.Push(a)
  • a a-gtnextNode
  • else
  • for each a in stack do append (a, d) to
    OutputList
  • d d-gtnextNode

74
Stack-Tree-Desc //section//paragraph
article
paragraph
section
paragraph
paragraph
section
paragraph
paragraph
section
paragraph
paragraph
Bla,..Bla,..
Bla,..Bla,..
75
Stack-Tree-Desc //section//paragraph
article
Alist
paragraph
section
paragraph
paragraph
section
paragraph
paragraph
section
paragraph
paragraph
Bla,..Bla,..
Bla,..Bla,..
76
Stack-Tree-Desc //section//paragraph
article
Dlist
paragraph
section
paragraph
paragraph
section
paragraph
paragraph
section
paragraph
paragraph
Bla,..Bla,..
Bla,..Bla,..
77
Stack-Tree-Desc //section//paragraph
article
d7
a1
paragraph
section
d1
a2
d6
paragraph
paragraph
section
a3
d2
d5
paragraph
paragraph
section
d4
d3
paragraph
paragraph
Bla,..Bla,..
Bla,..Bla,..
78
Stack-Tree-Desc //section//paragraph
Alist
a1
a2
a3
Dlist
d1
d4
d2
d5
d3
d6
article
d7
paragraph
a1
section
d1
d6
paragraph
paragraph
a2
section
d5
d2
paragraph
paragraph
a3
section
d4
d3
paragraph
paragraph
Bla,..Bla,..
Bla,..Bla,..
79
Stack-Tree-Desc
a1 d1 a2 d2 a3 d3 d4 d5 d6 d7
d7
a1
d1
d6
a2
d2
d5
a3
d3
d4
a3
a2
Stack
a1

(a1,d3),(a2,d3),(a3,d3)
(a1,d1)
(a1,d2),(a2,d2)
Output
(a1,d4),(a2,d4),(a3,d4)
(a1,d5),(a2,d5)
(a1,d6)
80
Analysis of Stack-Tree-Dec
  • O(Alist Dlist OutputList) for
    ancestor-descendant structural relationships.
  • Each Alist element is pushed once and popped
    once, so stack operations take O(Alist).
  • The inner for loop outputs a new pair each
    time, so its total time is O(OutputList).

81
XML and XQuery for integration
  • XML facilitates integration
  • Richer data model
  • Nesting, path queries, irregular structure
  • Schema can be used for optimization
  • As in relational
  • XML is lingua franca for many heterogeneous
    systems
  • Mediator query processor data model XML
  • Underlying systems often relational
  • Going back and forth XML to relational
Write a Comment
User Comments (0)
About PowerShow.com