Processing XQuery queries - PowerPoint PPT Presentation

1 / 149
About This Presentation
Title:

Processing XQuery queries

Description:

brooch' id='item1' id='item2' 15. Nodes and node identity ... brooch' 2. 15. 4. 3. 6 'Art Nouveau. gold pin' Item name. before. item description. Also node names ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 150
Provided by: proje50
Category:

less

Transcript and Presenter's Notes

Title: Processing XQuery queries


1
Processing XQuery queries
Andrei Arion (andrei.arion_at_inria.fr) (Slides of
Ioana Manolescu) Gemo group, INRIA Futurs
2
Context XML query processing
  • Several possible flows
  • Data comes from persistent (disk-based) storage
  • First load, then query
  • Query processing "at first sight"
  • Data is queried when it is first seen
  • Not our topic LMP02, FSC03, BCF03, FHK03

3
XML query processing scenarios
Loading
endElem("XML") ... beginElem("XML")
ltXML docgt
ltXML docgt
Query
In-memory data structures
Java / Lisp / CAML...
Querying
Query
Result
Result
Result
4
XML query processing scenarios (1/2)
  • "Persistent store"
  • Logging / archiving an ongoing activity
  • Clients, orders, products...
  • Structured text (documentation, news, image
    annotations, scientific data...)
  • Warehousing XML
  • "At first sight"
  • Fast processing of incoming documents
  • Web service messages
  • Workflow coordination
  • Many small documents to process
  • In-memory, programming language approach feasible

5
XML query processing scenarios (2/2)
  • "Persistent store"
  • Heavier
  • Needs loading
  • All DBMS goodies
  • Set-at-a-time processing
  • Query optimization
  • Persistence
  • Transactions
  • Concurrence control
  • View-based management...
  • "At first sight"
  • Lighter
  • May blend easily into a programming framework
  • Data marshalling is a pain
  • In real life, there are not just databases

6
Processing XML queries in a persistent store
roadmap
7
Roadmap
  • Part 1 Query processing stages
  • Part 2 Storing XML documents in a persistent
    database
  • Requirements
  • Storage models
  • Impact on query processing
  • Part 3 Some physical query execution methods
  • Part 4 Logical-level optimizations for XQuery

8
Part 1 query processing stages
9
Query processing stages in a persistent database
First stages in query processing query parsing
optimization
Query parser
Query optimizer
Logical query plan
Logical query plan
XQuery
Abstract Syntax Tree
Physical query plan
Logical query plan
Logical query plan
10
Query processing stages in a persistent database
Last stage in query processing query execution
Execution engine
Physical query plan
Storage system
11
Comparison with relational databases
12
Part 2Storing XML documents in a persistent
database
13
2.0 Requirements for an XML storage method
14
Requirements for an XML storage method
  • Completeness
  • Must preserve all information content of the
    document
  • Amenable to efficient processing
  • Navigation queries benefit from fragmentation
  • Reconstruction queries suffer from fragmentation
  • Flexible
  • Generic
  • Must not require precise schema information
  • Human-understandable

15
Sample XML document
auctions
item
item
id"item1"
id"item2"
name
description
comment
name
description
"Gold pin"
"Surveillance camera"
"Remarkable dragon brooch"
text
parlist
text
parlist
"8 channel"
listitem
listitem
"Art Nouveau gold pin"
listitem
parlist
text
text
listitem
listitem
"also pendant"
"Video recorder, 8 dome cameras"
text
text
"Large diamond"
"Circa 1900"
16
Nodes and node identity
  • 7 types of nodes document, element, attribute,
    text, namespace, processing instruction, comment
    XQDM
  • Element, attribute, namespace, PI nodes have a
    unique identity
  • (ElemID, attr name) determine attr. value ? key
    issue is element identity
  • In-memory processing "the pointer is the ID"
  • Persistent stores must materialize some
    persistent IDs (not necessarily for all elements)

17
Assigning persistent IDs to elements
1
auctions
item
item
id"item1"
id"item2"
2
16
17
15
name
description
comment
name
description
18
4
3
"Gold pin"
"Surveillance camera"
"Remarkable dragon brooch"
text
parlist
19
20
5
text
parlist
6
"8 channel"
7
listitem
"Art Nouveau gold pin"
listitem
13
21
listitem
parlist
8
text
14
text
22
9
11
listitem
listitem
"also pendant"
"Video recorder, 8 dome cameras"
10
text
12
text
"Large diamond"
"Circa 1900"
18
Data values
...
  • Text nodes
  • Level 0 bunch ofstrings
  • Level 1 strings, numbers,booleans
  • Level 2 bags of words, numbers, boolean
  • This is still a simplification

item
id"item2"
price
name
description
"200"
"Surveillance camera"
text
parlist
"8 channel"
listitem
text
"Video recorder, 8 dome cameras"
19
Document structure relationships among nodes
  • Level 0 store parent-child relationships
  • Given a node, it must be possible to find
  • Its children
  • Its parent
  • Parent-child relationships between elements
  • "Ownership" relationships between an element and
    an attribute
  • "Text value" relationships between elements and
    text
  • Elements may have several text children

20
Document structure relationships among nodes
1
auctions
item
item
id"item1"
id"item2"
2
16
17
15
name
description
comment
name
description
18
4
3
"Gold pin"
"Surveillance camera"
"Remarkable dragon brooch"
text
parlist
19
20
text
parlist
5
6
  • Element 1 is parent of elements 2 and 16
  • Element 2 has the attribute id"item1"
  • Element 3 has the text child "Gold pin"

21
Document structure order and names
  • Nodes in an XML document appear in a well-defined
    total order
  • It must be possible to retrieve this order

...
item
2
Item name before item description Also node
names
15
name
description
comment
4
3
"Gold pin"
"Remarkable dragon brooch"
5
text
parlist
6
"Art Nouveau gold pin"
7
listitem
listitem
8
22
Document structure, revisited
  • Structure
  • Invariants ("constraints", "schema", "DTD")
  • Instances (particular instantiations of a degree
    of freedom left by the invariants)
  • Many formalisms for specifying the invariants
  • DTDs, XML Schemas
  • Structural summaries GW97
  • Graph schemas FS98
  • Richer constraint languages

23
Invariants in XML document structure
1
auctions
item
item
id"item1"
id"item2"
2
16
17
15
name
description
comment
name
description
18
4
3
"Gold pin"
"Surveillance camera"
"Remarkable dragon brooch"
All element children of the ltauctionsgt element
are named ltitemgt
text
parlist
18
19
5
text
parlist
5
"8 channel"
7
listitem
"Art Nouveau gold pin"
listitem
13
20
listitem
parlist
8
text
14
text
21
9
11
listitem
listitem
"also pendant"
"Video recorder, 8 dome cameras"
10
text
12
text
"Large diamond"
"Circa 1900"
24
Invariants in XML document structure
1
auctions
item
2
item
id"item1"
id"item2"
16
17
15
name
description
comment
name
description
18
3
3
"Gold pin"
"Surveillance camera"
"Remarkable dragon brooch"
Each ltitemgt element has an attribute labeled "id"
text
parlist
18
19
5
text
parlist
5
"8 channel"
7
listitem
"Art Nouveau gold pin"
listitem
13
20
listitem
parlist
8
text
14
text
21
listitem
listitem
9
11
"also pendant"
"Video recorder, 8 dome cameras"
text
10
12
text
"Large diamond"
"Circa 1900"
25
Invariants in document structure
"All itemID values in adocument are unique."
  • "For each value of theitem attribute of
    anltopen_auctiongt element, there exists an ltitemgt
    element having the itemID attribute with the
    same value"
  • Change the "item" attribute into a subelement...

26
Instance structure information
1
auctions
item
item
id"item1"
id"item2"
2
17
18
15
name
description
comment
name
description
19
4
3
"Gold pin"
"Surveillance camera"
"Remarkable dragon brooch"
The first ltitemgt element has a ltcommentgt child
the second ltitemgt element does not.
text
parlist
18
20
5
text
parlist
5
"8 channel"
7
listitem
listitem
"Art Nouveau gold pin"
13
21
listitem
parlist
8
text
14
text
22
listitem
listitem
9
11
"also pendant"
"Video recorder, 8 dome cameras"
text
10
12
text
"Large diamond"
"Circa 1900"
27
Instance structure information
1
auctions
item
item
id"item1"
id"item2"
2
16
16
The ltlistitemgt element number 6 hasa ltparlistgt
child the ltlistitemgt elem. number 8, 10, 12, 20
do not have ltparlistgt children.
14
name
description
comment
name
description
18
3
3
"Gold pin"
"Surveillance camera"
"Remarkable dragon brooch"
text
parlist
19
20
5
text
parlist
5
"8 channel"
listitem
"Art Nouveau gold pin"
listitem
14
7
21
listitem
parlist
8
text
15
text
22
9
11
listitem
listitem
"also pendant"
"Video recorder, 8 dome cameras"
10
text
12
text
"Large diamond"
"Circa 1900"
28
Storage completeness summary
  • Need to store
  • Node identity and order
  • (Typed) data values
  • Document structure invariants particular
    instances
  • Many invariants is good (regular data)... but
    they should remain small (to handle easily)
  • DTDs, XML Schemas are there, but do not express
    all desirable constraints
  • Complex constraints require special care for
    updates

29
XPath and XQuery performance requirements for an
XML store
  • Path expressions result in duplicate-free lists
    of nodes E.g. //parlist//listitem
  • Lots of existential semantics
  • 1, 2 2, 3 true
  • 2, 3 3, 4 true
  • 1, 2 3, 4 false
  • 1, 2 ! 3, 4 true
  • Complicates indexing
  • Path expressions return whole subtrees
  • Element constructors copy whole subtrees

parlist
listitem
listitem
parlist
listitem
listitem
30
XPath and XQuery performance requirements for an
XML store
  • The semantics of x op y is defined by a large
    switch on the types associated to x and y
  • Thus, type information should be available
    during execution automata-based implementation
    (see typing course)
  • Type information is also helpful before
    execution, for optimization
  • Also used to organize the storage (see later)
  • Many atomic types exist a correct implementation
    should support them
  • We will focus on storing ordered trees with
    simple values

31
Where (most) persistent storage systems stand
  • Provide selective data access
  • Provide set-at-a-time execution primitives, for
    a language whose formal semantics is defined
    tuple-at-a-time (recall this is allowed by FS !)
  • Advantage performance
  • Selective data access
  • Set-at-a-time processing allows for better disk
    access locality
  • Hash join vs. nested loops join
  • Naive navigation-based tuple-at-a-time path query
    processors have complexity exponential in the
    size of the query GKP02 set-at-a-time scales
    better.

32
Where (most) persistent storage systems stand
  • Provide set-at-a-time execution primitives, for
    a language whose formal semantics is defined
    tuple-at-a-time (recall this is allowed by FS !)
  • Disadvantages
  • Must make explicit effort to prove correctness
  • Some features may not be doable this way
  • But they do not seem to be the essential ones

33
2.1Persistent storage models for XML
34
Storage models for XML
  • They are determined by
  • data model tuples or trees
  • fragmentation strategy choice of invariant
  • Choose some property node name, node path,
  • Group together all tuples/trees that have the
    same value for the same property E.g. table A
    contains all A elements E.g. collection C1 has
    all trees on path /A/B
  • Store each group in a separate structure

35
Problems related to the choice of fragmentation
(1)
  • Access path selection choosing the best way to
    retrieve from disk a given data set
  • Parameters data model, available storage
    structures
  • Relational scan vs. index-based access
  • Object navigation vs. scanning class extents

36
Problems related to the choice of fragmentation
(2)
  • View-based query rewriting reformulating a query
    based on a set of (materialized) views
  • Parameters data model, view and query languages
  • Local-as-view data integration

Mediator schema
view definitions
Source schema 2
Source schema 1
37
Problems related to the choice of data
fragmentation (3)
  • Materialized view/index selection choosing a set
    of views to materialize to speed up certain
    queries (supposes view-based query rewriting)
  • Parameters query language, query workload,
    space
  • Materialized view selection in a data warehouse
    all sales of beer, or sales of beer in May,
    or sales of beer in France by brand...
  • Supported in current-release relational DBMSs
  • For XML ?

38
Problems related to the choice of data
fragmentation (4)
  • Automatic index and view selection choosing a
    set of views and indexes to speed up certain
    queries
  • Parameters query language, query workload,
    space
  • Index tuning wizards (SQL Server, DB2)

39
The simplest store no fragmentation(introduced
for OEM PGW95 )
  • OEM Object exchange model
  • Labeled, directed, unordered graph of objects
  • Objects have unique identity
  • Atomic objects values (simple atomic types)

40
Storing OEM objects in LORE MAG97
  • Objects clustered in pages in depth-first order,
    including simple value leaves
  • Basic physical operator Scan(obj, path)

Auctions
item
item
open_auctions
object
object
comment
name
description
auction
description
auction
name
...
bids
bids
initial
initial
...
...
...
...
41
Navigation in a persistent graph
  • Navigation-based, tuple-at-a-time, pointer-chasing

Scan(Auctions, "item")
2 pages accessed
Auctions
item
item
open_auctions
object
object
comment
name
description
auction
description
auction
name
...
bids
bids
initial
initial
...
...
...
...
42
Navigation in a persistent graph
  • Scan(Auctions, "item.description")

4 pages accessed
4 pages accessed
Scan(Auctions, "open_auctions.auction.object")
Auctions
item
item
open_auctions
object
object
name
comment
description
auction
description
auction
name
...
bids
bids
initial
initial
...
...
...
...
43
Indexing objects in a graph MW97,MWA98,MW99a,MW9
9b
  • VIndex(l, o, pred) all objects o with an
    incoming l-edge, satisfying pred
  • LIndex(o, l, p) all parents of o via an l-edge
  • "Reverse pointers"
  • BIndex(x, l, y) all edges labeled l

select X from Auction.open_auctions.auction
X where X.initial lt 10
tuple at a time
bulk access
44
Indexing objects in a graph MW97
  • PIndex(p, o) all objects o reachable by the path
    p

select X from Auction.open_auctions.auction.initia
l X where X.initial lt 10
Set at a time
Tuple at a time
45
The idea behind path indexes DataGuides GW97
46
The idea behind path indexes DataGuides GW97
  • Graph-shaped summaries of graph data
  • Invariants extracted from the data ("a posteriori
    schema")
  • Groups all nodes reachableby the same paths

Auctions
oid1
open_auctions
item
oid12
oid2,oid15
object
auction
comment
description
oid15,oid16
name
initial
bids
oid40,oid41
oid22,oid23
oid24,oid25
oid30,oid31
oid20,oid21
...
...
47
Another graph indexing model graph schema FS98
  • Simplifications of the data instance (less
    labels)

48
Path query optimization based on graph schemas
FS98
  • Prune path queriesAuctions..description..listi
    tem becomesAuctions.item.description.listitem
    schema-based simplification
  • Associate extents to graph states, and use
    theextents to answer indexing

49
More on graph indexing
  • Graph indexing
  • Partition nodes into equivalence classes
  • Store the extent of each equivalence class, use
    it as "pre-cooked" answer to some queries
  • Equivalence notions
  • Reachable by some common paths DataGuide MW97
  • Reachable by exactly the same paths 1-index
    MS99 or, equivalently,indistinguishable by any
    forward path expression
  • Indistinguishable by any (forward and backward)
    path expression FB Index ABS99,KBN02
  • Indistinguishable by the (forward and backward)
    path expressions in the set Q covering index
    KBN02
  • Indistinguishable by any path expression of
    length lt k A(k) index KSB02

50
FB index
  • Group together nodes reachable by exactly the
    same paths
  • Path language
  • Navigate along one edge in both directions
  • Navigate along any number of edges, in both
    directions
  • n1 n2 for any path expression p, either n1 and
    n2 are in the answer of p, or neither are in the
    answer of p.

1
a
a
a
a
Data graph
FB index
2
4
3
5
b
c
b
b
6
5
7
8
51
Computing the FB index
All nodes in A have a parent in B, or none does.
All nodes in A have a parent in group BCD...,
or none does.
All nodes in any group have parents in exactly
the same groups
  • Group of nodes A is succ-stable wrt B iffA ?
    succ(B) or A and succ(B) are
    disjoint

A partition is succ-stable if all node groups are
pairwise succ-stable.
All nodes in any group have children in exactly
the same groups
  • Index built on succ-stable partition
  • if there is an edge from A to B in the index,
  • then from every node in ext(A) there is an edge
    to node in ext(B)

Pred- and succ-stable partition leads to an index
covering all path expressions
A partition is pred-stable if all node groups are
pairwise pred-stable.
The FB index is the smallest graph index both
pred-stable and succ-stable.
52
Computing the FB index
  • P label partition on the nodes (XML style)
  • Repeat
  • Reverse edges in the graph.
  • Refine P to make it succ-stable.
  • Reverse back edges in the graph.
  • Refine P to make it pred-stable.
  • Until P does no longer change.

53
Computing the FB index
  • P label partition on the nodes (XML style)
  • Repeat
  • Reverse edges in the graph.
  • Refine P to make it succ-stable.
  • Reverse back edges in the graph.
  • Refine P to make it pred-stable.
  • Until P does no longer change.

54
Computing the FB index
  • P label partition on the nodes (XML style)
  • Repeat
  • Reverse edges in the graph.
  • Refine P to make it succ-stable.
  • Reverse back edges in the graph.
  • Refine P to make it pred-stable.
  • Until P does no longer change.

55
Summary persistent graph / tree storage and
indexing
  • Very simple storage models
  • Quite simple value indexing MWA98
  • Multiple graph schema/index structures
  • Identify invariants / regularity / interesting
    node groups
  • Use interesting node groups
  • Simplify path queries
  • Basis for indexing
  • Store IDs of all nodes in an interesting group.
    Access them directly (avoid navigation).
  • Formalism behind it bisimulation ABS99

56
2.2Persistent storage models for XMLthe
relational option
57
Storing XML documents in relational databases
XML document
Load in RDBMS
Recompose document
  • Query processing
  • Translate XQuery in SQL
  • Run SQL query
  • Serialize XML result
  • RDBMS
  • ACID properties
  • Performance

58
Relational storage models for XML
  • It must be possible to store any document
  • Documents may have an XML schema
  • Relational schemas seek to
  • Exploit / identify structural invariants
  • "Tables for fixed or frequent structures"
  • Factorize or avoid storage of labels
  • Factorize storage of values
  • Compared to graph storage, relational XML storage
    is "already" partitioning / organizing the data
    (even without index)

59
Classification of relational storage schemes for
XML documents
  • No XML schema assumption ( "Tree OEM")
  • Generic same relational schema for all
  • Derive relational schema from the data
  • Based on an XML schema
  • Based only on an XML schema
  • Based on schema and cost information
  • User-defined

60
Simplest relational model "Edge" FK99
auctions
1
item
2
id"item1"
4
description
3
comment
"Remarkable dragon brooch"
parlist
5

6
listitem
listitem
10
7
parlist
text
General (no schema or queries used) No regularity
assumed ID may reflect document order Index on
pID, and (name,target)
listitem
listitem
8
9
"also pendant"
text
text
"Large diamond"
"Circa 1900"
61
Path query processing on "Edge"
//item //comment/text() //item_at_id"item1"/
description
select target from Edge where name"item" select
target from Edge where name"comment" select
e3.target from Edge e1, Edge e2, Edge
e3 where e1.name"item" and
e1.targete2.pID and e2.name"id" and
e2.target"item1" and
e1.targete3.pID and e3.name"descr"
Actual query answering also requires
reconstructing the element... will discuss that
later
3-way join on the Edge table
62
Partitioned "Edge"
  • EdgeAuction(pID,ord,target)
  • EdgeItem(pID,ord,target)
  • EdgeID(pID,ord,target)
  • EdgeDescription(pID,ord,target)

Similar to graph index,but this is the storage
Interesting groups of nodes those with the same
label Store tags in schema, not in data Some code
on the side keeps the mapping between tags and
table names
3-way join on (much) smaller tables
63
Other storage schemes derived from "Edge"
  • Universal relation FK99EdgeAuction
    EdgeItem EdgeID ...
  • Query-dictated materialized views over the
    partitioned Edge tablesEdgeAuction
    EdgeItemEdgeAuction EdgeID
  • Materialized view selection problem

64
Relational schemas based on frequent tree
structures DFS99
  • Analyze elements to find frequent patterns
  • Create one table for each type with enough
    support in the databaseE.g. one relation for
    articles with up to 3 author
  • Stores the remaining elements in overflow
    tablesE.g. the remaining article authors
  • Materialized views over partitioned Edge, based
    on pattern frequency
  • Performance benefits less joins for many queries
  • "Interesting node groups" are frequent ones
  • Most tags stored in the schema, not in the data

65
Path partitioning SKV00
  • 1 relation per root-to-leaf data path Path(id1,
    id2, ord)
  • 1 relation per root-to-inner node data path
    Val(id, val)

auctions
1
auctions (id1, id2, ord) auctions.item(id1, id2,
ord) auctions.item.id(id1, id2,
ord) auctions.item.name(id1, id2,
ord) ... auctions.item.id.val (id,
val) auction.item.name.val (id, val) ...
item
id"item1"
2
4
3
5
7
name
description
comment
15
6
"Gold pin"
"Remarkable dragon brooch"
16
10
text
parlist
8
11
listitem
listitem
"Art Nouveau gold pin"
12
9
66
Path partitioning SKV00
  • (Very) similar to graph indexes
  • IDs are stored once per descendent
  • Values stored once
  • No join required for linearpath expressions
    wildcards
  • Dataguide is neededto find elementary paths for
    a given path expression

auctions (id1, id2, ord) auctions.item(id1, id2,
ord) auctions.item.id(id1, id2,
ord) auctions.item.name(id1, id2,
ord) ... auctions.item.id.val (id,
val) auction.item.name.val (id, val) ...
67
Path partitioning SKV00
  • IDs are stored once per descendent
  • Values stored once
  • No join required for linearpath expressions
    wildcards
  • Dataguide is neededto find elementary paths for
    a given path expression

auctions
//listitem
item
id
name
description
comment
text
parlist
text
text
listitem
text
parlist
text
listitem
text
68
XRel path partitioning interval encoding
YAT01
Factorize path information in Path table (1,
"/auctions")(2, "/auctions/item")(3,
/auctions/item/name") Assign region IDs Start,
End to nodes N1 ancestor of N2 iffN1.Start lt
N2.Start andN1.End gt N2.End Children
relationship extra condition on paths
  • Path(PathID,PathExpr)
  • Element(DocID, PathID, Start, End, Ordinal)
  • Text(DocID, PathID, Start, End, Value)
  • Attribute(DocID, PathID, Start, End, Value)

auctions
69
XRel path partitioning interval encoding
YAT01
  • Path(PathID,PathExpr)
  • Element(DocID, PathID, Start, End, Ordinal)
  • Text(DocID, PathID, Start, End, Value)
  • Attribute(DocID, PathID, Start, End, Value)
  • Processing regular path expressions match PE
    against Path.PathExpr (string pattern matching)
  • Thanks to region IDs, no joins required for
    linear path expressions
  • Theta-joins still needed for branching path
    expressions
  • match paths, then use indexes on Element.pathID

70
XParent path partitioning materialized paths
JLW01
  • LabelPath(ID, lengh, pathExpr)
  • Data(pathID, nID, ord, value)
  • Element(pathID, ord, nID)
  • ParentChild(pID, cID)
  • Same string-based path encoding scheme as XRel
  • Drops region-based IDs, to avoid theta-joins
  • Parent-child relationships stored as with Edge
  • Ancestor-descendent relationships established
    based on paths
  • XRel, XParent still single large tables, only
    viable with indices

71
Classification of relational storage schemes for
XML documents
  • No XML schema assumption ( "Tree OEM")
  • Generic same relational schema for all
  • Derive relational schema from the data
  • Based on an XML schema
  • Based only on an XML schema
  • Based on schema and cost information
  • User-defined

72
DTD-derived relational storage schemes STH99
  • Use a DTD graph derived from the DTD

lt!ELEMENT a (b, f, d)gt lt!ELEMENT b
(c,d)gt lt!ATTLIST b bID REQUIRED IDgt lt!ELEMENT c
(e, g)gt lt!ELEMENT d PCDATAgt lt!ELEMENT e (b?)
gt lt!ATTLIST e bIDRef IDREFgt lt!ELEMENT
f PCDATAgt lt!ELEMENT g PCDATAgt
73
"Basic" model STH99
  • 1 relation for every DTD graph node
  • XML documents can be rooted at any element
  • This relation contains all data contents
    accessible from the DTD graph node, except
  • children (create separate relation)
  • nodes with backpointers (group idref with
    parent) A(a.ID, a.f.val, a.d.val)B(b.ID,
    b.parentID, b.val, bID.val, b.d.val)A.B(a.b.ID,
    a.b.parentID, a.b.bID.val, a.b.d.val)C(c.ID,
    c.parentID, c.e.val, c.e. bIDRef.val)B.C(b.c.ID,
    b.c.parentID, b.c.val)C.G(c.g.ID, c.g.parentID,
    c.g.val)D(d.ID, d.parentID, d.val) E(e.ID,
    e.parentID, e.val, e. bIDRef.val)F(f.ID,
    f.parentID, f.val)G(g.ID, g.parentID, g.val)

74
"Basic" model STH99
  • Node labels stored in the relational schema
  • Parent-child links materialized // requires
    joins
  • Path information split between rel. schema and
    DTD
  • Many tables, unions required, redundancy
  • A(a.ID, a.f.val, a.d.val)B(b.ID, b.parentID,
    b.ID.val, b.d.val)A.B(a.b.ID, a.b.parentID,
    a.b.ID.val, a.b.d.val)C(c.ID, c.parentID,
    c.e.val, c.e. bIDRef.val)B.C(b.c.ID,
    b.c.parentID)C.G(c.g.ID, c.g.parentID,
    c.g.val)D(d.ID, d.parentID, d.val) E(e.ID,
    e.parentID, e.val, e. bIDRef.val)F(f.ID,
    f.parentID, f.val)G(g.ID, g.parentID, g.val)

75
"Shared" model STH99
  • 1 relation for every DTD graph node with
    in-degree 0 or gt1, or under a or backpointer
  • Less relations, no redundancy
  • Requires joins for reconstruction
  • Order support is complicated by ID absence

a
A(a.ID, a.f.val, a.f.isRoot)B(b.ID, b.parentID,
b.ID.val)C(c.ID, c.parentID, c.e.val, c.e.
bIDRef.val, c.e.isRoot)D(d.ID, d.parentID,
d.parentCode, d.val) G(g.ID, g.parentID,
g.val)

b
f

?
c
d

e
g
76
"Hybrid" model STH99
  • 1 relation for every DTD graph node with indegree
    0, under a , or under a backpointer
  • Less relations, no redundancy
  • Requires joins for reconstruction
  • Order support is complicated by ID absence

a

A(a.ID, a.f.val, a.f.isRoot, a.d.val)B(b.ID,
b.parentID, b.ID.val, b.d.val)C(c.ID,
c.parentID, c.e.val, c.e. bIDRef.val,
c.e.isRoot)D(d.ID, d.parentID, d.val) G(g.ID,
g.parentID, g.val)
b
f

?
c
d

e
g
77
DTD-derived models
  • Labels stored in the schema only some IDs stored
  • E-R analysis of data in XML
  • "If every A has 1 D, and all Ds have A parents,
    then D is a property of A otherwise, D is an
    entity"
  • DTD (graph) must be kept and exploited to
    translate queries, handle inlined elements, ...
  • Long path queries may be simplified, then matched
    accessing few tables
  • To bind many branches, (several) joins still
    needed

78
Classification of relational storage schemes for
XML documents
  • No XML schema assumption ( "Tree OEM")
  • Generic same relational schema for all
  • Derive relational schema from the data
  • Based on an XML schema
  • Based only on an XML schema
  • Based on schema and cost information
  • User-defined

79
Relational models based on a query workload
BFR02
  • Adapt relational schemas to documents (XML
    schema), queries
  • Problem different XML schemas may accept the
    same sets of documents, yet produce different
    relational schemas !
  • Fixed storage for a given XML schema (shared)
  • Transformation rules for moving between
    equivalent XML Schemas
  • Cost-based optimization in the space of possible
    transformations
  • Impact of a given XML Schema transformation
    estimated by the RDBMS's optimizer

80
Cost-based choice of a relational schema
XML schema
RDBMS optimizer
81
Schema transformation rules driven by workload
queries
  • Inlining / Outlining
  • type Ab Integer, C, d, type Ce String
    equivalent totype Ab Integer, e String,
    d
  • Inlining useful if C is always queried through
    ancestor A
  • Union Factorization / Distribution
  • (a, (bc)) equivalent to (a, b) (a, c)
  • at1t2 equivalent to at1 at2
  • Useful to separate if at1 often queried
    together, at2 rarely or never queried together

82
Schema transformation rules driven by workload
queries
  • Repetitions merge / split
  • a equivalent to (a, a)
  • If the first ltagt is isolated, it can be inlined
    with parent
  • Wildcard rewritings
  • Ab String equivalent to a b (cd),
    where ctag1String and d(! tag1)String
  • If a/b/tag1 often queried, a/b/other never
    queried, separate them.

83
Cost-based choice of relational schema
  • Materialized view selection for a dataset and
    workload
  • This is the storage
  • Optimizer estimates can be wrong, but the
    optimizer will make the same mistake when
    choosing the best plan
  • Search space explored
  • Node labels factorized in the schema
  • Schema management module needed to identify
    pertinent relations
  • Various points in the search space vary the
    number of unions and joins required by a query

84
Classification of relational storage schemes for
XML documents
  • No XML schema assumption ( "Tree OEM")
  • Generic same relational schema for all
  • Derive relational schema from the data
  • Based on an XML schema
  • Based only on an XML schema
  • Based on schema and cost information
  • User-defined

85
User-defined storage model for XML MFK01,DS03
Does each item have exactly one description ?
Does each item have exactly one price ?
  • Express (relational) storage by custom
    (algebraic) expressions over the XML document
  • Relation materialized view over the XML
    document
  • Finding useful tables requires view-based query
    rewriting
  • R(y,z) - Auctions.item x, x._at_id.text() y,
    x.price.text() z
  • S(u,v) - Auctions.item t, t._at_id.text() u,
    t.description.text() v
  • for x in //item
  • return ltresgt x/price, x/description lt/resgt

Is Auctions.item the same as //item ?
Is _at_id a key for item ?
Not so fast.
86
User-defined storage model for XML MFK01,DS03
Does each item have exactly one description ?
Does each item have exactly one price ?
  • Express (relational) storage by custom
    expressions over the XML document
  • Relation materialized view over the XML
    document
  • Finding useful tables requires view-based query
    rewriting

Is Auctions.item the same as //item ?
  • XPath containment
  • Functional dependency
  • Cardinality constraints

Is _at_id a key for item ?
Query containment /rewriting under
constraints Techniques based on the chase DS03
87
User-defined storage model for XML MFK01,DS03
  • Express (relational) storage by custom
    expressions over the XML document
  • Relation materialized view over the XML
    document
  • Other issue we would want to write
  • R(XID, y,z) - Auctions.item x, x.id XID,
    x._at_id.text() y, x.price.text() z
  • S(XID, u,v) - Auctions.item t, x.id XID
    t._at_id.text() u, t.description.text() v
  • x.id not in the document ! But, in the data model
    XQDM

88
User-defined storage model for XML
  • Express (relational) storage by custom
    expressions over the XML
  • Must check storage completeness
  • Most generic potential for good performance
    (materialized views !)
  • Can also express non-relational storage models
  • Rewriting is complex.
  • Poor man's solution cut in flexibility (and
    performance)
  • Less freedom in the mappings
  • Assign IDs to all elements
  • Map each element to a table...

89
Summary of relational storage models for XML
  • Relations alone only go that far
  • Many solutions around S-P-J materialized view
    selection over partitioned Edge table
  • Flexible (or generic) storage requires view-based
    query rewriting
  • Interesting performance advantages stem from
    various encodings path, ID, ...
  • Fragmentation (horizontal/vertical) facilitates
    navigation and complicates reconstruction

90
Part 2.3 Native XML storage. Conclusions and
examples
91
Native XML storage models
  • Native storage persistent trees JKC02,FHK02
  • XML nodes split among disk pages
  • Logical XML document Physical
    representation
  • Node representation optimized based on fixed page
    size
  • Storage evolution on updates B-trees FHK02

92
Native XML storage and indexing models
  • Persistent trees only allow navigation ?
  • Approach taken in Timber (U. Michigan) and Natix
  • Attach region-based labels to XML elements
  • Use structural joins structural join or region
    labels
  • Index labels by tag. This largely outperforms
    navigation HBG03.
  • XQueC approach (Gemo U. Calabria MAB04 )
  • Attach region-based labels to XML elements
    partition region labels by incoming path (as
    storage) MAB04
  • Partition identifiers by the paths
  • More concise (tags stored in schema)
  • Less duplicate elimination, joins more unions
    required
  • More native encoding storage indexing schemes
  • WKF03, K03 based on binary tree encoding,
    RM04,ZKO04

93
Native storage and indexing models summary
  • Overall transformation
  • Document -- logical encoding/labeling -- mapping
    to physical store
  • Some labeling schemes reduce the number of joins
    (ORDPATHS)
  • Navigation outperformed by index-based access
  • Persistent trees provide for fast serialization
  • Construction of new, complex result remains
    complex
  • Likely to yield interesting encoding and storage
    solutions

94
Part 3 query execution techniques
95
Plan
  • Labeling schemes and their desireable properties
  • Fundamental primitive for path queries
    structural joins
  • XML element reconstruction

96
Why start with labeling schemes ?
  • Idea assign labels to XML elements
  • unique identifiers
  • useful information for query processing
  • Source of big performance improvements over
    relational storage traditional joins
  • Many labeling schemes
  • trade-off between space occupancy, information
    contents, and suitability to updates CKM02
  • most frequent one region-based ("pre-post")
  • shortcomings and alternatives
  • new ones still being produced

97
Region-based labeling schemes
ltbgt child of ltagt region of ltbgt included in region
of ltagt
  • Idea label elements to reflect nesting
    (containment)
  • ltagt Some content here and then a ltbgt element lt/bgt
    lt/agt
  • Label ltagt with 1, 2, and ltbgt with 2, 1
  • Add also nesting level
  • label ltagt with 1, 2, 1
  • label ltbgt with 2, 1, 2
  • Variant start and end counting in characters in
    the file (recycled IR indexing technique)

98
Region-based structural identifiers
1,0,22
auctions
16,1,21
2,1,14
item
item
id"item1"
id"item2"
15,2,13
3,2,1
4,2,12
17,2,15
18,2,20
name
description
comment
name
description
"Gold pin"
"Surveillance camera"
"Remarkable dragon brooch"
19,3,16
20,3,19
6,3,11
text
parlist
text
parlist
5,3,2
"8 channel"
7,4,8
13,4,10
21,4,18
listitem
listitem
"Art Nouveau gold pin"
listitem
8,5,7
14,5,9
22,5,17
parlist
text
text
9,6,4
11,6,6
listitem
listitem
"also pendant"
"Video recorder, 8 dome cameras"
10,7,3
12,7,5
text
text
"Large diamond"
"Circa 1900"
99
The interest of region-based identifiers
  • Element e1 is an ancestor of element e2 iff
    e1.pre lt e2.pre and e1.post gt e2.post
  • Element e1 is a parent of element e2 iffe1.pre
    lt e2.pre and e1.post gt e2.post and
    e1.depth1e2.depth
  • Establishing ancestor-descendent relationships
    becomes as easy as establishing parent-child
    relationships
  • Structural join (see next)

The end of navigation !
100
Query primitive structural join
  • Relationship established throughsimple
    comparisons
  • select from Element e1, Element e2where e1.doc
    e2.doc and e1.pre lt e2.pre and e1.post gt
    e2.post
  • Non-equality join !
  • Relational implementations perform poorly
    ZND01
  • Alternative efficient stack-based algorithms
    AJK02
  • Algorithm StackTreeDesc AJK02, holistic twig
    joins BKS02
  • Supporting indexes JLW03, join ordering
    WPJ03

101
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g
2, 5, 2 3, 3, 3 7, 14, 2 11,12, 3
5, 2, 4 10, 7, 4 13,10, 4 14,11, 4
102
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g
3, 3, 3 7, 14, 2 11,12, 3
5, 2, 4 10, 7, 4 13,10, 4 14,11, 4
2, 5, 2
103
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g

7, 14, 2 11,12, 3
5, 2, 4 10, 7, 4 13,10, 4 14,11, 4
3, 3, 3 2, 5, 2
104
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g

7, 14, 2 11,12, 3
5, 2, 4 10, 7, 4 13,10, 4 14,11, 4
2, 5, 2, 5, 2, 4 3, 3, 3, 5, 2, 4
3, 3, 3 2, 5, 2
105
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g

7, 14, 2 11,12, 3
10, 7, 4 13,10, 4 14,11, 4
2, 5, 2, 5, 2, 4 3, 3, 3, 5, 2, 4
3, 3, 3 2, 5, 2
106
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g

7, 14, 2 11,12, 3
10, 7, 4 13,10, 4 14,11, 4
2, 5, 2, 5, 2, 4 3, 3, 3, 5, 2, 4
2, 5, 2
107
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g

7, 14, 2 11,12, 3
10, 7, 4 13,10, 4 14,11, 4
2, 5, 2, 5, 2, 4 3, 3, 3, 5, 2, 4
108
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g

11,12, 3
10, 7, 4 13,10, 4 14,11, 4
2, 5, 2, 5, 2, 4 3, 3, 3, 5, 2, 4
7, 14, 2
109
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g

10, 7, 4 13,10, 4 14,11, 4
2, 5, 2, 5, 2, 4 3, 3, 3, 5, 2, 4
11,12,3 7, 14, 2
110
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g

13,10, 4 14,11, 4
2, 5, 2, 5, 2, 4 3, 3, 3, 5, 2,
4 7, 14, 2, 10, 7, 4
11,12,3 7, 14, 2
111
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g

13,10, 4 14,11, 4
2, 5, 2, 5, 2, 4 3, 3, 3, 5, 2,
4 7, 14, 2, 10, 7, 4
11,12,3 7, 14, 2
7 ,14, 2, 13,10,4 11,12,3, 13,10,4
112
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g

14,11, 4
2, 5, 2, 5, 2, 4 3, 3, 3, 5, 2,
4 7, 14, 2, 10, 7, 4
11,12,3 7, 14, 2
7, 14, 2, 13,10,4 11,12,3, 13,10,4
7, 14, 2, 14,11,4 11,12,3, 14,11,4
113
Algorithm StackTreeDesc
Result //b join //g
Stack
//b
//g

2, 5, 2, 5, 2, 4 3, 3, 3, 5, 2,
4 7, 14, 2, 10, 7, 4
7, 14, 2, 13,10,4 11,12,3, 13,10,4
7, 14, 2, 14,11,4 11,12,3, 14,11,4
114
Remarks on StackTreeDesc
Result //b join //g
  • I/O cost in O(mn) stack
  • Does not compute //b//g (duplicates !)
  • Returns result in descendent order
  • Similar algorithm for result in ancestor order


2, 5, 2, 5, 2, 4 3, 3, 3, 5, 2,
4 7, 14, 2, 10, 7, 4
7, 14, 2, 13,10,4 11,12,3, 13,10,4
7, 14, 2, 14,11,4 11,12,3, 14,11,4
115
Skipping structural joins
  • Avoid useless comparisons avoid scanning all the
    input

a1
a2
a3
d1
d2
d3
d5
d6
d4
Can be replaced with
This requires searching the descendent by pre
116
Structural joins summary BKS02
  • Linear complexity join algorithms based on region
    identifiers
  • Sub-linear variants exist, based on skipping
  • Algorithmic vs. disk I/O reduction
  • Based on ordered storage/indexes
    JLW03,MAB04,G02
  • Holistic twig joins reduce intermediary results
  • Cost of matching a twig pattern
  • Data access cost
  • Join cost (including intermediary results)
  • Sort cost Duplicate elimination cost

Depends on storage/index
Depends on algorithm
Depends on join order, query plan
117
Region-based identifiers and updates
  • Possible solutions
  • Leave empty intervals LM01
  • Use real numbers JKC02
  • Multi-versioning CTZ02
  • Append another "discriminating" label to pre,
    depth, post

16,2,21
item
id"item2"
17,3,15
18,3,20
name
description
...
19,4,16
18,4,16
text
parlist
"8 channel"
listitem
text
...
118
Other labeling scheme ORDPATHS NNP04
1
  • Implemented in MS SQL Server
  • Label each node by a sequence of integer numbers
  • Initial loading all odd numbers
  • This scheme records order depth
    parentancestor-descendent relationships.

1.1
1.3
1.5
1.3.1
1.3.3
Insert first sibling decrease leftmost code by
2 Insert last sibling increase rightmost code by
2 Insert between siblings use even numbers (does
not affect node depth)
119
Internal representation of ORDPATHs
1
  • L0 O0 L1 O1 ... Li Oi ... where
  • Oi is the i-th number
  • Li encodes Oi's length
  • Li's must satisfy prefix property
  • Partition -max, max in intervals of length 2i
  • Build prefix tree on the partitioning

1.1
1.3
1.5
1.3.1
1.3.3
0
-21,-6 -5,-2 -1,0 1,1
2,3 4,7 8,23
If Oi?-21,-6, Li000, -21 is 0000, ..., -6 is
1111 If Oi?-5,-2, Li001, -5 is 00, ...,
-2 is 11 If Oi?-1,0, Li01, -1 is 0, 0
is 1 If Oi?1,1, Li10, Oi is not
represented If Oi?2,3, Li110, 2 is
0, 3 is 1 If Oi?4,7, Li1110, 4 is 00,
7 is 11 If Oi?8,23, Li1111, 8 is 0000, 23
is 1111
0
1
0
1
0
1
0
1
0
Moderate length on real-life documents MBF03
1
1
120
(Re)constructing XML elements
  • Query /Auctions/item
  • This concerns
  • /Auctions/item/Auctions/item/name/Auctions/item
    /_at_id/Auctions/item/comment/Auctions/item/descrip
    tion/Auctions/item/description/text/Auctions/ite
    m/description/parlist/Auctions/item/description/p
    arlist/listitem.............
  • Simplification item, comment, name

i1 - - i2 - n1 i3 c1 - i4 c2 n2 i5 c3
n3 i5 c3 n4 i5 c4 n3 i5 c4 n4
i1 - i2 - i3 c1 i4 c2 i5 c3 i5 c4
121
(Re)constructing XML elements
  • /Auctions/item

i1 i2 i3 c1 i4 c2 i5 c3 i5 c4
i1 i2 n1 i3 i4 n2 i5 n3 i5 n4
item i1,..i5
name n1,...n4
122
(Re)constructing XML elements
  • /Auctions/item
  • Two alternatives
  • Run P twice
  • Materialize intermediary results
  • Result of fragmentation
  • Same problem for complex returned results

i1 i2 i3 c1 i4 c2 i5 c3 i5 c4
i1 i2 n1 i3 i4 n2 i5 n3 i5 n4
Plan P
name n1,...n4
123
Case study 1 XQueCNative storage system based
on path partitioning
124
(No Transcript)
125
Query processing on path-partitioned storage
  • Linear path queries / or //-separated labels
  • Match the query on the path summary -gt summary
    nodes
  • Merge the ID sequences of the corresponding
    summary nodes
  • No sort, duplicate-elimination needed
  • More complex XQuery queries
  • Infer possible paths for every variable
  • Prune some paths based on path summary
    constraint-based minimization query
    simplification
  • Scan and join just IDs on the proper paths

126
//samerica//keyword
127
Query processing based on a path summary
  • for n in //item/name...Paths for n
    /site/regions/africa/item/name,
    /site/regions/asia/item/name,...
    /site/regions/samerica/item/name

Tag partitioning
128
Query processing based on a path summary
  • for v in (///) t1 (///) t2 (///) t3 ...
    (///) tk...
  • Path partitioning 1 merge
  • Tag partitioning extra data reads (k-1) way
    join
  • for v1 in p1, v2 in v1/p2, ..., vk in
    (v1v2...vk-1)/pk...
  • Path partitioning k-way join (need to bind all
    vi's !)
  • Tag partitioning m-way join, mS(length(pi))
  • Minimization
  • Eliminate existential branches //itemname
  • Reduce path sets //listitem//parlist
  • Reasoning about duplicates and order
  • //anc//desc iff anc is not recursive, anc and
    desc order coincide

129
Case study 2 ULoadGeneric XML storage and
materialized view managerhttp//www-rocq.inria.f
r/gemo/XAM
Andrei Arion INRIA Futurs and Univ. Paris XI,
France Veronique Benzaken Univ. Paris XI,
France Ioana Manolescu INRIA Futurs, France Ravi
Vijay IIT Bombay, India
130
ULoad prototype demonstration (VLDB 2005)
  • XML materialized view management for XQuery
  • Materialized view creation
  • Data extraction loading in native/relational
    repository
  • Query answering over the materialized views
  • Materialized view extraction from XQuery queries
  • Guidance in choosing views and writing queries
    satisfiability / answerability tests
  • Constraint-based query answering

131
Formalism for describing complex XML materialized
views XAMs
132
Loading XAMsin a store
133
Querying a database of XAMs
134
Logical query plans over XAMs
135
Testing query satisfiability
136
Testing query coverage by the stored XAMs
137
Behind the scene structural constraints
138
Conclusions
139
Conclusions
  • (XML) persistent databases need
  • 1. storage model
  • 2. indexes,
  • 3. materialized views
  • 4. efficient execution techniques
  • Choosing a storage model difficult because XML is
    so far from what should go on disk
  • Verbose (labels) type-heterogeneous
  • Difficult to reconciliate order, conciseness, and
    updates
  • Fragmentation favors "for" clause and hurts
    "return" clause (selective access vs. fast
    serialization)
  • Efficient execution techniques rely on labeling
    schemes

140
References
141
References
  • ABC04 A.Arion, A.Bonifati, G.Costa,
    S.D'Aguanno, I.Manolescu and A.Pugliese.
    "Efficient Query Evaluation over Compressed XML
    Data", EDBT 2004
  • ABS99 S.Abiteboul, P.Buneman and D.Suciu. "Data
    on the web from relations to semistructured data
    and XML", Morgan Kauffmann, 1999.
  • AJK02 S.Al-Khalifa, H.V.Jagadish, N.Koudas,
    J.Patel, D.Srivastava and Y.Wu. "Structural
    Joins A Primitive for Efficient XML Query
    Pattern Matching", ICDE 2002
  • BCF03 V.Benzaken, G.Castagna and A.Frisch.
    "CDuce an XML-centric general-purpose language",
    ICFP 2003

142
References
  • BFR02 P.Bohannon, J.Freire, P.Roy and
    J.Simeon. "From XML Schema to Relations A
    Cost-Based Approach to XML Storage", ICDE 2002
  • BKS02 N.Bruno, N.Koudas and D.Srivastava.
    "Holistic Twig Joins Optimal XML Pattern
    Matching", SIGMOD 2002
  • CKM02 E.Cohen, H.Kaplan and T.Milo. "Labeling
    Dynamic XML", PODS 2002
  • CSF01 B.Cooper, N.Sample, M.Franklin,
    G.Hjaltason and M.Shadmon. "A Fast Index for
    Semistructured Data", VLDB 2001
  • CTZ01 S.Chien, V.Tsotras, C.Zaniolo and
    D.Zhang. "Efficient complex query support for
    multiversion XML documents", EDBT 2002
  • DFS99 A.Deutsch, M.Fernandez and D.Suciu.
    "Storing Semistructured Data with STORED", SIGMOD
    1999
  • DT03 A.Deutsch and V.Tannen. "MARS A System
    for Publishing XML from Mixed and Redundant
    Storage", VLDB 2003

143
References
  • FHK02 T.Fiebig, S.Helmer, C-C. Kanne,
    G.Moerkotte, J.Neumann, R.Schiele and T.Westman.
    "Anatomy of a native XML base management system",
    VLDB Journal 2002.
  • FHK03 D.Florescu, C.Hillery, D.Kossmann,
    P.Lucas, F.Riccardi, T.Westmann, M.Carey,
    A.Sundararajan and G.Agrawal. "The BEA/XQRL
    Streaming XQuery Processor", VLDB 2003
  • FK99 D.Florescu and D.Kossmann. "Storing and
    Querying XML Using an RDBMS. IEEE Data
    Engineering Bulletin, 1999.
  • FS98 M.Fernandez and D.Suciu. "Optimizing
    Regular Path Expressions Using Graph Schemas",
    ICDE 1998
  • FSC03 M.Fernandez and J.Simeon and B.Choi and
    A.Marian and G.Sur. "Implementing XQuery 1.0 The
    Galax Experience", VLDB 2003

144
References
  • G02 T.Grust. "Accelerating XPath location
    steps", VLDB 2002
  • GKP02 G.Gottlob, C.Koch and R.Pichler.
    "Efficient Algorithms for Processing XPath
    Queries", VLDB 2002
  • GW97 R.Goldman and J.Widom. "DataGuides
    Enabling Query Formulation and Optimization in
    Semistructured Databases", VLDB 1997
  • HBG03 A.Halverson, J.Burger, L.Galanis,
    A.Kini, R.Krishnamurthy, A.Rao, F.Tian, S.Viglas,
    Y.Wang, J. Naughton and D.DeWitt. "Mixed Mode XML
    Querey Processing", VLDB 2003
  • JLW01 H.Jiang, H.Lu, W.Wang and J.Yu. "Path
    Materialization Revisited An Efficient Storage
    Model for XML Data", AICE 2001

145
References
  • JKC02 H.V.Jagadish, S.Al-Khalifa, A.Chapman,
    L.Lakshmanan, A.Nierman, S.Paparizos, J.Patel,
    D.Srivastava, N.Wiwatwattana, Y.Wu and C.Yu.
    "TIMBER a Native XML database", VLDB Journal
    2002
  • JLW03 H.Jiang, H.Lu, W.Wang and B.Ooi.
    "XR-Tree Indexing XML Data for Efficient
    Structural Joins", ICDE 2003.
  • K03 C.Koch. "Efficient Processing of Expressive
    Node-Selecting Queries on XML Data in Secondary
    Storage A Tree Automata-based Approach", VLDB
    2003
  • KBN02 R.Kaushik, P.Bohannon, J.Naughton and
    H.Korth. "Covering Indexes for Branching Path
    Queries", SIGMOD 2002
  • LM01 Q. Li and B.Moon. "Indexing and querying
    XML data for regular path expressions", VLDB 2001
  • LMP02 B. Ludascher, P.Mukhopadhyay and
    Y.Pap
Write a Comment
User Comments (0)
About PowerShow.com