An XML Query Engine for NetworkBound Data - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

An XML Query Engine for NetworkBound Data

Description:

Use a parser to read. and query XML documents. Store XML docs as raw files. Use a DOM XML parser and then evaluate the query ?xml version='1.0'? book. magazine ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 32
Provided by: Har15
Category:

less

Transcript and Presenter's Notes

Title: An XML Query Engine for NetworkBound Data


1
An XML Query Engine for Network-Bound Data
  • VLDB J. 11(4) 380-402 (2002)

2
What is it about?
  • Tukwila An XQuery processing system
  • Evaluates XQuery expressions over streaming XML
    documents
  • It (claims that) can be used as a data
    integration system an XQuery may involve
    different XML documents from different sources

3
Storing and querying XML
  • How to implement an XPath or XQuery processor?
  • It depends on how you store XML documents
  • shred XML documents into relations
  • store XML documents in an
  • appropriate data model (e.g B-trees)
  • store XML documents as raw files

faster query evaluation (indexing, storing
auxiliary information)
Use a parser to read and query XML documents
4
Store XML docs as raw files
  • Use a DOM XML parser and then evaluate the query

Secondary Storage
Memory
DOM tree
store
Query
lt?xml version1.0?gt .......... .. .. .
. .
/store//book
booklist1
booklist2
Results
magazine
magazine
book
book
book book
  • Why not?
  • Must first parse the whole document
  • Must keep the whole dom tree into memory
  • Bad for web applications where first results
    must be fetched quickly

5
Store XML docs as raw files
  • Or use a SAX XML parser..

Secondary Storage
Memory
SAX Parser
Query
lt?xml version1.0?gt .......... .. .. .
. .
Next Element
/store//book
store
booklist1
book
magazine
booklist2
  • Why?
  • No need to parse the entire document into memory
  • Handles XML document as a stream
  • But
  • More difficult to evaluate a query over a stream
    of XML document

6
Store XML docs as raw files
  • Or use a SAX XML parser

XML Source
SAX Parser
Data Stream
Next Element
Query
lt?xml version1.0?gt .......... .. .. .
. .
/store//book
Network
Despite that, it could be efficient for web
applications, data integration, etc.
Thats where Tukwila system is based on!
7
Tukwila System - Architecture
XML Producer
Streams of Partly Tagged tuples
XQuery optimizer
XQuery
Data Stream
XML trees
Simple XPath expressions
Streams of tuples
XML Tree Manager
8
The query optimizer
  • Input an XQuery expression
  • Output a query plan involving available
    operators
  • XScan operators
  • In the leaves of the plan tree
  • Reads a stream of an XML document
  • Evaluates one or more simple XPath expressions
    over the stream
  • Outputs a pipelined relation
  • Web-join operators for joining two XML documents
  • Other operators for projecting, joining,
    sorting, XML construction etc.

9
The query optimizer
10
The XScan operator
  • Input An XML data stream, several XPath
    expressions
  • Output A pipelined relation (streaming tuples)
  • For each reference to a different XML document,
    the XScan operator is involved, which evaluates
    several simple XPath expressions, derived from
    the XQuery, and pipelines a denormalized relation
    having a column for each bind variable

11
The XScan operator
12
The XScan Operator
  • For each XPath expression in the for clause of
    the XQuery statement
  • a separate deterministic finite automaton is
    created
  • a separate relation is created with columns
    corresponding to the binding variables
  • The final state of each automaton corresponds to
    a new binding for the respective variable

13
The XScan operator
Deterministic automata
At first one relation for each binding variable
XML Tree Manager
b
a
d
14
The XScan operator
  • As the stream is parsed by the SAX parser the
    automata change states
  • Once an automaton reaches a final state
  • a new node is bound to the respective binding
    variable
  • a new tuple is inserted into the respective
    relation having the value of the binding variable
    in the respective column
  • If the node is an attribute or a simple element,
    the value stored is the actual simple value of
    the node
  • If the node is a complex element, the value
    stored is a reference to the element in the XML
    tree, maintained in the XML Tree Manager in text
    form

15
The XScan operator
Deterministic automata
db
At first one relation for each binding variable
XML Tree Manager
b
a
d
db
16
The XScan operator
Deterministic automata
book
At first one relation for each binding variable
XML Tree Manager
b
a
d
1
db book
17
The XScan operator
Deterministic automata
authors
At first one relation for each binding variable
XML Tree Manager
b
a
d
1
db book title XQuery from the
Experts authors
18
The XScan operator
Deterministic automata
authors
At first one relation for each binding variable
XML Tree Manager
b
a
d
1
db book title XQuery from the
Experts authors authorDon
Chamnerlim
Don Chammerlin
19
The XScan operator
Deterministic automata
authors
At first one relation for each binding variable
XML Tree Manager
b
a
d
1
db book title XQuery from the
Experts authors authorDon
Chamnerlim authorMichael Kay
authorDenise Draper
Don Chammerlin
Michael Kay
Denise Draper
20
The XScan operator
Deterministic automata
details
At first one relation for each binding variable
b
a
d
1
Don Chammerlin
2
Michael Kay
Denise Draper
21
The XScan operator
  • The output of the XScan operator is the join of
    the relations corresponding to the automata
  • The join takes place gradually
  • Whenever a new tuple is to be inserted into the
    root relation
  • all the current relations are joined
  • the result relation is pipelined
  • all the tuples in the current relations are
    deleted
  • The new tuple is inserted into the root
    relation
  • and so on

22
The XScan operator
Deterministic automata
At first one relation for each binding variable
which are later joined into a single relation
a
b
d
Don Chammerlin
1
2
Michael Kay
1
2
Denise Draper
1
2
23
The XScan operator - problems
  • Only simple XPath expressions are handled
  • Selection predicates simple predicates over
    values, e.g. a/b_at_d4, but not a/bh/j/t5
  • Only forward paths
  • The output relation may be huge due to
    denormalization
  • The entire XML document is also kept as text (in
    slight different format)

24
The web-join operator
  • In a XQuery expression a join between different
    XML documents may occur

25
The web-join operator
  • One way to do this is to use an XScan operator to
    read from each source and then join the result
    relations using a relational join algorithm
  • What if the join with the second source is highly
    selective?
  • What if a source requires input values before it
    returns an answer? (e.g. an online bookseller may
    require an author or title)

26
The web-join operator
  • The web-join operator is inspired by the
    dependent join used by distributed relational
    query processing
  • The main idea
  • read data from the one source (with an XScan
    operator)
  • send results to the other source (via http post
    or SOAP)
  • read the answer (again with an XScan operator)
  • join the two relations.

27
The web-join operator
  • The big question How to send the results to a
    source?
  • The paper doesnt explain!
  • It claims that it uses HTTP POST or SOAP
    requests, but this requires
  • knowing the query capabilities of the sources
  • XPath? Simple sequences of values??
  • Knowing their schema

28
XML Construction
  • In XQuery, the return clause builds a tree and
    inserts references to binding variables within
    this tree
  • Special operators (output, element, result) are
    used which add structural information in the
    final binding tuples and finally output the
    result in XML form

29
XML Construction
For Return ltbookgt ltnamegt lst fst
lt/namegt ltbublishergt p lt/publishergt
lt/bookgt
XQuery expression
binding tuple
lst fst p
fst
lst
name/2
p
publisher/1
book/2
30
Performance
31
Conclusions
  • Query over streaming data is useful in
    web-applications
  • Automata may be used for evaluating XML queries
    over streaming XML data
  • A data integration system should forward queries
    to the sources and then combine the results
  • The Tukwila system process and evaluates queries
    in the mediator
  • The sources sent entire XML documents to the
    mediator
  • Web-join is quite vague and not sufficient
Write a Comment
User Comments (0)
About PowerShow.com