Title: A Query Algebra for Fragmented XML Stream Data
1A Query Algebra for FragmentedXML Stream Data
- Sujoe Bose
- Leonidas Fegaras
- David Levine
- Vamsi Chaluvadi
- University of Texas at Arlington
2Processing Streamed XML Data
- Most web servers are pull-based
- A client submits a request, the server returns
the requested data. - This doesnt scale very well for large number of
clients and large query results. - Alternative method pushed-based dissemination
- The server broadcasts/multicasts data in a
continuous stream - The client connects to multiple streams and
evaluates queries locally - No handshaking, no error-correction
- All processing is done at the client side
- The only task performed by the server is slicing,
scheduling, and broadcasting data - Critical data may be repeated more often than
no-critical data - Invalid data may be revoked
- New updates may be broadcast as soon as they
become available.
3A Framework for Processing XML Streams
- The server slices an XML data source into XML
fragments. Each fragment - is a filler that fills a hole
- may contain holes which can be filled by other
fragments - is wrapped with control information, such as its
unique hole ID, the path that reaches this
fragment, etc. - The client opens connections to streams and
evaluates XQueries against these streams - For large streams, its a bad idea to reconstruct
the streamed data in clients memory - need to process fragments as soon they become
available from the server - There are blocking operators that require
unbounded memory - Sorting
- Joins between two streams or self-joins
- Group-by with aggregation.
4The Fragmented Hole-Filler Model
ltcommoditiesgt ltvendorgt ltnamegt Wal-Mart
lt/namegt ltitemsgt ltstreamhole
id"10" tsid"5"/gt ltstreamhole id"20"
tsid"5"/gt ... lt/vendorgt
... lt/commoditiesgt
ltstreamfiller id"10" tsid"5"gt ltitemgt
ltnamegt PDA lt/namegt ltmakegt HP lt/makegt
ltmodelgt PalmPilot lt/modelgt ltprice
currency"USD"gt315.25ltpricegt
lt/itemgt lt/streamfillergt
ltstreamfiller id"20" tsid"5"gt ltitemgt
ltnamegt Calculator lt/namegt ltmakegt Casio
lt/makegt ltmodelgt FX-100 lt/modelgt ltprice
currency"USD"gt50.25ltpricegt lt/itemgt lt/streamfi
llergt
5An Algebra for Stored XML Data
- Based on the nested-relational algebra
- ?v(T) access the XML data source T using v
- ?pred(X) select fragments from X that satisfy
pred - ?v1,.,vn(X) project
- X ? Y merge
- X predY join
- ?predv,path (X) unnest (retrieve descendents of
elements) - ?pred?,h (X) apply h and reduce by ?
- ?gs,predv,?,h(X) group-by gs, apply h to each
group, - and reduce each group by ?
6Semantics
- ?v(T) lt v T gt
- ?pred(X) t t ? X, pred(t)
- ?v1,.,vn(X) ltv1t.v1,,vnt.vngt t ? X
- X ? Y X Y
- X predY tx ? ty tx ? X, ty ? Y,
pred(tx,ty) - ?predv,path(X) t ? ltvwgt t ? X, w ?
PATH(t,path), pred(t,w) - ?pred?,h (X) ?/ h(t) t ? X, pred(t)
- ?gs,predv,?,h (X)
7Example 1
where
8Example 1 (cont.)
??,element(book,b/title)
?
b/publisherAddison-Wesley and b/_at_year gt 1991
b
?
v/bib/book
v
?
document(http//www.bn.com)
9Example 2
- for u in document(users.xml)//user_tuple
- return ltusergt u/name
- for b in document(bids.xml
)//bid_tupleuseridu/userid/itemno - i in document(items.xml)//ite
m_tupleitemnob - return ltbidgt i/description/text()
lt/bidgt - sortby(.)
- lt/usergt
- sortby(name)
?
sort, elem(bid,i/description/text())
i/itemnob
sort(u/name), elem(user,u/name