Title: Introduction To XML Algebra
1Introduction To XML Algebra
- Wan Liu
- Bintou Kane
- Advanced Database
- Instructor Elka
- 2/11/2002
- 1
2Outline
- Reasons for XML algebra
- Niagara algebra
- ATT Algebra
3Data Model and Design
- We need a clear framework to design a database
- A data model is like creating different data
structures for appropriate programming usage. It
is a type system, it is abstract. - Relational database is implemented by tables, XML
format is a new one method for information
integration.
4Why XML Algebra?
- It is common to translate a query language into
the algebra. - First, the algebra is used to give a semantics
for the query language. - Second, the algebra is used to support query
optimization.
5XML Algebra History
- Lore Algebra (August 1999)
- -- Stanford University
-
- IBM Algebra (September 1999)
- --Oracle IBM Microsoft Corp
- YAT Algebra (May 2000)
- ATT Algebra (June 2000)
- --ATT Bell Labs
- Niagara Algebra (2001)
- -- University of Wisconsin -Madison
6NIAGARA
- Title Following the paths of XML Data An
algebraic framework for XML query evaluation - By Leonidas Galanis, Efstratios Viglas, David
J. DeWitt, Jeffrey. F. Naughton, and David Maier.
7OutLine
- Concepts of Niagara Algebra
- Operations
- Optimization
8Goals of Niagara Algebra
- Be independent of schema information
- Query on both structure and content
- Generate simple,flexible, yet powerful algebraic
expressions - Allow re-use of traditional optimization
techniques
9Example XML Source Documents
Invoice.xml ltInvoice_Documentgt ltinvoice No
1gt ltaccount_numbergt2 lt/account_numbergt
ltcarriergtATTlt/carriergt lttotalgt0.25lt/totalgt
lt/invoicegt ltinvoicegt ltaccount_numbergt1
lt/account_numbergt ltcarriergtSprintlt/carriergt
lttotalgt1.20lt/totalgt lt/invoicegt
ltinvoicegt ltaccount_numbergt1 lt/account_numbergt
ltcarriergtATTlt/carriergt lttotalgt0.75lt/totalgt
lt/invoicegt lt/Invoice_Documentgt
- Customer.xml
- ltCustomer_Documentgt
- ltcustomergt
- ltaccountgt1 lt/accountgt
- ltnamegtTom lt/namegt
- lt/customer gt
- ltcustomergt
- ltaccountgt2 lt/accountgt
- ltnamegtGeorge lt/namegt
- lt/customer gt
- lt/Customer _Documentgt
10XML Data Model and Tree Graph
Invoice_Document
ltInvoice_Documentgt ltinvoicegt
ltnumbergt2lt/numbergt ltcarriergtSprintlt/carriergt
lttotalgt0.25lt/totalgt lt/invoicegt
ltinvoicegt ltnumbergt1lt/numbergt ltcarriergtSprintlt/car
riergt lttotalgt1.20lt/totalgt lt/invoicegt lt/Invoice
_Documentgt
Invoice
Invoice
number
carrier
number
total
total
carrier
2
ATT
0.25
1
1.20
Sprint
Ordered Tree Graph, Semi structured Data
11XML Data Model GVDNM01
- Collection of bags of vertices.
- Vertices in a bag have no order.
- Example
Root invoice.xml invoice
invoice.account_number
lt account_number gt element-content lt/
account_number gt
ltinvoicegt Invoice-element-content lt/invoicegt
Rootinvoice.xml, invoice, invoice.
account_number
12Data Model
- Bag elements are reachable by path expressions.
- The path expression consists of two parts
- An entry point
- A relative forward part
- Example account_numberinvoice
13Operators
- Source S , Follow ?, Select ?, Join , Rename
?, Expose ?, Vertex ?, Group ?, Union ?,
Intersection ?, Difference - , Cartesian Product
?.
14 Source Operator S
- Input a list of documents
- Output a collection of singleton bags
- Examples
- S () All Known XML documents
- S (invoice.xml) All XML documents whose
filename matches - invoice.xml
- S (,schema.dtd) All known XML documents that
conform to schema.dtd
15Follow operator ?
- Input a path expression in entry point notation
- Functionality extracts vertices reachable by
path expression - Output a new bag that consist of the extracted
vertex all the contents of the original bag (in
care of unnesting follow)
16Follow operator (Example)
Root invoice.xml , invoice, invoice.carrier
Root invoice.xml invoice
invoice.carrier
ltcarriergt carrier -element-content lt/carrier gt
ltinvoicegt Invoice-element-content lt/invoicegt
Unnesting Follow
?(carrierinvoice)
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
Root invoice.xml , invoice
17Select operator ?
- Input a set of bags
- Functionality filters the bags of a collection
using a predicate - Output a set of bags that conform to the
predicate - Predicate Logical operator (?,?,?), or simple
qualifications (?,?,?,?,?,?)
18Select operator (Example)
Root invoice.xml , invoice,
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
? invoice.carrier Sprint
Root invoice.xml invoice
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
ltinvoicegt Invoice-element-content lt/invoicegt
Root invoice.xml , invoice, Root invoice.xml
, invoice,
19Join operator
- Input two collections of bags
- Functionality Joins the two collections based on
a predicate - Output the concatenation of pairs of pages that
satisfy the predicate
20Join operator (Example)
Root invoice.xml , invoice, Root customer.xml ,
customer
Root invoice.xml invoice
Root customer.xml customer
ltinvoicegt Invoice-element-content lt/invoicegt
ltcustomergt customer-element-content lt/customergt
account_number invoice numbercustomer
Root invoice.xml invoice
Root customer.xml customer
ltinvoicegt Invoice-element-content lt/invoicegt
ltcustomergt customer-element-content lt/customergt
Root invoice.xml , invoice
Root customer.xml , customer
21Expose operator ?
- Input a list of path expressions of vertices to
be exposed - Output a set of bags that contains vertices in
the parameter list with the same order
22Expose operator (Example)
Root invoice.xml , invoice.bill_period,
invoice.carrier
Root invoice.xml invoice.
bill_period invoice.carrier
ltcarriergt bill_period -element-content lt/carrier gt
ltinvoicegt carrier-element-content lt/invoicegt
?(bill_period,carrier)
Root invoice.xml invoice
invoice.carrier invoice.bill_period
ltcarriergt bill_period -element-content lt/carrier gt
ltinvoicegt Invoice-element-content lt/invoicegt
ltinvoicegt carrier-element-content lt/invoicegt
Root invoice.xml , invoice, invoice.carrier,
invoice.bill_period
23Vertex operator ?
- Creates the actual XML vertex that will encompass
everything created by an expose operator - Example
? (Customer_invoice)?(?(account)invoice.account_
number, ?(inv_total)invoice.total)
24Other operators
- Group ? is used for arbitrary grouping of
elements based on their values - Aggregate functions can be used with the group
operator (i.e. average) - Rename ? Changes the entry point annotation of
the elements of a bag. - Example ?(invoice.bill_period,date)
25Example XML Source Documents
Invoice.xml ltInvoice_Documentgt
ltinvoicegt ltaccount_numbergt2 lt/account_numbergt
ltcarriergtATTlt/carriergt lttotalgt0.25lt/totalgt
lt/invoicegt ltinvoicegt ltaccount_numbergt1
lt/account_numbergt ltcarriergtSprintlt/carriergt
lttotalgt1.20lt/totalgt lt/invoicegt
ltinvoicegt ltaccount_numbergt1 lt/account_numbergt
lttotalgt0.75lt/totalgt lt/invoicegt ltauditorgt
maria lt/auditorgt lt/Invoice_Documentgt
Customer.xml ltCustomer_Documentgt
ltcustomergt ltaccountgt1 lt/accountgt ltnamegtTom
lt/namegt lt/customer gt ltcustomergt ltaccountgt
2 lt/accountgt ltnamegtGeorge lt/namegt
lt/customer gt lt/Customer _Documentgt
26Xquery Example
- List account number, customer name, and invoice
total for all invoices that has carrier
Sprint.
- FOR i in (invoices.xml)//invoice,
- c in (customers.xml)//customer
- WHERE i/carrier Sprint and
- i/account_number c/account
- RETURN
- ltSprint_invoicesgt
- i/account_number,
- c/name,
- i/total
- lt/Sprint_invoicesgt
27Example Xquery output
- ltSprint_Invoicegt
- ltaccount_numbergt1 lt/account_numbergt
- ltnamegtTom lt/namegt
- lttotalgt1.20lt/totalgt
- lt/Sprint_Invoice gt
28Algebra Tree Execution
Account_number name total
Expose (.account_number , .name, .total )
invoice(2) customer(1)
Join (.invoice.account_number.customer.account)
invoice (2)
Select (carrier Sprint )
customer (2)
customer(1)
Invoice (1)
invoice (2)
invoice (3)
Follow (.invoice)
Follow (.customer)
Source (Invoices.xml)
Source (cutomers.xml)
29Optimization with Niagara
- Optimizer based on the Niagara algebra
- Use the operation more efficiently
- Produce simpler expression by combining
operations -
30Language Convention
- A and B are path expressions
- Alt B --? Path Expression A is prefix of B
- AnB ---? Common prefix of path A and B
- AnB ---? Greatest common of path A and B
- - ---? Null path Expression
31Use of Rule 8.5
- Make profit of rule 8.5
- Allows optimization based on path selectivity
- When applying un-nesting follow operation Fµ
32- Fµ(A) Fµ(B)Fµ (B)Fµ (A)
- True When
- Exist C / C ltA C lt B
- C AnB
- Or AnB -
- Interchangeability of Follow operation
33Application of 8.5 With Invoice
- Fµ(acc_Numinvoice)Fµ(carrierinvoice)
- ?
- Fµ(carrierinvoice)Fµ(acc_Numinvoice)
- Both Share the common prefix invoice
- Case AnB invoice
34Benefit of Rule Application
- Note if
- acc_Num required for each invoice Element
- carrier is not required for invoice Element
- Then using
- Fµ(acc_Numinvoice)Fµ(acc_Numcustomer)
- make more sense than Why?
35- Reduction of Input Size on the first
- Sub-operation
- Fµ(carrierinvoice)
- Should we or can we apply the 8.5 below?
- Fµ(acc_Numinvoice)Fµ(acc_NumCustomer)
- Why?
36- acc_Numinvoice and
- acc_NumCustomer are totally different path
- Case is AnB - Then yes
37Rule 8.7 , 8.9 , 8.11 Interesting Helps identify
- When and where to use selection ?
- to decrease size of input operation to subsequent
operation - Example Algebra tree slide 28
- Selected before join.
38Addition would be
- Give computation for finding when rule can be
applied automatically in a case and then apply
it. -
39 40(No Transcript)
41ATT Algebra Introduction
- The algebra is derived from the nested relational
algebra. - ATT algebra makes heavy use of list
comprehensions, a standard notation in the
function programming community. - ATT algebra uses the functional programming
language Haskell as a notation from presenting
the algebra.
42ATT data model
- The data model merges attribute and element
nodes, and eliminates comments. - Declare Basic Type Node.
- Text String -gtnode
- elem Tag -gt Node -gtnode
- ref Node -gtNode
-
elem bib elem book elem _at_year
text 1999 , elem title text Data on
the web
- ltbibgt
-
- ltbook year1999gt
- lttitlegt Data on the Weblt/titlegt
- ltyeargt 1999lt/yeargt
- lt/bookgt
- lt/bibgt
43Basic Type Declarations
- To find the type of a node,
- isText Node -gt Bool
- isElem Node -gt Bool
- isRef Node -gt Bool
- For a text node, string Node -gt String
- For an element node,
- 1)tag Node -gt Tag
- 2)children Node -gt Node
- For a reference node,
- dereference Node -gt Node
44Nested relational algebra
- In the nested relational approach, data is
composed of tuples and lists. - Tuple values and tuple types are written in round
brackets. - (1999,"Data on theWeb","Abiteboul")
(Int,String,String) - Decompose values
- year (Int,String,String)
- year (x,y,l) x
45Nested relational algebra
- Comprehensions List comprehensions can be used
to express fundamental query operations,
navigation, cartesian product, nesting, joins. - Example value x
- x lt- children book0, is "author" x
- gt "Abiteboul"
- Normal expression exp qual1,...,qualn
- bool-exp
- pat lt- list-exp
46Nested relational algebra
- Using comprehensions to write queries.
- Navigate
- follow Tag -gt Node -gt Node
- follow t x y y lt- children x, is t y
- Cartesian product
- (value y, value z)
- x lt- follow "book" bib0,
- y lt- follow "title" x,
- z lt- follow "author" x
- gt ("Data on the Web", "Abiteboul")
-
47Nested relational algebra
- elem "reviews"
- elem "book"
- elem "title" text"Data on the Web" ,
- elem "review" text "This is great!"
(value y, int (value z), value w) x lt- follow
"book" bib0, y lt- follow "title" x, z lt- follow
"_at_year" x, u lt- follow "book" reviews0, v lt-
follow "title" u, w lt- follow _at_year" u, y v
gt ("Data on the Web", 1999, "This is
great!")
elem bib elem book elem _at_year text
1999 , elem title text Data on the
web
48Nested relational algebra
- Regular expression matching
( (x,y,u) x lt- item "_at_year", y lt- item
"title", u lt- rep (item "author") )
Reg (Node,Node,Node )
Match Reg a -gt Node-gt a
Result
match reg0 book0 gt (elem "_at_year" text
"1999", elem "title" text "Data on the
Web", elem "author" text "Abiteboul",
elem "author" text "Buneman", elem "author"
text "Suciu" )
49Nested relational algebra
- Sorting.
-
- sortBy (a -gt a -gt Bool) -gt a -gt a
- sortBy (lt) 3,1,2,1 gt 1,1,2,3
- Grouping
- groupBy (a -gt a -gt Bool) -gt a -gt a
- groupBy () 3,1,2,1 2,1,1,3
50Cross Comparisons of Algebra
- Niagara and ATT standalone XML algebras
- Niagara proposed after W3C had selected proposed
standard - and has operators which operate on sets of
bags -
- AtT algebra chosen as proposed standard by W3C
- -- expressions resemble high level query
language - -- latest version of document referred to as
- Semantics of XML Query Language XQuery
51Future Work
- Need more different evaluation strategies which
would allow for flexible query plans - Develop physical operators that take advantage of
- physical storage structures and generate
mapping from - query tree to a physical query plan
-