Title: XML%20queries%20and%20updates
1XML queries and updates
- Introduction
- XML Data Model and Type System
- XML Queries
- Technical discussions on Xquery design
- XML Updates
- Conclusions
3What is XML?
- The Extensible Markup Language (XML) is the
universal format for structured documents and
data on the Web. - Base specifications
- XML 1.0, W3C Recommendation Feb '98
- Namespaces, W3C Recommendation Jan '99
4Simple XML Data Example
- ltbook year1967 xmlnsamzwww.amazon.comgt
- lttitlegtThe politics of experiencelt/titlegt
- ltauthorgtR.D. Lainglt/authorgt
- ltamzref amzisbn1341-1444-555/gt
- ltsectiongt
- The great and true Amphibian, whose
nature is disposed to.. - lttitlegtPersons and experiencelt/titlegt
Even facts become... - lt/sectiongt
- lt/bookgt
5The secrets of the XML success
- XML is a data representation format
- XML is universal
- XML is human readable
- XML is machine readable
- XML is international
- XML is platform independent
- XML is vendor independent
- XML is endorsed by the W3C
- XML is not a new technology
- XML is not only a data representation format
6XML as a family of technologies
- XML Information Set
- XML Schema
- XML Query
- The Extensible Stylesheet Transformation Language
(XSLT) - XML Forms
- XML Protocol
- XML Encryption
- XML Signature
- Others
- almost all the pieces needed for a reasonably
good Web Services puzzle
7Major application domains for XML
- Data exchange on the Web
- e.g.HealthCare Level Seven http//www.hl7.org/
- Application integration on the Web
- e.g. ebXML http//www.ebxml.org/
- Document exchange on the Web
- e.g. Encoded Archival Description Application
8The role of an XML query language
- Why a query language for XML ?
- Preserve the logical/physical data independence
- The semantics is described in terms of an
abstract data model, independent on the physical
data storage - Declarative programming
- Such programs should describe the what, not the
how - Why a native query language ??
- We need to deal with the specificities of XML
(hierarchical, ordered , textual, potentially
schema-less structure)
9XML query languages state of the art
- Query languages for graph data
- e.g. GOOD, GraphLog, Clean
- Query languages/scripting languages for the WEB
- e.g. WebSQL, WebOQL, WebL
- Query languages for semi-structured data
- e.g. MSL, UnQL, StruQL, YATL
10XML query languages state of the art
- Research query languages for XML
- e.g. XML-QL, Lorel, XML-GL, Quilt, Xduce
- Industry query languages for XML
- e.g. XQL, OQL extensions to query SGML documents
- W3C standard processing languages for XML
- e.g. XPath, XSLT
- Standard W3C XML Query Language Xquery
11W3C Query Working Group - History
- Sept 1999 WG creation and first F2F
- Currently 30 W3C member companies
- Twelve F2F meetings and 80 telecons so far
- Public WDs every three months
- http//www.w3.org/XML/Query
12W3C Query Working Group - Goal
- "The goal of the XML Query WG is to produce
- - an abstract data model for XML documents,
- - a set of query operators on that data model,
- - a query language based on these query
13XML Many Environments
W3C XML Query Data Model
W3C XML Query Data Model
14W3C XML Working Group - Status
- June 2001 new or revised working drafts
- XML Query Requirements
- XML Query Use Cases
- XML Query 1.0 and Xpath 2.0 Data Model
- XML Query 1.0 Formal Semantics
- Xquery 1.0 An XML Query Language
- XML Syntax for Xquery 1.0 (XqueryX)
15General XML query requirements
- Non-procedural, declarative query language
- XML syntax for query language but also a human
readable syntax - Protocol independent
- Standard error conditions
- Should not preclude updates
16XML Query Use Cases
- Use Case Organization
- Description, DTD/Schema, Input Data, Queries and
Results - Current Use Cases
- "XMP" Experiences and exemplars
- "TREE" Queries that preserve hierarchy
- "SEQ" - Queries based on sequences
- "R" - Access to relational data
- "TEXT" Full-text search
- "NS" - Queries using namespaces
- "PARTS" - Recursive computation
- "REF" - Queries based on references
17XML Abstract Data Model
- Common for Xpath 2.0 and XQuery 1.0
- A logical model composed of
- a set of logical entities
- constructors and accessors for each entity
- Based on the notion of an ordered tree
- XML data cannot be modeled as a simple tree
18XML Abstract Data Model Entities
- Nodes
- Node Document Element Attribute Text
Namespaces PI Comment - Simple values (all XML Schema simple types)
- string, boolean, ID, IDREF, decimal, QName, URI,
... - Sequences
- Errors
- Schema components
19Document Nodes Constructors Accessors
- Constructor
- document-node
- URI X Sequence1, (Element Text
PI Comment ) - -gtDocumentNode
- Accessors
- base-uri DocumentNode -gt URI
- children DocumentNode -gt
- Sequence1,(ElementNodeTextNodePIComm
ent) - string-value DocumentNode -gt string
20Attribute Nodes Constructors Accessors
- Constructor
- attribute-node Qname X string X
SchemaComponent -gt - AttributeNode
- Accessors
- name AttributeNode -gt Qname
- type AttributeNode -gt SchemaComponent
- typed-value AttributeNode -gt
Sequence(SimpleValue) - string-value AttributeNode -gt string
- parent AttributeNode -gt Sequence0,1 (Node)
21Sequences Constructors Accessors
- Constructors
- empty-sequence () -gt Sequence
- append Sequence X Sequence -gt Sequence
- Accessors
- empty Sequence -gt boolean
- head Sequence -gt UnitValue
- tail Sequence -gt Sequence
22XML data model - conclusion
- Complete with respect to XML
- Relatively simple design
- ordered trees, node-labeled, with node identity
- Semantics of the query language relies on the
data model constructors and accessors - Relationship with the other W3C XML related
standards - Clear mapping to/from the XML Infoset
- Less clear relationship with Document Object
Model (DOM) - Less clear relationship with the XML Schema and
the type system
23 The Xquery type system
- Xquerys original design had a powerful type
system (based on Xduce) - The type system can
- (1) detect statically errors in the queries
- (2) infer the type of the result of valid
queries - (3) ensure statically that the result of a given
query is of a given (expected) type if the input
dataset is guaranteed to be of a given type - Queries on types
- Big debate XML type system vs. XML Schema
24Xquery in a nutshell
- Functional language
- A query is an expression
- The result of the query is the result of the
evaluation of the expression - Expressions are evaluated in a certain
environment - Strongly typed
- Every expression has a type
- Statically typed
- The type of the result of an expression can be
detected statically - Formal semantics based on XML Abstract Data Model
- Dual syntax XML and non XML
- Influenced by SQL, OQL, XQL, Xpath, Quilt
25Xquery expressions
- Constants and variables
- expression1 operator expression2
- function(expression1,...expression2)
- XPath expressions (for navigation)
- FLWR expressions (for iteration)
- SORTBY expressions
- Quantified expressions
- Conditional expressions
- Type-related expressions
- XML node constructors (elements, attributes, etc)
- Xquery expressions can be nested with full
generality !
26First XML queries
- 11
- x
- x/title
- x/price1
- document(www.amazon.com/books.xml)
27Xquery functions and operators
- Arithmetic operators
- , -, , div, , !, lt, etc
- Logical operators
- and, not, or
- Collection oriented operators
- union, intersection, difference, empty, distinct,
count, sum, avg, min, max, etc - Global topological order related operators
- before, after, unordered
- XML specific functions
- document, name, string-value, typed-value, etc
- Many semantic open issues related to the
semantics of these operators
28Xpath expressions
- General syntax
- expression / step
- Two syntaxes abbreviated or not
- Step in the non-abbreviated syntax
- axis nodeTest
- Axis control the navigation direction in the tree
- ancestor, ancestor-or-self, attribute, child,
descendent, descendent-or-self, following,
following-sibling, namespace, parent, preceding,
preceding-sibling, self - Node test by
- Name (e.g. publisher, myNSpublisher,
publisher, myNS , ) - Type (e.g. node(), comment(), text() )
29Examples of path expressions
- document(bibliography.xml)/childbib
- x/childbib/childbook/attributeyear
- x/parent
- x/ancestor/descendentcomment()
30Semantics of XPath expressions
- Semantics of path expressions in Xpath 1.0
- (1) Ordered forests of nodes as input, ordered
forests of nodes as output - (2) For each root node in the input forest,
select the nodes in the same document that obey
to the given axis - (3) Among those select and return the ones that
satisfy the node test. - (4) No duplicates are allowed in the output
- (5) Output nodes are ordered by the document
order - (6) Nodes preserve their identity
- No type error for book/nose
- A list of lists is automatically flattened
31Xpath abbreviated syntax (1)
- Axis can be missing
- By default the child axis
- x/childperson -gt x/person
- Short-hands for common axes
- Descendent-or-self
- x/descendant-or-selfcomment() -gt
x//comment() - Parent
- x/parent -gt x/..
- Attribute
- x/attributeyear -gt x/_at_year
- Self
- x/self -gt x/.
32Xpath abbreviated syntax (2)
- Implicit root node
- root/bib -gt /bib
- root -gt /
- Implicit current node (inside in the second order
functions ) - self/title -gt ./title
- self/title -gt title
33Simple iteration expression
- Syntax
- for variable in expression1 return
expression2 - Example
- for x in document(bibliography.xml)/bib/book
- return x/title
- Semantics
- bind the variable to each root node of the forest
returned by expression1 - for each such binding evaluate expression2
- concatenate the resulting forests
- lists of lists are automatically flattened
34Local variable declaration
- Syntax
- let variable expression1 return
expression2 - Example
- let x document(bibliography.xml)/bib/book
- return count(x)
- Semantics
- bind the variable to the result of the
expression1 - add this binding to the current environment
- evaluate expression2
- remove the local variable from the environment.
35Conditional expressions
- Syntax
- if ( expression1 ) then expression2 else
expression3 - Example
- if ( book/_at_year lt1980 )
- then old book
- else new book
- Semantics
- If expression1 evaluates to true then return the
result of the evaluation of expression2 else
return the result of the evaluation of
36FLWR expressions
- Syntactic sugar that combines FOR, LET, IF
- Example
- for x in //bib/book / like the FROM in
SQL / - let y x/author / no analog in SQL
/ - where x/titleThe politics of experience
- / like
the WHERE in SQL / - return count(y) / like the SELECT
in SQL /
37FLWR expression semantics
- FLWR expression
- for x in //bib/book
- let y x/author
- where x/titleThe politics of experience
- return count(y)
- Semantically equivalent to
- for x in //bib/book
- return (let y x/author
- return if (x/titleThe politics
of experience ) - then count(y)
- else ()
- )
38More FLWR expression examples
- Selections
- for b in document("bib.xml")//book
- where b/publisher Springer Verlag" and
- b/_at_year "1998"
- return b/title
- Joins
- for b in document("bib.xml")//book,
- p in //publisher
- where b/publisher p/name
- return b/title p/address/title, p/name
39Xpath filter predicates
- Syntax
- expression1 expression2
- is an overloaded operator
- Filtering by predicate
- //book ./author/firstname ronald
- //book _at_price lt25
- //book count(author _at_genderfemale )gt0
- Filtering by position
- /book3
- /book3/author1
- /book3/author1 to 2
40Quantified expressions
- Syntax
- some variable in expression1 satisfies
expression2 - every variable in expression1 satisfies
expression2 - Examples
- some x in //book satisfies x/price gt200
- //booksome x in author satisfies
x/_at_genderfemale - for x in //book
- where every y in x/author satisfies
y/_at_genderfemale - return x/title
41SORTBY expressions
- Syntax
- expression0 SORTBY
, ., - expressionK ASCENDING DESCENDING )
- Examples
- //book sortby ( _at_price )
- //book_at_year2001/author sortby (lastname,
firstname) - for x in //book
- where empty(x/author)
- return x
- sortby (title)
42Global document order queries
- Syntax
- expression1 ( before after )
expression2 - Examples
- //section before //sectiontitlePersons and
experiences - //paragraph after //sectionnameIntroduction
- before //paragraphcontains(Xq
43Xquery element constructors
- Standard XML elements
- ltsection titlePersons and experiences gt
This is a section of the book entitled lttitlegtThe
politics of Experiencelt/titlegt written by
ltauthorgt Ronald Lainglt/authorgt. lt/sectiongt - Dynamically constructed elements
- ltsection title s/title gtThis is a section
of the book entitled s/ascendentsbook/title
written by for a ins/ascendentsbook/author
return ltauthorgt concat(a/firstname,
a,lastname) lt/authorgt .lt/sectiongt
44Complex Xquery example
- ltbibliographygt
- for x in //book_at_year2001
- return
- ltbook titlex/titlegt
- if(empty(x/author))
- then
x/editor/affiliation - else x/author
- lt/bookgt
- lt/bibliographygt
45Xquery operators on datatypes
- returns True if its first operand is an instance
of the type named in its second operand - CAST
- is used to convert a value from one datatype to
another - TREAT
- causes the query processor to treat an expression
as though its datatype were a subtype of its
static type - TYPESWITCH
- branching based on the dynamic type of the input
46Dealing with node identity
- All nodes in the data model have node identity
- Nodes identity is preserved through queries
- Two equality functions for nodes
- Value based
- Identity based
47Local function declarations
- Example
- function number_paragraphs(x nssection)
- return xsdinteger
- count(x/paragraph)
- sum(for y in x/section
- return number_paragraphs(y))
- number_paragraphs(/bib/booktitleThe politics
of experience/section1)
48 Joins in XQuery
- ltbooks-with-pricesgt
- for a in document(amaxon.xml)/book,
- b in document(bn.xml)/book
- where b/isbna/isbn
- return
- ltbookgt
- a/title
- ltprice-amazongta/pricelt/price
-amazongt, - ltprice-bngtb/pricelt/price-bngt
- lt/bookgt
- lt/books-with pricesgt
49 Left-outer joins in XQuery
- ltbooks-with-pricesgt
- for a in document(amaxon.xml)/book
- return
- ltbookgt
- a/title
- ltprice-amazongta/pricelt/price-amazongt,
- for b in document(bn.xml)/
book - where b/isbna/isbn
- return ltprice-bngtb/price
lt/price-bngt -
- lt/bookgt
- lt/books-with pricesgt
50 Full-outer joins in Xquery
- let allISBNsdistinct(document(amazon.xml)/boo
k/isbn union -
document(bn.xml)/book/isbn ) - return
- ltbooks-with-pricesgt
- for isbn in allISBNs
- return
- ltbookgt
- for a in
document(amazon.xml)/bookisbnisbn - return ltprice-amazongtb/pricelt/price-amazon
gt -
- for b in
document(bn.xml)/book isbnisbn - return ltprice-bngtb/pricelt/price-bngt
- lt/bookgt
- lt/books-with pricesgt
51Group-by and Having
- Example
- for a in distinct(//book/author/lastname)
- let books //booksome y in
author/lastnamea - where count(books)gt10
- return ltresultgt
- a/name books1 to 10
- lt/resultgt
52Views in Xquery
- Views are supported in Xquery via functions
- non-parameterized views via functions with no
arguments - parameterized views via functions with at least
one argument - Xquery supports recursive views
- unrestricted form of recursion
- Termination is not guaranteed automatically
53Many open issues
- Relationship with Xpath
- E.g. should we preserve the implicit casting
operations of Xpath 1.0? - Relationship with XML Schema
- Bi-directional mapping between the XML Schema
concepts and the Xquery type system concepts - Schema validation vs. type checking
- Name-based sub typing vs. structural sub typing
- Human readable (non XML) syntax for types ?
- Xquery functions and operators built-in library
- More sophisticated support for full text search
- and many more
54Xquery implementations
- Microsoft
- Software AG
- Kweelt
- Lucent
- Univ. Darmstad
- HiFive.com
- FatDog.com
55XML query language summary
- Expressive power
- Major functionality of XML-QL, XQL, SQL, OQL -
query the many kinds of data XML contains! - Use-case driven approach
- Can be implemented in many environments
- Traditional databases, XML repositories, XML
programming libraries, etc. - Queries may combine data from many sources
- Minimalist design
- Small, easy to understand, clean semantics ?
- A quilt, not a camel
- One language replaces DOMXPathXSLT
- Expressive, concise, easy to learn ?
- Implementable, optimizable
- Data integration for multiple sources
- Several current implementations
- Preliminary update proposal
- Future work
- Scripting language for XML
- Workflow langauge for XML
- For more informations about the W3C XML Query
Language WG activity please visit - W3C XML Query
57Some of Xquerys debates
58Procedural difficulties
- Language designed by a committee
- Hard to avoid the Camel
- Strong interaction with other W3C WG
- Not too much coordination among the W3C WG
- No preexisting global vision or architecture
(bottom up design, like the Web itself !)
59Technical argument (1)
- Problem 1 equality is not transitive nor
reflexive - x3 and x4 can evaluate to true
- xlt2 and xgt4 can also evaluate to true
- x3 and x!3 can evaluate to true
- xy and yz does not imply xz
- xx can evaluate to false
- Source
- Equality (and all the other relational operators)
has an implicit existential quantifier in Xpath
1.0 - Nasty consequences
- high probability of user errors and intense
frustration - good old query evaluation algorithms dont work
anymore - schema evolution is badly handled
60Technical argument (2)
- Problem 2 implicit data conversions
- from an element to the element content
- from an attribute to the attribute content
- from a sequence to a value (by taking the first
member) - from a typed value to string
- from a string to a typed value
- from any typed value to a Boolean
- (e.g. from a node set to Boolean)
- Examples of bad cases
- //bookprice is not the same as //bookprice0
- ltbookgt_at_pricelt/bookgt is not the same as
61Technical argument (3)
- Problem 2 implicit data conversions
- Source
- Backward compatibility with Xpath 1.0
- Dealing with the semi-structured aspect of the
data - Trying to avoid static or dynamic errors as much
as possible - Bad consequences
- the result of the evaluation of an expression can
depend on the context where the expression appear - high probability of user errors and intense
62Technical argument (4)
- Problem 3 / is not a simple projection
- (//book sortby _at_price)/title will be sorted by
document order, not by price - Source
- Backwards compatibility with Xpath 1.0
- Bad consequences
- high probability of user errors and more
frustration - / often requires materialization (for sorting
and duplicate elimination) - difficult to parallelize and stream
63You can help
- Designing such a language is VERY hard!
- Your opinion matters!
- A year from now it will be too late
- Please help reviewing the specifications and send
comments to - www-xml-query-comments_at_w3c.org
64XML update language
- Declarative update language
- XML data model tree modification
- E.g. nodes deletion, insertion, replacement
- Metadata replacement
- Built in top of the XML query language
- Initial proposal from some of the XML Query WG
members - Not an official working draft of the W3C !
- Already supported by some Xquery implementations
65XML update statements
- Simple update statements
- InsertStatement
- DeleteStatement
- RenameStatement
- ReplaceStatement
- MoveStatement
- Complex update statements
66INSERT statement
- Syntax
- insert expression1 ( into after before )
expression2 - Examples
- insert ltpublishergtMorgan Kaufmannlt/publishergt
- after //booktitleThe politics of
experience/title - insert ltcommentgtThis is a great
paragraph!lt/commentgt - before //bookauthor/lastnameLaing/section1
67DELETE statement
- Syntax
- delete expression
- Examples
- delete //book/_at_pricegt100
- delete //book1/section1 to 3/comment()
- delete //comment()
68RENAME statement
- Syntax
- rename expression as expression
- Examples
- rename //book as publication
- rename //book/_at_price as amazon_price
69REPLACE statement
- Syntax
- replace expression1 with expression2
- Examples
- replace //book1/title with lttitlegtSome new
titlelt/titlegt - replace //book1/_at_price/data() with 25.50
- replace //book1/_at_price/data() with
70MOVE statement
- Syntax
- move expression1 ( before after into )
expression2 - Examples
- move //book1/section1/paragraph2 before
- //book1/section2/paragraph1
- move //book1/_at_price into //book1/publisher