Title: XML%20queries%20and%20updates
1XML queries and updates
2Outline
- Introduction
- XML Data Model and Type System
- XML Queries
- Technical discussions on Xquery design
- XML Updates
- Conclusions
3What is XML?
- The Extensible Markup Language (XML) is the
universal format for structured documents and
data on the Web. - Base specifications
- XML 1.0, W3C Recommendation Feb '98
- Namespaces, W3C Recommendation Jan '99
4Simple XML Data Example
- ltbook year1967 xmlnsamzwww.amazon.comgt
- lttitlegtThe politics of experiencelt/titlegt
- ltauthorgtR.D. Lainglt/authorgt
- ltamzref amzisbn1341-1444-555/gt
- ltsectiongt
- The great and true Amphibian, whose
nature is disposed to.. - lttitlegtPersons and experiencelt/titlegt
Even facts become... - lt/sectiongt
- lt/bookgt
5The secrets of the XML success
- XML is a data representation format
- XML is universal
- XML is human readable
- XML is machine readable
- XML is international
- XML is platform independent
- XML is vendor independent
- XML is endorsed by the W3C
- XML is not a new technology
- XML is not only a data representation format
6XML as a family of technologies
- XML Information Set
- XML Schema
- XML Query
- The Extensible Stylesheet Transformation Language
(XSLT) - XML Forms
- XML Protocol
- XML Encryption
- XML Signature
- Others
- almost all the pieces needed for a reasonably
good Web Services puzzle
7Major application domains for XML
- Data exchange on the Web
- e.g.HealthCare Level Seven http//www.hl7.org/
- Application integration on the Web
- e.g. ebXML http//www.ebxml.org/
- Document exchange on the Web
- e.g. Encoded Archival Description Application
http//lcweb.loc.gov/ead/
8The role of an XML query language
- Why a query language for XML ?
- Preserve the logical/physical data independence
- The semantics is described in terms of an
abstract data model, independent on the physical
data storage - Declarative programming
- Such programs should describe the what, not the
how - Why a native query language ??
- We need to deal with the specificities of XML
(hierarchical, ordered , textual, potentially
schema-less structure)
9XML query languages state of the art
- Query languages for graph data
- e.g. GOOD, GraphLog, Clean
- Query languages/scripting languages for the WEB
- e.g. WebSQL, WebOQL, WebL
- Query languages for semi-structured data
- e.g. MSL, UnQL, StruQL, YATL
-
-
10XML query languages state of the art
- Research query languages for XML
- e.g. XML-QL, Lorel, XML-GL, Quilt, Xduce
- Industry query languages for XML
- e.g. XQL, OQL extensions to query SGML documents
- W3C standard processing languages for XML
- e.g. XPath, XSLT
- Standard W3C XML Query Language Xquery
-
-
11W3C Query Working Group - History
- Sept 1999 WG creation and first F2F
- Currently 30 W3C member companies
- Twelve F2F meetings and 80 telecons so far
- Public WDs every three monthsÂ
- http//www.w3.org/XML/Query
12W3C Query Working Group - Goal
- "The goal of the XML Query WG is to produce
- - an abstract data model for XML documents,
- - a set of query operators on that data model,
- - a query language based on these query
operators
13XML Many Environments
DOM
DOM
SAX
SAX
DBMS
DBMS
XQuery
W3C XML Query Data Model
W3C XML Query Data Model
XML
XML
Java
Java
COBOL
COBOL
14W3C XML Working Group - Status
- June 2001 new or revised working drafts
- XML Query Requirements
- XML Query Use Cases
- XML Query 1.0 and Xpath 2.0 Data Model
- XML Query 1.0 Formal Semantics
- Xquery 1.0 An XML Query Language
- XML Syntax for Xquery 1.0 (XqueryX)
15General XML query requirements
- Non-procedural, declarative query language
- XML syntax for query language but also a human
readable syntax - Protocol independent
- Standard error conditions
- Should not preclude updates
16XML Query Use Cases
- Use Case Organization
- Description, DTD/Schema, Input Data, Queries and
Results - Current Use Cases
- "XMP" Experiences and exemplars
- "TREE" Queries that preserve hierarchy
- "SEQ" - Queries based on sequences
- "R" - Access to relational data
- "TEXT" Full-text search
- "NS" - Queries using namespaces
- "PARTS" - Recursive computation
- "REF" - Queries based on references
17XML Abstract Data Model
- Common for Xpath 2.0 and XQuery 1.0
- A logical model composed of
- a set of logical entities
- constructors and accessors for each entity
- Based on the notion of an ordered tree
- XML data cannot be modeled as a simple tree
18XML Abstract Data Model Entities
- Nodes
- Node Document Element Attribute Text
Namespaces PI Comment - Simple values (all XML Schema simple types)
- string, boolean, ID, IDREF, decimal, QName, URI,
... - Sequences
- Errors
- Schema components
19Document Nodes Constructors Accessors
- Constructor
- document-node
- URI X Sequence1, (Element Text
PI Comment ) - -gtDocumentNode
- Accessors
- base-uri DocumentNode -gt URI
- children DocumentNode -gt
- Sequence1,(ElementNodeTextNodePIComm
ent) - string-value DocumentNode -gt string
20Attribute Nodes Constructors Accessors
- Constructor
- attribute-node Qname X string X
SchemaComponent -gt - AttributeNode
- Accessors
- name AttributeNode -gt Qname
- type AttributeNode -gt SchemaComponent
- typed-value AttributeNode -gt
Sequence(SimpleValue) - string-value AttributeNode -gt string
- parent AttributeNode -gt Sequence0,1 (Node)
21Sequences Constructors Accessors
- Constructors
- empty-sequence () -gt Sequence
- append Sequence X Sequence -gt Sequence
- Accessors
- empty Sequence -gt boolean
- head Sequence -gt UnitValue
- tail Sequence -gt Sequence
22XML data model - conclusion
- Complete with respect to XML
- Relatively simple design
- ordered trees, node-labeled, with node identity
- Semantics of the query language relies on the
data model constructors and accessors - Relationship with the other W3C XML related
standards - Clear mapping to/from the XML Infoset
- Less clear relationship with Document Object
Model (DOM) - Less clear relationship with the XML Schema and
the type system
23 The Xquery type system
- Xquerys original design had a powerful type
system (based on Xduce) - The type system can
- (1) detect statically errors in the queries
- (2) infer the type of the result of valid
queries - (3) ensure statically that the result of a given
query is of a given (expected) type if the input
dataset is guaranteed to be of a given type - Queries on types
- Big debate XML type system vs. XML Schema
24Xquery in a nutshell
- Functional language
- A query is an expression
- The result of the query is the result of the
evaluation of the expression - Expressions are evaluated in a certain
environment - Strongly typed
- Every expression has a type
- Statically typed
- The type of the result of an expression can be
detected statically - Formal semantics based on XML Abstract Data Model
- Dual syntax XML and non XML
- Influenced by SQL, OQL, XQL, Xpath, Quilt
25Xquery expressions
- Constants and variables
- expression1 operator expression2
- function(expression1,...expression2)
- XPath expressions (for navigation)
- FLWR expressions (for iteration)
- SORTBY expressions
- Quantified expressions
- Conditional expressions
- Type-related expressions
- XML node constructors (elements, attributes, etc)
- Xquery expressions can be nested with full
generality !
26First XML queries
- 11
- x
- x/title
- x/price1
- document(www.amazon.com/books.xml)
27Xquery functions and operators
- Arithmetic operators
- , -, , div, , !, lt, etc
- Logical operators
- and, not, or
- Collection oriented operators
- union, intersection, difference, empty, distinct,
count, sum, avg, min, max, etc - Global topological order related operators
- before, after, unordered
- XML specific functions
- document, name, string-value, typed-value, etc
- Many semantic open issues related to the
semantics of these operators
28Xpath expressions
- General syntax
- expression / step
- Two syntaxes abbreviated or not
- Step in the non-abbreviated syntax
- axis nodeTest
- Axis control the navigation direction in the tree
- ancestor, ancestor-or-self, attribute, child,
descendent, descendent-or-self, following,
following-sibling, namespace, parent, preceding,
preceding-sibling, self - Node test by
- Name (e.g. publisher, myNSpublisher,
publisher, myNS , ) - Type (e.g. node(), comment(), text() )
29Examples of path expressions
- document(bibliography.xml)/childbib
- x/childbib/childbook/attributeyear
- x/parent
- x/ancestor/descendentcomment()
30Semantics of XPath expressions
- Semantics of path expressions in Xpath 1.0
- (1) Ordered forests of nodes as input, ordered
forests of nodes as output - (2) For each root node in the input forest,
select the nodes in the same document that obey
to the given axis - (3) Among those select and return the ones that
satisfy the node test. - (4) No duplicates are allowed in the output
- (5) Output nodes are ordered by the document
order - (6) Nodes preserve their identity
- No type error for book/nose
- A list of lists is automatically flattened
31Xpath abbreviated syntax (1)
- Axis can be missing
- By default the child axis
- x/childperson -gt x/person
- Short-hands for common axes
- Descendent-or-self
- x/descendant-or-selfcomment() -gt
x//comment() - Parent
- x/parent -gt x/..
- Attribute
- x/attributeyear -gt x/_at_year
- Self
- x/self -gt x/.
32Xpath abbreviated syntax (2)
- Implicit root node
- root/bib -gt /bib
- root -gt /
- Implicit current node (inside in the second order
functions ) - self/title -gt ./title
- self/title -gt title
33Simple iteration expression
- Syntax
- for variable in expression1 return
expression2 - Example
- for x in document(bibliography.xml)/bib/book
- return x/title
- Semantics
- bind the variable to each root node of the forest
returned by expression1 - for each such binding evaluate expression2
- concatenate the resulting forests
- lists of lists are automatically flattened
34Local variable declaration
- Syntax
- let variable expression1 return
expression2 - Example
- let x document(bibliography.xml)/bib/book
- return count(x)
- Semantics
- bind the variable to the result of the
expression1 - add this binding to the current environment
- evaluate expression2
- remove the local variable from the environment.
35Conditional expressions
- Syntax
- if ( expression1 ) then expression2 else
expression3 - Example
- if ( book/_at_year lt1980 )
- then old book
- else new book
- Semantics
- If expression1 evaluates to true then return the
result of the evaluation of expression2 else
return the result of the evaluation of
expression3.
36FLWR expressions
- Syntactic sugar that combines FOR, LET, IF
- Example
- for x in //bib/book / like the FROM in
SQL / - let y x/author / no analog in SQL
/ - where x/titleThe politics of experience
- / like
the WHERE in SQL / - return count(y) / like the SELECT
in SQL /
37FLWR expression semantics
- FLWR expression
- for x in //bib/book
- let y x/author
- where x/titleThe politics of experience
- return count(y)
- Semantically equivalent to
- for x in //bib/book
- return (let y x/author
- return if (x/titleThe politics
of experience ) - then count(y)
- else ()
- )
-
38More FLWR expression examples
- Selections
- for b in document("bib.xml")//book
- where b/publisher Springer Verlag" and
- b/_at_year "1998"
- return b/title
- Joins
- for b in document("bib.xml")//book,
- p in //publisher
- where b/publisher p/name
- return b/title p/address/title, p/name
39Xpath filter predicates
- Syntax
- expression1 expression2
- is an overloaded operator
- Filtering by predicate
- //book ./author/firstname ronald
- //book _at_price lt25
- //book count(author _at_genderfemale )gt0
- Filtering by position
- /book3
- /book3/author1
- /book3/author1 to 2
40Quantified expressions
- Syntax
- some variable in expression1 satisfies
expression2 - every variable in expression1 satisfies
expression2 - Examples
- some x in //book satisfies x/price gt200
- //booksome x in author satisfies
x/_at_genderfemale - for x in //book
- where every y in x/author satisfies
y/_at_genderfemale - return x/title
41SORTBY expressions
- Syntax
- expression0 SORTBY
- ( expression1 ASCENDING DESCENDING
, ., - expressionK ASCENDING DESCENDING )
- Examples
- //book sortby ( _at_price )
- //book_at_year2001/author sortby (lastname,
firstname) - for x in //book
- where empty(x/author)
- return x
- sortby (title)
42Global document order queries
- Syntax
- expression1 ( before after )
expression2 - Examples
- //section before //sectiontitlePersons and
experiences - //paragraph after //sectionnameIntroduction
- before //paragraphcontains(Xq
uery)
43Xquery element constructors
- Standard XML elements
- ltsection titlePersons and experiences gt
This is a section of the book entitled lttitlegtThe
politics of Experiencelt/titlegt written by
ltauthorgt Ronald Lainglt/authorgt. lt/sectiongt - Dynamically constructed elements
- ltsection title s/title gtThis is a section
of the book entitled s/ascendentsbook/title
written by for a ins/ascendentsbook/author
return ltauthorgt concat(a/firstname,
a,lastname) lt/authorgt .lt/sectiongt
44Complex Xquery example
- ltbibliographygt
- for x in //book_at_year2001
- return
- ltbook titlex/titlegt
- if(empty(x/author))
- then
x/editor/affiliation - else x/author
-
- lt/bookgt
-
- lt/bibliographygt
45Xquery operators on datatypes
- INSTANCEOF
- returns True if its first operand is an instance
of the type named in its second operand - CAST
- is used to convert a value from one datatype to
another - TREAT
- causes the query processor to treat an expression
as though its datatype were a subtype of its
static type - TYPESWITCH
- branching based on the dynamic type of the input
data
46Dealing with node identity
- All nodes in the data model have node identity
- Nodes identity is preserved through queries
- Two equality functions for nodes
- Value based
- Identity based
47Local function declarations
- Example
- function number_paragraphs(x nssection)
- return xsdinteger
- count(x/paragraph)
- sum(for y in x/section
- return number_paragraphs(y))
- number_paragraphs(/bib/booktitleThe politics
of experience/section1)
48 Joins in XQuery
- ltbooks-with-pricesgt
- for a in document(amaxon.xml)/book,
- b in document(bn.xml)/book
- where b/isbna/isbn
- return
- ltbookgt
- a/title
- ltprice-amazongta/pricelt/price
-amazongt, - ltprice-bngtb/pricelt/price-bngt
- lt/bookgt
-
- lt/books-with pricesgt
-
49 Left-outer joins in XQuery
- ltbooks-with-pricesgt
- for a in document(amaxon.xml)/book
- return
- ltbookgt
- a/title
- ltprice-amazongta/pricelt/price-amazongt,
- for b in document(bn.xml)/
book - where b/isbna/isbn
- return ltprice-bngtb/price
lt/price-bngt -
- lt/bookgt
-
- lt/books-with pricesgt
-
50 Full-outer joins in Xquery
- let allISBNsdistinct(document(amazon.xml)/boo
k/isbn union -
document(bn.xml)/book/isbn ) - return
- ltbooks-with-pricesgt
- for isbn in allISBNs
- return
- ltbookgt
- for a in
document(amazon.xml)/bookisbnisbn - return ltprice-amazongtb/pricelt/price-amazon
gt -
- for b in
document(bn.xml)/book isbnisbn - return ltprice-bngtb/pricelt/price-bngt
-
- lt/bookgt
-
- lt/books-with pricesgt
51Group-by and Having
- Example
- for a in distinct(//book/author/lastname)
- let books //booksome y in
author/lastnamea - where count(books)gt10
- return ltresultgt
- a/name books1 to 10
- lt/resultgt
-
52Views in Xquery
- Views are supported in Xquery via functions
- non-parameterized views via functions with no
arguments - parameterized views via functions with at least
one argument - Xquery supports recursive views
- unrestricted form of recursion
- Termination is not guaranteed automatically
53Many open issues
- Relationship with Xpath
- E.g. should we preserve the implicit casting
operations of Xpath 1.0? - Relationship with XML Schema
- Bi-directional mapping between the XML Schema
concepts and the Xquery type system concepts - Schema validation vs. type checking
- Name-based sub typing vs. structural sub typing
- Human readable (non XML) syntax for types ?
- Xquery functions and operators built-in library
- More sophisticated support for full text search
- and many more
54Xquery implementations
- Microsoft
- Software AG
- Kweelt
- Lucent
- Univ. Darmstad
- HiFive.com
- FatDog.com
55XML query language summary
- Expressive power
- Major functionality of XML-QL, XQL, SQL, OQL -
query the many kinds of data XML contains! - Use-case driven approach
- Can be implemented in many environments
- Traditional databases, XML repositories, XML
programming libraries, etc. - Queries may combine data from many sources
- Minimalist design
- Small, easy to understand, clean semantics ?
- A quilt, not a camel
56Conclusion
- One language replaces DOMXPathXSLT
- Expressive, concise, easy to learn ?
- Implementable, optimizable
- Data integration for multiple sources
- Several current implementations
- Preliminary update proposal
- Future work
- Scripting language for XML
- Workflow langauge for XML
- For more informations about the W3C XML Query
Language WG activity please visit - W3C XML Query
57Some of Xquerys debates
58Procedural difficulties
- Language designed by a committee
- Hard to avoid the Camel
- Strong interaction with other W3C WG
- Not too much coordination among the W3C WG
- No preexisting global vision or architecture
(bottom up design, like the Web itself !)
59Technical argument (1)
- Problem 1 equality is not transitive nor
reflexive - x3 and x4 can evaluate to true
- xlt2 and xgt4 can also evaluate to true
- x3 and x!3 can evaluate to true
- xy and yz does not imply xz
- xx can evaluate to false
- Source
- Equality (and all the other relational operators)
has an implicit existential quantifier in Xpath
1.0 - Nasty consequences
- high probability of user errors and intense
frustration - good old query evaluation algorithms dont work
anymore - schema evolution is badly handled
60Technical argument (2)
- Problem 2 implicit data conversions
- from an element to the element content
- from an attribute to the attribute content
- from a sequence to a value (by taking the first
member) - from a typed value to string
- from a string to a typed value
- from any typed value to a Boolean
- (e.g. from a node set to Boolean)
- Examples of bad cases
- //bookprice is not the same as //bookprice0
- ltbookgt_at_pricelt/bookgt is not the same as
ltbookgt_at_price0lt/bookgt
61Technical argument (3)
- Problem 2 implicit data conversions
- Source
- Backward compatibility with Xpath 1.0
- Dealing with the semi-structured aspect of the
data - Trying to avoid static or dynamic errors as much
as possible - Bad consequences
- the result of the evaluation of an expression can
depend on the context where the expression appear - high probability of user errors and intense
frustration
62Technical argument (4)
- Problem 3 / is not a simple projection
- (//book sortby _at_price)/title will be sorted by
document order, not by price - Source
- Backwards compatibility with Xpath 1.0
- Bad consequences
- high probability of user errors and more
frustration - / often requires materialization (for sorting
and duplicate elimination) - difficult to parallelize and stream
63You can help
- Designing such a language is VERY hard!
- Your opinion matters!
- A year from now it will be too late
- Please help reviewing the specifications and send
comments to - www-xml-query-comments_at_w3c.org
64XML update language
- Declarative update language
- XML data model tree modification
- E.g. nodes deletion, insertion, replacement
- Metadata replacement
- Built in top of the XML query language
- Initial proposal from some of the XML Query WG
members - Not an official working draft of the W3C !
- Already supported by some Xquery implementations
65XML update statements
- Simple update statements
- InsertStatement
- DeleteStatement
- RenameStatement
- ReplaceStatement
- MoveStatement
- Complex update statements
66INSERT statement
- Syntax
- insert expression1 ( into after before )
expression2 - Examples
- insert ltpublishergtMorgan Kaufmannlt/publishergt
- after //booktitleThe politics of
experience/title - insert ltcommentgtThis is a great
paragraph!lt/commentgt - before //bookauthor/lastnameLaing/section1
/paragraph2
67DELETE statement
- Syntax
- delete expression
- Examples
- delete //book/_at_pricegt100
- delete //book1/section1 to 3/comment()
- delete //comment()
68RENAME statement
- Syntax
- rename expression as expression
- Examples
- rename //book as publication
- rename //book/_at_price as amazon_price
69REPLACE statement
- Syntax
- replace expression1 with expression2
- Examples
- replace //book1/title with lttitlegtSome new
titlelt/titlegt - replace //book1/_at_price/data() with 25.50
- replace //book1/_at_price/data() with
//book1/_at_price/data()5
70MOVE statement
- Syntax
- move expression1 ( before after into )
expression2 - Examples
- move //book1/section1/paragraph2 before
- //book1/section2/paragraph1
- move //book1/_at_price into //book1/publisher