Title: XML Programming Techniques
1XML Programming Techniques
- Daniela Florescu, OracleDonald Kossmann, ETH
2Why this tutorial?
- Has XML changed the way we build apps?
- No! (just another layer made things worse!)
- Should XML change the way we build apps?
- Yes! (our hypothesis)
- So what are the options/tradeoffs?
3Overview
- Introduction
- Applications Architectures
- Interfaces to existing languages (Java, .NET, )
- XML APIs SAX, DOM, StaX
- Codegenerators JAXB 2.0, XML Beans, SDO, EMF
- Extensions to existing programming languages
- JavaScript (ECMA), AJAX, PHP
- SQL/XML
- Microsofts XLinq
- Native XML Programming Languages
- Domain-specific languages BPEL
- Pure XML Type System XQuery, XSLT, XQueryP
- Research Curl, XL, Xduce, Links, XQuery!, SIMKIN
- Comparison of existing solutions
4Killer Advantages of XML
- Platform/vendor independent, international
(UNICODE) - Human and machine readable
- Serialization of data
- Hype and people
- Tools and human resources available
- Standardization, secure investment
- Family of technologies
- XQuery, XML Schema, SOAP, XQuery, WS Security,
- (all building blocks for SOA)
- XML is not new!
- Best of breed from OO, DB, Documents, Distr.
Systems,
5Killer Advantages
- Decouple Data from Application
- Data lives longer than code (legacy problem)
- Data first, schema later (pay as you go along)
- Spectrum unstructured to structured data
- Potentially all data
- Pay as you go along
- Spectrum data, meta-data, code
- Potentially all information
- Avoid technology jungle one size fits all
6Some Problems of XML
- Not complete pieces of puzzle missing
- RDF Compatibility, Programming,
- Bottom-up standardization
- Bottom-up product development
- Too much fluff
- Do you need processing instructions?
- No references, no support for NM relationships
- No design methodology
- ER / UML were not designed for XML
- Some things are good and bad
- Lexical and binary representation of data
- All data are context-sensitive (no cutpaste!)
7Why is programming for XML different?
- XML is not based on entities relationships
- XML decouples data from its interpretation
- Data first, schema later
- Spectrum unstructured to structured data
- Spectrum data, meta-data, code
- Dont burry killer advantages of XML in
programming language!
8Typical XML Applications
- Blogs RSS, Atom
- Why XML Platform-independent, serialization,
structure-unstructured data - Unused potential RSS as a building block of any
streaming application - EAI Web Services, Rest
- Why XML family of standards, serialization,
platform-independent, machine readable - Unused potential performance, declarative
programming, strong typing
9Typical XML Applications (ctd.)
- Office OpenOffice, Microsoft Office
- Why XML structured-unstructured data, hype
- Unused potential ???
- Scientific Data
- Why XML data first/schema later, hype,
strucutre-unstructured data - Unused potential ???
- Eclipse (XMI), Configuration Files
- Why XML XML is not new, human readable,
data/code/metadata - Unused potential data first/schema later
10XML Architectures
XML
Objects
SQL
- XML another layer for comm. presentation
- Leave everything else as before
- XML makes things worse (another layer)
- More marshalling, more logging, more complexity
11XML Architectures
SQL
XML
Objects
- Common runtime ideally no marshalling
- Exploit best of all worlds
- Not clear how to do the cut
- Example Microsoft LINQ
12XML Architectures
XML
- XML used by different components at different
layers for different purposes - Examples Eclipse, PHP (most frameworks)
13XML Architectures
XML
XML
XML
XML
XML
XML
XML
- XML everywhere and nowhere
- Example WebLogic, WebSphere
14XML Architectures
XML
- XML everywhere
- Only a little bit of native code
- Jim Gray Extremist Approach (ACM Queue)
- Example XQuery, XQueryP
15What is right for me?
- How deep does the XML go into architecture?
- Wrap XML as an additional layer
- How big is wrapper compared to rest of code?
- Am I too lazy to learn a new language?
- Cost to train people, how safe is that investment
- What tools support my SE process?
- Do I have a methodology for the XML app?
- What application? What computations?
- What kind of XML data?
- Persistent, data on the wire, typed, distributed,
... - What kind of XML data model?
- Serialized XML, Infoset, PSVI, XDM, ...
16What is right for me?
- Optimizability, performance
- Cost for data marshalling
- Can I stream data no need to parse whole message
- Do things several times (e.g., logging, checking
integrity) - Productivity of programmers
- Technology jungle vs. one unified model
- Optimization, logging, ... are all automatic
focus on application logic and not on mundane
tasks - Static typing of programs
- programming style (declarative vs. Imperative)
- Standard compliance W3C XML family
- Other domain-specific goodies
- Support for push / events, error handling,
logging, asynchronous computation, - Exploits / exposes killer advantages of XML
- XML syntax?
17Overview
- Introduction
- Applications Architectures
- Interfaces to existing languages (Java, .NET, )
- XML APIs SAX, DOM, StaX
- Codegenerators JAXB 2.0, XML Beans, SDO, EMF
- Extensions to existing programming languages
- JavaScript (ECMA), AJAX, PHP
- SQL/XML
- Microsofts XLinq
- Native XML Programming Languages
- Domain-specific languages BPEL
- Pure XML Type System XQuery, XSLT, XQueryP
- Research Curl, XL, Xduce, Links, XQuery!, SIMKIN
- Comparison of existing solutions
18Overview of XML APIs
- DOM
- Any XML application, updates navigational read
- SAX
- Low level XML processing, no updates, only
forward nav. - StaX (JSR 173), XMLPullParser
- Low level like SAX, but pull (instead of push)
- TokenIterator (BEA XQuery processor)
- Like JSR 173, but full support for XQuery data
model - XQJ / JSR 225
- Standard for Java interface for XQuery results
- Microsoft XMLReader Streaming API
- Microsofts streaming XML interface
- (Many more that I have omitted.)
19Classification Criteria
- Navigational access?
- Random access (by node id)?
- Decouple navigation from data reads?
- Updates?
- Infoset or XQuery Data Model?
- Target programming language?
- Target data consumer?
20Decoupling
- Idea
- methods to navigate through data (XML tree)
- methods to read properties at current position
(node) - Example DOM (tree-based model)
- navigation firstChild, parentNode, nextSibling,
- properties nodeName, getNamedItem,
- (updates createElement, setNamedItem, )
- Assessment
- good read parts of document, integrate existing
stores - bad materialize temp. query results,
transformations
21Non Decoupling
- Idea
- Combined navigation read properties
- Special methods for fast forward, reverse
navigation - Example TokenIterator (token stream)
- Token getNext(), void skipToNextNode(),
- Assessment
- good less method calls, stream-based processing
- good integration of data from multiple sources
- bad difficult to wrap existing XML data sources
- bad reverse navigation tricky, difficult
programming model
22Classification of APIs
23Summary XML APIs
- Good programmers stay in their world
- Bad APIs are clumsy (not declarative)
- Bad no logical/physical data independence
- Bad APIs require data marshalling
- Programming via XML APIs extreme case
- How deep XML goes into architecture
- How lazy am I to learn a new language
24Overview
- Introduction
- Applications Architectures
- Interfaces to existing languages (Java, .NET, )
- XML APIs SAX, DOM, StaX
- Codegenerators JAXB 2.0, XML Beans, SDO, EMF
- Extensions to existing programming languages
- JavaScript (ECMA), AJAX, PHP
- SQL/XML
- Microsofts XLinq
- Native XML Programming Languages
- Domain-specific languages BPEL
- Pure XML Type System XQuery, XSLT, XQueryP
- Research Curl, XL, Xduce, Links, XQuery!, SIMKIN
- Comparison of existing solutions
25Code Generators
- Idea
- Input XML Schema (XSD)
- Output Code in target language (mostly Java)
- Examples
- JAXB XML lt-gt Java Objects (Un-)Marshalling
- Given XML and Java Class, automatic translation
- Many similar open source projects (e.g. Castor)
- XML Beans Java getters and setters for XML
- Compiles Java interfaces based on XSD
- Implements an XML Store XPath/XQuery access
- Open Source, but owned by BEA
- SDO, EMF (see next slides)
26Eclipse Modeling Framework (EMF)
- Background Model Driven Architecture
- Idea compile (Java) code from model
- EMF supports the following models
- UML 2.0 diagrams (e.g., IBM Rational Rose)
- XMI (XML Metadata Interchange)
- Annotated Java
- XML Schema (but restricted!!!)
- Reference http//www.eclipse.org/emf
27EMF ECore and EObject
- ECore is a meta model
- Model to describe models
- All models (UML, etc.) are described with ECore
- Analogon XML Schema
- EObject is a model to represent instances
- All instances (Java objects) implement EObject
- Analogon XML instance
28XML Schema vs. ECore
XML Schema
ECore
describes
XMLSchema.xsd
ECore.ecore
ECore.xsd
XMLSchema.ecore
29EMF from UML Example
- UML 2.0 Class Diagram
- Generated Java Code
- Public interface BankAccount extends EObject
- String getOwner()
- void setOwner(String value)
- double getBalance()
- void setBalance()
-
- Generated code is annotated can be manually
extended, regenerated - Generates interfaces implementation (i.e.,
class) - Very big community (!)
30EMF from XSD
- ltxsdschema targetNamespace
- xmlnsxsdgt
- ltxsdcomplexType nameBankAccountgt
- ltxsdsequencegt
- ltxsdelement nameowner typexsdstring/gt
- ltxsdelement namebalance typexsddouble/gt
- lt/xsdsequencegt
- lt/xsdcomplexTypegt
- lt/xsdschemagt
- Creates the same Java (interface class)
- Works for simple cases
- Does not work for complex XML Schemas
- Generated Java not always equivalent to XML Schema
31Summary EMF
- Very popular in MDA community
- If you believe in MDA, here you go
- Technical advantages
- References are part of ECore (fixes XML bug)
- ECore shares some of the XML advantages
- EObjects are strongly typed
- Technical disadvantages (common to all CGs)
- Does not support whole XML Schema
- Does not support declarative programming
- Optimizability alla DB is not likely to happen
- Platform Java Eclipse
- If you hate Microsoft, here you go
- Code Generators XML APIs (productivity)
- Schema-based static typing, data independence
32SDO, ADO.NET
- SDO service data objects (J2EE platform)
- BEA, IBM, Oracle et al.
- ADO ActiveX data objects (.NET platform)
- Microsoft
- Uniform access to data from different sources
- Inparticular XML, Web sources
- Java or C interface to access any kind of data
- Protocol for disconnected client/server access
- Client propagates change lists to server
- Implementation IBMs SDO on top of EMF
- Conceived by IBM as an extension of EMF
- wrt. XML binding, similar tradeoffs as EMF
33Overview
- Introduction
- Applications Architectures
- Interfaces to existing languages (Java, .NET, )
- XML APIs SAX, DOM, StaX
- Codegenerators JAXB 2.0, XML Beans, SDO, EMF
- Extensions to existing programming languages
- JavaScript (ECMA), AJAX, PHP
- SQL/XML
- Microsofts XLinq
- Native XML Programming Languages
- Domain-specific languages BPEL
- Pure XML Type System XQuery, XSLT, XQueryP
- Research Curl, XL, Xduce, Links, XQuery!, SIMKIN
- Comparison of existing solutions
34ECMAScript JavaScript JScript
- History
- Started 1995 Sun and Netscape
- March 1996 Netscape Navigator 2.0
- August 1996 Microsoft IE 3.0 (JScript)
- June 1997, 1998 first standards (ECMAScript)
- Dec. 1999 ECMA-262 (current version)regular
expressions, formatting, try/catch, - June 2004, Dec 2005 E4X (ECMAScript for XML)
- Purpose
- enliven Web pages (dynamic Web-based GUIs)
- Scripting language for experts and users
- http//www.ecma-international.org
35ECMAScript Overview
- object-based language (not fully OO)
- Object have properties (e.g., name, balance)
- Properties contain objects, primitives, methods
- Primitives e.g., Boolean, String, null
- Properties have attributes (e.g., ReadOnly)
- Objects are created through constructors
- Constructors use prototypes
- Built-in objects Object, Array, Function,
- Example objects pop-up, menu, text field,
- Event-based language (there is no main)
- Attach code to events (mouse, errors, aborts, )
- Syntax resembles Java, C, Self
36E4X (ECMA-357)
- Simplify access and manipulation of XML
- DOM conceived as too clumsy
- XML is a primitive (like String, Boolean, )
- var x new XML()
- x ltBankAccountgt ltowner id4711gtD.
Ducklt/ownergt - ltbalance currEURgt123.54lt/balancegt
- lt/BankAccountgt
-
37E4X
- Access to elements
- Child access .
- x.balance
- Attribute axis ._at_
- x.balance._at_curr
- Iteration
- var total 0
- for each (x in allBankAccounts.BankAccount)
- total x.balance
- Updates
- Delete nodes
- delete x.comment
- Insert nodes
- x.comment ltcommentgtblablalt/commentgt
38AJAX Asyn. JavaScript and XML
- Goal fine-grained interaction between Web
browser and Web server - Faster, more interactive, user-friendly Web GUI
- Web GUI should be as powerful as desktop GUI
- Idea Exploit JavaScript, HTTP and XML
- JavaScript has methods to invoke HTTP requests
- AJAX uses XML to ship data from/to server
- Why so successful?
- Nothing new it is all there already
- Just do it!
39AJAX Example
- HTML Form
- ltformgt Product
- ltinput type"text" idpname" onkeyupautoComp(
this.value)/gt - lt/formgt
- JavaScript
- function autoComp(str)
- var urlwww.myapp.com/pname.do?"p"str
xmlHttp.open("GET", url , true) - xmlHttp.send(null)
40PHP
- Compile first, execute later interpreter
- Compiles into intermediate language
- Executes opcodes (might contain a lot of
functionality) - Dynamically typed language
- Types include integer, float, boolean, string,
array (hash), object, null - Type juggling is automatic at runtime based on
context
41PHP Accessing XML
- Treats XML values as if they were native PHP
types - Takes advantage of the new Zend Engine II
Overloading API - Takes advantage of the dynamic nature of PHP
- Uses Gnome projects libxml2 library
42Simple Access to XML
43Proposal XML Content Store
- Goals
- Process and manage XML data from many sources
web services, RSS feeds, messages, configuration
files, user data - Create an API to abstract CRUD details
- Results
- Allow for rapid application design without
worrying about tedious persistence details - Implementation Example
- API PHP
- Persistence Layer Upcoming Release of DB2,
code-named Viper, with Native XML support
44Summary
- JavaScript, AJAX, PHP are very popular
- Essential building block of Web 2.0
- Good mature platforms, great community
- Good domain-specific goodies
- E4X and PHP provide native support for XML
- XML data type
- Syntax to access and manipulate XML
- E4X, PHP are not compatible with standards
- they argue that this is a feature
- Bad but, do miss some of the XML advantages
45Overview
- Introduction
- Applications Architectures
- Interfaces to existing languages (Java, .NET, )
- XML APIs SAX, DOM, StaX
- Codegenerators JAXB 2.0, XML Beans, SDO, EMF
- Extensions to existing programming languages
- JavaScript (ECMA), AJAX, PHP
- SQL/XML
- Microsofts XLinq
- Native XML Programming Languages
- Domain-specific languages BPEL
- Pure XML Type System XQuery, XSLT, XQueryP
- Research Curl, XL, Xduce, Links, XQuery!, SIMKIN
- Comparison of existing solutions
46Processing XML with SQL
- Mapping XML data into tuples in a relational
database, then use (a variant of) SQL - User-controlled shredding, then use classical SQL
- Model driven shredding (Florescu, Kossmann, 99)
- Edge, binary approach alternatives
- Corresponds to generic APIs (e.g. DOM) for Java
- Plus very general integrates well with
relational data - Minus poor performance
- Schema based shredding (Shanmugasundaram et al.
99) - Map XML Schema / DTD to SQL DDL
- Plus integrates well with relational data
- Minus  missing tools, complicated
- Automatic shredding, then use SQL/XML
- Plus usability, logical/physical data
independence - Minus less user control
47History of SQL / XML
- First edition part of SQL2003
- Part 14 of the SQL standard
- Pre-dates XQuery standard
- Limited functionality - storage and publishing
- Second edition work in progress
- More complete integration of XQuery XQuery
Data Model - Advanced Query capabilities
- Expected to be published in 2006
48XML Type in SQL
- A new type (like varchar, date, numeric)
- SQL2003 - XML type restricted to
- XML document or
- XML element or
- Sequence of XML elements
- SQL / XML, 2nd edition
- Full support of XQuery Data Model
- XML(SEQUENCE), XML(ANY CONTENT), ...
49Example (SQL2003)
- create table books(
- title varchar(20),
- authors XML)
No schema validation, no typing!
50XML View on Relational Data
Phantasy-People
SELECT XMLGEN( ltPerson id Id gt Name
lt/Persongt) as Person FROM Phantasy-People
51XML View on XML Data
SELECT Title, XMLGEN(ltpagtAuthors1/text()lt/pagt
) as PrimA FROM MyAuthors
52XMLAGG
SalesTable
SELECT Product, XMLAGG( XMLELEMENT(NAME S,
Sales)) AS AllSales FROM SalesTable GROUP BY
Product
53SQL / XML 2nd Edition
- XML datatype will support XQuery data model
- XML(UNTYPED CONTENT) old XML infoset model
- XML(SEQUENCE) holds heterogeneous sequences
- ... (other parameterized types validated data
possible! Non well-formed XML data possible,
too.) - Full XML Schema support and validation
- XMLQuery() function
- create XML content using XQuery
- XMLTable() function
- Shred XML to rel. Data using Xquery
- Mapping between SQL XQuery data model
- XMLCAST between XML and SQL types
54XMLExists
- SELECT Title FROM books
- WHERE
- XMLEXISTS(Authors, //author et al.)
- Explicit PASSING also possible (see XMLQuery)
55XMLQuery expression
- SQL Expression use in select for constructing
XML - select XMLQuery(
- for i in ./PurchaseOrder
- where i/PoNo j/val
- return i//Item
- passing p.pocol ,
- xmlelement(val,2100) as j
- returning content)
- from purchaseorder p
- ltItem itemno21gtltQuantitygt200lt/Quantitygt..lt/Item
gt - ltItem itemno22gtltQuantitygt22lt/Quantitygt..lt/Itemgt
Pocol maps to default item
XMLElement value maps to j
56XMLTable construct
- Used in FROM clause translate XML into
relational data - Splits up result into SQL columns, passing always
BY REF - select items.pos, items.itemno, items.quantity
- from purchaseorder p,
- XMLTable(for i in /PurchaseOrder//Items
- where i/Quantity gt 200
- return i passing p.pocol
- columns pos for ordinality,
- itemno number
path ItemNo - quantity number
DEFAULT 0 path Quantity - ) items
- POS ITEMNO QUANTITY
- ------ ----------- ------------
- 1 21 21
- 2 22 0
Relational columns returned in result
Ordinality returns sequential position
Default value is used If path does not
return value
57SQL/XML
- Good
- Takes advantage of the entire SQL infrastructure
(e.g. triggers, PL/SQL) - Transactional support
- Scalability, clustering, reliability
- Global optimization (XML and relational)
- Standard implemented and supported by Microsoft,
Oracle, IBM, DataDirect, etc - Bad
- Requires data to be loaded in the database
- not good for temporary XML data
- not worth the effort for small volumes of data
- database complex component, hard to fit in an
architectural diagram - Blend of the two languages (SQL, XQuery) isnt
natural, easy to use - XQuery not supported entirely by database engines
- Not XML updates a la XQuery yet
58Overview
- Introduction
- Applications Architectures
- Interfaces to existing languages (Java, .NET, )
- XML APIs SAX, DOM, StaX
- Codegenerators JAXB 2.0, XML Beans, SDO, EMF
- Extensions to existing programming languages
- JavaScript (ECMA), AJAX, PHP
- SQL/XML
- Microsofts XLinq
- Native XML Programming Languages
- Domain-specific languages BPEL
- Pure XML Type System XQuery, XSLT, XQueryP
- Research Curl, XL, Xduce, Links, XQuery!, SIMKIN
- Comparison of existing solutions
59Xlinq in .NET
- http//msdn.microsoft.com/data/linq/
XLinq
DLinq
Declarative access to persistent relational data
Declarative access to transient XML data
Standard Query Operators
.NET Common Language Integration
C
Visual Basic
60XLinq main concepts
- XML type added as a basic type (C, VB)
- Infoset, no typed data
- No support for the XML Data Model (XDM)
- Temporary, not persistent XML data
- Library of basic XML manipulation functions (e.g.
navigation, construction) - Basic .NET Standard Query Operators
- Collection-oriented set of operations
- Second order
- General, not XML specific
- High level syntax similar to SELECT-FROM-WHERE
- Natively integrated with the language, not
through APIs - Goal eliminate the need for DOM processing
61.NET Standard Query Operators
- Set of second order operators
- similar to the relational algebra
- Work on all ordered collections in .NET
- In particular, they work on collections of XML
elements - Build your own algebraic query execution plan by
hand ! -
62.NET Standard Query Operators
- Where(selectFunction)
- Items.Where(i gt i.price lt100)
- Select(mappingFunction)
- Products.Select(p gt new p.name, p.price)
- SelectMany(mappingFunction)
- Customers.SelectMany(c gt c.orders)
- Take, Skip
- Products.OrderByDescending(p gt p.price).Take(3)
- TakeWhile, SkipWhile(predicate)
- Products.OrderByDescending(p gt
p.price).TakeWhile(p gt p.pricelt100)
63.NET Standard Query Operators
- Join(outer, inner, outerKeySelection,
innerKeySelection, resultSelector) - Customers.Join(orders, c gt c.CustomerID, o gt
o.CustomerID, (c, o) gt new c.name, o.Total) - GroupJoin(outer, inner, outerKeySelection,
innerKeySelection, resultSelector) - Customers.GroupJoin(orders, c gt c.CustomerID, o
gt o.CustomerID, (c, co) gt new c.name,
co.Sum(ogto.Total)) - OrderBy(comparisonFunct), ThenBy(ComparisonFunct)
- Collection.OrderBy().ThenBy().ThenBy()
64.NET Standard Query Operators
- GroupBy(collection, keySelector)
- GroupBy(collection, equalityComparer)
- Distinct, Union, Intersect, Except
- Based on GetHashCode and Equals
- ToDictionary(collection, keySelector)
- Creates a one-to-one dictionary
- ToLookup(collection, keySelector)
- Creates a one-to-many dictionary
- Any(collection, predicate), All(collection,
predicate) - products.Any(p gt p.pricegt100)
- Sum, Count, Min, Max, Average, Aggregate
65Constructing XML data
- C, VB (nested) functional notation
- new XMLElement(person,
- new XMLAttribute(age, 45),
- new XMLElement(name, Patrick Hines),
- new XMLElement(phone, 425-555-0144))
- VB 9.0 inlined XML with dynamic content
- ltcontactgt
- ltnamegtltmyNamegtltnamegt
- lt/contactgt
66A more complex example
- new XMLElement(contracts, contracts.
- Where(c gt c.address.city New York).
- OrderBy(c gt c.age).
- Select(c gt new XMLElement(contact,
- new XMLElement(name,
c.name), - new XMLElement(phone,
c.phone)))
Linq works across data models (objects, tuples,
XML)
67Navigation primitives in XLinq
- Similar to the path axes in Xpath 1.0
- Nodes() retrieves all the children
- Elements() retrieves all elements children
- Elements(name) selects children elem. by name
- Attributes()
- Parent()
- Descendents()
- Etc
68Updating primitives in XLinq
- Add()
- add new content to an existing XML tree
- Remove()
- Delete nodes from a tree
- ReplaceContent()
- Replaces the content of a node
- SetElement()
- Particular case of ReplaceContent
- SetAttribute()
69Declarative XML querying in XLinq
- Select-From-Where style syntax directly supported
C 3.0 (no API barrier) - Can be logically mapped into a combination of
query operators (see above) - from c in contacts.Elements(contact),
- average contacts.Elements(contact).
- Average(x gt (int)
x.Element(netWorth)) - where (int) c.Element(netWorth) gt average
- orderBy (string) c.Element(name)
- select c
70Conclusion on XLinq
- Good
- Usability for .NET developers (simple tasks)
- Integration with the rest of .NETs tools and
libraries - Bad
- No support for typed data
- No static analysis
- No schema based static typing
- No optimization based on static knowledge
- Blend of imperative and declarative code
problematic - Semantics lazy evaluation
- Semantics error handling
- Semantics imperative and and or are
non-commutative - Optimization global dataflow analysis hard
- Optimization streaming and indexing are explicit
71Overview
- Introduction
- Applications Architectures
- Interfaces to existing languages (Java, .NET, )
- XML APIs SAX, DOM, StaX
- Codegenerators JAXB 2.0, XML Beans, SDO, EMF
- Extensions to existing programming languages
- JavaScript (ECMA), AJAX, PHP
- SQL/XML
- Microsofts XLinq
- Native XML Programming Languages
- Domain-specific languages BPEL
- Pure XML Type System XQuery, XSLT, XQueryP
- Research Curl, XL, Xduce, Links, XQuery!, SIMKIN
- Comparison of existing solutions
72WS-BPEL
- Web Service Business Process Execution Language
(version 2.0) - OASIS, May 2006 working draft
- Not a general purpose programming language
- Designed for a specific task
- Specification of the implementation of a Web
Service created by the composition and
orchestration of other Web Services - Created by logically merging two previous XML
programming languages - WSFL (IBM)
- Xlang (Microsoft)
- Implemented by Microsoft, Oracle, IBM, SAP
73WS-BPEL programs
pickup notification
ship order
receive place order send ship order if(shipComplet
ed) send order notice (completed) else
send order notice (!completed) receive update
notification update ship history receive
invoice send invoice response receive payment
confirmation send order confirmation
place order
order confirmation
receive invoice send invoice respond
payment confirmation
74Main concepts
- Traditional workflow concepts adapted to the
reality of XML and Web Services - Ports, messages and operations (WSDL)
- Describe the external interface of the process
- Activities
- Describe how various components are assembled
into complex execution logic - Variables
- Internal state of the program
- Error and compensation handlers
- Describe the behavior in case of dynamic faults
- Correlation sets
- To describe how various process instances
participate in complex conversations - Scopes
75WS-BPEL query and expression languages
- XML data model, query language and expression
language are black boxes for the main language - By default Infoset (untyped data) and Xpath 1.0
- Uses XSLT 1.0 for data transformation
(doXslTransform) - Allows other data models and languages
- XDM (XQuery Data Model)
- Xpath 2.0
- XQuery
- ltassigngt
- ltcopygt
- ltfromgt po/lineItem_at_prodCodemyProd/amtex
chRatelt/fromgt - lttogt convertPO/lineItem_at_prodCodemyPro
d lt/togt - ltcopygt
- lt/assigngt
76WS-BPEL simple activities
- assign and copy
- invoke
- receive
- throw
- wait
- empty
- exit
- user defined activities (extensibility mechanism)
77WS-BPEL structured activities
- sequence
- if
- while
- repeatUntil
- pick
- selectively choosing an activity
- flow
- for parallel and control dependency processing
- forEach
78WS-BPEL active behavior
- Each scope can have event handlers
- They execute concurrently
- They start when the parent scope starts
- OnEvent
- Waiting for a particular type of message
- OnAlarm
- For (duration value), until (specific point in
time) - repeatEvery
79WS-BPEL error handling
- Support for Long Running Transactions
- Mechanism for specifying the compensation logic
(sagas) - Compensation handlers associated with scopes
80Compensation example
- ltscopegt
- ltcompensationHandlergt
- ltinvoke partnerLinkSeller portTypePurchasing
- operationCancelPurchase
inputVariablegetResponse - outputVariablegetConfirmation
gt - ltcorrelationsgt
- ltcorrelation setPurchaseOrder
patternrequest/gt - lt/correlationsgt
- lt/invokegt
- lt/compensationHandlergt
- ltinvoke partnerLinkSeller portTypePurchasing
- operationPurchase
inputVariablesendPurchaseOrder - outputVariablegetResponsegt
- ltcorrelationsgt
- ltcorrelation setPurchaseOrder
patternrequest initiateyes/gt - lt/correlationsgt
- lt/invokegt
- lt/scopegt
81WS-BPEL conclusion
- Good
- Easy specification of Web Services orchestration
- High level
- Useful constructs (parallelism, compensation,
events, etc) - Bad
- Separation between control flow and
expression/query language - Impact on static typing, automatic optimization,
usability
82Overview
- Introduction
- Applications Architectures
- Interfaces to existing languages (Java, .NET, )
- XML APIs SAX, DOM, StaX
- Codegenerators JAXB 2.0, XML Beans, SDO, EMF
- Extensions to existing programming languages
- JavaScript (ECMA), AJAX, PHP
- SQL/XML
- Microsofts XLinq
- Native XML Programming Languages
- Domain-specific languages BPEL
- Pure XML Type System XQuery, XSLT, XQueryP
- Research Curl, XL, Xduce, Links, XQuery!, SIMKIN
- Comparison of existing solutions
83W3C XQuery, Xpath, XSLT
XSLT 2.0
XQuery 1.0
extends
uses as a sublanguage
FLWOR expressions Node constructors Validation
Xpath 2.0
2006
extends, almost backwards compatible
Xpath 1.0
uses as a sublanguage
1999
XSLT 1.0
84XQuery 1.0 vs. XSLT 2.0
- Equivalent expressive power
- Same data model, type system, function library
- Different programming paradigms
- Iteration-based for XQuery
- Recursive template-based for XSLT
- Two different syntaxes for the same language
- XQuery easier when shape of the data is known
- XSLT easier to use when shape of the data is
unknown - Implementations often use the same runtime for
both - Oracle, Saxon
- Better language integration in the future
XQuery
XSLT 2.0
Xpath 2.0
Function Library
XML Type System (XML Schema)
XML Data Model (XDM)
85XML Data Model (XDM)
- Abstract (I.e. logical) data model for XML data
- Same role for Xpath 2.0, XQuery and XSLT 2.0 as
the relational data model for SQL - Purely logical --- no standard storage or access
model (in purpose) - XQuery, Xpath 2.0 and XSLT 2.0 are closed with
respect to XDM
XQuery Xpath 2.0 XSLT 2.0
Infoset
XML Data Model
PSVI
86XML Data Model (XDM)
Remember Lisp ?
- Instance of the data model
- a sequence composed of zero or more items
- The empty sequence often considered as the null
value - Items
- nodes or atomic values
- Nodes
- document element attribute text
namespaces PI comment - Atomic values
- Instances of all XML Schema atomic types
- string, boolean, ID, IDREF, decimal, QName, URI,
... - untyped atomic values
- Typed (I.e. schema validated) and untyped (I.e.
non schema validated) nodes and values
87Xpath 2.0/XQuery/XSLT 2.0 type system
- Types are imported from XML Schemas
- Standard static typing for XQuery and XPath 2.0
- Optional feature
- Pessimistic/conservative
- XSLT 2.0 has no standard static typing rules
- Dynamic dispatch makes dataflow analysis very
hard - The goal of the type system is
- detect statically errors in the queries
- infer the type of the result of valid queries
- ensure statically that the result of a given
query is of a given (expected) type if the input
dataset is guaranteed to be of a given type
88What is XQuery ?
- A programming language that can express
arbitrary XML to XML data transformations - Logical/physical data independence
- Declarative
- Side-effect free
- Strongly typed language
- An expression language for XML.
- Such expressions are embeddable in a variety of
environments (programming languages, APIs, etc)
89XQuery vs. SQL
Persistent data
Persistent data
Large volume
Large volume
SQL
XQuery
Transacted data
Transacted data
Declarative processing
Declarative processing
SQL works on the relational data model. XQuery
works on XML Data Model (XDM). XQuery the XML
replacement for SQL ? No. XQuery is not a query
language, but a declarative programming language.
90XQuery programs
- An XQuery program
- a prolog an expression
- Role of the prolog
- Populate the context where the expression is
compiled and evaluated - Prologue contains
- namespace definitions
- schema imports
- default element and function namespace
- function definitions
- collations declarations
- function library imports
- global and external variables definitions, etc
- The prolog is the link between the XQuery
expression and the environment where the
expression is embedded
91XQuery expressions
- XQuery Expr Constants Variable
FunctionCalls PathExpr - ComparisonExpr ArithmeticExpr LogicExpr
- FLWRExpr ConditionalExpr
QuantifiedExpr - TypeSwitchExpr InstanceofExpr CastExpr
- UnionExpr IntersectExceptExpr
- ConstructorExpr ValidateExpr
- Expressions can be nested with full generality !
- Functional programming heritage.
92Path expressions
- document(bibliography.xml)/bib
- x/childbib/childbook/_at_year
- x/parent
- x/child/descendentcomment()
- x/childelement(, nsPoType)
- x/attributeattribute(, xsinteger)
- x/ancestorsdocument(schema-element(nsPO))
- x/(childelement(, xsdate)
attributeattribute(, xsdate) - x/f(.)
93FLWOR expressions
- Similar to the Select-From-Where of SQL
- Clauses FOR, LET, WHERE, ORDER BY, RETURN
- Example
- for x in //bib/book
/ similar to FROM in SQL / - let y x/author
/ no analogy in SQL / - where x/titleThe politics of experience
-
/ similar to WHERE in SQL / - order by x/year
/ similar to the ORDER BY clause / - return count(y)
/ similar to SELECT in SQL /
FOR var IN expr
RETURN expr
WHERE expr
LET var expr
ORDER expr
94Node constructors
- Constructing new nodes
- Elements, attributes, documents, processing
instructions, comments, text - Constant vs. Dynamically evaluated content
- ltresultgt
- literal text content
- lt/resultgt
- ltresultgt
- x/name
- lt/resultgt
- ltresultgt
- some content here x/text()and some more here
- lt/resultgt
95Functions in XQuery
- In-place XQuery functions
- declare function nsfoo(x as xsinteger) as
element() - ltagt x1lt/agt
- Can be recursive and mutually recursive
- Support for external functions
- Support for library of modules
XQuery functions play the role of database views
96Dynamic dispatch in XSLT
- Order of templates depends on the data
- Very useful while dealing with irregular XML
structures - ltxsltemplate match"/"gt
- ltaxslstylesheet version"2.0"gt
- ltxslapply-templates/gt
- lt/axslstylesheetgt
- lt/xsltemplategt
- ltxsltemplate match"elements"gt
- ltaxsltemplate match"/"gt
- ltaxslcomment select"systemproperty('xslversion
')"/gt ltaxslapply-templates/gt - lt/axsltemplategt
- lt/xsltemplategt
- ltxsltemplate match"block"gt
- ltaxsltemplate match"."gt
- ltfoblockgt ltaxslapply-templates/gt lt/foblockgt
- lt/axsltemplategt
- lt/xsltemplategt
97XQuery/Xpath 2.0 Full Text
- XML data frequently contains text
- XQuery/Xpath 2.0 Full Text extension provides
search capabilities - Use case example RSS/blogs filtering
- FTSelections special kind of Boolean predicates
- Operators
- words, and, or, not, Â mild not, order, scope,
distance, window, times)Â Â - Match options
- Case, diacritics, stemming, thesauri, stop words,
language, wildcards - Scoring
98XQuery Full Text Example
- for book in doc("http//bstore1.example.com/full-
text.xml")/books/book - let title book/metadata/title. ftcontains
"improving" "usability" distance at most 2
words ordered at start - where count(title)gt0
- return title
99XML Update facility
- XML Update Facility W3C Working Draft
- Ability to modify nodes in an XDM instance in a
declarative fashion - Primitive update operations
- insert ltagegt24lt/agegt into personnameJim
- delete book_at_yearlt2000
- rename article as publication
- replace (books/book)1 with ltbookgt.lt/bookgt
- replace value of title with New Title
100XML Update Facility (2)
- Conditional updates
- if(book/yearlt2000)
- then delete book/year
- else rename book/year as publicationTime
- Collection-oriented updates
- for x in book
- where x/yearlt200
- do rename x as oldBook
- XML transformations using the update syntax
- Single snapshot query
101Overview
- Introduction
- Applications Architectures
- Interfaces to existing languages (Java, .NET, )
- XML APIs SAX, DOM, StaX
- Codegenerators JAXB 2.0, XML Beans, SDO, EMF
- Extensions to existing programming languages
- JavaScript (ECMA), AJAX, PHP
- SQL/XML
- Microsofts XLinq
- Native XML Programming Languages
- Domain-specific languages BPEL
- Pure XML Type System XQuery, XSLT, XQueryP
- Research Curl, XL, Xduce, Links, XQuery!, SIMKIN
- Comparison of existing solutions
102Procedural extensions to XQuery
- Very controversial topic
- Old research
- XL project (Florescu, Kossmann, 2001)
- New Research
- XQuery! (Simeon, Ghelli)
- XQueryP (Carey, Chamberlin, Kossmann, Florescu,
Robie) - Industrial pressure
- E.g.MarkLogics XML application development
platform - Long history of adding control flow logic to
query languages - More then 15 years of success of PL/SQL and other
procedural extensions for SQL - SQL might have failed otherwise !
103What functionalities are missing in XQuery (after
adding updates)?
- The ability to see the results of their
side-effects during the computation - The ability to invoke external computations that
cannot participate in a snapshot semantics - The ability to preserve state during computation
- The ability to recover (in a controlled way) from
dynamic errors
104XQueryP proposal
- Submitted by several companies to W3C
- Oracle, BEA, DataDirect, etc
- Under consideration for standardization
- Surprisingly very small extensions to XQuery
can satisfy many new use case scenarios (not all
unfortunately)
105The XQueryP technical proposal
- A well-defined evaluation order for XQuery
expressions (sequential order) - Paradigm shift for the database people
- Does not mean that optimizability is reduced !
- Reduce the granularity of the snapshot to each
individual atomic update expression - Adds three new kind of expressions
- Block
- Set
- While
106(1) Sequential evaluation order
- Slight modification to existing rules
- FLWOR FLWO clauses are evaluated first result
in a tuple stream then Return clause is
evaluated in order for each tuple. Side-effects
made by one row are visible to the subsequent
rows. - COMMA subexpressions are evaluated in order
- (UPDATING) FUNCTION CALL arguments are evaluated
first before body gets evaluated
Required (only) if we add side-effects
immediately visible to the program e.g. variable
assignments or single snapshot atomic updates
otherwise semantics not deterministic.
107(2) Reduce snapshot granularity
- Today update snapshot entire query
- Change
- Every single atomic update expression (insert,
delete, rename, replace) is executed and made
effective immediately - Semantics is deterministic because of the
sequential evaluation order (point1)
108(3) Adding new expressions
- Block expressions
- Assignment expressions
- While expressions
109Block expression
- Syntax
- ( BlockDecl ) Expr ( Expr)
- BlockDecl
- (declare VarName TypeDecl? (
ExprSingle) ?)? - (, VarName TypeDecl? (
ExprSingle) ? ) - Semantics
- Declare a set of updatable variables, whose scope
is only the block expression (in order) - Evaluate each expression (in order) and make the
effects visible immediately - Return the value of the last expression
- Updating if body contains an updating expression
110Assignment expression
- Syntax
- set VarName ExprSingle
- Semantics
- Change the value of the variable
- Variable has to be external or declared in a
block (no let, for or typeswitch) - Updating expression
- Semantics is deterministic because of the
sequential evaluation order
111While expression
- Syntax
- while ( ExprSingle ) return Expr
- Semantics
- Evaluate the test condition
- If true then evaluate the return clause repeat
- If false return the concatenation of the values
returned by all previous evaluations of return - Syntactic sugar, mostly for convenience
- Could be written using recursive functions
112Atomic Blocks
- Syntax
- atomic . . .
- Semantics
- If the evaluation of Expr does not raise errors,
then result is returned - If the evaluation of Expr raises a dynamic error
then no partial side-effects are performed (all
are rolled back) and the result is the error - Only the largest atomic scope is effective
- Note XQuery! had a similar construct
- Snap vs. atomic
113XQueryP example
- declare updating function localprune(d as
xsinteger) as xsinteger -
- declare count as xsinteger 0
- for m in /mail/messagedate lt d
- return do delete m
- set count count 1
-
- count
114More complex example
- declare updating function myNscumCost(projects)
as element( ) -
- declare total-cost as xsdecimal 0
- for p in projectsyear eq 2005
- return
- set total-cost total-costp/cost
- ltprojectgt
- ltnamegtp/namelt/namegt
- ltcostgtp/costlt/costgt
- ltcumCostgttotal-costlt/cumCostgt
- ltprojectgt
-
Today additional self join, or recursive function
115XQueryP conclusion
- If successful, can provide a platform for
building XML-only applications - No more SQL, no more Java/C
- Declarative programming and usability
- Good less code, higher level
- Bad less programmers can do it, harder debugging
- Automatic optimization
- Compilers will be very complex to build
- Better chances of success
116Research projects
- XL
- Web Services implementation
- Xduce
- Static typing, pattern matching
- Links
- XML programming without tiers
- XQuery!
- Make XQuery fully compositional with side-effects
- User controlled granularity for snapshots
117Overview
- Introduction
- Applications Architectures
- Interfaces to existing languages (Java, .NET, )
- XML APIs SAX, DOM, StaX
- Codegenerators JAXB 2.0, XML Beans, SDO, EMF
- Extensions to existing programming languages
- JavaScript (ECMA), AJAX, PHP
- SQL/XML
- Microsofts XLinq
- Native XML Programming Languages
- Domain-specific languages BPEL
- Pure XML Type System XQuery, XSLT, XQueryP
- Research Curl, XL, Xduce, Links, XQuery!, SIMKIN
- Comparison of existing solutions
118XML programming for what kind of application ?
- Simple XML serialization for communication (XML
at the end) - Xlink, JavaAPIs
- Web distributed XML communication
- Ajax
- Complex XML computations (HealthCare7, XBRL)
- XQuery, XQueryP, XLink
- Orchestration of Web Service messages
- BPEL
- Process a mix of relational and XML data
- SQL/XML
- Formatting XML content
- XSLT
- Unfortunately, many (most) applications have
several of those needs in the same time ! - Changing paradigms is very costly
119What community what background?
- XML is an unification factor for CS various
communities - For the moment each community wrongly believes to
solve the XML problem - Global XML picture missing in each community
Programming languages
Databases
XML
Content management
Workflow
120XML programming where in the architecture ?
- What tier in the architecture ?
- Client, server, middle tier ?
- Same language on all the tiers ?
- XQuery can run on all tiers
- EcmaScript, PhP werent designed to scale on a
large server, but middle tier - Which one will run on a mobile phone ?
- XML might have an impact on the existence of the
existing multi-tiered architectures
Client (XHTML, scripts)
Communication (XML)
Application logic (Java/C/PhP)
Storage (supports XML)
121Programming style
- All styles
- Imperative programming APIs (Java DOM/SAX)
- Declarative (XQuery, XQueryP)
- Imperative declarative (Xlink)
- Workflow (BPEL)
- Recursive template (XSLT)
- Choice
- Usability based on what people are already used
to do - Performance declarative is easier to optimize
- Neither of those alternatives provides a
complete XML programming solution - All will evolve in the future
- Which one will provide all the functionality
required ?
122How much weight does XML have in the language ?
- One of the thousands APIs
- E.g. Java DOM
- Language agnostic to the XML existence
- More serious syntactic extension
- Xlinq, SQL/XML
- XML is one feature among others in the language
- Nothing but XML
- XQuery, Xpath, XSLT, XQueryP, BPEL
- Try to process real XML (complex or not, good or
bad), not to simplify it, or fix it - XML is a given
123Compliance to the W3C family of standards
- XML is not an orphan it comes with an
Italian-style family of W3C standards - Infoset, Namespaces, XML Schema, Xlink, XForms,
XHTML, binary XML, etc, etc - Forced to live well together by W3C rules