Title: Web services and data integration
1Web services and data integration
- S. Abiteboul Omar Benjelloun Tova Milo
- INRIA and Xyleme INRIA INRIA and Tel Aviv
- Serge.Abiteboul_at_inria.fr
- Singapore, December 2002
2Organization
- The context
- Accessing information on the Web
- Web services
- SOAP
- WSDL
- UDDI
- Active XML
- AXML documents
- AXML services
- Architecture et implementation
- Applications
- Conclusion
3The context
- The Web and XML are changing dramatically the
management of distributed information
4Distributed data management
- Warehousing
- Mediation
- Management of data in cooperative work
- Management of data in distributed scientific
applications - Mobile data management
- Document management
- Web sites
- Portals, etc.
- Information used to live in islands and this is
changing
5The Web of yesterday
- Protocol HTTP
- Documents HTML
- Millions of independent Web sites and billions of
documents - Browsing and full-text indexing
- Publication of databases using forms
- Data management with the Web
- HTML is primarily to be read by humans
- Data management applications over Web data
- Based on hand-made wrappers
- Expensive, incomplete, short-lived, not adapted
to the Web constant change - No real support for distributed data management!
6Information used to live in islands but it is
changing
- Different formats relational, metadata,
documents, text, DXF - A Web standard for data exchange, XML, is fixing
it - XML captures all kinds of information over a wide
spectrum - XML comes with a family of emerging standards
XML schema, XSL/T, Xquery, domain specific
schemas - Different computers, platforms, languages,
applications - A standard for Web services, SOAP, is fixing it
- SOAP allows ubiquitous computing on the Internet
- SOAP comes with a family of emerging standards
WSDL, UDDI - This provides a uniform access to information
- the dream for distributed data management
7The information spectrum
Semi-structured data and XML
Structured Data
Meta data
Hierarchy
Books Contracts Catalogs Bank
accounts Emails Financial Reports Insurance
Policies Economical Analysis
Derivatives Inventory Political
analysis Insurance Claims Financial
News Sports News Resumes
8What can be captured with XML?
- Very structured information such as database,
knowledge base - Most DBMS now export in XML
- Semi-structured data such as data exchange
formats (ASN.1, SGML), e.g., technical
documentation - Less structured data documents
- Meta-data Author, date, status
- Existing structure in them chapter, section,
table of content and index - Possibly tagging of elements in it (citation,
lists) - Links to other documents
- Plain text
- Meta data for unstructured data such as images
and sound
9A standard for information XML
- labeled ordered trees where leaves are text
- Marriage of document and database worlds
- Marriage of full text indexing and structure
indexing - Is it the ultimate data model? No
- Purely syntax more semantics needed
- Is it OK for now? Definitely yes (because it is a
standard)
10The main asset of XML typing
- Applications need typing and XML data can be
typed if needed (DTD and XML schema) - Trees
- Logical Granularity neither page or document
level but the piece of information that is
needed - Semantics and structure are in tags and paths
- product-table/product/reference
- product-table/product/price
11A standard for distributed computing Web
services
- Possibility to activate a method on some remote
Web server - Exchange information in XML input and result are
in XML - Ubiquitous XML distributed computing
infrastructure - 2 main applications
- E-commerce
- Access to remote data
- With XML and Web services, it is possible
- To get information from virtually anywhere
- To provide information to virtually anywhere
12The basic picture
XML
m( )
Black box
SOAP messages
query
answer
XML
SOAP service
Web client
Internet
13Accessing and integrating information
14Accessing remote information
Query some data services that provide candidate
genes
Multi formats multi protocoles
Gene banks
Application using gene banks
processing
Use some processing services
processing
processing
15Same with Web services
Query some data services that provide candidate
genes
Web
Gene banks
Application using gene banks
processing
Use some processing services
processing
processing
16The big picture peer2peer
Web service
DB Web Service
queries
Web
queries
DB Web Service
Web service
Data warehouses Databases Web pages PC, PDA, cell
phones
17The main roles
Client
Look up
Service Registry
bind
publish
Service Provider
18Simple view Looking for information about Gismos
- Query some yellow-pages
- Who knows about Gismos?
- Negotiate with Gismo specialists
- Nature of the service
- Quality, cost
- Get the information
- Order, payment, delivery
- Integration in my information system
- Eventually publish information
- and all this automatically
19Data integration Logical view
Service directories
Mediator or warehouse
wrapper1
Service descriptions
Get service description
wrapper2
wrapper3
source2
source3
source1
20The Web service solution
Data and service repository
Web
UDDI
Data and service description
wsdl
RDF
Data and service semantics
worklow
wsfl
XMLSOAP
21Mediation with Web services
Service directories
Service descriptions
wrapper3
source3
Mediator
Web
wrapper1
source1
wrapper2
source2
- Web services
- Service directories
- Service descriptions
- Wrappers
- Sources
- Mediators/warehouses
22Advantages for data integration
- A universal model for data integration XML
- Solves the heterogeneity issue
- A universal protocol for distribution SOAP
- A language for describing the interface of data
sources WSDL - Simple object access protocol (something like
Corba) - Web service description language (something like
IDL) - Solves the interoperability issue
- A standard for publication and discovery of
information UDDI - Universal Description, Discovery and Integration
- A standard for describing the semantics of
sources RDF - Resource description framework
23Advantages continued the goal
- The system can find a new source of information
using UDDI - Understand its syntax using WSDL
- Understand its semantics using RDF
- Get it using SOAP
- The information is in XML, can be restructured
and integrated automatically - Not yet But soon?
24Jargon
Help!
WSFL
XHTML
.NET
XML
DTD
RDF
RosettaNet
XSL-FO
Xschema
namespace
XSL
ebXML
XSLT
HTTPS
SOAP
HTTP
OASIS
OAGIS
ICE
MIME
WSDL
UDDI
WSDL
RSS
25Active XML
- Joint work with Bernd Amann, Jerôme Baumgarten,
Angela Bonifati, Ioana Manolescu, Frederic Ngoc
and others
26AXML XML embedded SOAP calls
SOAP messages
AXML
AXML
AXML
m( )
query
query
Web server
Web client
answer
answer
AXML
q1(1,2), Q2, Q3 (XPATH, Xquery)
Internet
Internet
AXML peer client and server
27Active XML
AXML peer
- Peer-to-peer architecture
- Each Active XML peer
- Repository manages active XML data with
embedded Web service calls - Web client activate calls in the documents
- Web server provides Web services defined as
(parameterized) queries over the repository
soap
28Build on existing standards
- Tree data XML
- internal data representation and
- data exchange
XML
AXML
Web services SOAP, WSDL
Query languages Xquery/Xpath
29AXML peer repository of AXML documents
- ltdirectorygt
- ltdep name"Toygt
- ltscgttoy.xyz.com/GetToyPersonel()lt/scgt
- lt/depgt
- ltdep nameDVDgt
- ltscgtdvd2000.com/GetDVDPersonnel()lt/scgt
- lt/deptgt
- lt/directorygt
Service calls
May contain calls to any SOAP Web service
e-bay.net, google.com, etc. to any AXML Web
service
30AXML peer Web client
- ltdirectorygt
- ltdep name"Toygt
- ltperson pnameSmithgt
- ltphonegt01lt/phonegt
- ltpdagt
- ltscgttoy.xyz.com/GetPDA(../../_at_pname)lt/scgt
- lt/pdagt
- lt/persongt
- ltscgttoy.xyz.com/GetToyPersonel()lt/scgt
- lt/depgt
- ltdep nameDVDgt
- ltscgtdvd2000.com/GetDVDPersonnel()lt/scgt
- lt/deptgt
- lt/directorygt
Result
31Controlling the evaluation
- Activation of calls and data lifespan are
controlled - frequency when is the service called ? ( call
each day ) - validity how long is the retrieved data valid ?
- mode immediate or lazy ?
32Example control attributes
- ltdirectorygt
- ltdep name"Toygt
- ltsc validrt1 week modeimmediate
gt - toy.xyz.com/GetToyPersonel()
- lt/scgt
- lt/depgt
- ltdep nameDVDgt
- ltsc valid0 modelazy gt
dvd2000.com/GetDVDPersonnel() - lt/scgt
- lt/deptgt
- lt/directorygt
33AXML peer Web server
- AXML Web services defined using XQuery over AXML
documents
let service Get-Toy-Personnel( ) be for a in
document("toy.xyz.com/members.axml")/member,
b in a//name, c in a//phone,
d in a//pda return ltperson pname
b/text() gt c d lt/persongt
34The crux the exchange of AXML data
- Arguments result of calls are AXML
- Data is thus intentional dynamic
- Distributed computing by sending data containing
service calls, one can delegate some work to
other peers - Partial computations by returning data
containing service calls, one can give to the
receiver the control of these calls - All this can be controlled
35Example Tourist guide
- ltscgtyahoo.com/Temp(Paris)lt/scgt
- I need to evaluate the temperature of Paris
- I call Yahoo ltscgtmeteoF.com/t(Paris)lt/scgt
- I call meteoF ltt typecelciusgt0lt/tgt
- I am asked what is the temperature of Paris
- ltt typecelciusgt0lt/tgt
- ltscgtmeteoF.com/t(Paris)lt/scgt
- ltscgtyahoo.com/Temp(Paris)lt/scgt
36Continuous services
- Inside the tourist guide new events
- Pull mode standard SOAP query
- Ask once a week
- Push mode subscription to a continuous service
- When new events are announced, they are pushed to
the AXML document - Possibility to define AXML continuous services
37Architecture andimplementation
38Global architecture
AXML peer S2
AXML peer S1
query
SOAP
XQuery processor
Evaluator
AXML
AXML peer S3
AXML
read update
SOAP wrapper
read update
consults
SOAP
service descriptions
SOAP service
XML
AXML document store
AXML
SOAP client
service call
service result
39Implementation
- SUNs Java SDK 1.4 (includes XML parser, XPath
processor, XSLT engine) - Apache Tomcat 4.0 servlet engine
- Apache Axis SOAP toolkit 1.0 beta 3
- X-OQL query processor, persistent DOM repository
- JSP-based user interface, using JSTL 1.0
standard tag library - First prototype
- No lazy evaluation
- No continuous services
- On going work on typing, security, replication
- Demo for VLDB02
- P2P auctioning system
40Illustration 3 applications
41Application 1 Warehousing
- Construction of warehouses with Web data
- Monitoring of changes on the Web
- Kind of services that are used
- Google search engine
- wget
- Classification
- XML Diff and site changes
- Page monitoring system
- etc.
42Application 2 Mobile data
- AXML peers as mobile entities
- Active data store with query capabilities
- Metadata and object profiles
- Issues
- Storage services for mobile objects
- Processing services for mobile objects
- Use proxies for that
- European Project DBGlobe
43Application 2 Mobile data
- Light-weight AXML peers
- PDA, cellular phone, laptop
- Limited storage, network bandwidth
- Sometime disconnected
- Limited functionalities
- E.g., support for continuous services based on a
mail server and SMTP
44Application 2 context awareness
- Where am I? (geographical position)
- Where is the nearest AXML proxy? (network
position) - Active use of this information
- For providing context dependent data (e.g., time,
temperature, nearest restaurants, etc.) - For selecting services (e.g., choose a nearby
proxy for caching)
45Application 3 P2P Auction
- Each peer proposes some auctions
- The document records the peers items and the
bids - Each peer knows about some auctions of other
peers
- Each peer can bid on any auction
- The peer recalls the bids she has put
- When an auction closes, the winner is notified
- No centralization
46Conclusion and on-going work
47AXML services
- A simple, declarative way to create Web services
compatible with current standards for Web
services invocation - AXML services are powerful tools for data
integration - They allows for new, powerful features
- Intentional parameters and results AXML
documents (containing service calls) that are
exchanged. - Continuous services send back a stream of
answers (SOAP messages) to the caller
48Many issues
- Security
- Typing of parameters
- Lazy evaluation and optimization
- Replication
- Mobility dbglobe project
- Termination
- Implementation
- Foundations
- And more
49Security
- Peers exchange AXML documents containing service
calls - A server (resp. client) might ask the client
(resp. server) to do something bad - ltscgtqod.com/QuoteOfDay lt/scgt
- ltquote datejuly 8th 2002gt
- My heart was bumping ltcontextgtTskitishvili,
picked 5th in the NBA draft by the Denver
Nuggetslt/contextgt - ltscgtbuy.com/BuyCar( BMW Z3 )lt/scgt
- lt/quotegt
50Using type to control the use of services
Accept
Peer1
Peer2
f
f
g
Evaluate g before sending data
Peer1 tells which kind of data it exports and
Peer2 which kind it accepts
51Distribution and replication
- Motivated by mobile devices with limited
resources - Allows to distribute one XML document on several
peers - Allows to replicate an XML-sub-tree on several
peers - Query optimization
52Thanxmore questions Serge.Abiteboul_at_inria.fr