Title: INT-2: XQuery Levels the Data Integration Playing Field
1INT-2 XQuery Levels the Data Integration Playing
Field
Carlo (Minollo) Innocenti
DataDirect XML Technologies, Program Manager
2Agenda
- The Problem
- Why XQuery
- XQuery for Java API (XQJ)
- XQuery for Data Integration
- XQuery Demos, Code Walk-throughs
- Summary
3A Typical Data Integration Problem
The web server needs to fetch users personal
data, stock holdings and live stock data to
compile a report to send back to the user
The user submits a request for a report about
their stock holdings
A public service offers live (delayed) stock
prices
Different repositories are used for different
parts of the information necessary to create a
stock holdings report
4Some implementation constraints
Java/JSP codeaccessing the variousJava APIs
and generating the HTML report
HTML
SOAP through AXIS
Java Open Client or JDBC
dBASE IV APIs
JDBC
5A dangerous approach
Data Access Layer
6The XQuery Vision
XQuery
7Agenda
- The Problem
- Why XQuery
- XQuery for Java API (XQJ)
- XQuery for Data Integration
- XQuery Demos, Code Walk-throughs
- Summary
8What is XQuery?
- W3C Query Language for XML
- Native XML Programming Language
- The SQL for XML
- Designed to query, process, and create XML
- High level functionality
- Find anything in an XML structure
- Querying and combining data
- Creating XML structures
- Functions
- User-defined function libraries
9XQuery Basics
- Path Expressions Finding Data
doc("holdings.xml")/holdings/entry - FLWOR Expressions Querying and Combining
Datafor h in doc("holdings.xml")/holdings/holdi
ng, c in doc("companies.xml")/companies/comp
anywhere h/userid "Minollo" and
c/ticker h/stocktickerreturn c/name
10XQuery Basics
- Path Expressions, FLWOR Expressions, and XML
Constructorsfor h in doc("holdings.xml")/holdin
gs/entry, c in doc("companies.xml")/companie
s/companywhere h/userid "Minollo" and
c/ticker h/stocktickerreturn ltcompany
ticker" c/ticker "gt c/companyname
c/annualrevenues lt/companygt
11Functions and Modules
- A Function in a Library Modulemodule namespace
stock"http//tagsalad.com/stocks"declare
function stockcompanies(user as xsstring)
for h in doc("holdings")/holdings/entry,
c in doc("companies")/companies/company wh
ere h/userid user and c/ticker
h/stockticker return ltcompany ticker"
c/ticker "gt lt/companygt
12Functions and Modules (2)
- Importing and Using a Library Moduleimport
module namespace stock"http//tagsalad.com/stocks
"stockcompanies("Minollo")
13Why XQuery?
- Native Support for XML
- Conventional programming and query languages are
not designed for XML - No more parse, navigate, cast, repeat XML is
the native datatype - Designed for Data Integration
- Native XML and non-XML data can be used the same
way - Vastly simplifies development when input includes
XML, relational, EDI - Requires support from implementation for the data
sources you need
14Why XQuery?
- XML Output is Directly Useful
- XML is becoming the industry standard for data
exchange - Dynamic Web Sites
- Publishing Applications
- Web Messages
- We normally dont exchange SQL tables or present
them to users! - Programmer Productivity
- Readable, declarative code transparent, easier
to maintain - 7 to 20 times less code than Java SQL JDBC
XML APIs
15Why XQuery?
- Performance
- Declarative code can be optimized by the XQuery
Engine - Relational database vendors and experts very
involved in the design - Actually performance depends on the
implementation
16Benefits of XQuery
- Data Integration is harder without XQuery!
- Every data source is different
- Many applications use several languages and APIs
to address data sources (e.g. JavaJDBCDOM, SQL,
Perl, XSLT) - Mediating among data sources accounts for a lot
of code - XQuery treats all data sources as XML
17Benefits of XQuery
- Processing XML is harder without XQuery!
- Most programming languages dont know XML
structures - Parse, navigate, cast, repeat
- XML is the native data structure for XQuery
- XML Reporting is harder without XQuery!
- XML input and output may have very complex
structure - Many different desired XML outputs
- Data Integration, Native XML Processing are
needed - XQuery gives full query processing for any XML
input and output
18Agenda
- The Problem
- Why XQuery
- XQuery for Java API (XQJ)
- XQuery for Data Integration
- XQuery Demos, Code Walk-throughs
- Summary
19What is XQJ?
- XQuery API for Java (XQJ) JSR 225
- The JDBC for XQuery
20Benefits of XQJ
- Industry Standard, similar to JDBC
- No need to learn a new proprietary API for each
product and each version - Can build on existing JDBC knowledge
- Lets XQuery fit into any Java architecture
- Queries can be created or parameterized at
run-time - Example A portfolio for a given user at a given
date - Interfaces are designed for use in J2EE
applications - Example Results can be retrieved as DOM, SAX,
StAX, or text
21Agenda
- The Problem
- Why XQuery
- XQuery for Java API (XQJ)
- XQuery for Data Integration
- XQuery Demos, Code Walk-throughs
- Summary
22An XQuery architecture
23DataDirect XQuery
- High performance
- Scalable
- Embeddable
- Plugs into any Java architecture
- Accesses almost any data source
- No dependency on servers
- Standards-based
24XQuery can be fast for relational data!
ltportfolio gt ltcompany ticker"AMZN"gt lt
companynamegtAmazon.com, Inc.lt/companynamegt
ltannualrevenuesgt7780lt/annualrevenuesgt lt/compa
nygt ltcompany ticker"EBAY"gt ltcompanyna
megteBay Inc.lt/companynamegt ltannualrevenues
gt22600lt/annualrevenuesgt lt/companygt ltcompan
y ticker"IBM"gt ltcompanynamegtInt'l
Business Machines Clt/companynamegt ltannualr
evenuesgt128200lt/annualrevenuesgt lt/companygt
ltcompany ticker"PRGS"gt ltcompanynamegtProg
ress Softwarelt/companynamegt ltannualrevenue
sgt493.4lt/annualrevenuesgt lt/companygtlt/portfoli
ogt
- Highly optimized for relational sources
- Minimizes retrieval of data
- No more rows than needed
- No more columns than needed
- Uses database functionality
- Joins
- Sorting
- Etc..
- Optimizes for each SQL dialect
- Efficient JDBC retrieval
- Embeds DataDirect JDBC technology
- Optimizations added to support XQuery
- Supports incremental retrieval
- Optimizes for XML hierarchies
- Sort-merge algorithm
- Minimal cost of XML construction
- Leverages SQL library
- Supports hints
HOLDINGS HOLDINGS
USERID TICKER SHARES
Jonathan PRGS 23
Minollo PRGS 4000000
Jonathan AMZN 3000
Minollo AMZN 3000
COMPANIES COMPANIES COMPANIES
TICKER NAME REVENUES
AMZN Amazon.com, Inc. 7780
EBAY eBay Inc. 22600
PRGS Progress Software 493.4
YHOO Yahoo! Inc. 10700
25XQuery can be fast for XML files!
- lt?xml version"1.0" encoding"UTF-8"?gtltsoapEnvel
ope xmlnssoap"http//schemas.xmlsoap.org/soap/en
velope/" xmlnsxsi"http//www.w3.org/2001/XMLSche
ma-instance" xmlnsxsd"http//www.w3.org/2001/XML
Schema"gt ltsoapBodygt ltGetQuotesResponse
xmlns"http//swanandmokashi.com"gt ltGetQuote
sResultgt ltQuotegt ltCompanyNamegtAP
PLE COMPUTERlt/CompanyNamegt ltStockTickergt
AAPLlt/StockTickergt ltStockQuotegt74.17lt/St
ockQuotegt ltLastUpdatedgt9/14/2006
401pmlt/LastUpdatedgt ltChangegt1.17lt/Chang
egt ltPercentChangegt1.82lt/PercentChangegt
ltOpenPricegtN/Alt/OpenPricegt ltDa
yHighPricegtN/Alt/DayHighPricegt ltDayLowPri
cegtN/Alt/DayLowPricegt ltVolumegt0lt/Volumegt
ltMarketCapgt63.266Blt/MarketCapgt
ltYearRangegt47.87 - 86.40lt/YearRangegt ltE
xDividendDategt21-Nov-95lt/ExDividendDategt
ltDividendYieldgtN/Alt/DividendYieldgt ltDiv
idendPerSharegt0.00lt/DividendPerSharegt lt/Qu
otegt lt/GetQuotesResultgt lt/GetQuotesRespo
nsegt lt/soapBodygtlt/soapEnvelopegt
- General XQuery rewrites
- Constant-folding, elimination of common
sub-expressions, loop rewrites, ordering
rewrites, etc - Document projection
- XML construction accounts for much of the cost
- Dont build parts of the document that the query
doesnt need! - Document streaming
- Discard parts of the document when no longer
needed - Makes memory usage near constant with size of
file - Multiple Gigabytes can be queried
26XQuery can use XML Converters
- EDI File
- ISA00DATADIRECT00STYLUS200601DATA DIRECT
- 01STYLUS STUDIO 060504121200503200654321
0I' - GSBFDATADIRECTSTYLUS200620060504121212256X
005030' - ST1053389'
- BGN28102420060504121212GM'
- NM12L4Progress Software Corporation'
- N314 Oak Park Drive'
- N4BedfordMA01730USAA'
- REF1ZPRGS'
- NM12L4Apple Computer, Inc.'
- N31 Infinite Loop'
- N4CupertinoCA95014USAA'
- REF1ZAAPL'
- SE113389'
- GE1256'
- IEA1200654321'
doc("adapter//EDI?ticker-request.edi") ltX12gt
ltISAgt ltISA01gtlt!--I01 Authorization
Information Qualifier--gt00lt!--No Authorization
Information Present (No Meaningful Information in
I02)--gtlt/ISA01gt ltISA02gtlt!--I02
Authorization Information--gtDATADIRECTlt/ISA02gt
ltISA03gtlt!--I03 Security Information
Qualifier--gt00lt!--No Security Information Present
(No Meaningful Information in I04)--gtlt/ISA03gt
ltISA04gtlt!--I04 Security Information--gtSTYLUS2
006lt/ISA04gt ltISA05gtlt!--I05 Interchange
ID Qualifier--gt01lt!--Duns (Dun amp
Bradstreet)--gtlt/ISA05gt ltISA06gtlt!--I06
Interchange Sender ID--gtDATA DIRECT
lt/ISA06gt ltISA07gtlt!--I05 Interchange ID
Qualifier--gt01lt!--Duns (Dun amp
Bradstreet)--gtlt/ISA07gt ltISA08gtlt!--I07
Interchange Receiver ID--gtSTYLUS STUDIO
lt/ISA08gt ltISA09gtlt!--I08 Interchange
Date--gt060504lt!--2006-05-04--gtlt/ISA09gt ltIS
A10gtlt!--I09 Interchange Time--gt1212lt/ISA10gt
ltISA11gtlt!--I65 Repetition Separator--gtlt/ISA11
gt ltISA12gtlt!--I11 Interchange Control
Version Number--gt00503lt!--Standards Approved for
Publication by ASC X12 Procedures Review Board
through October 2005--gtlt/ISA12gt ltISA13gtlt!-
-I12 Interchange Control Number--gt200654321lt/ISA1
3gt ltISA14gtlt!--I13 Acknowledgment
Requested--gt0lt!--No Interchange Acknowledgment
Requested--gtlt/ISA14gt ltISA15gtlt!--I14
Interchange Usage Indicator--gtIlt!--Information--gtlt
/ISA15gt ltISA16gtlt!--I15 Component Element
Separator--gtlt/ISA16gt lt/ISAgt ltGSgt
ltGS01gtlt!--479 Functional Identifier
Code--gtBFlt!--Business Entity Filings
(105)--gtlt/GS01gt ltGS02gtlt!--142
Application Sender's Code--gtDATADIRECTlt/GS02gt
ltGS03gtlt!--124 Application Receiver's
Code--gtSTYLUS2006lt/GS03gt ltGS04gtlt!--373
Date--gt20060504lt!--2006-05-04--gtlt/GS04gt ltG
S05gtlt!--337 Time--gt121212lt/GS05gt ltGS06gtlt!
--28 Group Control Number--gt256lt/GS06gt ltG
S07gtlt!--455 Responsible Agency
Code--gtXlt!--Accredited Standards Committee
X12--gtlt/GS07gt ltGS08gtlt!--480 Version /
Release / Industry Identifier Code--gt005030lt!--Sta
ndards Approved for Publication by ASC X12
Procedures Review Board through October
2005--gtlt/GS08gt lt/GSgt
- Convert non-XML format to XML on-the-fly!
- EDI message types
- Comma-delimited or tab-delimited files
- dBase
- RTF
- mbox
- Batch conversions are supported
- Custom conversions
27XQuery can access Web Services
declare function localamazon-listing(isbn)
lttnsRequestgt lttnsConditiongtAlllt/tnsCondi
tiongt lttnsDeliveryMethodgtShiplt/tnsDelivery
Methodgt lttnsFutureLaunchDate/gt lttnsI
dTypegtASINlt/tnsIdTypegt lttnsItemIdgt isbn
lt/tnsItemIdgt lttnsResponseGroupgtMediumlt/tn
sResponseGroupgt lt/tnsRequestgt let loc
ltlocation address"http//soap.amazon.com/onca/
soap?ServiceAWSECommerceService"
soapaction"http//soap.amazon.com" /gtlet
payload localamazon-listing("0395518482")ret
urn wscall(loc, payload)
- Leverage existing SOA architecture in queries!
- Integrate queries with web services
- Easily generate complex web service requests
- Vastly increases the reach of your queries
28Questions so far?
29Agenda
- The Problem
- Why XQuery
- XQuery for Java API (XQJ)
- XQuery for Data Integration
- XQuery Demos, Code Walk-throughs
- Summary
30A Data Integration problem
Java/JSP codeaccessing the variousJava APIs and
generating the HTML report
HTML
SOAP through AXIS
Java Open Client or JDBC
dBASE IV APIs
JDBC
31A dangerous approach
Data Access Layer
32The XQuery Vision
XQuery
33The DataDirect XQuery Solution
HTML
34Step by step
- XQuery to aggregate data from the multiple data
sources - XQuery to publish an HTML or XSL-FO (PDF) report
directly - Pipelining multiple XQueries with validation
steps - Exposing an XQuery Web Service and consuming it
from OpenEdge
35Step by step
- XQuery to aggregate data from the multiple data
sources - XQuery to publish an HTML or XSL-FO (PDF) report
directly - Pipelining multiple XQueries with validation
steps - Exposing an XQuery Web Service and consuming it
from OpenEdge
36Step by step
- XQuery to aggregate data from the multiple data
sources - XQuery to publish an HTML or XSL-FO (PDF) report
directly - Pipelining multiple XQueries with validation
steps - Exposing an XQuery Web Service and consuming it
from OpenEdge
37Step by step
- XQuery to aggregate data from the multiple data
sources - XQuery to publish an HTML or XSL-FO (PDF) report
directly - Pipelining multiple XQueries with validation
steps - Exposing an XQuery Web Service and consuming it
from OpenEdge
38Agenda
- The Problem
- Why XQuery
- XQuery for Java API (XQJ)
- XQuery for Data Integration
- XQuery Demos, Code Walk-throughs
- Summary
39Benefits of XQuery
- Data Integration is harder without XQuery!
- Every data source is different
- Many applications use several languages and APIs
to address data sources (e.g. JavaJDBCDOM, SQL,
Perl, XSLT) - Mediating among data sources accounts for a lot
of code - XQuery treats all data sources as XML
40Benefits of XQuery
- Processing XML is harder without XQuery!
- Most programming languages dont know XML
structures - Parse, navigate, cast, repeat
- XML is the native data structure for XQuery
- XML Reporting is harder without XQuery!
- XML input and output may have very complex
structure - Many different desired XML outputs
- Data Integration, Native XML Processing are
needed - XQuery gives full query processing for any XML
input and output
41DataDirect XQuery
42Getting Started
- Examples Tutorialshttp//www.xquery.com
- XQuery Tutorial
- XQJ Tutorial
- DataDirect XQuery Tutorial
43Questions?
44Thank you foryour time
45(No Transcript)