Title: TECH854 XML TECHNOLOGIES Programming with XML REVISION Week 13
1TECH854XML TECHNOLOGIESProgramming with
XMLREVISIONWeek 13
2PROGRAMMING WITH XML
- Week 5 Parsing XML - SAX
- Week 6 XML to objects - DOM
- Week 7 XML Web Applications
- Week 8 XML Integration, Data Exchange and
Databases - Making XML work in real applications
- This part of the course required programming (ie
Java) and assumes programming expertise - Plenty of examples and code tempates
- Application in Assignment 2
3Microsoft
Java
IBM
XML Mediator
Java
Web Online Inventory
XML
Warehouse Oracle
Web Server Apache
HTML
Web Server
Java
XML
Web Online Shopping
XML Transformers
Web Content Management
XML Content
XML is format for data exchange and content
management
4Class ExerciseHow to read in an XML file?
lt?xml version 1.0?gt ltordersgt ltorderitem
personid235 gt ltproductgtchoc
lt/productgt ltqtygt535lt/qtygt
lt/orderitemgt lt/ordersgt
Write some pseudo code to read the XML file and
turn it into an order object ? What are some
issues ?
5Processors
- XML has well developed processors for this task
- Take advantage of this work
- Months of development time plus wide use in
market place - Issues - new standards (eg what about schema)
- Complexities - CData, external entities,
parameter entities, namespaces - Recall why XML has taken off as data interchange
format - Because we already have processors and agreed
syntax, we do not have to reinvent wheel of
syntax, grammars and parsing of one-off formats !!
6DOM versus SAX
xml.fujitsu.com/en/tech/dom/
7SAX Parser
- SAX provides an event based interface to the
parser. - User callbacks are associated with events for
- startElement
- endElement
- characters (text)
- startDocument etc
- The callbacks are passed the data associated with
the tag or the text of the character data. - Good for large XML files - especially where
processing required is linear. - For us this is lower level - good place to start
- however SAX requires much more programming than
other XML processors
8Step 2 - UnderstandFramework
DefaultHandler
http//java.sun.com/xml/jaxp/dist/1.1/docs/tutoria
l/overview/3_apis.html
9SAX Event Handlers
- startDocument()
- endDocument()
- startElement( String uri, String localname,
- String qname, Attributes atts)
- endElement(String name)
- characters (char ch, int start, int length)
- ignorableWhitespace(char ch, int start, int
length) - setDocumentLocator (Locator locator)
- uri - namespace URI
- localName - unprefixed name
- qname - with prefix eg oraelement
10Back to Echo Program
These methods are part of the Echo2 class
defined earlier
Recall line - saxParser.parse( new
File(argv0), handler) Registers this object
as the handler Then when events occur, call
thyself
11Echo Program
endElement
characters
why StringBuffer ??
12Exceptions
try // statements that could cause a
problem catch (SaxException) // error
statements
Exception Methods getMessage() getLineNumber
() getSystemId ()
13JAXP
- JAXP - Java API For XML Processing
- makes it easier to process XML data
- Abstract Layer between program and SAX, DOM, XSL
- Provides namespace support
- The examples so far have used JAXP, which means
imports and factory calls are easier - Underneath, it still uses the DOM and SAX API
14DOM - Document Object Model
- Origins in W3C
- DOM is specification to represent content and
model of XML documents across all programming
languages and tools - Based on tree model
- A set of bindings and classes that use the DOM
itself - When to use
- If changing XML file - inserting or deleting
elements or changing structure - Navigating to parts of XML file
- Complex hierarchies
- Memory intensive
15Class ExerciseHow to process Trees
Design a simple class and methods for trees Now
write pseudo program to pretty print the tree
16DOM Framework
17DOM API Framework
- org.w3c.dom - w3c Dom package
- javax.xml.parsers - Java DOM API
- DocumentBuilderFactory - creates instance of
factory - DocumentBuilder - does the work
- parse() - you call parse to do the work
- Document - the top level document
- Node - any sort of node in tree
- NodeList - a list of nodes (children)
- NamedNodeMap - use to get attributes
- SAXParseException, SAXException,
ParserConfigurationExeception
18 19Node Methods
- short getNodeType ()
- Node.DOCUMENT_NODE, Node.ELEMENT_NODE,
Node.TEXT_NODE etc - String getNodeName ()
- String getNodeValue()
- NodeList getChildNodes ()
- Boolean hasChildNodes ()
- NamedNodeMap getAttributes ()
20nodeName and nodeValue
21Start the recursion
theDocument builder.parse( new
File(argv0) ) // start your
processing here doit.printTree
(theDocument, argv0)
call from main
public void printTree(Document doc, String
fname) System.out.println("Pretty
Printing Tree " fname) // Start the
ball rolling from the top
treeRecursivePrint (doc, "") public
void treeRecursivePrint (Node node, String level)
System.out.println (level
prettyString (node)) .
printTree ()
doc becomes Node !
treeRecursivePrint ( node, level )
22Recurse down tree
public void treeRecursivePrint (Node node,
String level) System.out.println
(level prettyString (node))
// now recurse attributes if
(node.getNodeType() node.ELEMENT_NODE)
NamedNodeMap attributes
node.getAttributes () if (attributes
! null) for (int i0
iltattributes.getLength() i)
treeRecursivePrint ((Node) attributes.item(i),
levelindent"gt")
// process children if
(node.hasChildNodes () ) NodeList
nodes node.getChildNodes() if
(nodes ! null) for (int i0
iltnodes.getLength() i)
treeRecursivePrint (nodes.item(i), level
indent)
// end treeRecursivePrint
recurse through attributes
recurse through children
23Simple Creation
public static void main (String argv )
throws IOException, DOMException,
ParserConfigurationException
DomPPrint myDomPPrint new DomPPrint()
DocumentBuilderFactory dbf DocumentBuilderFactor
y.newInstance () DocumentBuilder db
dbf.newDocumentBuilder () Document doc
db.newDocument () Element root
doc.createElement ("root") Attr tmp
doc.appendChild (root)
System.out.println("Created root")
Element header doc.createElement("header")
header.appendChild (doc.createTextNode("This
is header")) root.appendChild (header)
root.appendChild (doc.createTextNode
("\nTextual contents of root
element\n ")) root.appendChild
(doc.createElement ("footer"))
SimpleDomCreate.java doc.createElement(tag) doc
.createTextNode(txt) node.appendChild
(childnode)
24Deleting
// Now delete some children to show how
it is dome // Using for loops can get you
into trouble Node delNode root
while (delNode.hasChildNodes())
delNode.removeChild(delNode.getFirstChild())
System.out.println("After Deleting
children of root DELgt")
myDomPPrint.treeRecursivePrint(doc, "DELgt")
Must use the following construct to delete
children Note that for loops fail to
remove other children NodeList is live
Note that attributes must be deleted separately
25 Browsers
Web Server
Servlet Engine
Internet
Servlet
IE
HTTP
DB Server XML files XSLT files
TCP/IP
Netscape
Clients, HTTP and Servers
26HTML lthtmlgtltbodygt lth1gtMy Malllt/h1gt lttablegtlttrgtlttd
gtProductlt/tdgt
XML lt?xml version1.0gt ltcatalog nameMy
Mallgt ltproductgt
Web Server
http
Web Browser
Servlet Engine
html
XML
XSLT
DataBase
- Servlets encapuslate HTTP
- Servlets read Client Data
- (Forms, request headers)
- Generate Results
- Send Results back to client
- (Headers, Status Results, HTML)
27Servlets - Hello World 1
import java.io. import javax.servlet. import
javax.servlet.http. public class HelloWorld 1
extends HttpServlet public void
doGet(HttpServletRequest request,
HttpServletResponse response) throws
ServletException, IOException PrintWriter
out response.getWriter()
out.println("Hello World 1")
imports
extend HttpServlet
doGet request response
http//localhost8080/servlet/HelloWorld1
28Servlets - Hello World 2
public class HelloWorld2 extends HttpServlet
public void doGet(HttpServletRequest
request, HttpServletResponse response) throws
IOException, ServletException
response.setContentType("text/html") PrintWriter
out response.getWriter() out.println("lthtmlgt"
) out.println("ltheadgt") String title "Hello
World" out.println("lttitlegt" title
"lt/titlegt") out.println("lt/headgt") out.println
("ltbody bgcolorwhitegt") out.println("lth1gtltfont
color\"green\"gt" title "lt/fontgtlt/h1gt") Stri
ng param request.getParameter("param") if
(param ! null)
out.println("Thanks for the lovely param 'ltbgt"
param "lt/bgt'") out.println("lt/bodygt")o
ut.println("lt/htmlgt")
setContentType getWriter
req.getParameter (parmname)
29Web Interactivity via Forms
- FORMS
- Enhanced HTML documents to collect information
- Form inputs
- text, password, radio, checkboxes, textareas,
selection lists, option lists - Submit buttons
- method Post (usual)
- method Get
- Web Server processes the form inputs then
composes the reply - Servlets
- Generate the Form HTML dynamically
- Process the result using the DoPost method
30Issue
- HTTP is stateless
- Each doGet, doPost are single requests
- HTTP does not keep track of clients
- How to keep track of user state ?
- eg shopping cart, accumulating information,
saving re-entry of passwords etc - In CGI had to use cookies or fancy URLs or hidden
fields in form - Servlets have high level functionality to deal
with session information
31HTTPSession
- Session Object
- unique for each client
- Under hood maintained by Java using cookies or
URL - Accessing the session object from HTTPRequest
- HTTPSession session req.getSession(boolean
create) - Testing the session object
- if (session.IsNew())
- if (session Null)
- We can add and get objects from the session
object - session.putValue (obj-name, myobject)
- later
- myobject session.getvalue(obj-name)
32Framework Design Issues
- Three-Tier Architecture A common architecture
for servlet based applications. - the application logic is implemented in a set of
helper classes. - Methods on objects of these classes are invoked
by the service methods in the servlets. - http//www.subrahmanyam.com/articles/servlets/Serv
letIssues.html
33The Presentation Nightmare
- Java Servlets contain embedded HTML statements
with presentation code everywhere - What happens if you want to redesign the look and
feel of the site? - What happens if you want to be browser specific ?
Work with PDAs? - A maintenance nightmare !!!!!
- JSP only helps marginally
34The case for XSLT
- XSLT can be used to separate data, program, logic
and presentation - XML and XSLT can be developed independently of
servlet code - Modularity of presentation improves maintenance
- Can target multiple client devices via different
stylesheets - Weakness
- adds layer of abstraction and extra step, slower
runtime performance - Java and XSLT, Burke, OReilly chap 4, 6
35XSLT Conceptual Model
Servlet (controller)
request
HTML (view)
response
XSLT Processor
XML (Model)
XSLT Stylesheets (view)
36XML - Query and Databases
- SemiStructured Information
- XML Query
- XML Persistence
- XML Data design
- Native XML Databases
- XML enabling and relational databases
- Future directions
37Convergence Of Disparate Data Frameworks
Object Model
Relational Model
Document Model
Semi-Structured Data
XML
XML - allows representation of information from
previously disparate worlds - database centric
(everything is a relation) (everything is an
object), document centric XML alllows convergence
38The Semistructured Data Model
Bib
o1
complex object
paper
paper
book
references
o12
o24
o29
references
references
author
page
author
year
author
title
http
title
title
publisher
author
author
author
o43
25
96
1997
last
firstname
atomic object
firstname
lastname
first
lastname
243
206
Serge
Abiteboul
Victor
122
133
Vianu
Object Exchange Model (OEM)
39X Query
- New full powered query language for XML with
both document-centric and data-centric
capabilites - Expressive power
- Relational joins
- Navigation and hierarchy structure
- Compositionality (node sets)
- Reconstruction of new node sets
- Combining documents
- Filtering, sorting, functions
- see Maier, Database desiderata for query
languages
40Example
Make an alphabetic list of publishers. Within
each publisher, make a list of books, each
containing a title and a price, in descending
order by price. ltpublisher_listgt FOR p IN
distinct(document("bib.xml")//publisher)
RETURN ltpublishergt ltnamegt
p/text() lt/namegt FOR b IN
document("bib.xml")//bookpublisher p
RETURN ltbookgt b/title b/price
lt/bookgt SORTBY(price DESCENDING)
lt/publishergt SORTBY(name)
lt/publisher_listgt
41XQuery FLWR Expressions
- A FLWR expression binds some expressions, applies
a predicate, and constructs a new result. - expr can contain FLWR expressions
- nested building blocks
FOR and LET clauses generate a list of tuples of
bound expressions, preserving document order.
WHERE clause applies a predicate, eliminating
some of the tuples
RETURN clause is executed for each surviving
tuple, generating an ordered list of outputs
42List the titles of books published by Morgan
Kaufmann in 1998. FOR b IN document("bib.xml")//
book WHERE b/publisher "Morgan Kaufmann
AND b/year "1998" RETURN b/title
List each publisher and the average price of its
books. FOR p IN distinct(document("bib.xml")//pu
blisher) LET a avg(document("bib.xml")//book
publisher p/price) RETURN ltpublishergt
ltnamegt p/text() lt/namegt ltavgpricegt
a lt/avgpricegt lt/publishergt
43Constructing Elements
ltbook isbn"isbn-0060229357"gt lttitlegtHarold
and the Purple Crayonlt/titlegt ltauthorgt
ltfirstgtCrockettlt/firstgt
ltlastgtJohnsonlt/lastgt lt/authorgt lt/bookgt
for i in //book RETURN ltexamplegt ltpgt Here is a
query. lt/pgt lteggt i//title lt/eggt ltpgt Here is
the result of the above query. lt/pgt lteggt i //
title lt/eggt ltsurnamegt i / author /
last(text() lt/surnamegt lt/examplegt
ltexamplegt ltpgt Here is a query. lt/pgt lteggt
i//title lt/eggt ltpgt Here is the result of the
above query. lt/pgt lteggtlttitlegtHarold and the
Purple Crayonlt/titlegtlt/eggt ltsurnamegtJohnsonlt/su
rnamegt lt/examplegt
44FOR versus LET
- FOR
- Binds node variables ? iteration
- LET
- Binds collection variables ? one value
Returns ltresultgt ltbookgt...lt/bookgtlt/resultgt
ltresultgt ltbookgt...lt/bookgtlt/resultgt ltresultgt
ltbookgt...lt/bookgtlt/resultgt ...
FOR x IN document("bib.xml")/bib/book RETURN
ltresultgt x lt/resultgt
Returns ltresultgt ltbookgt...lt/bookgt
ltbookgt...lt/bookgt
ltbookgt...lt/bookgt ... lt/resultgt
LET x IN document("bib.xml")/bib/book RETURN
ltresultgt x lt/resultgt
45SQL - Expressive Power
- XQuery uses a for let where .. result
syntax for ? SQL from where ?
SQL where result ? SQL select let
allows temporary variables, and has no
equivalence in SQL - let binds to a set of nodes - groupby has no equivalence yet in XQuery
46SQL Joins
ltresultgt for u in document("users.xml")/
/user_tuple for i in document("items.xml")//i
tem_tuple where u/rating gt "C" and
i/reserve_price gt 1000 and i/offered_by
u/userid return ltwarninggt
u/name u/rating
i/description
i/reserve_price lt/warninggt
lt/resultgt
1.4.4.3 Q3 Find cases where a user with a rating
worse (alphabetically, greater) than "C" is
offering an item with a reserve price of more
than 1000.
47Is XML a Database ?
- XML is a collection of data
- XML is self-describing, portable and has rich
expressiveness to represent any tree or graph
structure - but verbose, needs parsing
- XML provides
- elementary storage (XML documents)
- schemas (DTD.s, XML Schema)
- query languages (XQuery, XPath)
- programming interfaces (SAX, DOM, JDOM)
48Storing XML
- Flat Files
- lightweight data storage
- OK for some applications (configuration files,
small in-house systems) - Relational Systems
- Universal Systems (relational true XML
extensions) - evolving to truer XML model
- Native XML Database
- Other categories
- middleware, XML Servers, XML Appplication Servers
(Zope, Cocoon), Content management (Documentum,
Vignette), caching systems
http//www.rpbourret.com/xml/XMLDatabaseProds.htm
49Pure Relational Databases
- Convert your XML file to relational design
- For each complex element, create a table CE and
primary key - For each element with mixed content, create a
separate table to store PCData, with link back to
parent element using foreign key - For each single element and attribute, create a
column in table CE - Repeat for each complex child element, using
foreign keys to link back to parent
or do proper analysis using ERA !!
50Containment or Pointers
containment
pointers
51Storing XML in Relational Databases - CLOB
- Store as string (Character Large Object)
- E.g. store each top level element as a string
field of a tuple in a database - Use a separate relation for each top-level
element type - E.g. account, customer, depositor
- Indexing
- Store values of subelements/attributes as extra
indexes - Benefits
- Can store any XML data even without DTD
- As long as there are many top-level elements in a
document, strings are small compared to full
document, allowing faster access to individual
elements. - Drawback Need to parse strings to access values
inside the elements parsing is slow.
52Storing XML as Relations
- Tree representation model XML data as general
tree and store using relations - nodes(id, type, label, value)
- child (child-id, parent-id)
- Each element/attribute is given a unique
identifier - Type (element_or_attribute), labeltag,
valuecontent - child notes the parent-child relationships in
the tree - Can add an extra attribute to child to record
ordering of children - Benefit Can store any XML data, even without DTD
- Drawbacks
- Data is broken up into too many pieces,
increasing space overheads - Even simple queries require a large number of
joins, which can be slow -
53The Semistructured Data Model
Bib
o1
complex object
paper
paper
book
references
o12
o24
o29
references
references
author
page
author
year
author
title
http
title
title
publisher
author
author
author
o43
25
96
1997
last
firstname
atomic object
firstname
lastname
first
lastname
243
206
Serge
Abiteboul
Victor
122
133
Vianu
Object Exchange Model (OEM)
54Mismatches between XML/RDBMS
- XML
- Data in single hierarchy
- Nodes have elements and/or attribute values
- Elements are nested
- Elements are ordered
- Schema optional
- Direct storage/retrieval of simple docs
- Query with XML standards
- RDBMS (normalized)
- Data in multiple tables
- Cells have single value
- Atomic cell values
- Row/column order not defined
- Schema required
- Joins necessary to retrieve simple docs
- Query with SQL retrofitted for XML
Michael Champion, Storing XML in Databases
But XML hierarchy does not address issues
of redundant information or different nestings -
the whole rationale for RDBMS !!
55XML Universal Databases
- Future trends
- Relational databases are being XML enabled to
become truer XML repositories - Moving beyond simple XML wrappers around
relations - and/or storing XML as LOB (binary, character)
- Trend towards unification of XML content and
relational data - New XML architectures and features (dynamic XML
Views, automatic mappings) - see for example - Seybold report, Oracle XML DB
Uniting XML Content and Data, 2002 - XTables - Bridging Relational Technologu and XML
- IBM 2002