Title: XML Tools
1XML Tools
2XML Processing
Well-formedness checks Reference expansion
document parser
document validator
application
XML infoset
XML infoset (annotated)
XML document
DTD or XML schema
storage system
3DOM
- The Document Object Model (DOM) is a platform-
and language-neutral interface that allows
programs and scripts to dynamically access and
update the content and structure of XML
documents. The following is part of the DOM
interface - public interface Node
- public String getNodeName ()
- public String getNodeValue ()
- public NodeList getChildNodes ()
- public NamedNodeMap getAttributes ()
-
- public interface Element extends Node
- public Node getElementsByTagName ( String name
) -
- public interface Document extends Node
- public Element getDocumentElement ()
-
- public interface NodeList
- public int getLength ()
- public Node item ( int index )
4DOM Example
- import java.io.File
- import javax.xml.parsers.
- import org.w3c.dom.
- class Test
- public static void main ( String args ) throws
Exception - DocumentBuilderFactory dbf DocumentBuilderFacto
ry.newInstance() - DocumentBuilder db dbf.newDocumentBuilder()
- Document doc db.parse(new File("depts.xml"))
- NodeList nodes doc.getDocumentElement().getChil
dNodes() - for (int i0 iltnodes.getLength() i)
- Node n nodes.item(i)
- NodeList ndl n.getChildNodes()
- for (int k0 kltndl.getLength() k)
- Node m ndl.item(k)
- if ( (m.getNodeName() "dept")
- (m.getFirstChild().getNodeValue() "cse")
) - NodeList ncl ((Element)
m).getElementsByTagName("tel") - for (int j0 jltncl.getLength() j)
5Better Programming
- import java.io.File
- import javax.xml.parsers.
- import org.w3c.dom.
- import java.util.Vector
- class Sequence extends Vector
- Sequence () super()
- Sequence ( String filename ) throws Exception
- super()
- DocumentBuilderFactory dbf
- DocumentBuilderFactory.newInstance()
- DocumentBuilder db dbf.newDocumentBuilder()
- Document doc db.parse(new File(filename))
- add((Object) doc.getDocumentElement())
-
Sequence child ( String tagname )
Sequence result new Sequence() for
(int i 0 iltsize() i) Node n
(Node) elementAt(i) NodeList c
n.getChildNodes() for (int k 0
kltc.getLength() k) if (c.item(k).getNodeName(
).equals(tagname)) result.add((Object)
c.item(k)) return result
void print () for (int i 0
iltsize() i) System.out.println(e
lementAt(i).toString())
class DOM public static void main ( String
args ) throws Exception (new
Sequence("cs.xml")).child("gradstudent").child("na
me").print()
6SAX
- SAX is the Simple API for XML that allows you to
process a document as it's being read - in contrast to DOM, which requires the entire
document to be read before it takes any action) - The SAX API is event based
- The XML parser sends events, such as the start or
the end of an element, to an event handler, which
processes the information
7Parser Events
- Receive notification of the beginning of a
document - void startDocument ()
- Receive notification of the end of a document
- void endDocument ()
- Receive notification of the beginning of an
element - void startElement ( String namespace, String
localName, - String qName, Attributes atts )
- Receive notification of the end of an element
- void endElement ( String namespace, String
localName, - String qName )
- Receive notification of character data
- void characters ( char ch, int start, int
length )
8SAX Example a Printer
- import java.io.FileReader
- import javax.xml.parsers.
- import org.xml.sax.
- import org.xml.sax.helpers.
- class Printer extends DefaultHandler
- public Printer () super()
- public void startDocument ()
- public void endDocument () System.out.println(
) - public void startElement ( String uri, String
name, - String tag, Attributes atts )
- System.out.print(lt tag gt)
-
- public void endElement ( String uri, String
name, String tag ) - System.out.print(lt/ tag gt)
-
- public void characters ( char text, int
start, int length ) - System.out.print(new String(text,start,lengt
h)) -
9The Child Handler
- class Child extends DefaultHandler
- DefaultHandler next // the next handler in
the pipeline - String ptag // the tagname of the child
- boolean keep // are we keeping or skipping
events? - short level // the depth level of the
current element - public Child ( String s, DefaultHandler n )
- super()
- next n ptag s
- keep false level 0
-
- public void startDocument () throws
SAXException - next.startDocument()
-
- public void endDocument () throws
SAXException - next.endDocument()
-
10The Child Handler (cont.)
- public void startElement ( String nm, String
ln, String qn, Attributes a ) throws SAXException
- if (level 1)
- keep ptag.equals(qn)
- if (keep)
- next.startElement(nm,ln,qn,a)
-
- public void endElement ( String nm, String
ln, String qn ) throws SAXException - if (keep)
- next.endElement(nm,ln,qn)
- if (--level 1)
- keep false
-
- public void characters ( char text, int
start, int length ) throws SAXException - if (keep)
- next.characters(text,start,length)
-
11Forming the Pipeline
- class SAX
- public static void main ( String args )
throws Exception - SAXParserFactory pf SAXParserFactory.new
Instance() - SAXParser parser pf.newSAXParser()
- DefaultHandler handler
- new Child("gradstudent",
- new Child("name",
- new Printer()))
- parser.parse(new InputSource(new
FileReader("cs.xml")), - handler)
-
Childname
Printer
SAX parser
Childgradstudent
12Example
- Input Stream
- ltdepartmentgt
- ltdeptnamegt
- Computer Science
- lt/deptnamegt
- ltgradstudentgt
- ltnamegt
- ltlastnamegt
- Smith
- lt/lastnamegt
- ltfirstnamegt
- John
- lt/firstnamegt
- lt/namegt
- lt/gradstudentgt
- ...
- lt/departmentgt
SAX Events SD SE department SE deptname C
Computer Science EE deptname SE gradstudent SE
name SE lastname C Smith EE lastname SE
firstname C John EE firstname EE name EE
gradstudent ... EE department ED
Child gradstudent
Child name
Printer
13XSL Transformation
- A stylesheet specification language for
converting XML documents into various forms (XML,
HTML, plain text, etc). - Can transform each XML element into another
element, add new elements into the output file,
or remove elements. - Can rearrange and sort elements, test and make
decisions about which elements to display, and
much more. - Based on XPath
- ltxslstylesheet version1.0
- xmlnsxslhttp//www.w3.org/1999/XSL/Transform
gt - ltstudentsgt
- ltxslcopy-of select//student/name/gt
- lt/studentsgt
- lt/xslstylesheetgt
14XSLT Templates
- XSL uses XPath to define parts of the source
document that match one or more predefined
templates. - When a match is found, XSLT will transform the
matching part of the source document into the
result document. - The parts of the source document that do not
match a template will end up unmodified in the
result document (they will use the default
templates). - Form
- ltxsltemplate matchXPath expressiongt
-
- lt/xsltemplategt
- The default (implicit) templates visit all nodes
and strip out all tags - ltxsltemplate match/gt
- ltxslapply-templates/gt
- lt/xsltemplategt
- ltxsltemplate matchtext()_at_"gt
- ltxslvalue-of select./gt
- lt/xsltemplategt
15Other XSLT Elements
- ltxslvalue-of selectXPath expression/gt
- select the value of an XML element and add it to
the output stream of the transformation, e.g.
ltxslvalue-of select"//books/book/author"/gt. - ltxslcopy-of selectXPath expression/gt
- copy the entire XML element to the output stream
of the transformation. - ltxslapply-templates matchXPath expression/gt
- apply the template rules to the elements that
match the XPath expression. - ltxslelement nameXPath expressiongt
lt/xslelementgt - add an element to the output with a tag-name
derived from the XPath. - Example
- ltxslstylesheet version 1.0
- xmlnsxslhttp//www.w3.org/1999/XSL/Tra
nsformgt - ltxsltemplate match"employee"gt
- ltbgt ltxslapply-templates select"node()"/gt
lt/bgt - lt/xsltemplategt
- ltxsltemplate match"surname"gt
- ltigt ltxslvalue-of select"."/gt lt/igt
- lt/xsltemplategt
- lt/xslstylesheetgt
16Copy the Entire Document
- ltxslstylesheet version 1.0
- xmlnsxslhttp//www.w3.org/1999/XSL/Transfo
rmgt - ltxsltemplate match/"gt
- ltxslapply-templates/gt
- lt/xsltemplategt
- ltxsltemplate matchtext()"gt
- ltxslvalue-of select./gt
- lt/xsltemplategt
- ltxsltemplate match"gt
- ltxslelement namename(.)gt
- ltxslapply-templates/gt
- lt/xslelementgt
- lt/xsltemplategt
- lt/xslstylesheetgt
17More on XSLT
- Conflict resolution more specific templates
overwrite more general templates. Templates are
assigned default priorities, but they can be
overwritten using priorityn in a template. - Modes can be used to group together templates. No
mode is an empty mode. - ltxsltemplate match modeAgt
- ltxslapply-templates modeB/gt
- lt/xsltemplategt
- Conditional and loop statements
- ltxslif testXPath predicategt body lt/xslifgt
- ltxslfor-each selectXPathgt body
lt/xslfor-eachgt - Variables can be used to name data
- ltxslvariable namexgt value lt/xslvariablegt
- Variables are used as x in XPaths.
18Using XSLT
- import javax.xml.parsers.
- import org.xml.sax.
- import org.w3c.dom.
- import javax.xml.transform.
- import javax.xml. . transform.dom.
- import javax.xml.transformstream.
- import java.io.
- class XSLT
- public static void main ( String argv )
throws Exception - File stylesheet new File("x.xsl")
- File xmlfile new File("a.xml")
- DocumentBuilderFactory dbf DocumentBuilderFacto
ry.newInstance() - DocumentBuilder db dbf.newDocumentBuilder()
- Document document db.parse(xmlfile)
- StreamSource stylesource new
StreamSource(stylesheet) - TransformerFactory tf TransformerFactory.newIns
tance() - Transformer transformer tf.newTransformer(style
source) - DOMSource source new DOMSource(document)