Ace104 Lecture 6 - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Ace104 Lecture 6

Description:

Defines an API for automagically representing XML schema as collections of Java classes. ... DOM had to be backwards compatible with the hackish, poorly thought out, ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 69
Provided by: peopleCs
Category:

less

Transcript and Presenter's Notes

Title: Ace104 Lecture 6


1
Ace104Lecture 6
  • Parsing XML into programming languages

2
Parsing XML
  • Goal read XML files into data structures in
    programming languages
  • Possible strategies
  • Parse by hand with some reusable libraries
  • Parse into generic tree structure
  • Parse as sequence of events
  • Automagically parse to language-specific objects

3
Parsing by-hand
  • Advantages
  • Complete control
  • Good if simple needs build off of regex package
  • Disadvantages
  • Must write the initial code yourself, even if it
    becomes generalized
  • Pretty tedious and error prone.
  • Gets very hard when using schema or DTD to
    validate
  • No one does this anymore

4
Parsing into generic tree structure
  • Advantages
  • Industry-wide, language neutral W3C standard
    exists called DOM (Document Object Model)
  • Learning DOM for one language makes it easy to
    learn for any other
  • As of JAXP 1.2, support for Schema
  • Have to write much less code to get XML to
    something you want to manipulate in your program
  • Disadvantages
  • Non-intuitive API, doesnt take full advantage of
    Java
  • Still quite a bit of work

5
What is JAXP?
  • JAXP Java API for XML Processing
  • In the Java language, the definition of these
    standard APIs (together with XSLT API) comprise
    a set of interfaces known as JAXP
  • Java also provides standard implementations
    together with vendor pluggability layer
  • Some of these come standard with J2SDK, others
    are only availdable with Web Services Developers
    Pack
  • We will study these shortly

6
Another alternative
  • JDOM Native Java published API for representing
    XML as tree
  • Like DOM but much more Java-specific, object
    oriented
  • However, not supported by other languages
  • Also, no support for schema
  • Dom4j another alternative

7
JAXB
  • JAXB Java API for XML Bindings
  • Defines an API for automagically representing XML
    schema as collections of Java classes.
  • Most convenient for application programming
  • Will cover next class

8
DOM
9
About DOM
  • Stands for Document Object Model
  • A World Wide Web Consortium (w3c) standard
  • Standard constantly adding new features Level 3
    Core released late 05
  • Well cover most of the basics. Theres always
    more, and its always changing.

10
DOM abstraction layer in Java -- architecture
Emphasis is on allowing vendors to supply their
own DOM Implementation without requiring change
to source code
Returns specific parser implementation
org.w3d.dom.Document
11
Sample Code
A factory instance is the parser
implementation. Can be changed with runtime
System property. Jdk has default. Xerces much
better.
DocumentBuilderFactor factory
DocumentBuilderFactory.newInstance() / set
some factory options here / DocumentBuilder
builder factory.newDocumentBuilde
r() Document doc builder.parse(xmlFile)
From the factory one obtains an instance of the
parser
xmlFile can be an java.io.File, an inputstream,
etc.
javax.xml.parsers.DocumentBuilderFactory javax.xml
.parsers.DocumentBuilder org.w3c.dom.Document
For reference. Notice that the Document class
comes from the w3c-specified bindings.
12
Validation
  • Note that by default the parser will not validate
    against a schema or DTD
  • As of JAXP1.2, java provides a default parser
    than can handle most schema features
  • See next slide for details on how to setup

13
Important Schema validation
String JAXP_SCHEMA_LANGUAGE    
 "http//java.sun.com/xml/jaxp/properties/schemaLa
nguage" String W3C_XML_SCHEMA     
"http//www.w3.org/2001/XMLSchema" Next, you
need to configure DocumentBuilderFactory to
generate a namespace-aware, validating parser
that uses XML Schema DocumentBuilderFactory
factory     DocumentBuilderFactory.newInstance()
 factory.setNamespaceAware(true)
   factory.setValidating(true) try  
 factory.setAttribute(JAXP_SCHEMA_LANGUAGE,
W3C_XML_SCHEMA) catch (IllegalArgumentExcepti
on x)    // Happens if the parser does not
support JAXP 1.2   ...
14
Associating document with schema
  • An xml file can be associated with a schema in
    two ways
  • Directly in xml file in regular way
  • Programmatically from java
  • Latter is done as
  • factory.setAttribute(JAXP_SCHEMA_SOURCE,    new
    File(schemaSource))

15
A few notes
  • Factory allows ease of switching parser
    implementations
  • Java provides simple DOM implementation, but much
    better to use vendor-supplied when doing serious
    work
  • Xerces, part of apache project, is installed on
    cluster as Eclipse plugin. Well use next week.
  • Note that some properties are not supported by
    all parser implementations.

16
Document object
  • Once a Document object is obtained, rich API to
    manipulate.
  • First call is usually
  • Element root doc.getDocumentElement()
  • This gets the root element of the Document as an
    instance of the Element class
  • Note that Element subclasses Node and has methods
    getType(), getName(), and getValue(), and
    getChildNodes()

17
Types of Nodes
  • Note that there are many types of Nodes (ie
    subclasses of Node)
  • Attr, CDATASection, Comment, Document,
    DocumentFragment, DocumentType, Element, Entity,
    EntityReference, Notation, ProcessingInstruction,
    Text
  • Each of these has a special and non-obvious
    associated type, value, and name.
  • Standards are language-neutral and are specified
    on chart on following slide
  • Important keep this chart nearby when using DOM

18
(No Transcript)
19
DOM Exercise Write a function to do a depth
search printout of the node information of a
given XML file as recursePrint(root) Assume
you have access to the following printNodeInfo(No
de node)prints the name, type, and value of the
input node. boolean Node.hasChildNodes() to
check if a node has any children NodeList
Node.getChildNodes() to get a list of all
children nodes Node NodeList.item(int num) to
select the numth child node public static void
recursePrint(Node node)
20
DOM Exercise Answer Write a function to do a
depth search printout of the node information of
a given XML file as recursePrint(root) Assume
you have access to the following printNodeInfo(N
ode node)prints the name, type, and value of the
input node. boolean Node.hasChildNodes() to
check if a node has any children NodeList
Node.getChildNodes() to get a list of all
children nodes Node NodeList.item(int num) to
select the numth child node public static void
recursePrint(Node node)
printNodeInfo(node) if
(!node.hasChildNodes()) return NodeList
nodes node.getChildNodes() for (int i
0 i lt nodes.getLength() i)
node nodes.item(i)
recursePrint(depth, node)
21
Transforming XML
22
The JAXP Transformation Packages
  • JAXP Transformation APIs
  • javax.xml.transform
  • This package defines the factory class you use to
    get a Transformer object. You then configure the
    transformer with input (Source) and output
    (Result) objects, and invoke its transform()
    method to make the transformation happen. The
    source and result objects are created using
    classes from one of the other three packages.
  • javax.xml.transform.dom
  • Defines the DOMSource and DOMResult classes that
    let you use a DOM as an input to or output from a
    transformation.
  • javax.xml.transform.sax
  • Defines the SAXSource and SAXResult classes that
    let you use a SAX event generator as input to a
    transformation, or deliver SAX events as output
    to a SAX event processor.
  • javax.xml.transform.stream
  • Defines the StreamSource and StreamResult classes
    that let you use an I/O stream as an input to or
    output from a transformation.

23
Transformer Architecture
24
Writing DOM to XML
public class WriteDOM public static void
main(String argv) throws Exception
File f new File(argv0)
DocumentBuilderFactory factory
DocumentBuilderFactory.newInstance()
DocumentBuilder builder factory.newDocumentBuild
er() Document document
builder.parse(f) TransformerFactory
tFactory TransformerFactory.newInsta
nce() Transformer transformer
tFactory.newTransformer() DOMSource
source new DOMSource(document)
StreamResult result new StreamResult(System.out)
transformer.transform(source, result)

25
Creating a DOM from scratch
  • Sometimes you may want to create a DOM tree
    directly in memory. This is done with
  • DocumentBuilderFactory factory
     DocumentBuilderFactory.newInstance()         
  • DocumentBuilder builder         factory.newDocum
    entBuilder()       
  •  document builder.newDocument()

26
Manipulating Nodes
  • Once the root node is obtained, typical tree
    methods exist to manipulate other elements
  • boolean node.hasChildNodes()
  • NodeList node.getChildNodes()
  • Node node.getNextSibling()
  • Node node.getParentNode()
  • String node.getValue()
  • String node.getName()
  • String node.getText()
  • void setNodeValue(String nodeValue)
  • Node insertBefore(Node new, Node ref)

27
JDOM
28
JDOM Motivation(from Elliot Harold)
  • Unfortunately DOM suffers from a number of design
    flaws and limitations that make it less than
    ideal as a Java API for processing XML
  • DOM had to be backwards compatible with the
    hackish, poorly thought out, unplanned object
    models used in third generation web browsers.
  • DOM was designed by a committee trying to
    reconcile differences between the object models
    implemented by Netscape, Microsoft, and other
    vendors. They needed a solution that was at least
    minimally acceptable to everybody, which resulted
    in an API that?s maximally acceptable to no one.
  • DOM is a cross-language API defined in IDL, and
    thus limited to those features and classes that
    are available in essentially all programming
    languages, including not fully-object oriented
    scripting languages like JavaScript and Visual
    Basic. It is a lowest common denominator API. It
    does not take full advantage of Java, nor does it
    adhere to Java best practices, naming
    conventions, and coding standards.
  • DOM must work for both HTML (not just XHTML, but
    traditional malformed HTML) and XML.

29
Some sample JDOM
ltfibonacci/gt In JDOM Element element new
Element("fibonacci") In DOM DocumentBuilderFac
tory factory DocumentBuilderFactory.newInstance
() DocumentBuilder builder factory.newDocumentB
uilder() DOMImplementation impl
builder.getDOMImplementation() Document doc
impl.createDocument( null, "Fibonacci_Numbers",
null) In JDOM Element element
doc.createElement("fibonacci") Element element
new Element("fibonacci") element.setText("8")
element.setAttribute("index", "6") Extremely
simple and intuitive!
30
More JDOM
  • To create this element
  • ltsequencegt
  • ltnumbergt3lt/numbergt
  • ltnumbergt5lt/numbergt
  • lt/sequencegt
  • Element element new Element("sequence")
  • Element firstNumber new Element("number")
  • Element secondNumber new Element("number")
  • firstNumber.setText("3")
  • secondNumber.setText("5")
  • element.addContent(firstNumber)
  • element.addContent(secondNumber)

31
import org.jdom. import org.jdom.input.SAXBuilde
r Parsing XML file with JDOM import
java.io.IOException import java.util. public
class ElementLister public static void
main(String args) if (args.length 0)
System.out.println("Usage java
ElementLister URL") return
SAXBuilder builder new SAXBuilder()
try Document doc
builder.build(args0) Element root
doc.getRootElement()
listChildren(root, 0) // indicates a
well-formedness error catch
(JDOMException e)
System.out.println(args0 " is not
well-formed.") System.out.println(e.
getMessage()) catch (IOException
e) System.out.println(e)
public static void listChildren(Element
current, int depth)
printSpaces(depth) System.out.println(cu
rrent.getName()) List children
current.getChildren() Iterator iterator
children.iterator() while
(iterator.hasNext()) Element child
(Element) iterator.next()
listChildren(child, depth1)
private static void printSpaces(int n)
for (int i 0 i lt n i)
System.out.print(' ')
32
SAX
  • Simple API for XML Processing

33
About SAX
  • SAX in Java is hosted on source forge
  • SAX is not a w3c standard
  • Originated purely in Java
  • Other languages have chosen to implement in their
    own ways based on this prototype

34
SAX vs.
  • Please dont compared unrelated things
  • SAX is an alternative to DOM, but realize that
    DOM is often built on top of SAX
  • SAX and DOM do not compete with JAXP
  • They do both compete with JAXB implementations

35
How a SAX parser works
  • SAX parser scans an xml stream on the fly and
    responds to certain parsing events as it
    encounters them.
  • This is very different than digesting an entire
    XML document into memory.
  • Much faster, requires less memory.
  • However, need to reparse if you need to revisit
    data.

36
Obtaining a SAX parser
  • Important classes
  • javax.xml.parsers.SAXParserFactory
  • javax.xml.parsers.SAXParser
  • javax.xml.parsers.ParserConfigurationException
  • //get the parser
  • SAXParserFactory factory
    SAXParserFactory.newInstance()
  • SAXParser saxParser factory.newSAXParser
    ()
  • //parse the document
  • saxParser.parse( new File(argv0),
    handler)

37
DefaultHandler
  • Note that an event handler has to be passed to
    the SAX parser.
  • This must implement the interface
  • org.xml.sax.ContentHandler
  • Easier to extend the adapter
  • org.xml.sax.helpers.DefaultHandler

38
Overriding Handler methods
  • Most important methods to override
  • void startDocument()
  • Called once when document parsing begins
  • void endDocument()
  • Called once when parsing ends
  • void startElement(...)
  • Called each time an element begin tag is
    encountered
  • void endElement(...)
  • Called each time an element end tag is
    encountered
  • void characters(...)
  • Called randomly between startElement and
    endElement calls to accumulated character data

39
startElement
  • public void startElement(
  • String namespaceURI, //if namespace
    assoc
  • String sName,
    //nonqualified name
  • String qName,
    //qualified name
  • Attributes attrs) //list
    of attributes
  • Attribute info is obtained by querying Attributes
    objects.

40
Characters
  • public void characters(
  • char buf, //buffer of
    chars accumulated
  • int offset, //begin
    element of chars
  • int len) //number of
    chars
  • Note, characters may be called more than once
    between begin tag / end tag
  • Also, mixed-content elements require careful
    handling

41
Entity references
  • Recall that entity references are special
    character sequences for referring to characters
    that have special meaning in XML syntax
  • lt is lt
  • gt is gt
  • In SAX these are automatically converted and
    passed to the characters stream unless they are
    part of a CDATA section

42
Choosing a Parser
  • Choosing your Parser Implementation
  • If no other factory class is specified, the
    default SAXParserFactory class is used. To use a
    different manufacturer's parser, you can change
    the value of the environment variable that points
    to it. You can do that from the command line,
    like this
  • java -Djavax.xml.parsers.SAXParserFactoryyourFact
    oryHere ...
  • The factory name you specify must be a fully
    qualified class name (all package prefixes
    included). For more information, see the
    documentation in the newInstance() method of the
    SAXParserFactory class.

43
Validating SAX Parsers
String JAXP_SCHEMA_LANGUAGE    
 "http//java.sun.com/xml/jaxp/properties/schemaLa
nguage" String W3C_XML_SCHEMA     
"http//www.w3.org/2001/XMLSchema" Next, you
need to configure DocumentBuilderFactory to
generate a namespace-aware, validating parser
that uses XML Schema SaxParserFactory
factory     SaxParserFactory.newInstance()
 factory.setNamespaceAware(true)
   factory.setValidating(true) try  
 factory.setAttribute(JAXP_SCHEMA_LANGUAGE,
W3C_XML_SCHEMA) catch (IllegalArgumentExcepti
on x)    // Happens if the parser does not
support JAXP 1.2   ...
44
Transforming arbitrary data structures using SAX
and Transformer
45
Goal
  • Now that we know SAX and a little about
    Transformations, there are some cool things we
    can do.
  • One immediate thing is to create xml files from
    plain text files using the help of a faux SAX
    parser
  • Turns out to be more robust than doing by hand

46
Transformers
  • Recall that transformers easily let us go between
    any source and result by arbitrary wirings of
  • StreamSource / StreamResult
  • SAXSource / SAXResult
  • DOMSource / DOMResult
  • We used this to write a DOM tree to an XML file
  • Now we will use a SAXSource together with a
    StreamResult to convert our text file

47
Strategy
  • We construct our own SAXParser ie a class that
    implements the XMLReader interface
  • This class must have a parse method (among
    others)
  • We use parse to read our input file and fire the
    appropriate SAX events, rather than handcoding
    the Strings ourselves.

48
Main snippet
public static void main (String argv )
StudentReader parser new StudentReader()
TransformerFactory tFactory
TransformerFactory.newInstance()
Transformer transformer tFactory.newTransformer(
) FileReader fr new FileReader(student
s.txt) BufferedReader br new
BufferedReader(fr) InputSource
inputSource new InputSource(fr)
SAXSource source new SAXSource(saxReader,
inputSource) StreamResult result new
StreamResult(System.out)
transformer.transform(source, result)
Create SAX parser
create transformer
Use text File as Transformer source
Use text as result
49
XMLReader implementation
  • To have a valid SAXSource we need a class that
    implements
  • XMLReader interface
  • public void parse(InputSource input)
  • public void setContentHandler(ContentHandler
    handler)
  • public ContentHandler getContentHandler()
  • .
  • .
  • .
  • Shown are the important methods for a simple app

50
See Course Examples for details
51
JAXB
  • Java Architecture for XML Bindings

52
What is JAXB?
  • JAXB defines the behavior of a standard set of
    tools and interfaces that automatically generate
    java class files from XML schema
  • JAXB is a framework or architecture, not an
    implementation.
  • Sun provides a reference implementation of JAXB
    with the Web Services Developers kit, available
    as a separate download http//java.sun.com/webserv
    ices/downloads/webservicespack.html

53
JAXB vs. DOM and SAX
  • JAXB is a higher level construct than DOM or SAX
  • DOM represents XML documents as generic trees
  • SAX represents XML documents as generic event
    streams
  • JAXB represents XML documents as Java classes
    with properties that are specific to the
    particular XML document
  • E.g. book.xml becomes Book.java with getTitle,
    setTitle, etc.
  • JAXB thus requires almost no knowledge of XML to
    be able to programmatically process XML documents!

54
High-level comparison
  • Before diving into details of JAXB, its good to
    see a birds-eye-view of the difference between
    JAXB and SAX and/or DOM-like parsers
  • Study the books/ examples under the examples/jaxb
    directory on the course website

55
JAXB steps
  • We start by assuming that you have a valid
    installation of java web services developers pack
    version 3. We cover these installation details
    later
  • Using JAXB then requires several steps
  • Run the binding compiler on the schema file to
    automagically produce the appropriate java class
    files
  • Compile the java class files (ant tool helps
    here)
  • Study the autogenerated api to learn what java
    types have been created
  • Create a program that unmarshals an xml document
    into these elementary data structures

56
Running binding compiler
  • ltinstall_dirgt/jaxb/bin/xjc.sh -p test.jaxb
    books.xsd -d work
  • xjc.sh executes binding compiler
  • -p test.jaxb place resulting class files in
    package test.jaxb
  • books.xsd run compiler on schema books.xsd
  • -d work place resulting files in directory
    called work/
  • Note that this creates a huge number of files
    that together represent the content of the
    books.xsd schema as a set of Java classes
  • It is not necessary to know all of these classes.
    Well study them only at a high level so we can
    understand how to use them

57
Example students.xsd
58
Generated interfaces
  • xjc.sh -p test.lottery students.xsd
  • This generates the following interfaces
  • test/lottery/ObjectFactory.java
  • Contains methods for generating instances of the
    interfaces
  • test/lottery/Students.java
  • Represents the root node ltstudentsgt
  • test/lottery/StudentsType.java
  • Represents the unnamed type of each student object

59
Generated implementations
  • Each interface is implemented in the impl
    directory
  • test/lottery/impl/StudentsImpl.java
  • Vendor-specific implementation of the Students
    inteface
  • test/lottery/impl/StudentsTypeImpl.java
  • Vendor-specific implementation of the
    StudentsType Interface

60
Compilation
  • Next, the generated classes must be compiled
  • javac students/.java students/impl/.java
  • CLASSPATH requires many jar files
  • jaxb/lib/.jar
  • jwsdp-shared/lib/.jar
  • jaxp/lib//.jar
  • Note an ant buildfile (like a java makefile)
    makes this much easier. More on this later

61
Generated docs
  • Java API docs for these classes are generated in
  • students/docs/api/.html
  • After bindings are generated, one usually works
    directly through these API docs to learn how to
    access/manipulate the XML data.

62
Sample Programs
63
Sample Programs
  • Easiest way to learn is to cover certain generic
    sample cases. These are all on the course website
    under ace104/lesson6/examples
  • Summary of examples
  • student/
  • Use JAXB to read an xml document composed of a
    single student complex type
  • student/
  • Same, but for an xml document composed of a
    sequence of such student types of indefinite
    length
  • purchaseOrder/
  • Another read example, but for a more complex
    schema

64
Sample programs, cont
  • Course examples, cont
  • create-marshal
  • Purchase-order example modified to create in
    memory and write to XML
  • modify-marshal
  • Purchase-order example modified to read XML,
    change it and write back to XML
  • Study these examples!

65
Some additional JAXB details
66
Binding Data Types
  • Default java datatype bindings can be found at
  • http//java.sun.com/webservices/docs/1.3/tutorial/
    doc/JAXBWorks5.html
  • These defaults can be changed if required for an
    application
  • Also, name binding are fairly standard changes of
    names to things acceptable in java programming
    language
  • See other binding rules on subsequent pages

67
Default binding rules summary
  • The JAXB binding model follows the default
    binding rules summarized below
  • Bind the following to Java package
  • XML Namespace URI
  • Bind the following XML Schema components to Java
    content interface
  • Named complex type
  • Anonymous inlined type definition of an element
    declaration
  • Bind to typesafe enum class
  • A named simple type definition with a basetype
    that derives from "xsdNCName" and has
    enumeration facets.
  • Bind the following XML Schema components to a
    Java Element interface
  • A global element declaration to a Element
    interface.
  • Local element declaration that can be inserted
    into a general content list.
  • Bind to Java property
  • Attribute use
  • Particle with a term that is an element reference
    or local element declaration.

68
End
Write a Comment
User Comments (0)
About PowerShow.com