Title: Processing XML Part II
1Processing XML Part II
- Parser Operations with DOM and SAX overview
-
- XML Validation with examples
- Processing XML with SAX (locally and on the
internet)
2FixedFloatSwap.xml
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt ltFixed
FloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt ltNumYearsgt3lt/NumYea
rsgt ltNumPaymentsgt6lt/NumPaymentsgt lt/FixedFloatS
wapgt
3FixedFloatSwap.dtd
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt
4Operation of a Tree-based Parser
XML DTD
Document Tree
Tree-Based Parser
Application Logic
Valid
XML Document
5Tree Benefits
- Some data preparation tasks require early
- access to data that is further along in the
- document (e.g. we wish to extract titles to
build a table of contents) - New tree construction is easier (e.g. xslt works
from a tree to convert FpML to WML)
6Operation of an Event Based Parser
XML DTD
Event-Based Parser
Application Logic
Valid
XML Document
7Operation of an Event Based Parser
XML DTD
public void startDocument () public void
endDocument () public void startElement (String
name, AttributeList attrs) public void endElement
(String name) public void characters (char buf
, int offset, int len)
Event-Based Parser
Application Logic
Valid
public void error(SAXParseException e) throws
SAXException System.out.println("\n
\n--Invalid document ---" e)
XML Document
8Event-Driven Benefits
- We do not need the memory required for trees
- Parsing can be done faster with no tree
construction going on
9XML Validation
A batch validating process involves comparing the
DTD against a complete document instance and
producing a report containing any errors or
warnings. Software developers should consider
batch validation to be analogous to program
compilation, with similar errors
detected. Interactive validation involves
constant comparison of the DTD against a document
as it is being created.
10XML Validation
- The benefits of validating documents against a
DTD include - Programmers can write extraction and
manipulation filters - without fear of their software ever processing
unexpected - input.
- Using an XML-aware word processor, authors and
editors can - be guided and constrained to produce conforming
documents.
11XML Validation Examples
XML elements may contain further, embedded
elements, and the entire document must be
enclosed by a single document element. The
degree to which an elements content is organized
into child elements is often termed its
granularity. Some hierarchical structures may be
recursive. The Document Type Definition (DTD)
contains rules for each element allowed within a
specific class of documents.
12Well run this program against several xml
files with DTDs.
// Validate.java import java.io. import
org.xml.sax. import javax.xml.parsers.SAXParserF
actory import javax.xml.parsers.ParserConfigurati
onException import javax.xml.parsers.SAXParser
public class Validate extends HandlerBase
public static boolean valid true public
static void main (String argv )
if (argv.length ! 1)
System.err.println ("Usage java Validate
filename.xml") System.exit (1)
SAXParserFactory factory
SAXParserFactory.newInstance()
factory.setValidating(true)
13try SAXParser saxParser
factory.newSAXParser() saxParser.parse(
new File(argv 0), new Validate())
catch (Throwable t)
t.printStackTrace ()
System.out.println("Valid document is " valid)
System.exit (0) public void
error(SAXParseException e) throws SAXException
System.out.println(e.toString())
valid false
14lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt ltFixed
FloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt ltNumYearsgt3lt/NumYea
rsgt ltNumPaymentsgt6lt/NumPaymentsgt lt/FixedFloatS
wapgt
XML Document
DTD
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt
Valid document is true
15lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt ltFixed
FloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt ltNumYearsgt3lt/NumYea
rsgt ltNumPaymentsgt6lt/NumPaymentsgt lt/FixedFloatS
wapgt
XML Document
DTD
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumPayments (PCDATA) gt
Valid document is false
16lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
Swaps SYSTEM "FixedFloatSwap.dtd"gt ltSwapsgt
ltFixedFloatSwapgt ltNotionalgt100lt/Notio
nalgt ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
ltFixedFloatSwapgt ltNotionalgt100lt/Notion
algt ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt lt/Swapsgt
XML Document
17lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
Swaps (FixedFloatSwap) gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt
DTD
C\McCarthy\www\46-928\examples\saxgtjava Validate
FixedFloatSwap.xml
Valid document is true
Quantity Indicators ? 0 or 1 time
1 or more times 0 or more
times
18The locations where document text data is allowed
are indicated by the keyword PCDATA (Parsed
Character Data).
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotionalgt100lt/Notio
nalgt ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt ltStartYeargt2000lt/StartYeargt
ltEndYeargt2002lt/EndYeargt
lt/NumYearsgt ltNumPaymentsgt6lt/NumPaymentsgt
lt/FixedFloatSwapgt
XML Document
19DTD
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt
Output of program after being modified to
display the error.
C\McCarthy\www\46-928\examples\saxgtjava Validate
FixedFloatSwap.xml org.xml.sax.SAXParseException
Element "NumYears" does not allow "StartYear"
-- (PCDATA) org.xml.sax.SAXParseException
Element type "StartYear" is not
declared. org.xml.sax.SAXParseException Element
"NumYears" does not allow "EndYear" --
( PCDATA) org.xml.sax.SAXParseException Element
type "EndYear" is not declared. Valid document is
false
20There are strict rules which must be applied when
an element is allowed to contain both text and
child elements. The PCDATA keyword must be the
first token in the group, and the group must be a
choice group (using not ,). The group must
be optional and repeatable. This is known as a
mixed content model.
21lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
Mixed (emph) gt lt!ELEMENT emph (PCDATA sub
super) gt lt!ELEMENT sub (PCDATA)gt lt!ELEMENT
super (PCDATA)gt
DTD
XML Document
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
Mixed SYSTEM "Mixed.dtd"gt ltMixedgt
ltemphgtHltsubgt2lt/subgtO is water.lt/emphgt lt/Mixedgt
Valid document is true
22Attributes
An attribute is associated with a particular
element by the DTD and is assigned an attribute
type. The attribute type can restrict the range
of values it can hold. Example attribute types
include CDATA indicates a simple
string of characters NMTOKEN indicates a
word or token A named token group such as
(left center right)
23lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt lt!ATTLIST Notional currency (Dollars
Pounds) REQUIREDgt
DTD
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
XML Document
C\McCarthy\www\46-928\examples\saxgtjava Validate
FixedFloatSwap.xml org.xml.sax.SAXParseException
Attribute value for "currency" is
REQUIRED. Valid document is false
24lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt lt!ATTLIST Notional currency (Dollars
Pounds) REQUIREDgt
DTD
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotional currency
Poundsgt100lt/Notionalgt ltFixed_Rategt5lt/Fixe
d_Rategt ltNumYearsgt3lt/NumYearsgt
ltNumPaymentsgt6lt/NumPaymentsgt lt/FixedFloatSwapgt
XML Document
Valid document is true
25lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt lt!ATTLIST Notional currency (Dollars
Pounds) REQUIREDgt lt!ATTLIST FixedFloatSwap
note CDATA IMPLIEDgt
DTD
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotional currency
Poundsgt100lt/Notionalgt ltFixed_Rategt5lt/Fixe
d_Rategt ltNumYearsgt3lt/NumYearsgt
ltNumPaymentsgt6lt/NumPaymentsgt lt/FixedFloatSwapgt
XML Document
Valid document is true IMPLIED means optional
26lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt lt!ATTLIST Notional currency (Dollars
Pounds) REQUIREDgt lt!ATTLIST FixedFloatSwap
note CDATA IMPLIEDgt
DTD
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwap note For your eyes onlygt
ltNotional currency Poundsgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
XML Document
Valid document is true
27lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"
lt!ENTITY bankname "Mellon National Bank and
Trust" gt gt ltFixedFloatSwapgt
ltBankgtbanknamelt/Bankgt ltNotionalgt100lt/Noti
onalgt ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
Document using a General Entity
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Bank,Notional, Fixed_Rate,
NumYears, NumPayments )
gt lt!ELEMENT Bank (PCDATA) gt lt!ELEMENT Notional
(PCDATA) gt lt!ELEMENT Fixed_Rate (PCDATA)
gt lt!ELEMENT NumYears (PCDATA) gt lt!ELEMENT
NumPayments (PCDATA) gt
DTD
Validate is true
28ltxslstylesheet xmlnsxsl"http//www.w3.org/1999/
XSL/Transform" version"1.0"gt
ltxsltemplate match "Bank"gt ltWMLgt
ltCARDgt ltxslapply-templates/gt
lt/CARDgt lt/WMLgt lt/xsltemplategt
ltxsltemplate match "Notional Fixed_Rate
NumYears NumPayments"gt lt/xsltemplategt
lt/xslstylesheetgt
XSLT Program
29C\McCarthy\www\46-928\examples\saxgtjava
-Dcom.jclark.xsl.sax.parsercom.jclark. xml.sax.Co
mmentDriver com.jclark.xsl.sax.Driver
FixedFloatSwap.xml FixedFloatSwa p.xsl
FixedFloatSwap.wml C\McCarthy\www\46-928\example
s\saxgttype FixedFloatSwap.wml lt?xml
version"1.0" encoding"utf-8"?gt
ltWMLgtltCARDgtMellon National Bank and
Trustlt/CARDgtlt/WMLgt
XSLT OUTPUT
30lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"
lt!ENTITY bankname SYSTEM "JustAFile.dat" gt
gt ltFixedFloatSwapgt ltBankgtbanknamelt/B
ankgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
An external text entity
31JustAFile.dat
Mellon Bank And Trust Corporation When you need a
friend!
XSLT Output
lt?xml version"1.0" encoding"utf-8"?gt
ltWMLgtltCARDgtMellon Bank And Trust Corporation When
you need a friend!lt/CARDgtlt/WMLgt
32lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
XML Document
Internal Parameter Entities
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ENTITY parsedCharacterData
"(PCDATA)"gt lt!ELEMENT Notional
parsedCharacterData gt lt!ELEMENT Fixed_Rate
(PCDATA) gt lt!ELEMENT NumYears (PCDATA)
gt lt!ELEMENT NumPayments (PCDATA) gt
DTD
33lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltBankgt bankname
lt/Bankgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
XML Document
General Entity defined in the DTD
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Bank, Notional, Fixed_Rate,
NumYears,
NumPayments ) gt lt!ENTITY bankname
"Mellon National Bank and Trust Corporation"
gt lt!ELEMENT Bank (PCDATA)gt lt!ELEMENT Notional
(PCDATA)gt lt!ELEMENT Fixed_Rate (PCDATA)
gt lt!ELEMENT NumYears (PCDATA) gt lt!ELEMENT
NumPayments (PCDATA) gt
DTD
34lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt ltNotegt
lt!CDATAThis is text that ltbgtwill not be
parsed for markupgt
lt/Notegt lt/FixedFloatSwapgt
XML Document
CDATA Section
DTD
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap ( Notional, Fixed_Rate, NumYears,
NumPayments, Note ) gt lt!ELEMENT Notional
(PCDATA)gt lt!ELEMENT Fixed_Rate (PCDATA)
gt lt!ELEMENT NumYears (PCDATA) gt lt!ELEMENT
NumPayments (PCDATA) gt lt!ELEMENT Note (PCDATA) gt
35ltxslstylesheet xmlnsxsl"http//www.w3.org/1999/
XSL/Transform" version"1.0"gt
ltxsltemplate match "Note"gt ltWMLgt
ltCARDgt ltxslapply-templates/gt
lt/CARDgth lt/WMLgt
lt/xsltemplategt ltxsltemplate match
"Notional Fixed_Rate NumYears
NumPayments"gt lt/xsltemplategt
lt/xslstylesheetgt
XSLT Program
36lt?xml version"1.0" encoding"utf-8"?gt ltWMLgtltCARDgt
This is text that ltbgtwill
not be parsed for markup
lt/CARDgtlt/WMLgt
XSLT Output
37DTD Components
lt?xml version"1.0" encoding "UTF-8"?gt lt!DOCTYPE
ORDER SYSTEM "order.dtd"gt lt!-- example order
form --gt ltORDER SOURCE "web" CUSTOMERTYPE"consum
er" CURRENCY"USD"gt ltaddressesgt
ltaddress ADDTYPE"billship"gt
ltfirstnamegtKevinlt/firstnamegt
ltlastnamegtDicklt/lastnamegt ltstreet
ORDER"1"gt123 Anywhere Lanelt/streetgt
ltstreet ORDER"2"gtApt 1blt/streetgt
ltcitygtPalo Altolt/citygt
ltstategtCAlt/stategt ltpostalgt94303lt/postalgt
ltcountrygtUSAlt/countrygt lt/addressgt
Order.xml
38ltaddress ADDTYPE"bill"gt
ltfirstnamegtKevinlt/firstnamegt
ltlastnamegtDicklt/lastnamegt ltstreet
ORDER"1"gt123 Not The Same Lanelt/streetgt
ltstreet ORDER"2"gtWork Placelt/streetgt
ltcitygtPalo Altolt/citygt
ltstategtCAlt/stategt ltpostalgt94300lt/postalgt
ltcountrygtUSAlt/countrygt lt/addressgt
lt/addressesgt
An order may have more than one address.
39ltlineitemsgt ltlineitem ID"line1"gt
ltproduct CAT"MBoard"gt440BX Motherboardlt/productgt
ltquantitygt1lt/quantitygt
ltunitpricegt200lt/unitpricegt lt/lineitemgt
ltlineitem ID"line2"gt ltproduct CAT
"RAM"gt128 MB PC-100 DIMMlt/productgt
ltquantitygt2lt/quantitygt
ltunitpricegt175lt/unitpricegt lt/lineitemgt
ltlineitem ID"line3"gt ltproduct
CAT"CDROM"gt40x CD-ROMlt/productgt
ltquantitygt1lt/quantitygt
ltunitpricegt50lt/unitpricegt lt/lineitemgt
lt/lineitemsgt
Several products may be purchased.
40ltpaymentgt ltcard CARDTYPE"VISA"gt
ltcardholdergtKevin S. Dicklt/cardholdergt
ltcardnumbergt11111-22222-33333lt/cardnumbergt
ltexpirationgt01/01lt/expirationgt lt/cardgt
lt/paymentgt lt/ORDERgt
The payment is with a Visa card.
Valid document is true
41order.dtd
lt?xml version"1.0" encoding"UTF-8"?gt lt!--
Example Order form DTD adapted from XML A
Manager's Guide --gt lt!-- Define an ORDER element
--gt lt!ELEMENT ORDER (addresses, lineitems,
payment)gt lt!ATTLIST ORDER SOURCE
(web phone retail)
REQUIRED CUSTOMERTYPE (consumer
business) "consumer" CURRENCY
CDATA "USD" gt
Define an order based on other elements.
42 lt!ENTITY anAddress SYSTEM "address.dtd"
gt anAddress lt!-- Collection of Addresses
--gt lt!ELEMENT addresses (address)gt lt!ENTITY
aLineItem SYSTEM "lineitem.dtd"
gt aLineItem lt!-- Collection of LineItems
--gt lt!ELEMENT lineitems (lineitem)gt lt!ENTITY
aPayment SYSTEM "payment.dtd" gt aPayment
External parameter entities
The other elements are in their own dtd files.
43address.dtd
lt!-- Address Structure --gt lt!ELEMENT address
(firstname, middlename?, lastname, street,
city, state,postal,country)gt lt!ELEME
NT firstname (PCDATA)gt lt!ELEMENT middlename
(PCDATA)gt lt!ELEMENT lastname
(PCDATA)gt lt!ELEMENT street
(PCDATA)gt lt!ELEMENT city
(PCDATA)gt lt!ELEMENT state
(PCDATA)gt lt!ELEMENT postal
(PCDATA)gt lt!ELEMENT country
(PCDATA)gt lt!ATTLIST address ADDTYPE
(bill ship billship)
"billship"gt lt!ATTLIST street ORDER
CDATA IMPLIEDgt
44lineitem.dtd
lt!ELEMENT lineitem (product,quantity,unitprice)gt lt
!ATTLIST lineitem ID ID
REQUIREDgt lt!ELEMENT
product (PCDATA)gt lt!ATTLIST product CAT
(CDROMMBoardRAM)
REQUIREDgt lt!ELEMENT quantity (PCDATA)gt lt!ELEMEN
T unitprice (PCDATA)gt
45payment.dtd
lt!ELEMENT payment (card PO)gt lt!ELEMENT card
(cardholder, cardnumber, expiration)gt lt!ELEMENT
cardholder (PCDATA)gt lt!ELEMENT cardnumber
(PCDATA)gt lt!ELEMENT expiration
(PCDATA)gt lt!ELEMENT PO (number,authorization)gt lt
!ELEMENT number (PCDATA)gt lt!ELEMENT
authorization (PCDATA)gt lt!ATTLIST card
CARDTYPE (VISAMasterCardAmex)
REQUIREDgt
46Processing XML with SAX
- Important interfaces and classes are found in
org.xml.sax package - We will look at the following interfaces and
then study an example - interface DocumentHandler -- reports on
document events - interface ErrorHandler reports on
validity errors - class HandlerBase implements both of the
above plus two others -
-
47public interface DocumentHandler Receive
notification of general document events. This
is the main interface that most SAX applications
implement if the application needs to be
informed of basic parsing events, it implements
this interface and registers an instance with the
SAX parser. The parser uses the instance to
report basic document-related events like
the start and end of elements and character data.
48Some methods from the DocumentHandler Interface
void characters(char ch, int start, int length)
Receive notification of
character data. void endDocument()
Receive notification of the end of a
document. void endElement(java.lang.String name)
Receive notification of the end
of an element. void startDocument()
Receive notification of the beginning of a
document. void startElement(java.lang.String
name, AttributeList atts)
Receive notification of the beginning of an
element.
49public interface ErrorHandler Basic interface
for SAX error handlers. If a SAX application
needs to implement customized error handling, it
must implement this interface and then register
an instance with the SAX parser. The parser will
then report all errors and warnings through this
interface.
Some methods are void error(SAXParseException
exception) Receive notification
of a recoverable error. void fatalError(SAXParseEx
ception exception) Receive
notification of a non-recoverable error. void
warning(SAXParseException exception)
Receive notification of a warning.
50public class HandlerBase extends
java.lang.Object implements EntityResolver,
DTDHandler, DocumentHandler, ErrorHandler Default
base class for handlers. This class implements
the default behaviour for four SAX interfaces
EntityResolver, DTDHandler, DocumentHandler, and
ErrorHandler.
51FixedFloatSwap.dtd
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap ( Bank, Notional, Fixed_Rate,
NumYears,
NumPayments ) gt lt!ELEMENT Bank
(PCDATA)gt lt!ELEMENT Notional (PCDATA)gt lt!ATTLIST
Notional currency (dollars pounds)
REQUIREDgt lt!ELEMENT Fixed_Rate (PCDATA)
gt lt!ELEMENT NumYears (PCDATA) gt lt!ELEMENT
NumPayments (PCDATA) gt
Input
52FixedFloatSwap.xml
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"
lt!ENTITY bankname "Pittsburgh National
Corporation"gt gt ltFixedFloatSwapgt
ltBankgtbanknamelt/Bankgt ltNotional currency
"pounds"gt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
Input
53Java event-driven processing
// NotifyStr.java // Adapted from XML and Java by
Maruyama, Tamura and Uramoto // IBM Tokyo
Research, Addison-Wesley import
java.io. import org.xml.sax. import
javax.xml.parsers.SAXParserFactory import
javax.xml.parsers.ParserConfigurationException im
port javax.xml.parsers.SAXParser
Processing
54public class NotifyStr extends HandlerBase
public static void main (String argv )
if (argv.length ! 1)
System.err.println ("Usage java NotifyStr
filename.xml") System.exit (1)
SAXParserFactory factory
SAXParserFactory.newInstance()
factory.setValidating(true) NotifyStr
myHandler new NotifyStr() try
SAXParser saxParser
factory.newSAXParser() saxParser.parse(
new File(argv 0), myHandler)
catch (Throwable t) t.printStackTrace ()
System.exit (0)
55 public NotifyStr() public void
startDocument() throws SAXException
System.out.println("startDocument called")
public void endDocument() throws
SAXException System.out.println("endDoc
ument called")
56public void startElement(String Name,
AttributeList aMap) throws
SAXException System.out.println("startElemen
t called element name " Name) //
examine the attributes for(int i
0 i lt aMap.getLength() i)
String attName aMap.getName(i)
String type aMap.getType(i)
String value aMap.getValue(i)
System.out.println(" attribute name "
attName "
type " type " value " value)
57 public void endElement(String name) throws
SAXException
System.out.println("endElement is called"
name) public void characters(char
ch, int start, int length) throws
SAXException // build
String from char array String dataFound
new String(ch,start,length)
System.out.println("characters called"
dataFound)
58public void error(SAXParseException e) throws
SAXException
System.out.println("Parsing error")
System.out.println(e.toString())
59C\McCarthy\www\46-928\examples\saxgtjava
NotifyStr FixedFloatSwap.xml startDocument
called startElement called element name
FixedFloatSwap startElement called element name
Bank characters calledPittsburgh National
Corporation endElement is calledBank startElement
called element name Notional attribute name
currency type ENUMERATION value
pounds characters called100 endElement is
calledNotional startElement called element name
Fixed_Rate characters called5 endElement is
calledFixed_Rate startElement called element
name NumYears characters called3 endElement is
calledNumYears startElement called element name
NumPayments characters called6 endElement is
calledNumPayments endElement is
calledFixedFloatSwap endDocument called
Output
60Accessing the swap from Jigsaw
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap lt!ENTITY bankname "Pittsburgh
National Corporation"gt gt ltFixedFloatSwapgt
ltBankgtbanknamelt/Bankgt ltNotional
currency "pounds"gt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
Saved under Www/fpml/ServerSwap.xml
61// This servlet file is stored in
WWW/Jigsaw/servlet/GetXML.java // This servlet
returns a user selected xml file from // the
Www/fpml directory and returns it to the
client. import java.io. import
java.util. import javax.servlet. import
javax.servlet.http. public class GetXML
extends HttpServlet public void
doGet(HttpServletRequest req, HttpServletResponse
res) throws
ServletException, IOException
String theData ""
String extraPath req.getPathInfo()
extraPath extraPath.substring(1)
Servlet Code
62 // read the file and write it to the
client try // open file
and create a DataInputStream
FileInputStream theFile
new FileInputStream("c\\Jigsaw\\Jigsaw\\Jigsaw\\
Www\\fpml\\
extraPath)
//DataInputStream dis new DataInputStream(theFil
e) InputStreamReader is new
InputStreamReader(theFile)
BufferedReader br new BufferedReader(is)
// read the file into the string
theData String
thisLine while((thisLine
br.readLine()) ! null)
theData thisLine "\n"
catch(Exception e)
System.err.println("Error " e)
63 PrintWriter out
res.getWriter()
out.write(theData)
System.out.println("Wrote document to client")
// write data to console
System.out.println(theData
) out.close()
64// Sax Client import java.io. import
org.xml.sax. import javax.xml.parsers.SAXParserF
actory import javax.xml.parsers.ParserConfigurati
onException import javax.xml.parsers.SAXParser
public class JigsawNotifyStr extends
HandlerBase public static void main (String
argv ) if (argv.length ! 1)
System.err.println ("Usage java
NotifyStr filename.xml") System.exit
(1) String serverString
"http//localhost8001/servlet/getXML/"
String fileName argv0
65 InputSource is new
InputSource(serverString fileName)
System.out.println("Got the input source")
SAXParserFactory factory SAXParserFactory.new
Instance() factory.setValidating(true)
JigsawNotifyStr myHandler new
JigsawNotifyStr() try
SAXParser saxParser factory.newSAXParser()
saxParser.parse( is, myHandler)
catch (Throwable t)
System.out.println("Big
error") t.printStackTrace ()
System.exit (0)
66public JigsawNotifyStr() public void
startDocument() throws SAXException
System.out.println("startDocument called")
public void endDocument() throws
SAXException System.out.println("endDo
cument called") // Same as before
// public void error(SAXParseException e)
throws SAXException // describe each
arror and show each error method
System.out.println("Parsing error")
System.out.println(e.toString())
67Being served by the servlet
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap lt!ENTITY bankname "Pittsburgh
National Corporation"gt gt ltFixedFloatSwapgt
ltBankgtbanknamelt/Bankgt ltNotional
currency "pounds"gt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
68Got the input source startDocument
called Parsing error org.xml.sax.SAXParseExceptio
n Element type "FixedFloatSwap" is not
declared. startElement called element name
FixedFloatSwap characters called Parsing
error org.xml.sax.SAXParseException Element type
"Bank" is not declared. startElement called
element name Bank characters calledPittsburgh
National Corporation endElement is
calledBank characters called Parsing
error org.xml.sax.SAXParseException Element type
"Notional" is not declared. Parsing
error org.xml.sax.SAXParseException Attribute
"currency" is not declared for element
"Notional". startElement called element name
Notional attribute name currency type
CDATA value pounds characters
called100 endElement is calledNotional character
s called
We have some parsing errors.
Do you see why?
69Parsing error org.xml.sax.SAXParseException
Element type "Fixed_Rate" is not
declared. startElement called element name
Fixed_Rate characters called5 endElement is
calledFixed_Rate characters called
Parsing error org.xml.sax.SAXParseException
Element type "NumYears" is not declared. startElem
ent called element name NumYears characters
called3 endElement is calledNumYears characters
called Parsing error org.xml.sax.SAXParseE
xception Element type "NumPayments" is not
declared. startElement called element name
NumPayments characters called6 endElement is
calledNumPayments characters called
endElement is calledFixedFloatSwap endDocument
called
70The InputSource Class
The SAX and DOM parsers need XML input. The
output produced by these parsers amounts to a
series of method calls (SAX) or an application
programmer interface to the tree (DOM). An
InputSource object can be used to provided input
to the parser.
Tree
application
InputSurce
SAX or DOM
Events
So, how do we build an InputSource object?
71Some InputSource constructors
InputSource(String pathToFile)
InputSource(InputStream byteStream)
InputStream(Reader characterStream) For
example String text ltagtsome xmllt/agt
StringReader sr new StringReader(text)
InputSource is new InputSource(sr)
myParser.parse(is)
72But what about the DTD?
public interface EntityResolver Basic interface
for resolving entities. If a SAX application
needs to implement customized handling for
external entities, it must implement this
interface and register an instance with the SAX
parser using the parser's setEntityResolver
method. The parser will then allow the
application to intercept any external entities
(including the external DTD subset and external
parameter entities, if any) before including them.
73EntityResolver
public InputSource resolveEntity(String publicId,
String systemId) // Add this method to
the client above. The systemId String //
holds the path to the dtd as specified in the xml
document. // We may now access the dtd
from a servlet and return an //
InputStream or return null and let the parser
resolve the // external entity.
System.out.println("Attempting to resolve"
"Public id " publicId
"System id "
systemId) return null