Processing XML Part II - PowerPoint PPT Presentation

About This Presentation
Title:

Processing XML Part II

Description:

be guided and constrained to produce conforming documents. XML Validation Examples. XML elements may contain further, embedded elements, and ... – PowerPoint PPT presentation

Number of Views:227
Avg rating:3.0/5.0
Slides: 74
Provided by: mm77
Category:
Tags: xml | part | processing

less

Transcript and Presenter's Notes

Title: Processing XML Part II


1
Processing XML Part II
  • Parser Operations with DOM and SAX overview
  • XML Validation with examples
  • Processing XML with SAX (locally and on the
    internet)

2
FixedFloatSwap.xml
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt ltFixed
FloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt ltNumYearsgt3lt/NumYea
rsgt ltNumPaymentsgt6lt/NumPaymentsgt lt/FixedFloatS
wapgt
3
FixedFloatSwap.dtd
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,

NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt
4
Operation of a Tree-based Parser
XML DTD
Document Tree
Tree-Based Parser
Application Logic
Valid
XML Document
5
Tree Benefits
  • Some data preparation tasks require early
  • access to data that is further along in the
  • document (e.g. we wish to extract titles to
    build a table of contents)
  • New tree construction is easier (e.g. xslt works
    from a tree to convert FpML to WML)

6
Operation of an Event Based Parser
XML DTD
Event-Based Parser
Application Logic
Valid
XML Document
7
Operation of an Event Based Parser
XML DTD
public void startDocument () public void
endDocument () public void startElement (String
name, AttributeList attrs) public void endElement
(String name) public void characters (char buf
, int offset, int len)
Event-Based Parser
Application Logic
Valid
public void error(SAXParseException e) throws
SAXException System.out.println("\n
\n--Invalid document ---" e)
XML Document
8
Event-Driven Benefits
  • We do not need the memory required for trees
  • Parsing can be done faster with no tree
    construction going on

9
XML Validation
A batch validating process involves comparing the
DTD against a complete document instance and
producing a report containing any errors or
warnings. Software developers should consider
batch validation to be analogous to program
compilation, with similar errors
detected. Interactive validation involves
constant comparison of the DTD against a document
as it is being created.
10
XML Validation
  • The benefits of validating documents against a
    DTD include
  • Programmers can write extraction and
    manipulation filters
  • without fear of their software ever processing
    unexpected
  • input.
  • Using an XML-aware word processor, authors and
    editors can
  • be guided and constrained to produce conforming
    documents.

11
XML Validation Examples
XML elements may contain further, embedded
elements, and the entire document must be
enclosed by a single document element. The
degree to which an elements content is organized
into child elements is often termed its
granularity. Some hierarchical structures may be
recursive. The Document Type Definition (DTD)
contains rules for each element allowed within a
specific class of documents.
12
Well run this program against several xml
files with DTDs.
// Validate.java import java.io. import
org.xml.sax. import javax.xml.parsers.SAXParserF
actory import javax.xml.parsers.ParserConfigurati
onException import javax.xml.parsers.SAXParser
public class Validate extends HandlerBase
public static boolean valid true public
static void main (String argv )
if (argv.length ! 1)
System.err.println ("Usage java Validate
filename.xml") System.exit (1)
SAXParserFactory factory
SAXParserFactory.newInstance()
factory.setValidating(true)
13
try SAXParser saxParser
factory.newSAXParser() saxParser.parse(
new File(argv 0), new Validate())
catch (Throwable t)
t.printStackTrace ()
System.out.println("Valid document is " valid)
System.exit (0) public void
error(SAXParseException e) throws SAXException
System.out.println(e.toString())
valid false
14
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt ltFixed
FloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt ltNumYearsgt3lt/NumYea
rsgt ltNumPaymentsgt6lt/NumPaymentsgt lt/FixedFloatS
wapgt
XML Document
DTD
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt
Valid document is true
15
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt ltFixed
FloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt ltNumYearsgt3lt/NumYea
rsgt ltNumPaymentsgt6lt/NumPaymentsgt lt/FixedFloatS
wapgt
XML Document
DTD
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumPayments (PCDATA) gt
Valid document is false
16
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
Swaps SYSTEM "FixedFloatSwap.dtd"gt ltSwapsgt
ltFixedFloatSwapgt ltNotionalgt100lt/Notio
nalgt ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
ltFixedFloatSwapgt ltNotionalgt100lt/Notion
algt ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt lt/Swapsgt
XML Document
17
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
Swaps (FixedFloatSwap) gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt
DTD
C\McCarthy\www\46-928\examples\saxgtjava Validate
FixedFloatSwap.xml
Valid document is true
Quantity Indicators ? 0 or 1 time
1 or more times 0 or more
times
18
The locations where document text data is allowed
are indicated by the keyword PCDATA (Parsed
Character Data).
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotionalgt100lt/Notio
nalgt ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt ltStartYeargt2000lt/StartYeargt
ltEndYeargt2002lt/EndYeargt
lt/NumYearsgt ltNumPaymentsgt6lt/NumPaymentsgt
lt/FixedFloatSwapgt
XML Document
19
DTD
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt
Output of program after being modified to
display the error.
C\McCarthy\www\46-928\examples\saxgtjava Validate
FixedFloatSwap.xml org.xml.sax.SAXParseException
Element "NumYears" does not allow "StartYear"
-- (PCDATA) org.xml.sax.SAXParseException
Element type "StartYear" is not
declared. org.xml.sax.SAXParseException Element
"NumYears" does not allow "EndYear" --
( PCDATA) org.xml.sax.SAXParseException Element
type "EndYear" is not declared. Valid document is
false
20
There are strict rules which must be applied when
an element is allowed to contain both text and
child elements. The PCDATA keyword must be the
first token in the group, and the group must be a
choice group (using not ,). The group must
be optional and repeatable. This is known as a
mixed content model.
21
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
Mixed (emph) gt lt!ELEMENT emph (PCDATA sub
super) gt lt!ELEMENT sub (PCDATA)gt lt!ELEMENT
super (PCDATA)gt
DTD
XML Document
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
Mixed SYSTEM "Mixed.dtd"gt ltMixedgt
ltemphgtHltsubgt2lt/subgtO is water.lt/emphgt lt/Mixedgt
Valid document is true
22
Attributes
An attribute is associated with a particular
element by the DTD and is assigned an attribute
type. The attribute type can restrict the range
of values it can hold. Example attribute types
include CDATA indicates a simple
string of characters NMTOKEN indicates a
word or token A named token group such as
(left center right)
23
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt lt!ATTLIST Notional currency (Dollars
Pounds) REQUIREDgt
DTD
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
XML Document
C\McCarthy\www\46-928\examples\saxgtjava Validate
FixedFloatSwap.xml org.xml.sax.SAXParseException
Attribute value for "currency" is
REQUIRED. Valid document is false
24
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt lt!ATTLIST Notional currency (Dollars
Pounds) REQUIREDgt
DTD
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotional currency
Poundsgt100lt/Notionalgt ltFixed_Rategt5lt/Fixe
d_Rategt ltNumYearsgt3lt/NumYearsgt
ltNumPaymentsgt6lt/NumPaymentsgt lt/FixedFloatSwapgt
XML Document
Valid document is true
25
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt lt!ATTLIST Notional currency (Dollars
Pounds) REQUIREDgt lt!ATTLIST FixedFloatSwap
note CDATA IMPLIEDgt
DTD
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotional currency
Poundsgt100lt/Notionalgt ltFixed_Rategt5lt/Fixe
d_Rategt ltNumYearsgt3lt/NumYearsgt
ltNumPaymentsgt6lt/NumPaymentsgt lt/FixedFloatSwapgt
XML Document
Valid document is true IMPLIED means optional
26
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) gt lt!ELEMENT Notional (PCDATA)
gt lt!ELEMENT Fixed_Rate (PCDATA) gt lt!ELEMENT
NumYears (PCDATA) gt lt!ELEMENT NumPayments
(PCDATA) gt lt!ATTLIST Notional currency (Dollars
Pounds) REQUIREDgt lt!ATTLIST FixedFloatSwap
note CDATA IMPLIEDgt
DTD
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwap note For your eyes onlygt
ltNotional currency Poundsgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
XML Document
Valid document is true
27
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"
lt!ENTITY bankname "Mellon National Bank and
Trust" gt gt ltFixedFloatSwapgt
ltBankgtbanknamelt/Bankgt ltNotionalgt100lt/Noti
onalgt ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
Document using a General Entity
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Bank,Notional, Fixed_Rate,
NumYears, NumPayments )
gt lt!ELEMENT Bank (PCDATA) gt lt!ELEMENT Notional
(PCDATA) gt lt!ELEMENT Fixed_Rate (PCDATA)
gt lt!ELEMENT NumYears (PCDATA) gt lt!ELEMENT
NumPayments (PCDATA) gt
DTD
Validate is true
28
ltxslstylesheet xmlnsxsl"http//www.w3.org/1999/
XSL/Transform" version"1.0"gt
ltxsltemplate match "Bank"gt ltWMLgt
ltCARDgt ltxslapply-templates/gt
lt/CARDgt lt/WMLgt lt/xsltemplategt
ltxsltemplate match "Notional Fixed_Rate
NumYears NumPayments"gt lt/xsltemplategt
lt/xslstylesheetgt
XSLT Program
29
C\McCarthy\www\46-928\examples\saxgtjava
-Dcom.jclark.xsl.sax.parsercom.jclark. xml.sax.Co
mmentDriver com.jclark.xsl.sax.Driver
FixedFloatSwap.xml FixedFloatSwa p.xsl
FixedFloatSwap.wml C\McCarthy\www\46-928\example
s\saxgttype FixedFloatSwap.wml lt?xml
version"1.0" encoding"utf-8"?gt
ltWMLgtltCARDgtMellon National Bank and
Trustlt/CARDgtlt/WMLgt
XSLT OUTPUT
30
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"
lt!ENTITY bankname SYSTEM "JustAFile.dat" gt
gt ltFixedFloatSwapgt ltBankgtbanknamelt/B
ankgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
An external text entity
31
JustAFile.dat
Mellon Bank And Trust Corporation When you need a
friend!
XSLT Output
lt?xml version"1.0" encoding"utf-8"?gt
ltWMLgtltCARDgtMellon Bank And Trust Corporation When
you need a friend!lt/CARDgtlt/WMLgt
32
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
XML Document
Internal Parameter Entities
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Notional, Fixed_Rate, NumYears,

NumPayments ) gt lt!ENTITY parsedCharacterData
"(PCDATA)"gt lt!ELEMENT Notional
parsedCharacterData gt lt!ELEMENT Fixed_Rate
(PCDATA) gt lt!ELEMENT NumYears (PCDATA)
gt lt!ELEMENT NumPayments (PCDATA) gt
DTD
33
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltBankgt bankname
lt/Bankgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
XML Document
General Entity defined in the DTD
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap (Bank, Notional, Fixed_Rate,
NumYears,
NumPayments ) gt lt!ENTITY bankname
"Mellon National Bank and Trust Corporation"
gt lt!ELEMENT Bank (PCDATA)gt lt!ELEMENT Notional
(PCDATA)gt lt!ELEMENT Fixed_Rate (PCDATA)
gt lt!ELEMENT NumYears (PCDATA) gt lt!ELEMENT
NumPayments (PCDATA) gt
DTD
34
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"gt
ltFixedFloatSwapgt ltNotionalgt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt ltNotegt
lt!CDATAThis is text that ltbgtwill not be
parsed for markupgt
lt/Notegt lt/FixedFloatSwapgt
XML Document
CDATA Section
DTD
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap ( Notional, Fixed_Rate, NumYears,

NumPayments, Note ) gt lt!ELEMENT Notional
(PCDATA)gt lt!ELEMENT Fixed_Rate (PCDATA)
gt lt!ELEMENT NumYears (PCDATA) gt lt!ELEMENT
NumPayments (PCDATA) gt lt!ELEMENT Note (PCDATA) gt
35
ltxslstylesheet xmlnsxsl"http//www.w3.org/1999/
XSL/Transform" version"1.0"gt
ltxsltemplate match "Note"gt ltWMLgt
ltCARDgt ltxslapply-templates/gt
lt/CARDgth lt/WMLgt
lt/xsltemplategt ltxsltemplate match
"Notional Fixed_Rate NumYears
NumPayments"gt lt/xsltemplategt
lt/xslstylesheetgt
XSLT Program
36
lt?xml version"1.0" encoding"utf-8"?gt ltWMLgtltCARDgt
This is text that ltbgtwill
not be parsed for markup
lt/CARDgtlt/WMLgt
XSLT Output
37
DTD Components
lt?xml version"1.0" encoding "UTF-8"?gt lt!DOCTYPE
ORDER SYSTEM "order.dtd"gt lt!-- example order
form --gt ltORDER SOURCE "web" CUSTOMERTYPE"consum
er" CURRENCY"USD"gt ltaddressesgt
ltaddress ADDTYPE"billship"gt
ltfirstnamegtKevinlt/firstnamegt
ltlastnamegtDicklt/lastnamegt ltstreet
ORDER"1"gt123 Anywhere Lanelt/streetgt
ltstreet ORDER"2"gtApt 1blt/streetgt
ltcitygtPalo Altolt/citygt
ltstategtCAlt/stategt ltpostalgt94303lt/postalgt
ltcountrygtUSAlt/countrygt lt/addressgt

Order.xml
38
ltaddress ADDTYPE"bill"gt
ltfirstnamegtKevinlt/firstnamegt
ltlastnamegtDicklt/lastnamegt ltstreet
ORDER"1"gt123 Not The Same Lanelt/streetgt
ltstreet ORDER"2"gtWork Placelt/streetgt
ltcitygtPalo Altolt/citygt
ltstategtCAlt/stategt ltpostalgt94300lt/postalgt
ltcountrygtUSAlt/countrygt lt/addressgt
lt/addressesgt
An order may have more than one address.
39
ltlineitemsgt ltlineitem ID"line1"gt
ltproduct CAT"MBoard"gt440BX Motherboardlt/productgt
ltquantitygt1lt/quantitygt
ltunitpricegt200lt/unitpricegt lt/lineitemgt
ltlineitem ID"line2"gt ltproduct CAT
"RAM"gt128 MB PC-100 DIMMlt/productgt
ltquantitygt2lt/quantitygt
ltunitpricegt175lt/unitpricegt lt/lineitemgt
ltlineitem ID"line3"gt ltproduct
CAT"CDROM"gt40x CD-ROMlt/productgt
ltquantitygt1lt/quantitygt
ltunitpricegt50lt/unitpricegt lt/lineitemgt
lt/lineitemsgt
Several products may be purchased.
40
ltpaymentgt ltcard CARDTYPE"VISA"gt
ltcardholdergtKevin S. Dicklt/cardholdergt
ltcardnumbergt11111-22222-33333lt/cardnumbergt
ltexpirationgt01/01lt/expirationgt lt/cardgt
lt/paymentgt lt/ORDERgt
The payment is with a Visa card.
Valid document is true
41
order.dtd
lt?xml version"1.0" encoding"UTF-8"?gt lt!--
Example Order form DTD adapted from XML A
Manager's Guide --gt lt!-- Define an ORDER element
--gt lt!ELEMENT ORDER (addresses, lineitems,
payment)gt lt!ATTLIST ORDER SOURCE
(web phone retail)
REQUIRED CUSTOMERTYPE (consumer
business) "consumer" CURRENCY
CDATA "USD" gt
Define an order based on other elements.
42
lt!ENTITY anAddress SYSTEM "address.dtd"
gt anAddress lt!-- Collection of Addresses
--gt lt!ELEMENT addresses (address)gt lt!ENTITY
aLineItem SYSTEM "lineitem.dtd"
gt aLineItem lt!-- Collection of LineItems
--gt lt!ELEMENT lineitems (lineitem)gt lt!ENTITY
aPayment SYSTEM "payment.dtd" gt aPayment
External parameter entities
The other elements are in their own dtd files.
43
address.dtd
lt!-- Address Structure --gt lt!ELEMENT address
(firstname, middlename?, lastname, street,
city, state,postal,country)gt lt!ELEME
NT firstname (PCDATA)gt lt!ELEMENT middlename
(PCDATA)gt lt!ELEMENT lastname
(PCDATA)gt lt!ELEMENT street
(PCDATA)gt lt!ELEMENT city
(PCDATA)gt lt!ELEMENT state
(PCDATA)gt lt!ELEMENT postal
(PCDATA)gt lt!ELEMENT country
(PCDATA)gt lt!ATTLIST address ADDTYPE
(bill ship billship)
"billship"gt lt!ATTLIST street ORDER
CDATA IMPLIEDgt
44
lineitem.dtd
lt!ELEMENT lineitem (product,quantity,unitprice)gt lt
!ATTLIST lineitem ID ID
REQUIREDgt lt!ELEMENT
product (PCDATA)gt lt!ATTLIST product CAT
(CDROMMBoardRAM)
REQUIREDgt lt!ELEMENT quantity (PCDATA)gt lt!ELEMEN
T unitprice (PCDATA)gt
45
payment.dtd
lt!ELEMENT payment (card PO)gt lt!ELEMENT card
(cardholder, cardnumber, expiration)gt lt!ELEMENT
cardholder (PCDATA)gt lt!ELEMENT cardnumber
(PCDATA)gt lt!ELEMENT expiration
(PCDATA)gt lt!ELEMENT PO (number,authorization)gt lt
!ELEMENT number (PCDATA)gt lt!ELEMENT
authorization (PCDATA)gt lt!ATTLIST card
CARDTYPE (VISAMasterCardAmex)
REQUIREDgt
46
Processing XML with SAX
  • Important interfaces and classes are found in
    org.xml.sax package
  • We will look at the following interfaces and
    then study an example
  • interface DocumentHandler -- reports on
    document events
  • interface ErrorHandler reports on
    validity errors
  • class HandlerBase implements both of the
    above plus two others

47
public interface DocumentHandler Receive
notification of general document events. This
is the main interface that most SAX applications
implement if the application needs to be
informed of basic parsing events, it implements
this interface and registers an instance with the
SAX parser. The parser uses the instance to
report basic document-related events like
the start and end of elements and character data.
48
Some methods from the DocumentHandler Interface
void characters(char ch, int start, int length)
Receive notification of
character data. void endDocument()
Receive notification of the end of a
document. void endElement(java.lang.String name)
Receive notification of the end
of an element. void startDocument()
Receive notification of the beginning of a
document. void startElement(java.lang.String
name, AttributeList atts)
Receive notification of the beginning of an
element.
49
public interface ErrorHandler Basic interface
for SAX error handlers. If a SAX application
needs to implement customized error handling, it
must implement this interface and then register
an instance with the SAX parser. The parser will
then report all errors and warnings through this
interface.
Some methods are void error(SAXParseException
exception) Receive notification
of a recoverable error. void fatalError(SAXParseEx
ception exception) Receive
notification of a non-recoverable error. void
warning(SAXParseException exception)
Receive notification of a warning.
50
public class HandlerBase extends
java.lang.Object implements EntityResolver,
DTDHandler, DocumentHandler, ErrorHandler Default
base class for handlers. This class implements
the default behaviour for four SAX interfaces
EntityResolver, DTDHandler, DocumentHandler, and
ErrorHandler.
51
FixedFloatSwap.dtd
lt?xml version"1.0" encoding"utf-8"?gt lt!ELEMENT
FixedFloatSwap ( Bank, Notional, Fixed_Rate,
NumYears,
NumPayments ) gt lt!ELEMENT Bank
(PCDATA)gt lt!ELEMENT Notional (PCDATA)gt lt!ATTLIST
Notional currency (dollars pounds)
REQUIREDgt lt!ELEMENT Fixed_Rate (PCDATA)
gt lt!ELEMENT NumYears (PCDATA) gt lt!ELEMENT
NumPayments (PCDATA) gt
Input
52
FixedFloatSwap.xml
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"
lt!ENTITY bankname "Pittsburgh National
Corporation"gt gt ltFixedFloatSwapgt
ltBankgtbanknamelt/Bankgt ltNotional currency
"pounds"gt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
Input
53
Java event-driven processing
// NotifyStr.java // Adapted from XML and Java by
Maruyama, Tamura and Uramoto // IBM Tokyo
Research, Addison-Wesley import
java.io. import org.xml.sax. import
javax.xml.parsers.SAXParserFactory import
javax.xml.parsers.ParserConfigurationException im
port javax.xml.parsers.SAXParser
Processing
54
public class NotifyStr extends HandlerBase
public static void main (String argv )
if (argv.length ! 1)
System.err.println ("Usage java NotifyStr
filename.xml") System.exit (1)
SAXParserFactory factory
SAXParserFactory.newInstance()
factory.setValidating(true) NotifyStr
myHandler new NotifyStr() try
SAXParser saxParser
factory.newSAXParser() saxParser.parse(
new File(argv 0), myHandler)
catch (Throwable t) t.printStackTrace ()
System.exit (0)
55
public NotifyStr() public void
startDocument() throws SAXException
System.out.println("startDocument called")
public void endDocument() throws
SAXException System.out.println("endDoc
ument called")
56
public void startElement(String Name,
AttributeList aMap) throws
SAXException System.out.println("startElemen
t called element name " Name) //
examine the attributes for(int i
0 i lt aMap.getLength() i)
String attName aMap.getName(i)
String type aMap.getType(i)
String value aMap.getValue(i)
System.out.println(" attribute name "
attName "
type " type " value " value)

57
public void endElement(String name) throws
SAXException
System.out.println("endElement is called"
name) public void characters(char
ch, int start, int length) throws

SAXException // build
String from char array String dataFound
new String(ch,start,length)
System.out.println("characters called"
dataFound)
58
public void error(SAXParseException e) throws
SAXException
System.out.println("Parsing error")
System.out.println(e.toString())
59
C\McCarthy\www\46-928\examples\saxgtjava
NotifyStr FixedFloatSwap.xml startDocument
called startElement called element name
FixedFloatSwap startElement called element name
Bank characters calledPittsburgh National
Corporation endElement is calledBank startElement
called element name Notional attribute name
currency type ENUMERATION value
pounds characters called100 endElement is
calledNotional startElement called element name
Fixed_Rate characters called5 endElement is
calledFixed_Rate startElement called element
name NumYears characters called3 endElement is
calledNumYears startElement called element name
NumPayments characters called6 endElement is
calledNumPayments endElement is
calledFixedFloatSwap endDocument called
Output
60
Accessing the swap from Jigsaw
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap lt!ENTITY bankname "Pittsburgh
National Corporation"gt gt ltFixedFloatSwapgt
ltBankgtbanknamelt/Bankgt ltNotional
currency "pounds"gt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
Saved under Www/fpml/ServerSwap.xml
61
// This servlet file is stored in
WWW/Jigsaw/servlet/GetXML.java // This servlet
returns a user selected xml file from // the
Www/fpml directory and returns it to the
client. import java.io. import
java.util. import javax.servlet. import
javax.servlet.http. public class GetXML
extends HttpServlet public void
doGet(HttpServletRequest req, HttpServletResponse
res) throws
ServletException, IOException
String theData ""
String extraPath req.getPathInfo()
extraPath extraPath.substring(1)

Servlet Code
62
// read the file and write it to the
client try // open file
and create a DataInputStream
FileInputStream theFile
new FileInputStream("c\\Jigsaw\\Jigsaw\\Jigsaw\\
Www\\fpml\\
extraPath)
//DataInputStream dis new DataInputStream(theFil
e) InputStreamReader is new
InputStreamReader(theFile)
BufferedReader br new BufferedReader(is)
// read the file into the string
theData String
thisLine while((thisLine
br.readLine()) ! null)
theData thisLine "\n"
catch(Exception e)
System.err.println("Error " e)

63
PrintWriter out
res.getWriter()
out.write(theData)
System.out.println("Wrote document to client")
// write data to console
System.out.println(theData
) out.close()

64
// Sax Client import java.io. import
org.xml.sax. import javax.xml.parsers.SAXParserF
actory import javax.xml.parsers.ParserConfigurati
onException import javax.xml.parsers.SAXParser
public class JigsawNotifyStr extends
HandlerBase public static void main (String
argv ) if (argv.length ! 1)
System.err.println ("Usage java
NotifyStr filename.xml") System.exit
(1) String serverString
"http//localhost8001/servlet/getXML/"
String fileName argv0
65
InputSource is new
InputSource(serverString fileName)
System.out.println("Got the input source")
SAXParserFactory factory SAXParserFactory.new
Instance() factory.setValidating(true)
JigsawNotifyStr myHandler new
JigsawNotifyStr() try
SAXParser saxParser factory.newSAXParser()
saxParser.parse( is, myHandler)
catch (Throwable t)
System.out.println("Big
error") t.printStackTrace ()
System.exit (0)
66
public JigsawNotifyStr() public void
startDocument() throws SAXException
System.out.println("startDocument called")
public void endDocument() throws
SAXException System.out.println("endDo
cument called") // Same as before
// public void error(SAXParseException e)
throws SAXException // describe each
arror and show each error method
System.out.println("Parsing error")
System.out.println(e.toString())
67
Being served by the servlet
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
FixedFloatSwap lt!ENTITY bankname "Pittsburgh
National Corporation"gt gt ltFixedFloatSwapgt
ltBankgtbanknamelt/Bankgt ltNotional
currency "pounds"gt100lt/Notionalgt
ltFixed_Rategt5lt/Fixed_Rategt
ltNumYearsgt3lt/NumYearsgt ltNumPaymentsgt6lt/NumP
aymentsgt lt/FixedFloatSwapgt
68
Got the input source startDocument
called Parsing error org.xml.sax.SAXParseExceptio
n Element type "FixedFloatSwap" is not
declared. startElement called element name
FixedFloatSwap characters called Parsing
error org.xml.sax.SAXParseException Element type
"Bank" is not declared. startElement called
element name Bank characters calledPittsburgh
National Corporation endElement is
calledBank characters called Parsing
error org.xml.sax.SAXParseException Element type
"Notional" is not declared. Parsing
error org.xml.sax.SAXParseException Attribute
"currency" is not declared for element
"Notional". startElement called element name
Notional attribute name currency type
CDATA value pounds characters
called100 endElement is calledNotional character
s called
We have some parsing errors.
Do you see why?
69
Parsing error org.xml.sax.SAXParseException
Element type "Fixed_Rate" is not
declared. startElement called element name
Fixed_Rate characters called5 endElement is
calledFixed_Rate characters called
Parsing error org.xml.sax.SAXParseException
Element type "NumYears" is not declared. startElem
ent called element name NumYears characters
called3 endElement is calledNumYears characters
called Parsing error org.xml.sax.SAXParseE
xception Element type "NumPayments" is not
declared. startElement called element name
NumPayments characters called6 endElement is
calledNumPayments characters called
endElement is calledFixedFloatSwap endDocument
called
70
The InputSource Class
The SAX and DOM parsers need XML input. The
output produced by these parsers amounts to a
series of method calls (SAX) or an application
programmer interface to the tree (DOM). An
InputSource object can be used to provided input
to the parser.
Tree
application
InputSurce
SAX or DOM
Events
So, how do we build an InputSource object?
71
Some InputSource constructors
InputSource(String pathToFile)
InputSource(InputStream byteStream)
InputStream(Reader characterStream) For
example String text ltagtsome xmllt/agt
StringReader sr new StringReader(text)
InputSource is new InputSource(sr)
myParser.parse(is)
72
But what about the DTD?
public interface EntityResolver Basic interface
for resolving entities. If a SAX application
needs to implement customized handling for
external entities, it must implement this
interface and register an instance with the SAX
parser using the parser's setEntityResolver
method. The parser will then allow the
application to intercept any external entities
(including the external DTD subset and external
parameter entities, if any) before including them.
73
EntityResolver
public InputSource resolveEntity(String publicId,
String systemId) // Add this method to
the client above. The systemId String //
holds the path to the dtd as specified in the xml
document. // We may now access the dtd
from a servlet and return an //
InputStream or return null and let the parser
resolve the // external entity.
System.out.println("Attempting to resolve"
"Public id " publicId
"System id "
systemId) return null
Write a Comment
User Comments (0)
About PowerShow.com