Title: P1252109256vBdOt
15
Processing XML
2Overview
- Parsing XML documents
- Document Object Model (DOM)
- Simple API for XML (SAX)
- Class generation
3What's the Problem?
?
lt?xml version"1.0"?gt ltbooksgt ltbookgt
lttitlegtThe XML Handbooklt/titlegt
ltauthorgtGoldfarblt/authorgt
ltauthorgtPrescodlt/authorgt ltpublishergtPrentic
e Halllt/publishergt ltpagesgt655lt/pagesgt
ltisbngt0130811521lt/isbngt ltprice
currency"USD"gt44.95lt/pricegt lt/bookgt ltbookgt
lttitlegtXML Designlt/titlegt
ltauthorgtSpencerlt/authorgt ltpublishergtWrox
Presslt/publishergt ... lt/bookgt lt/booksgt
?
Book
4Parsing XML Documents
Docu-ment
DTD /Schema
DOM
SAX
5Parser
- Project X (Sun Microsystems)
- Ælfred (Microstar Software)
- XML4J (IBM)
- Lark (Tim Bray)
- MSXML (Microsoft)
- XJ (Data Channel)
- Xerces (Apache)
- ...
6The Document Object Model
XML Document
Structure
lt?xml version"1.0"?gt ltbooksgt ltbookgt
lttitlegtThe XML Handbooklt/titlegt
ltauthorgtGoldfarblt/authorgt
ltauthorgtPrescodlt/authorgt ltpublishergtPrentic
e Halllt/publishergt ltpagesgt655lt/pagesgt
ltisbngt0130811521lt/isbngt ltprice
currency"USD"gt44.95lt/pricegt lt/bookgt ltbookgt
lttitlegtXML Designlt/titlegt
ltauthorgtSpencerlt/authorgt ltpublishergtWrox
Presslt/publishergt ... lt/bookgt lt/booksgt
books
book
book
publisher
pages
isbn
author
title
PrenticeHall
The XMLHandbook
Goldfarb
655
...
Prescod
7The Document Object Model
- Provides a standard interface for access to and
manipulation of XML structures. - Represents documents in the form of a hierarchy
of nodes. - Is platform- and programming-language-neutral
- Is a recommendation of the W3C (October 1, 1998)
- Is implemented by many parsers
8DOM - Structure Model
Document
books
book
book
Node
publisher
pages
isbn
author
title
Element
PrenticeHall
The XMLHandbook
Goldfarb
655
...
Prescod
NodeList
9The Document Interface
Method Result
docTypeimplementation documentElement getElements
ByTagName(String) createTextNode(String) createCom
ment(String) createElement(String) create
CDATASection(String)
DocumentType DOMImplementation Element NodeLis
t String Comment Element CDATASection
10The Node Interface
Method
Result
String String short Node NodeList Node
Node Node Node NodeNamedMap Node Node
Node Boolean
nodeName nodeValue nodeType parentNode childNodes
firstChild lastChild previousSibling nextSibling a
ttributes insertBefore(Node new,Node
ref) replaceChild(Node new,Node
old) removeChild(Node) hasChildNode
11Node Types / Node Names
Result NodeType /NodeName
Node Node Node Fields
Type Name ELEMENT_NODE 1
tagName ATTRIBUTE_NODE 2 name of
attribute TEXT_NODE 3 "text" CDATA_SECTI
ON_NODE 4 "cdata-section" ENTITY_REFERENCE
_NODE 5 name of entity referenced ENTITY_NO
DE 6 entity name PROCESSING_INSTRUCTION_N
ODE 7 targetCOMMENT_NODE 8
"comment"DOCUMENT_NODE 9
"document"DOCUMENT_TYPE_NODE 10 document
type name DOCUMENT_FRAGMENT_NODE 11
"document-fragment" NOTATION_NODE 12
notation name
12The NodeList Interface
Method Result
length item(int)
Int Node
13The Element Interface
Method Result
tagName getAttribute(String) setAttribute(String
name, String value) removeAttribute(String) getAtt
ributeNode(String) setAttributeNode(Attr) removeAt
tributeNode(String) getElementsByTagName
String String Attr Attr Attr NodeList
14DOM Methods for Navigation
parentNode
nextSibling
previousSibling
firstChild
lastChild
childNodes(length, item())
getElementsByTagName
15DOM Methods for Manipulation
appendChild insertBefore replaceChildremoveChild
createElement createAttribute createTextNode
16Example
books
book
book
author
author
author
Spencer
Prescod
Goldfarb
doc.documentElement.childNodes.item(0).getElements
ByTagName("author").
item(1).childNodes.item(0).data
17Script
ltHTMLgt ltHEADgtltTITLEgtDOM Examplelt/TITLEgtlt/HEADgt ltBO
DYgt ltH1gtDOM Examplelt/H1gt ltSCRIPT
LANGUAGE"JavaScript"gt var doc, root, book1,
authors, author2 doc new
ActiveXObject("Microsoft.XMLDOM") doc.async
false doc.load("books.xml") if
(doc.parseError ! 0) alert(doc.parseError.rea
son) else root doc.documentElement docu
ment.write("Name of Root node " root.nodeName
"ltBRgt") document.write("Type of Root node "
root.nodeType "ltBRgt") book1
root.childNodes.item(0) authors
book1.getElementsByTagName("author") document.wr
ite("Number of authors " authors.length
"ltBRgt") author2 authors.item(1) document.wri
te("Name of second author " author2.childNodes.
item(0).data) lt/SCRIPTgt lt/BODYgtlt/HTMLgt
18SAX - Simple API for XML
Docu-ment
DTD
Application
19SAX - Simple API for XML
- Event-driven parsing model
- "Don't call the DOM, the parser calls you."
- Developed by the members of the XML-DEV Mailing
List - Released on May 11, 1998
- Supported by many parsers ...
- ... but Ælfred is the saxon king.
20Procedure
- DOM
- Creating a parser instance
- Parsing the whole document
- Processing the DOM tree
- SAX
- Creating a parser instance
- Registrating event handlers with the parser
- Parser calls the event handler during parsing
21Namespace Support
lt?xml version"1.0"?gt ltorder xmlns"http//www.net
-standard.com/namespaces/order"
xmlnsbk"http//www.net-standard.com/namespaces/
books" xmlnscust"http//www.net-standard.
com/namespaces/customer" gt ... ltbkbookgt
ltbktitlegtXML Handbooklt/bktitlegt
ltbkisbngt0130811521lt/bkisbngt lt/bkbookgt .... lt/or
dergt
22Access to Qualified Elements
Node "book"
bkbook http//www.net-standard.com/namespaces/boo
ks bk book
23Generation of Data Structures
24Summary
- To avoid expensive text processing, applications
use an XML parser that creates a DOM tree of a
document. - The DOM provides a standardized API to access the
content of documents and to manipulate them. - Alternatively or additionally, applications can
work event-based using the SAX interface, which
is provided by many parsers.