Title: XML for Ecommerce II
1XML for E-commerce II
2XML processing model
- XML processor is used to read XML documents and
provide access to their content and structure - XML processor works for some application
- the specification defines which information the
processor should provide to the application
3Parsing
- input an XML document
- basic task is the document well-formed?
- Validating parsers additionally is the document
valid?
4Parsing
- parsers produce data structures, which other
tools and applications can use - two kind of APIs tree-based and event-based
5Tree-based API
- compiles an XML document into an internal tree
structure - allows an application to navigate the tree
- Document Object Model (DOM) is a tree-based API
for XML and HTML documents
6Event-based API
- reports parsing events (such as start and end of
elements) directly to the application through
callbacks - the application implements handlers to deal with
the different events - Simple API for XML (SAX)
7Example
lt?xml version1.0gt ltdocgt
ltparagtHello, world!lt/paragt lt/docgt
start document start element doc start element
para characters Hello, world! end element
para end element doc
8Example (cont.)
- an application handles these events just as it
would handle events from a graphical user
interface (mouse clicks, etc) as the events occur - no need to cache the entire document in memory or
secondary storage
9Tree-based vs. event-based
- tree-based APIs are useful for a wide range of
applications, but they may need a lot of
resources (if the document is large) - some applications may need to build their own
tree structures, and it is very inefficient to
build a parse tree only to map it to another tree
10Tree-based vs. event-based
- an event-based API is simpler, lower-level access
to an XML document - as document is processed sequentially, one can
parse documents much larger than the available
system memory - own data structures can be constructed using own
callback event handlers
11We need a parser...
- Apache Xerces http//xml.apache.org
- IBM XML4J http//alphaworks.ibm.com
- XP http//www.jclark.com/xml/xp
- many others
12 and the SAX classes
- http//www.megginson.com/SAX/
- often the SAX classes come bundled to the parser
distribution - some parsers only support SAX 1.0, the latest
version is 2.0
13Starting a SAX parser
import org.xml.sax.XMLReader import
org.apache.xerces.parsers.SAXParser XMLReader
parser new SAXParser() parser.parse(uri)
14Content handlers
- In order to let the application do something
useful with XML data as it is being parsed, we
must register handlers with the SAX parser - handler is a set of callbacks application code
can be run at important events within a
documents parsing
15Core handler interfaces in SAX
- org.xml.sax.ContentHandler
- org.xml.sax.ErrorHandler
- org.xml.sax.DTDHandler
- org.xml.sax.EntityResolver
16Custom application classes
- custom application classes that perform specific
actions within the parsing process can implement
each of the core interfaces - implementation classes can be registered with the
parser with the methods setContentHandler(), etc.
17Example content handlers
class MyContentHandler implements ContentHandler
public void startDocument() throws
SAXException System.out.println(Parsing
begins) public void endDocument()
throws SAXException System.out.println(
...Parsing ends.)
18Element handlers
public void startElement (String namespaceURI,
String
localName,
String rawName,
Attributes atts) throws SAXexception
System.out.print(startElement
localName) if (!namespaceURI.equals())
System.out.println( in namespace
namespaceURI
( rawname )) else
System.out.println( has no associated
namespace) for (int I0 Iltatts.getLength()
I) System.out.println( Attribute
atts.getLocalName(I)
atts.getValue(I))
19endElement
public void endElement(String namespaceURI,
String
localName,
String rawName) throws SAXException
System.out.println(endElement localName
\n)
20Character data
public void characters (char ch, int start, int
end) throws SAXException String s
new String(ch, start, end)
System.out.println(characters s)
- parser may return all contiguous character data
at once, or split the data up into multiple
method invocations
21Processing instructions
- XML documents may contain processing instructions
(PIs) - a processing instruction tells an application to
perform some specific task - form lt?target instructions?gt
22Handlers for PIs
public void processingInstruction (String
target,
String data) throws
SAXException System.out.println(PI
Target target
and Data data)
- Application could receive instructions and set
variables or execute methods to perform
application-specific processing
23Validation
- some parsers are validating, some non-validating
- some parsers can do both
- SAX method to turn validation on
parser.setFeature (http//xml.org/sax/features/va
lidation, true)
24Ignorable whitespace
- validating parser can decide which whitespace can
be ignored - for a non-validating parser, all whitespace is
just characters - content handler
public void ignorableWhitespace (char ch, int
start,
int end)
25XML Schema
- DTDs have drawbacks
- They can only define the element structure and
attributes - They cannot define any database-like constraints
for elements - Value (min, max, etc.)
- Type (integer, string, etc.)
- DTDs are not written in XML and cannot thus be
processed with the same tools as XML documents,
XSL(T), etc. - XML Schema
- Is written in XML
- Avoids most of the DTD drawbacks
26XML Schema
- XML Schema Part 1 Structures
- Element structure definition as with DTD
Elements, attributes, also enhanced ways to
control structures - XML Schema Part 2 Datatypes
- Primitive datatypes (string, boolean, float,
etc.) - Derived datatypes from primitive datatypes (time,
recurringDate) - Constraining facets for each datatype (minLength,
maxLength, pattern, precision, etc.) - Information about Schemas
- http//www.w3c.org/XML/Schema/
27Complex and simple types
- complex types allow elements in their content
and may have attributes - simple types cannot have element content and
cannot have attributes
28Reminder DTD declarations
- lt!ELEMENT name (fname, lname)gt
- lt!ELEMENT address (name, street, (city, state,
zipcode) (zipcode, city))gt - lt!ELEMENT contact
(address, phone, email?)gt - lt!ELEMENT contact2 (address
phone email)gt
29Example USAddress type
ltxsdcomplexType nameUSAddress gt
ltxsdsequencegt ltxsdelement namename
typexsdstring /gt ltxsdelement
namestreet typexsdstring /gt
ltxsdelement namecity typexsdstring /gt
ltxsdelement namestate typexsdstring
/gt ltxsdelement namezip
typexsddecimal /gt lt/xsdsequencegt
ltxsdattribute namecountry typexsdNMTOKEN
usefixed valueUS
/gt lt/xsdcomplexTypegt
30Example PurchaseOrderType
ltxsdcomplexType namePurchaseOrderTypegt
ltxsdsequencegt ltxsdelement
nameshipTo typeUSAddress /gt
ltxsdelement namebillTo typeUSAddress
/gt ltxsdelement refcomment
minOccurs0 /gt ltxsdelement
nameitems typeItems /gt
lt/xsdsequencegt ltxsdattribute
nameorderDate typexsddate
/gt lt/xsdcomplexTypegt
31Notes
- element declarations for shipTo and billTo
associate different element names with the same
complex type - attribute declarations must reference simple
types - element comment declared elsewhere in the schema
(here reference only)
32 continues
- element is optional, if minOccurs 0
- maximum number of times an element may appear
maxOccurs - attributes may appear once or not at all
- use attribute is used in an attribute declaration
to indicate whether the attribute is required or
optional, and if optional, whether the value is
fixed or whether there is a default
33More examples
ltitemsgt ltitem partNum"872-AA"gt
ltproductNamegtLawnmowerlt/productNamegt
ltquantitygt1lt/quantitygt ltpricegt148.95lt/pricegt
ltcommentgtConfirm this is
electriclt/commentgt lt/itemgt ltitem
partNum"926-AA"gt ltproductNamegtBaby
Monitorlt/productNamegt ltquantitygt1lt/quantitygt
ltpricegt39.98lt/pricegt
ltshipDategt1999-05-21lt/shipDategt lt/itemgt
lt/itemsgt
34ltxsdcomplexType name"Items"gt ltxsdelement
name"item" minOccurs"0
maxOccurs"unbounded"gt ltxsdcomplexTypegt
ltxsdelement name"quantity"gt
ltxsdsimpleType base"xsdpositiveInteger"gt
ltxsdmaxExclusive value"100"/gt
lt/xsdsimpleTypegt lt/xsdelementgt
ltxsdelement name"price" type"xsddecimal"/gt
ltxsdelement ref"comment" minOccurs"0"/gt
ltxsdelement name"shipDate" type"xsddate
minOccurs"0"/gt
ltxsdattribute name"partNum" type"Sku"/gt
lt/xsdcomplexTypegt lt/xsdelementgt lt/xsdcomplexT
ypegt ltxsdsimpleType nameSkugt ltxsdpattern
value"\d3-A-Z2"/gt lt/xsdsimpleTypegt
35Patterns
ltxsdsimpleType nameSkugt ltxsdrestriction
basexsdstringgt ltxsdpattern
value"\d3-A-Z2"/gt ltxsdrestrictiongt lt/xsd
simpleTypegt
- three digits followed by a hyphen followed by
two upper-case ASCII letters
36Building content models
- ltxsdsequencegt fixed order
- ltxsdchoicegt (1) choice of alternatives
- ltxsdgroupgt grouping (also named)
- ltxsdallgt no order specified
37Null values
- A missing element may mean many things unknown,
not applicable - an attribute to indicate that the element content
is null
in schema ltxsdelement nameshipDate
typexsddate
nullabletrue /gt in
document ltshipDate xsinulltruegtlt/shipDategt
38Specifying uniqueness
- XML Schema enables to indicate that any attribute
or element value must be unique within a certain
scope - unique element first select a set of elements,
then identify the attribute of element field
relative to each selected element that has to be
unique within the scope of the set of selected
elements
39Defining keys and their references
- Also keys and key references can be defined
ltkey namepNumKeygt ltselectorgtparts/partlt/sel
ectorgt ltfieldgt_at_numberlt/fieldgt lt/keygt ltkeyref
namedummy2 referpNumKeygt
ltselectorgtregions/zip/partlt/selectorgt
ltfieldgt_at_numberlt/fieldgt lt/keyrefgt
40XML Query Languages
- Currently
- There is no recommendation/standard available,
only drafts - Different suggestions given in 1998, work in
progress - XML Query Requirements
- Requirements draft 16.8.2000
- Query language until the end of 2000
- XML Query Data Model
- Draft 11.5.2000
- More on XML Query Languages
- http//www.w3.org/XML/Query/
41XML Query Languages
- Required features of an XML query language
- Support operations (selection, projection,
aggregation, sorting, etc.) on all data types - Choose a part of the data based on content or
structure - Also operations on hierarchy and sequence of
document structures - Structural preservation and transformation
- Preserve the relative hierarchy and sequence of
input document structures in the query results - Transform XML structures and create new XML
structures - Combination and joining
- Combine related information from different parts
of a given document or from multiple documents
42XML Query Languages
- Required features of an XML query language
(cont'd) - Closure property
- The result of an XML document query is also an
XML document (usually not valid but well-formed) - The results of a query can be used as input to
another query - Notions
- HTML is layout-oriented, queries can not be
efficiently carried out - XML is not layout-oriented but is based on
representing structure, DTDs and structure
information can be used in queries - XML query languages are still under construction,
but prototype languages exist (e.g., XML-QL, XQL,
Lore)
43XML Query Languages
- We want our query to collect elements from
manufacturer documents (in temp.database.xml)
listing manufacturer's name, year, models,
vendors, price, etc. to create new ltcargt elements - The results should list their make, model,
vendor, rank, and price (in this order) - Lorel
Select xml(car(select X.vehicle.make,
X.vehicle.model,
X.vehicle.vendor, X.manufacturer.rank,
X.vehicle.price from
temp.database.xml X))
44XML Query Languages
WHERE ltmanufacturergt ltmn_namegtmnlt/mn_namegt
ltvehiclemodelgt ltmodelgt
ltmo_namegtmonlt/mo_namegt ltrankgtrlt/rankgt
lt/modelgt ltvehiclegt ltpricegtylt/pricegt
ltvendorgtmnlt/vendorgt lt/vehiclegt
lt/vehiclemodelgt lt/manufacturergt IN
www.nhcs\temp.database.xml
CONSTRUCT ltcargt ltmakegtmnlt/makegt
ltmo_namegtmonlt/mo_namegt ltvendorgtvlt/vendorgt
ltrankgtrlt/rankgt ltpricegtylt/pricegt lt/cargt