Title: SAX : Simple API for XML
1SAX Simple API for XML
include ltqxml.hgt class StructureParser public
QXmlDefaultHandler public bool
startDocument() bool startElement( const
QString, const QString, const QString ,
const QXmlAttributes )
bool endElement( const QString, const QString,
const QString ) private QString
indent bool StructureParserstartDocument()
indent "" return TRUE bool
StructureParserstartElement( const QString,
const, QString,const QString QName,
const QXmlAttributes
) cout ltlt indent ltlt QName ltlt endl
indent " " return TRUE bool
StructureParserendElement( const Qstring,
const QString, const QString )
indent.remove( 0, 4 ) return TRUE int
main( int argc, char argv )
StructureParser handler QFile xmlFile(
argv1 ) QXmlInputSource source( xmlFile
) QXmlSimpleReader reader
reader.setContentHandler( handler )
reader.parse( source ) return 0
2SAX Simple API for XML
- Event-based
- Drawbacks
- No random access
- Complex search can be difficult to implement
- DTD not available
- Lexical information not available (comments,...)
- Read-Only
- Unsupported in actual browser
3Namespaces
- Solution to a drawback of the DTD
- Ambiguity (Name collision)
- Tag ltmonitorgt has a different meaning in the
context of a music studio vocabulary, - a computer related vocabulary or a nuclear power
plant vocabulary - Declaring a Namespace
- xmlnsname"URI"
ltCollection xmlnscatalog"http//myserver.org/Pub
Catalog.dtd"gt ltcatalogBook /gt lt/Collectiongt
4Namespaces
- Scope Namespace declaration have scope in the
same way that variable declaration do in
programming languages. - Default scope defined by omitting the prefix
declaration
ltCollection xmlns"http//myserver.org/PubCatalog.
dtd"gt ltBook /gt lttable xmlnshtml"http//www.w3.o
rg/TR/REC/REC-html40"gt lthtmltrgt lthtmltdgtXM
L in 2 hourslt/htmltdgt lt/htmltrgt lt/tablegt lt
/Collectiongt
5Schemas
- Metadata (data about data)
- Replacement of the DTD proposal
- Advantages
- Use the same syntax as XML
- Ability to explore the DTD using DOM
- Extensible
- Strong Datatyping
- Support inheritance
6Schemas RDF
- Academic proposal Resource Description
Framework - Model and Syntax
- RDF Schemas
- Oriented around 3 concepts resources,
properties and statements - Resource Any entity that can be referred by an
URI - Properties
- Constraints limit the types of values that can be
assigned to an property - Statements description of relations between
resources - Very powerful but laborious
7Schemas XML-Data
- Microsoft proposal XML Data
- Syntactic schemas set of rules describing how
to write documents using markups - Conceptual schemas describe relationships
between concepts or objects - Usable for description of an SQL query
- Written in XML
- Strong data typing
- Constraints on allowed values
- Open or closed models
- Expanded ID / IDREF
- Relation
- Alias
- Correlative
8XML Schemas vs DTD
DTD
lt!ELEMENT Name (Title?, First, Middle?, Last,
Suffix)gt lt!ELEMENT Title (PCDATA)gt lt!ELEMENT
First (PCDATA)gt lt!ELEMENT Middle (PCDATAgt lt!ELEM
ENT Last (PCDATA)gt lt!ELEMENT Suffix (PCDATA)gt
XML Schema
ltSchema ...gt ltelement name"Name"gt
lttypegt ltelement name"Title"
type"string" minOccurs"0" maxOccurs"1"/gt
ltelement name"First" type"string" /gt
ltelement name"Middle" type"string"
minOccurs"0" maxOccurs"1"/gt ltelement
name"Last" type"string" /gt ltelement
name"Suffix" type"string" minOccurs"0"
maxOccurs"1"/gt lt/typegt lt/elementgt lt/Schemagt
9XML Schemas
- Preamble
- Simple type definition constraint on
information that do not include elements
ltschema targetNS"http//myserver.org/schemaname.x
sd" version"1.0" xmlns"http//www.w3.org/
1999/XMLSchema"gt .... lt/schemagt
ltdatatype name"smallInt" source"integer"/gt
ltMinExclusive value"0" /gt ltMaxExclusive
value"10" /gt lt/datatypegt
10XML Schemas
- Complex type definition
- Attributes declaration
- Name mandatory
- minOccurs
- maxOccurs
- default
- Fixed
lttype name"Collection"gt ltelement ... /gt
ltattribute ... /gt lt/typegt
ltattribute name"serialNo" type"integer"
default"0" /gt
11XML Schemas
- Attribute groups
- Content Models
- ltgroupgt
- unconstrained Content of any kind
- empty Empty element
- mixed Elements and character data
- Ordering
- seq Elements must follow an exact order
- choice Exactly one element appears
ltattributeGroup name"BookParameters"gt
ltattribute name"serialNum" type"integer"
default"0" /gt ltattribute name"ISBN"
type"string" /gt lt/attributeGroupgt lttype
name"Book"gt ltattributeGroup
ref"BookParameters" /gt lt/typegt
12XML Schemas
- Element declarations
- ltelement name"elementName" type"elementType" /gt
- Primitive Types
- string
- boolean true / false
- float 32 bit floating point
- double 64 bit floating point
- decimal numeric type smaller range than float
- timeInstant combination of time and date encoded
as string - (YYYY-MM-DDThhmmss.sss )
- timeDuration combination of time and date encoded
as PnYnMnDTnHnMnS - recurringInstant Recurring instant of time
(timeInstant pattern with replacing - any period not defined )
- Generated types
- NMTOKEN
- ID
- IDREF
- ENTITY
- integer
13XML Schemas
- XML Schema not fully implemented now
- XML Data Reduced (Microsoft's IE 5)
- Partial port of XML Schemas (IBM's XML4J)
14Linking
- Problems with HTML hyperlinks
- Embedded in the source document
- One-way navigation
- Connects only two ressources
- Does not specify the behaviour of the rendering
engine - Xlink
- Simple links
- lt!ELEMENT xlinktype xlinkhref xlinkrole
xlinktitle xlinkshow xlinkactuate /gt - xlinktype always "xlinksimple"
- xlinkhref destination URI of the link
- xlinkrole function of the element in the link
- xlinktitle title of the link
- xlink show new target content is to be
rendered in a separate context - replace target content
should repace the source content - embedded content will be
embedded at link position - xlinkactuate onRequest user has to trigger
the link - onLoad link is activated
at load time
15Linking
ltxlinksimple xmlnsxlink"http//www.w3.org/1999/
xlink/namespace/"
xlinkhref"books.xml"
xlinktitle"Books list" xlinkshow"new"
xlinkactuate"onRequest" /gt
- Extended links
- lt!ELEMENT xlinktype (xlinktitle, xlinkarc,
xlinklocator, - (xlinkarc xlinklocator),
xlinkresource)gt - xlinktype always 'extended'
- xlinktitle title of the link
- xlinklocator location participating in an
extendd link - xlinkarc defines connections between 2 locators
- xlinkfrom defines the start point of the link
- xlinkto defines the end point of the link
- xlinkresource inline elements of an extended
link
16Linking
ltxlinkextended xmlnsxlink"http//www.w3.org/199
9/xlink/namespace/"
role"bookstructure"
title"Book structure" gt ltxlinklocator
href"book.xml"
role"parent"
title"XML first steps" /gt ltxlinklocator
href"chapter1.xml" role"part"
title"Chapter 1" /gt ltxlinklocator
href"chapter2.xml" role"part"
title"Chapter 2" /gt ltxlinkarc from"parent"
to"part" show"replace" actuate"onRequest"
/gt lt/xlinkextendedgt
- Defines the connections between
- 'XML first steps' and 'Chapter 1'
- 'XML first steps' and 'Chapter 2'
17Linking
- Out-of-line extended links
- To define connection between non-xml or read-only
objects - Role attribute set to 'xlinkexternal-linkset'
- Xpointer Pointing to part of an XML document
- Specification in an URI
- Childs sequence identification
http//myserver.org/catalog.xmlxpointer(book1) p
oints to the element with ID"book1"
http//myserver.org/catalog/xml/1/2/4 points
the 1 element of the document, then the second
child of this element then to the fourth child of
this element
18Linking
- Xpath
- Location steps
- Way to select nodes from XML document
- Works on current node
- Axis
- Child contains all the children of current node
- Attribute attribute nodes of the current node
- Namespace
- Descendant all the children
- Parent
- Ancestor the parent node, grandparent node, ...
- Node tests
- Allow specific elements to be selected
- node()
- text()
- comment()
- Predicate
- Boolean condition
- numbers
19Linking
- Node set functions
- number position() return the current position
- number count(node set) return the number of
nodes - Node-set id(object) return node set containing
nodes with ID matching - those
in the object parameter - String functions
- string concat () concatenates of the arguments
- Start-with (string,string) true if the first
string start with the second - Contains (string,string) true if the first
string contains the second - Boolean functions
- boolean not(boolean) true if it's argument is
false - boolean boolean(object) returns a boolean
dependant on the object
20Linking
- Location set specification
- axisnode testpredicate
- Xpointer extensions to Xpath
- Point particular location within character
content - Range XML content between two points
- Additional functions
childBookposition() lt 3 points to
the first 3 ltBookgt child elements
21Querying
- Ability to access a portion of an XML document
and process it - XML-QL
- Tries to mimic the SQL language
CONSTRUCT ltTitlesgt WHERE ltBookgt
ltTitlegttlt/Titlegt ltBookgt IN
"http//myserver.org/catalog.xml" CONSTRUCT
ltTitlegttlt/Titlegt lt/Titlesgt
returns ltTitlesgt ltTitlegt XML
First steps lt/Titlegt ltTitlegt ...
lt/Titlegt lt/Titlesgt
22Querying
- XQL
- Xpath and XSLT
- Row-wise restriction
//Title //BookBook.Title"XML First steps"
ltxslstylesheet xmlnsxsl"http//www.w3.org/1999/
XSL/Transform" version"1.0"gt ltxsltemplate
match"/Catalog"gt ltxslcopygt
ltxslfor-each select"BooksTitle'XML First
steps'"gt ltxslcopygt
ltxslapply-template name"childnodes"/gt
lt/xslcopygt lt/xslfor-eachgt
lt/xslcopygt lt/xsltemplategt ltxsltemplate
name"childnodes" match""gt ltxslcopygt
ltxslapply-templates name"childnodes" /gt
lt/xslcopygt lt/xsltemplategt lt/xslstylesheetgt
23Querying
- Column-wise restriction
- Summarizing
- Sorting
ltxslapply-template select"Title"
name"childnodes"/gt
ltxslcopygt ltxslelement name"totalPages"gt
ltxslvalue-of select"sum(//Book./Title'Xm
l'/Pages)"/gt lt/xslelementgt lt/xslcopygt
ltxslfor-each select...gt ltxslsort
select"Title"/gt ... lt/xslfor-eachgt
24Querying
- Other functions available
- Inner joins
- Outer joins
- Returning informations from more than one source
- Procedural processing
25Transforming XML (XSLT)
- Reasons
- Structural transformations
- Translation from one vocabulary to another
- Creating dynamic documents
- Reordering, filtering
- Transformation for rendering
- Converting to HTML, WML, ...
- Manipulation of structure not documents
- Use XSL stylesheets
26Transforming XML
lt?xml version"1.0" ?gt ltxslstylesheet
version"1.0" xmlnsxsl"http//www.w3.org/19
99/XSL/Transform"gt ltxsloutput method"html"
/gt ltxsltemplate match"/"gt lthtmlgt
ltheadgt lttitlegtBook Catalog lt/titlegt
lt/headgt ltbodygt
ltxslapply-templates select"//Title" /gt
lt/bodygt lt/htmlgt lt/xsltemplategt ltxsltemplate
match"Title" gt ltDIV style"font-family
Arial font-weight 700
font-size 14pt "gt ltSPANgt
ltxslvalue-of select"./" /gt
lt/SPANgt lt/DIVgt lt/xsltemplategt lt/xslstyleshee
tgt
lt?xml version"1.0"?gt ltCataloggt ltBookgt
ltTitlegt XML First steps lt/Titlegt
ltPagesgt 55 lt/Pagesgt ltAbstractgt First
approach to XML lt/Abstractgt lt/Bookgt
ltBookgt ltTtitlegt Thinking in Java
lt/Titlegt ltPagesgt 475 lt/Titlegt
ltAbstractgt ... lt/Abstractgt lt/Bookgt
.... lt/Catalog
Catalog.xsl
27Transforming XML
lt!DOCTYPE html PUBLIC "-//W3C/DTD HTML 4.0
Transitional//EN"gt lthtmlgt ltheadgt lttitlegtBook
Cataloglt/titlegt lt/headgt ltbodygt ltDIV
style"font-family Arial
font-weight 700 font-size
14pt "gt ltSPANgt XML
First steps lt/SPANgt lt/DIVgt
ltDIV style"font-family Arial
font-weight 700 font-size
14pt "gt ltSPANgt
Thinking in Java lt/SPANgt lt/DIVgt
lt/bodygt lt/htmlgt
28Transforming XML
- XSLT functions
- ltxslstylesheetgt Definition of the stylesheet
- ltxsloutput method"html" /gt Desired output
- ltxsltemplate match"/"gt Template
- ltxsltemplate name"templateName" gt Named
template ( that can take arguments ) - ltxslcall-template name"templateName"
/gt Calling named template - ltxslapply-templates select"//Title"
/gt Applying sub template - ltxslvalue-of select"." /gt Value of the
current element - ltxslstrip-space elements"..." /gt To remove
white space elements - ltxslpreserve-space elements"..."/gt To keep
white space elements - ltxslinclude href"URI" /gt Inclusion of
external stylesheet - ltxslimport href"URI" /gt Inclusion
inheritance - ltxslfor-each select"..." gt Repetition
- ltxslsort select"..." gt Sorting
- ltxslif gt Condition (if construct)
- ltxslchoosegt Switch (if /elseif construct)
29Transforming XML
- XSLT functions (cont'd)
- ltxslsort select"..." gt Sorting
- ltxslif gt Condition (if construct)
- ltxslchoosegt Switch (if /elseif construct)
- ltxslnumber value"..." position"..."
gt Numbering - ltxslcopygt Copy of the XML structure
30Conclusions
- Pros
- XML is a highly portable data modeling language
- Focuses on inter-applications data-exchange
- Can reflect databases structure
- Permit easy querying and transforming
- Cons
- Not completely implemented
- May be laborious (DTD)
31References
- XML 1.0 Recommendation
- http//www.w3.org/XML
- DOM Level-2 specification
- http//www.w3.org/TR/DOM-Level-2
- SAX specification
- http//www.megginson.com/SAX/
- General information on XML
- http//www.xml.org