Title: XML once over lightly
1XML- once over lightly
- Keith Beattie
- Mary Thompson
- Monte Goode
- Karlo Berket
2Outline
- Introduction
- Namespaces
- Xpath, Xpointer, Xinclude, XLink (ksb)
- XMLSchema Schema
- A few well known schemas
- Parsing
- Java tools
- Python tools
- Authoring tools
3eXtensible Markup Language XML
- XML structure and data expressed so that it can
be parsed programmatically, is relatively
human-readable, and is eXtensible - XML Schemas not just valid XML, but conforming
to some model
4XML - Intro
- Subset of SGML
- Goals
- Usable over the Internet (like HTML)
- Easy to write write tools for (authoring tools,
filters, translators, etc.) - Compatible with SGML (apply existing tools)
- Not dependant on a DTD
- Minimum of optional features, for compatibility
- Human readable (text, not binary)
- Terseness is of minimal importance
5XML - Definition
- Specification for creating new markup languages
(for structured documents) - XML does not define a set of tags you do that
- An XML document must be well-formed
(syntactically correct) - Tags properly nested, opened and closed, correct
use of entities, etc. - An XML document may optionally be validated
against a DTD and/or a Schema. - A DTD or Schema defines the new markup language
(but is not required).
6XML - Example
- lt?xml version'1.0' encoding'utf-8'?gt
- lt!DOCTYPE addressbook SYSTEM "addressbook.dtd"gt
- lt!-- ksb's addressbook --gt
- ltaddressbookgt
- ltpersongt
- ltnamegtKeith Jacksonlt/namegt
- ltphone type'moble'gt555-555-5555lt/phonegt
- ltphone type'land'gt999-999-9999lt/phonegt
- ltaddressgt1 Any Lane, Berkeley,
94111lt/addressgt - ltpasswordgtlt!CDATAd_at_n'tuegtltegtlt/passwordgt
- lt/persongt
- ltpersongt
- ltnamegtAndrew amp Lisa Wrightlt/namegt
- ltphone type'land'gt888-888-8888lt/phonegt
- ltphone type'moble'gt444-444-4444lt/phonegt
- ltaddress/gt
- lt/persongt
- lt/addressbookgt
7DTD - Example
- lt?xml version'1.0' encoding'utf-8'?gt
- lt!-- DTD for ksb's addressbook --gt
- lt!ELEMENT addressbook (person)gt
- lt!ELEMENT person (name, phone, address,
password?)gt - lt!ELEMENT name (PCDATA)gt
- lt!ELEMENT phone (PCDATA)gt
- lt!ATTLIST phone type (mobleland) REQUIREDgt
- lt!ELEMENT address ANYgt
- lt!ELEMENT password ANYgt
- Limited type definition
- No hierarchy of elements
- XML Schema addresses these shortcomings
8XML Namespaces Intro
- A means to avoid name conflicts between XML
elements and attributes - Elements and attributes can now have the form
ltprefixgtltlocal namegt where ltprefixgt is the
namespace id associated with a URI - Since URIs define either a physical or an
abstract resource it doesnt need to actually
exist
9XML Namespace URIs
- URIs have two general forms
- URL
- http//www.lbl.gov/akenti
- URN
- urnwww.lbl.govakenti
- urnuuidA941CFC3-736E-48E5-A691-C5B2FE036555
- A namespace URI is just a string with some
guarantee of uniqueness
10XML Namespace Example
- lt?xml version'1.0' encoding'utf-8'?gt
- lt!-- ksb's addressbook --gt
- ltksbaddressbook
- xmlnsksb'http//www.stobo.org/address
book - xmlnslbl'http//www.lbl.gov/addressbo
ok - xmlns'http//www.w3.org/addressbook'gt
- ltksbpersongt
- ltlblnamegtKeith Jacksonlt/lblnamegt
- ltphone ksbtype'moble'gt555-555-5555lt/phonegt
- ltphone ksbtype'land'gt999-999-9999lt/phonegt
- ltaddressgt1 Any Lane, Berkeley,
94111lt/addressgt - ltksbpasswordgtlt!CDATAd_at_n'tuemegtlt/ksbpa
sswordgt - lt/ksbpersongt
- lt/ksbaddressbookgt
11Some XML syntaxes
- XPath XML Path Language -
- searches an XML document using a path-like
string. (more on this later) - XPointer
- an addressing scheme for individual parts of an
XML document. think ltA NAME"foo"gt in html. - XInclude
- include for XML documents.
- XLink
- linking syntax for XML docs. think ltA
HREF"foo"gt in html.
12Xpath - Syntax
- XPath does not use XML syntax but its own query
strings - Simple example
- ./xp.py "person/phone" lt addressbook.xml
- ltElement Node Name'phone' with 1 attributes and
1 childrengt - ltElement Node Name'phone' with 1 attributes and
1 childrengt - ltElement Node Name'phone' with 1 attributes and
1 childrengt - ltElement Node Name'phone' with 1 attributes and
1 childrengt - Not as simple example
- ./xp.py "/addressbook/personname'Keith
Jackson'/phone_at_type'moble'/text()" lt
addressbook.xml - 555-555-5555
13Xpath - Intro
- The grep (or query language) of XML
- An XPath expression is applied to an XML document
(or DOM node) and returns one of the following - A Collection of nodes
- A Boolean value
- A floating-point value
- A String
- Used primarily in transforms (XSLT) and XPointer
14Xpath - Syntax
- XPath does not use XML syntax but its own query
strings - Simple example
- ./xp.py "person/phone" lt addressbook.xml
- ltElement Node Name'phone' with 1 attributes and
1 childrengt - ltElement Node Name'phone' with 1 attributes and
1 childrengt - ltElement Node Name'phone' with 1 attributes and
1 childrengt - ltElement Node Name'phone' with 1 attributes and
1 childrengt - Not as simple example
- ./xp.py "/addressbook/personname'Keith
Jackson'/phone_at_type'moble'/text()" lt
addressbook.xml - 555-555-5555
15Well Known XML Schemas
- XML Schema language to define other schemas
- XSLT (Extensible StyleSheet Language
Transformation) XML-to-XML translation - SOAP (Simple Object Access Protocol) RPC
mechanism and serialization format (typically
over HTTP) - WSDL (Web Services Description Language)
network services as endpoints operating on
messages
16XML Schema
- Language to define other schema
- http//www.w3.org/TR/xmlschema-0/
- Defines
- element, sequence, choice, attributes,
complexType, simpleType - Data types
- int, long, string, token, dateTime
- http//www.w3.org/TR/xmlschema-2/
17Example Schema
ltxsschema xmlnsxshttp//www.w3.org/2001/XMLSch
ema" targetNamespace"http//www-stobo.org/address
book"gt ltxselement nameaddressbookgt
ltxscomplexTypegt ltxssequencegt
ltxselement nameperson typepersonTypegt
lt/xssequencegt lt/xscomplexTypegt lt/xselementgt
ltxscomplexType namepersonTypegt
ltxssequencegt ltxselement namename
typestringgt ltxselement namephone
minOccurs0 maxOccursunbounded
typephoneTypegt ltxselement nameaddress
maxOccursunbounded typestringgt
ltxselement namepassword minOccurs0gt
lt/xssequencegt lt/xscomplexTypegt
18Example Schema (cont)
ltxssimpleType namephoneTypegt
ltxsrestriction basexsstringgt
lt/xsrestrictiongt ltxsattribute nametype
userequired ltxssimpleTypegt
ltxsrestriction basexsstringgt
ltxsenumeration valuemoble/gt
ltxsenumeration valueland/gt
lt/xsrestrictiongt lt/xssimpleTypegt
lt/xsattributegt lt/xscomplexTypegt
19XSL XSLT
- XSL - XML Stylesheet
- a vocabulary for specifying formatting
- XSLT - XSL Transforms
- Used to map one XML schema to another
- Source tree -gt result tree
20SOAP
- Simple Object Access Protocol
- originally intended to implement RPC
- now used for any kind of XML message
- http//www.w3.org/TR/soap12-part0,part1,part1
- Envelope
- Header, Body, Fault
- Body and Headers are defined as sequences of
anything
21SOAP (cont)
- SOAP processing model
- assumes a distributed model where the messages go
from a sender node through some number of
intermediary nodes to a Ultimate receiver nodes. - The nodes can be addressed by role names and each
can do some processing of the message
22WSDL
- Web Services Description Language
- http//www.w3.org/TR/wsdl12
- Describes the messages
- Operations - exchanges of messages
- PortTypes - collection of operations
- There is a binding document that describes SOAP,
HTTP and MIME bindings
23Resource Description Framework - RDF
- W3C Semantic Web Activity
- provides a model for metadata
- for use in knowlege representation systems
- describes the context of a document, so that
documents can be searched for by content. -
applies to non-text documents
24XML Security Languages
- Define vocabularies to express
- Authentication assertions or identity
credentials - Authorization assertions or licenses
- Attribute assertions - attribute credentials
- Enabling secure XML messages
- Queries and Responses about security questions
- Security policies - requirements of the resource
- Security contexts - information about the user
25XML Security Schemas
- XML Signature (W3C and IETF)
- Digital signatures for XML transactions
- XML Encryption (W3C)
- Encrypting data and representing the results in
XML - XKMS - key management (W3C)
- (MS, Verisign) delegate signature processing to
a trust server on WWW (for thin or mobile clients)
26XML Digital Signature
- Specifies XML digital signature processing rules
and syntax - Digital Signatures provide
- Message integrity
- Message authentication
- Signer authentication
- Signed data can be within the XML that includes
the signature or elsewhere - Can sign all or part of a document
27XML Encryption
- Process for encrypting data and representing the
result in XML - Data can be
- arbitrary data (including XML document)
- an XML element or element content
- An XML encryption element
- contains or references the cipher data
- specifies the encryption method
- specifies key info.
28XML Schema Reference Card
- Qualifier URN
- xs http//www.w3.org/2001/XMLSchema
- S http//www.w3.org/2002/06/soap-envelope
- xsl http//www.w3.org/1999/XSL/Format
- xslt http//www.w3.org/1999/XSL/Transform
- Xpath http//www.w3.org/TR/xpath
- dsig http//www.w3.org/2000/09/xmldsig
- xenc http//www.w3.org/2001/04/xmlenc
- xkms http//www.xkms.org/schema/xkms-2001-01-20
- saml urnoasisnamestcSAML1.0assertion
- samlp urnoasisnamestcSAML1.0protocol
- xacml urnoasisnamestcxacml1.0policy
- xacml-content urnoasisnamestcxacml1.0context
29DOM - Document Object Model (W3C recommendation)
- DOM Structure
- Document, element, entity reference, text...
- Nodes, NodeList, NamedNodeMap
- Fundamental Interfaces
- Document
- createDocument, createAttribute, createElement,
getElementByTagName - NoderemoveChild
- NodeListgetNamedItem
30DOM Parsers
- Parses the entire document into a DOM tree.
- Provide functions to examine pieces of the tree
- Provides a createDocument interface which
generates a XML document from the DOM tree
31Simple API for XMLSAX
- Started as a Java Event-based parser for XML
- Reports parsing events through call-backs.
- startElement(localName,qName,Attributes)
- Attributes getName, getType, getValue
- characters
- http//www.saxproject.org/ distributed through
SourceForge
32Parsers
- Xerces - Apache
- Java, C Perl, COM wrappers
- Does both DOM and SAX parsing
- Validates
- Java only
- IBM XML4J, Microsoft, Oracle, Netscape
- Suns JAXP (DOM and SAX)
- C
- XPAT - c (non validiating)
33Java Tools
- J2SE (since 1.4) was JAXP1.0
- SAX and DOM interfaces
- Interface for XSLT (generic, SAX, DOM, streams)
- Java WSDP
- JAXP 1.2.2 parsers and XSLT
- JAXB schema -gt Java classes
- JAXR access to XML registries
- JAX-RPC SOAP-RPC
- SAAJ Soap with attachments
34Java Tools
- Apache
- Xerces SAX and DOM parser
- Xalan XSLT and Xpath
- Xindice native XML database
- IBM
- XML Processing Plus Plus stream-based APIs
- XML Parser for Java
- X-IT batch processing and transformation of XML
files
35Python Tools
- Python Standard Library
- provides both a SAX and DOM interface
- interface to the Expat parser (Expat is a stream
and callback based parser similar to SAX). - - PyXML a richer set of xml tools
- fuller featured DOM implementations
- adapters to allow various parsers to use the SAX
interface, - XPath expression parsing and XSLT transform
tools. - link http//pyxml.sourceforge.net/
36More Python Tools
- 4Suite
- even more SAX/DOM implementations
- plus XPath, XPointer,XInclude, XLink
- RDF support including an XML/RDF data
respository and server (data access, query,
transfomation, etc). - link http//www.4suite.org/
37Python Soap Support
- SOAPy - WSDL/Web Service oriented SOAP lib.
- Provides a stand-alone http-like server to run
your soap services. - link http//soapy.sourceforge.net/
- ZSI (Zolera Soap Infrastructure)
- offers robust support for strict typing
- conversion between native python datatypes and
soap syntax. - provides a stand-alone http-like server to run
your soap services. - link http//pywebsvcs.sourceforge.net/
38Editing Tools
- PSGML major mode for Emacs
- Navigation, colorization, validation and other
editing functions. - Supports loading of DTDs, maintaining a DTD
library, and some editing based on said. - Done in a standard emacs-like text view.
- http//sourceforge.net/projects/psgml
- Gui XML editors
- Offers tree view of XML document with text box
editing fields. - Marginal utility.