Title: An XML Tutorial
1An XML Tutorial
Hannes Marais Systems Research Center, Palo
Alto, California marais_at_pa.dec.com
2What is XML ?
Extensible Markup Language (XML) is a W3C
proposed recommendation for a file format
to easily and cheaply distribute electronic
documents on the World Wide Web
3Example Documents
- Books
- User manuals
- Product catalogs
- Order forms
- Medical documents
- Tax forms
- Mathematical formulas
- Chemical formulas
- Drug descriptions
- Dictionaries
- Newspapers
- Style sheets
- Musical scores
- Library indices
- Protein sequences
- Bibliographies
- Database schemas
4Example E-Mail Document in XML
lt?XML VERSION"1.0"?gt lt!-- This is a sample email
data file --gt lt!DOCTYPE mail SYSTEM "email.dtd"
lt!ENTITY ingo "Ingo.Macherius_at_tu-clau
sthal.de" gt lt!ENTITY henning
"hb_at_ix.heise.de" gt gt ltmailgt
ltRecipientgthenninglt/Recipientgt
ltSendergtingolt/Sendergt ltDategtMon, 21 Apr
1997 092755 0200lt/Dategt ltSubjectgtXML
literaturelt/Subjectgt ltTextbodygt ltpgtHello
Mr ltNamegtBehmelt/Namegt,lt/pgt ltpgtPlease read
ltNamegtJon Bosaklt/Namegt's introductory textlt/pgt
ltpgt"SGML, Java and the Future of the
Web"lt/pgt ltpgtBest wishes,lt/pgt
ltpgtltNamegtIngo Macheriuslt/Namegtlt/pgt lt/Textbodygt lt/m
ailgt
5XML Features in a Nutshell
- Extensibility
- HTML with user-defined tags
- Can be used in any domain
- Structure
- Can represent trees and graph structures
(database schemas, OO hierarchies, ) - Validation
- Consuming applications can check forstructural
validity on importation
6XML Development Timeline
- 1986Standard Generalized Markup Language (SGML),
ISO 8879-1986 - Nov 1995HTML 2.0
- Nov 1996Simplified / stripped-down SGML draft
(dubbed XML) - Jan 1997HTML 3.2
- Aug 1997XML Working Draft
- Dec 1997XML1.0 Proposed RecommendationHTML 4.0
Recommendation
7Architectural Dependencies
Instances /Domains
RDF
CDF
CML
...
HTML
...
XML
SGML
8Overview of the Tutorial
- W3C Design Goals of XML
- The XML Format
- Example Applications
- Conclusions
9XML shall ...
- Be usable over the Internet
- Support a wide variety of applications
- Be SGML compatible
- Be easy to write
- Be easy to process by program
- Have no optional features
- Be human-legible and clear
- Be designed quickly
- Have a formal and concise design
10XML Markup Overview
- XML documents contain
- Character Data
- Comments Escaped content
- Processing instructions
- Elements
- Document Type Definition Markup
11Character Data
- Unicode (ISO 10646) characters without markup
- Example
ltpgtHello Mr ltNamegtBehmelt/Namegt,lt/pgt ltpgtPlease
read ltNamegtJon Bosaklt/Namegt's introductory
textlt/pgt ltpgt"SGML, Java and the Future of the
Web"lt/pgt ltpgtBest wishes,lt/pgt ltpgtltNamegtIngo
Macheriuslt/Namegtlt/pgt
12Comments Escaped Content
- Ignored during processing
- Example
lt!-- This is a sample email data file --gt
lt!CDATA any markup here gt
13Processing Instructions
- Special instructions to the XML consumer
application - Example
lt?XML VERSION"1.0 ?gt
14Elements
- Consists of a start tag, body, and end tag
- Body can be any other markup (even nested)
- Examples
- Empty element shorthand
ltpgtBest wishes,lt/pgt ltpgtltNamegtIngo
Macheriuslt/Namegtlt/pgtltpgtlt/pgt
ltp/gt
15Attributes
- Optional (Attribute, Value) pairs associated with
elements - Example
ltperson firstnameJohn Bosak surnameJohngt
ltaddressgtSun MicroSystemslt/addressgt
lte-mailgtbosak_at_sun.comlt/e-mailgt lt/persongt
16Doc Type Declarations
- Identifies the XML instance being used with
areference to the Document Type Definition (DTD) - Example
lt!DOCTYPE mail SYSTEM "email.dtd"
lt!ENTITY ingo "Ingo.Macherius_at_tu-clausthal.de"
gt lt!ENTITY henning "hb_at_ix.heise.de" gt gt
17Document Type Definition (DTD)
- Identifies the syntax of the XML flavor being
used,i.e. CDF, RDF, CML, ... - Meta-information about the document contents
- Valid element names
- Valid attribute names and values
- How elements can nest in each other
- Typically the DTD is stored in a separate
document - DTD does not say anything about document
semantics
18Well-formed vs. Valid Documents
- Well-formed
- Conforms to the basic XML syntax
- Can be parsed without regard to the DTD
- Valid
- Well-formed
- Conforms to its DTD
19DTD Element Declarations
- Specifies a valid element and its valid
contents(What can be nested inside the element?) - Uses regular expressions to define valid contents
- Examples
lt!ELEMENT br EMPTYgt // empty element lt!ELEMENT
p ANYgt // allows everything lt!ELEMENT mail
(subject from to textbody) )
20DTD Attribute List Declarations
- Defines allowed attribute names and values of an
element - Example
lt!ATTLIST list listtype (bulletsorderedglossary
) glossary name CDATA
REQUIRED gt Name Type
Default value
21DTD Some Attribute Types
- CDATA - Any value
- ID - Unique identifier for the XML Element
- IDREF - Reference to an element with a specific
ID - IDREFS - Sequence of IDREFs
-
22Quick review
- Comments Escaped content
- lt!-- gt lt!CDATA gt
- Processing instructions
- lt?XML ?gt
- Elements
- ltA hrefhttp//www.w3.orggtW3Clt/Agt
- Document Type Definition Markup
- DOCTYPE, ELEMENT, ATTLIST
23Convention Data Structures
ltnode namexgt ltnode nameagt This is a.
lt/nodegt ltnode nameb/gtlt/nodegt
ltnode idnode101gt This is 101.lt/nodegtltstar
t refnode101gt
24Convention Link identification
- Convention for identifying and displaying links
(URLs) - Tells how to display links
- Embed, replace page, new window
- Examples
ltA XML-LINKSIMPLE HREFhttp//www.w3.org
SHOWEMBEDgt W3Clt/Agt ltLINKSET
XML-LINKEXTENDEDgt ltLINK XML-LINKLOCATOR
HREFgt ltLINK XML-LINKLOCATOR HREFgt
lt/LINKSETgt
25Convention Extended Pointers (XPointers)
- String that identifies a specific element in a
document - Can be used wherever URLs are
- XPointer traces a relative path through the XML
parse tree - Expressible relationships
- Child, parent, descendant, ancestor, preceding,
following, sibling, ... - Example
CHILD(3, DIV1)CHILD(4, DIV2)CHILD(29, P) means
the 29th paragraph(P) of the 4th
subdivision(DIV2) of the 3rd division (DIV1)
26Applications of XML
- Content Definition Format (CDF)
- XML/EDI
- Extensible Style Language (XSL)
27Example Content Definition Format
- Domain Internet Push Technology
- Defined by Microsoft
- PointCast is a major user
- CDF file on a web site refers a set of newspaper
articles (in HTML) called a channel - Client periodically fetches CDF file from server,
then fetches newspaper articles described in CDF
file - Note Open Software Description (OSD) from
Microsoftdoes the same for application
distribution
28CDF Example Document
ltCHANNEL Title "AUFORA" LongName "AUFORA
News" Abstract "AUFORA offers the
latest UFO and astronomy news. We do
not subscribe to the sensationalism which
plagues UFOlogy today, rather we investigate in
an unbiased, objective, and scientific manner."
InfoURI "http//www.aufora.org/ SELF
"http//www.aufora.org/misc/pc.cdf" ContentID
"11037" Frequency "24" Authenticate
"No"gt ltITEM Title "Mysterious
happenings in Australian Outback" HREF
"http//www.aufora.org/news/16.html" Type
"HTML" Show "Channel"
Precache "Yes" Authenticate "No" gt
lt/ITEMgt ... lt/CHANNELgt
29Example XML/EDI
- Domain Inter-business Electronic Commerce,
interoperation of XML and EDI applications - Maps XML ltgt EDIFACT messages
- XML message size EDIFACT message 1.35
- Mapping established purely with DTDs
- An example EDItEUR Book Ordering
Messages(Book trade distribution, book supply to
libraries, new serial subscriptions, subscription
renewals, despatch, claims)(by European Group
for Electronic Commerce in theBooks and Serial
Sectors)
30EDItEUR Message in XML
lt!DOCTYPE Book-Order PUBLIC "-//EDItEUR//DTD
Book Order Message//EN"gt ltBook-Order
Supplier"4012345000094" Send-to"http//www.bic
.org/order.in"gt lttitlegtEDItEUR Lite-EDI Book
Orderinglt/titlegt ltOrder-Nogt967634lt/Order-Nogt
ltMessage-Dategt19961002lt/Message-Dategt
ltBuyer-EANgt5412345000176lt/Buyer-EANgt
ltOrder-Line Reference-No"0528837"gt
ltISBNgt0316907235lt/ISBNgt ltAuthor-TitlegtLabaln,
Brian/Chromelt/Author-Titlegtlt/Book-Ordergt
31EDItEUR DTD
lt!ELEMENT Book-Order (title?, Order-No,
Message-Date, Buyer-EAN,
Order-Line) gt lt!ATTLIST Book-Order EDI-Prefix
CDATA FIXED "UNHME00579ORDERSD93AUNEAN007"
EDI-Suffix CDATA FIXED "UNSS'CNT22'UNT18M
E00579" Send-to CDATA REQUIRED
Supplier CDATA REQUIRED gt lt!ELEMENT Order-No
(PCDATA) gt lt!ATTLIST Order-No EDI-Prefix
CDATA FIXED "BGM220" Datatype NAME
FIXED "C8" Size NUMBER FIXED "8"
Title CDATA "Book Order No" gt
32Extensible Style Language (XSL)
- Domain Publishing
- Proposed by Microsoft Co
- Specifies how XML is to be presentedMaps XML
files to HTML files - Mapping specification written in XML itself (!)
- Rule-based mappingsExample Map lttitlegts
occurring in ltdiv1gts to ltH2gt - Future browsers will have XSL support
built-inXML XSL HTML4 Industrial strength
publication - Plugin for XSL available since January 1998
(MXSL)
33The book order displayed using XSL
34Example Web Bookstore
Bookstore Database
Web Server
Bookstore Client
Order this book
Browse
Let me indexall your books
Where can I findthe cheapest book?
Reorderbooks
I publish ...
Publisher
Publisher
AltaVista
XML
HTML or XMLXSL
35And some more applications ...
- Web Collections
- Meta Content Framework
- XML-Data
- Name Spaces in XML
- Chemical Markup Language
- Bioinformatic Sequence Markup Language (BSML)
- Open Financial Exchange
- Open Trading Protocol (OTP)
- Encoded Archival Description (EAD)
- Translation Memory Exchange (TMX)
- Scripting News in XML
- Tutorial Markup Language (TML)
- Mathematical Markup Language
- OpenTag Markup
- Metadata PICS
- Synchronized Multimedia Integration Language
(SMIL) - Web Interface Definition Language (WIDL)
- Information and Content Exchange (ICE)
- Ontology and Conceptual Knowledge Markup
Languages - Cold Fusion Markup Language (CFML)
- Java Speech Markup Language (JSML)
- Resource Description Framework (RDF)
Source www.sil.org
36Conclusions
- With XML the Internet will move towards
distributed document processing in a big way - Biggest opportunity for inter-business e-commerce
- Newcomers to Internet-based e-commerce will
probably skip EDI completely and go for the
cheaper XML-based approach - Less that 80K of 6.2M U.S. businesses use an EDI
system (2) - Only estimated 125K world-wide organizations use
EDI - EDI cost and complexity makes it an
insurmountable obstacle for small to medium
sized businesses - Protocols for next generation internet
businesses - HTTP
- SSL / Authentication
- XML
- HTML
37Software Opportunities
- XML export import from/to databases
- Mapping XML to legacy formats back
- Mapping between different XML DTDs
- DTDs and tools for specific application
areas(for example, Healthcare, product catalogs,
enterprise information repositories, supply chain
integration) - Tools and languages for processing XML
- WIDL, WebL, ...
38Further reading
World Wide Web Consortium (W3C), http//www.w3.org
/XML