Title: Extensible Markup Language XML
1Extensible Markup Language (XML)
2(No Transcript)
3Introduction
- Extensible Markup Language (XML) was developed in
1996 by the World - Wide Web Consortiumss (W3Cs) XML Working Group.
- XML is a portable, widely supported, open
technology for describing data. - It is a standard for storing data that is
exchanged between applications. - It is human and machine-readable.
- Examples
- Mathematical formulae
- Software configuration instructions
- Music
- Recipes
- Financial reports
4XML Documents
lt?xml version "1.0"?gt lt!-- Fig. 18.1
article.xml --gt lt!-- Article structured with
XML --gt ltarticlegt lttitlegtSimple
XMLlt/titlegt ltdategtDecember 6, 2001lt/dategt
ltauthorgt ltfirstNamegtTemlt/firstNamegt
ltlastNamegtNietolt/lastNamegt lt/authorgt
ltsummarygtXML is pretty easy.lt/summarygt
ltcontentgtIn this chapter, we present a wide
variety of examples that use XML.
lt/contentgt lt/articlegt
- What to look for
- XML declaration line 1
- Version information parameter
- Comments
- Tags ltgt
- Start tag, End Tag
- Elements
- Root element
- Element hierarchies
- Filename extension .xml
- XML parser msxml, Xerces
- Style sheet (next slide)
5XML Documents
- An XML document style sheet
- What to look for
- Container elements are marked by and signs
- A minus sign indicates the containers (parent)
child elements are being displayed. - Clicking on the sign collapses the display
- XML data have a tree-node structure format
6XML Documents
lt?xml version "1.0"?gt lt!-- Fig. 18.3
letter.xml --gt lt!-- Business letter
formatted with XML --gt ltlettergt ltcontact type
"from"gt ltnamegtJane Doelt/namegt
ltaddress1gtBox 12345lt/address1gt ltaddress2gt15
Any Ave.lt/address2gt ltcitygtOthertownlt/citygt
ltstategtOtherstatelt/stategt
ltzipgt67890lt/zipgt ltphonegt555-4321lt/phonegt
ltflag gender "F" /gt lt/contactgt
ltcontact type "to"gt ltnamegtJohn
Doelt/namegt ltaddress1gt123 Main
St.lt/address1gt ltaddress2gtlt/address2gt
ltcitygtAnytownlt/citygt ltstategtAnystatelt/stategt
ltzipgt12345lt/zipgt ltphonegt555-1234lt/pho
negt ltflag gender "M" /gt lt/contactgt
ltsalutationgtDear Sirlt/salutationgt
ltparagraphgtIt is our privilege to inform you
about our new database managed with
lttechnologygtXMLlt/technologygt. This new
system allows you to reduce the load on
your inventory list server by having the client
machine perform the work of sorting and
filtering the data. lt/paragraphgt
ltparagraphgtPlease visit our Web site for
availability and pricing.
lt/paragraphgt ltclosinggtSincerelylt/closinggt
ltsignaturegtMs. Doelt/signaturegt lt/lettergt
- What to look for
- Root element letter.
- Child elements
- contact
- salutation
- paragraph
- closing
- signature
- 3. Placement of data in attributes
- ltcontact type fromgt
- 4. Empty elements
- ltflag gender Fgtlt/flaggt
7XML Documents
8XML Namespaces
- Object-oriented programming languages such as C
provide large - class libraries that group their features into
namespaces. They prevent - naming collisions between programmer defined
identifiers and class - library identifiers.
- XML provides namespaces which provide a means of
uniquely identifying - XML elements.
- XML-based languages called vocabularies such as
XML Schema, Extensible Stylesheet Language, and
BizTalk use namespaces to identify their
elements. - Elements are differentiated via namespace
prefixes, which identify a namespace to which an
element belongs ltbank_of_americacheckgtAmountlt/ba
nk_of_americacheckgt
lt?xml version "1.0"?gt lt!-- Fig. 18.4
namespace.xml --gt lt!-- Demonstrating namespaces
--gt lttextdirectory xmlnstext
"urndeiteltextInfo" xmlnsimage
"urndeitelimageInfo"gt lttextfile filename
"book.xml"gt lttextdescriptiongtA book
listlt/textdescriptiongt lt/textfilegt
ltimagefile filename "funny.jpg"gt
ltimagedescriptiongtA funny picturelt/imagedescript
iongt ltimagesize width "200" height
"100"/gt lt/imagefilegt lt/textdirectorygt
9XML Namespaces
- What to look for
- Attribute xmlns creates two namespace prefixes
text and image - Each namespace prefix is bound to a URI
- Document authors must ensure uniqueness
- Use URLs for URIs.
- Specify default namespaces (next slide).
10XML Namespaces
lt?xml version "1.0"?gt lt!-- Fig. 18.5
defaultnamespace.xml --gt lt!-- Using default
namespaces --gt ltdirectory xmlns
"urndeiteltextInfo" xmlnsimage
"urndeitelimageInfo"gt ltfile filename
"book.xml"gt ltdescriptiongtA book
listlt/descriptiongt lt/filegt ltimagefile
filename "funny.jpg"gt ltimagedescriptiongtA
funny picturelt/imagedescriptiongt
ltimagesize width "200" height "100"/gt
lt/imagefilegt lt/directorygt
- What to look for
- ltdirectorygt declares a default namespace using
attribute xmlns with a URI as its value. - Child elements need not be qualified by a
namespace prefix - Element file is in the namespace
urndeiteltextinfo - Compare with the preceding code where file and
description were prefixed with text. - Noticed that the default namespace is overridden
with the image prefix.
11Document Object Model (DOM)
- XML documents are text files and thus data can be
retrieved from - them using file I/O.
- XML parsers store document data as tree
structures in memory. - The hierarchical tree structure is called a
Document Object Model (DOM) tree and this tree
was created by a DOM parser.
article
- Parent nodes
- Child nodes
- Sibling nodes
- Ancestor nodes
- Root node
- C System.XML
title
date
author
firstName
summary
lastName
contents
12Document Object Model (DOM)
- XML Node Reader Code Example
- Data used by this program
13Document Object Model (DOM)
- What to look for
- Code
- Data
- 3. Use of class TreeView
- 4. Method BuildTree how the tree is built and
displayed. - 5. Use of the switch statement
- 6. How the tree depth increases.
- 7. XmlTextWriter stream.
- 8. XmlTextReader
- 9. Improving efficiency using XPathNavigator.
14Document Object Model (DOM)
- Code
- Data
- Program uses a TreeView control and TreeNode
objects to display XML nodes structure - TreeNode list is updated each time the
XPathNavigator is positioned to a new node. Nodes
are added to and deleted from the TreeView to
reflect the XPathNavigators location in the DOM
tree. - XPathDocument object
- Method CreateNavigator
- Traversal Methods
- MoveToFirstChild
- MoveToParent
- MoveToNext
- MoveToPrevious
- 8. XPathNodeIterator
- 9. Method DisplayIterator
15Document Type Definitions
- XML documents can reference optional documents
that specify - how the XML documents should be structured.
These optional documents - are called Document Type Definitions (DTDs) and
Schemas. - When a DTD is provided, validating parsers read
the schema and check the XML documents structure
against it. - If the XML document conforms to the DTD, the XML
document is valid. - DTDs provide a means for type checking XML they
confirm that elements contain the proper
attributes, elements, and are in proper sequence. - They used EBNF grammar to describe an XMLs
content.
16Document Type Definitions
lt!-- Fig. 18.12 letter.dtd --gt lt!-- DTD
document for letter.xml --gt lt!ELEMENT letter (
contact, salutation, paragraph,
closing, signature )gt lt!ELEMENT contact (
name, address1, address2, city, state,
zip, phone, flag )gt lt!ATTLIST contact
type CDATA IMPLIEDgt lt!ELEMENT name ( PCDATA
)gt lt!ELEMENT address1 ( PCDATA )gt lt!ELEMENT
address2 ( PCDATA )gt lt!ELEMENT city ( PCDATA
)gt lt!ELEMENT state ( PCDATA )gt lt!ELEMENT zip (
PCDATA )gt lt!ELEMENT phone ( PCDATA )gt lt!ELEMENT
flag EMPTYgt lt!ATTLIST flag gender (M F)
"M"gt lt!ELEMENT salutation ( PCDATA )gt lt!ELEMENT
closing ( PCDATA )gt lt!ELEMENT paragraph (
PCDATA )gt lt!ELEMENT signature ( PCDATA )gt
What to look for 1. Rules for element letter
a) one or more contact elements b) one
salutation c) one or more paragraph elements
d) one closing, and one signature 2. s
indicate optional data 3. Items are ordered 4.
ATTLIST defines attributes, i.e. type for the
contact. 5. IMPLIED specifies that the program
can provide a value or ignore a missing
attribute. 6. REQUIRED means mandatory. 7.
FIXED means immutable. 8. PCDATA element can
store parsed character data no markup. must
be replaced by amp, lt replaced by lt 9. EMPTY
element cannot contain character data.
17Document Type Definitions
lt?xml version "1.0"?gt lt!-- Fig. 18.13
letter2.xml --gt lt!-- Business letter
formatted with XML --gt lt!DOCTYPE letter SYSTEM
"letter.dtd"gt ltlettergt ltcontact type
"from"gt ltnamegtJane Doelt/namegt
ltaddress1gtBox 12345lt/address1gt ltaddress2gt15
Any Ave.lt/address2gt ltcitygtOthertownlt/citygt
ltstategtOtherstatelt/stategt
ltzipgt67890lt/zipgt ltphonegt555-4321lt/phonegt
ltflag gender "F" /gt lt/contactgt
ltcontact type "to"gt ltnamegtJohn
Doelt/namegt ltaddress1gt123 Main
St.lt/address1gt ltaddress2gtlt/address2gt
ltcitygtAnytownlt/citygt ltstategtAnystatelt/stategt
ltzipgt12345lt/zipgt ltphonegt555-1234lt/pho
negt ltflag gender "M" /gt lt/contactgt
ltsalutationgtDear Sirlt/salutationgt
ltparagraphgtIt is our privilege to inform you
about our new database managed with XML.
This new system allows you to reduce the
load on your inventory list server by
having the client machine perform the work of
sorting and filtering the data.
lt/paragraphgt ltparagraphgtPlease visit our Web
site for availability and pricing.
lt/paragraphgt ltclosinggtSincerelylt/closinggt
ltsignaturegtMs. Doelt/signaturegt lt/lettergt
- The document on the right conforms to letter.dtd.
- Notice that it references letter.dtd.
- Microsofts XML validator is available for free
for download - Microsoft XML Validator
18Microsoft XML Schemas
- Alternatives to DTDs are Schemas.
- DTDs cannot be manipulated programmatically
(searched, modified) and they - do not provide a means for describing an
elements data type. - Schemas do not use EBNF.
- Schemas are xml documents.
lt?xml version "1.0"?gt lt!-- Fig. 18.17 book.xdr
--gt lt!-- Schema document
to which book.xml conforms --gt ltSchema xmlns
"urnschemas-microsoft-comxml-data"gt
ltElementType name "title" content "textOnly"
model "closed" /gt ltElementType name
"book" content "eltOnly" model "closed"gt
ltelement type "title" minOccurs "1"
maxOccurs "1" /gt lt/ElementTypegt
ltElementType name "books" content "eltOnly"
model "closed"gt ltelement type "book"
minOccurs "0" maxOccurs "" /gt
lt/ElementTypegt lt/Schemagt
lt?xml version "1.0"?gt lt!-- Fig. 18.16
bookxdr.xml --gt lt!-- XML file that
marks up book data --gt ltbooks xmlns
"x-schemabook.xdr"gt ltbookgt lttitlegtC
How to Programlt/titlegt lt/bookgt ltbookgt
lttitlegtJava How to Program, 4/elt/titlegt
lt/bookgt ltbookgt lttitlegtVisual Basic .NET
How to Programlt/titlegt lt/bookgt ltbookgt
lttitlegtAdvanced Java 2 Platform How to
Programlt/titlegt lt/bookgt ltbookgt
lttitlegtPython How to Programlt/titlegt
lt/bookgt lt/booksgt
- What to look for
- title cannot contain child elements
- Attribute content specifies element contains
parsed character data. - Model attribute closed implies conforming XML
document can contain only elements specified in
the schema. - eltonly means the element cannot contain mixed
content such as text and other elements. - title is a child element for book. minoccurs,
maxoccurs
19W3C XML Schema
- Like Microsoft, W3C has created its own W3C XML
schema. - W3C Schema documents end in .xsd
lt?xml version "1.0"?gt lt!-- Fig. 18.19 book.xsd
--gt lt!-- Simple W3C XML Schema document
--gt ltxsdschema xmlnsxsd "http//www.w3.org/200
1/XMLSchema" xmlnsdeitel "http//www.deitel.
com/booklist" targetNamespace
"http//www.deitel.com/booklist"gt ltxsdelement
name "books" type "deitelBooksType"/gt
ltxsdcomplexType name "BooksType"gt
ltxsdsequencegt ltxsdelement name
"book" type "deitelBookType"
minOccurs "1" maxOccurs "unbounded"/gt
lt/xsdsequencegt lt/xsdcomplexTypegt
ltxsdcomplexType name "BookType"gt
ltxsdsequencegt ltxsdelement name
"title" type "xsdstring"/gt
lt/xsdsequencegt lt/xsdcomplexTypegt lt/xsdschem
agt
lt?xml version "1.0"?gt lt!-- Fig. 18.18
bookxsd.xml --gt lt!-- Document that
conforms to W3C Schema --gt ltdeitelbooks
xmlnsdeitel "http//www.deitel.com/booklist"gt
ltbookgt lttitlegte-Business and e-Commerce
How to Programlt/titlegt lt/bookgt ltbookgt
lttitlegtPython How to Programlt/titlegt
lt/bookgt lt/deitelbooksgt
20W3C XML Schema
- What to look for in the preceding slide
- W3C XML Schema namespace
- xsd prefix
- Root element schema contains elements that
define the XMLs documents structure. - Binding URI to namespace prefix
- Targetnamespace namespace for elements and
attributes that this schema defines. - Element defines an element
- Name and data type of an element.
- Any element that contains attributes of children
is a complex type - Simple types such as xsdstring, xsddate, xsdint
21Schema Validation in C
- Classes from .NET FCL XmlValidatingReader
performs XML validation - Code Example
- XDR validator XSD Validator
- XmlSchemaCollection
- Adding elements to collection
- XmlReader
- Registration of ValidationEventHandler
- Node-by-node validation
22Extensible Stylesheet Language XslTransform
- Extensible Stylesheet Language (XSL) is an XML
vocabulary - for formatting XML data.
- XSL Transformations (XSLT) creates formatted
text-based documents from XML documents. - The process is called a transformation and needs
two tree structures. - The source tree is the XML document being
transformed. - The result tree is the result of the
transformation - XSLT processors include Microsofts msxml and
Apaches Xalan.
23Extensible Stylesheet Language XslTransform
lt?xml version "1.0"?gt lt!-- Fig. 18.23
sorting.xml --gt lt!-- Usage of elements
and attributes --gt lt?xmlstylesheet type
"text/xsl" href "sorting.xsl"?gt ltbook isbn
"999-99999-9-X"gt lttitlegtDeitelaposs XML
Primerlt/titlegt ltauthorgt
ltfirstNamegtPaullt/firstNamegt
ltlastNamegtDeitellt/lastNamegt lt/authorgt
ltchaptersgt ltfrontMattergt ltpreface
pages "2"/gt ltcontents pages "5"/gt
ltillustrations pages "4"/gt
lt/frontMattergt ltchapter number "3" pages
"44"gt Advanced XMLlt/chaptergt
ltchapter number "2" pages "35"gt
Intermediate XMLlt/chaptergt ltappendix number
"B" pages "26"gt Parsers and
Toolslt/appendixgt ltappendix number "A"
pages "7"gt Entitieslt/appendixgt
ltchapter number "1" pages "28"gt XML
Fundamentalslt/chaptergt lt/chaptersgt ltmedia
type "CD"/gt lt/bookgt
XML Document
- XSL Document that Transform XML
- The line lt?xml is a processing instruction (PI)
which contains application-specific information
that is embedded in the XML document. - XSLT documents contain one or more xsltemplate
elements that specify which information is output
to the result tree. - The first template tag in the xsl document
matches the documents root node. When the
document root is encountered, the template is
applied, and any text marked up by this element
that is not in the namespace referenced by xsl is
output to the result tree. - This xsl style sheet creates an XHTML document.
- What to look for
- Document title
- Books author
- Extracting child elements
- Sorting of chapters
- Use of XSL variable to store the value of a
books page count - Code example to apply style sheet to XML document
- Style sheet (sports.xsl)
- sports.xml
24Assignment