Title: XML
1XML
2xml
- XML like HTML is created from the Standard
Generalized Markup Language, SGML
3A brief introduction to XML A simple xml doc
- lt?xml version 1.0?gt
- lt! a simple xml examplethis is a comment --!gt
- ltmymessagegt
- ltmessagegtWelcome to XML!lt/messagegt
- lt/mymessagegt
4In validator file is in examples\ch05\intro.xml
5XML documents and format
- An XML document contains data, not formatting
information. As well learn, there are ways (xsl
and fo files, for example) to provide formatting
for xml analogous to that in which css provided
formatting for html.
6XML
- XML are typically stored in a file with suffix
.xml, though this is not required. They can be
created with any editor (save as ASCII text).
Many packages like MS Word can save files as type
.xml - An xml document contains a single root which
contains other elements, Anything appearing
before the root is called the prolog. Elements
directly under the root are its children. The
structure is recursive. - In the example, the roots child message contains
the text Here is some message.
7The character set
- XML characters are CR, LF and Unicode.
- An XML document consists of markup and character
data. - Markup is enclosed in angle brackets (like html)
ltgt - Character data appears between the start and end
tag. - An xml parser passes whitespace characters to the
application. Insignificant whitespace can be
collapsed in a process called normalization. - It is a good idea to add whitespace to an xml
document for readability. - , lt, gt, and are reserved characters. An
entity reference makes it possible to use these
as characters in the character data part of an
xml document. - Entity references begin with and end with
- In this way character data is not confused with
markup. - Single and double quote are used to delimit
attribute values.
8More on syntax
- There must be exactly one root.
- Proper nesting of elements is required.
- Start tags require close tags.
- Unlike HTML, the author can define her own tags
in XML. - Tags are case sensitive
- Parser needs to distinguish markup from character
data - Typically, whitespace is normalized reduced to
1 whitespace char. - Entity references are marked with an ampersand
and allow us to use meta characters (lt, gt and
so on) which are part of the language syntax. - Entity references (for example, lt) allow us
to represent and distinguish the reserved
characters lt,gt, in XML. - They may only appear as an entity reference in
character data
9XML intro continued
- A DOM-based parser returns a tree structure. A
DOM parser must process the entire document to
create a (java) object which may be 3 or 4X the
size of the original. Not advisable if there are
storage size constraints. - A SAX (Simple-API for XML) -based parser returns
events. SAX parsers have a smaller footprint. - Many parsers can be downloaded for free and
several come with java 1.4
10A brief introduction to XML
- An xml validator parses an XML document and
indicates if it is correct. - A number of free Validators are available,
including one from MS which I downloaded and used
in this ppt.
11Validator
- Microsoft provides a validating program free for
download (with javascript and VBscript versions)
at - http//msdn.microsoft.com/archive/default.asp?url
/archive/en-us/samples/internet/xml/xml_validator/
default.asp - Or search MSDNvalidator
- There are others out there
- http//validator.w3.org/
- http//www.stg.brown.edu/service/xmlvalid/
- http//www.w3schools.com/XML/xml_validator.asp
12Link to validator program on my w drive
- http//employees.oneonta.edu/higgindm/internet20p
rogramming/validate_js.htm - This is a link for javascript validator
- http//employees.oneonta.edu/higgindm/internet20p
rogramming/validate_vbs.htm - This is a link for vbscript validator
13MS Validatorhttp//employees.oneonta.edu/higgind
m/internet20programming/validate_js.htm
14Parser continued
- The parser will indicate if the document is
well-formed. - In DOM-based parsing, a in the left margin
indicates a node has children and a indicates
all child nodes have been expanded. - The MS Validator uses color coding to indicate
child nodes can be expanded - An element that stores other elements is called a
container element. - The parser makes the document content available
for further processing if it is well-formed.
15Validator example
16Validator
17Reserved characters
- ltmessagegtltgtamplt/messagegt would enable a
character data message to contain characters ltgt
18DTD document type definition
- a dtd file may contain the definition of an xml
structure. - XML files may refer back to a dtd.
- If an XML document has a DTD or Schema, a
validating parser can determine not merely if it
is well-formed XML, but whether it is valid. - Valid means conforming to a dtd or schema.
19Another example Unicode
- Lang.xml (next slide) uses unicode entity
references to represent arabic words. - lang.dtd (also shown in a later slide) is used to
generate unicode characters (arabic) for some
entity references in the XML file.
20DTD document type definition a dtd file may
contain the definition of an xml structure.
- lt?xml version "1.0"?gt
- lt!-- Fig. 5.4 lang.xml --gt
- lt!-- Demonstrating Unicode --gt
- lt!DOCTYPE welcome SYSTEM "lang.dtd"gt
- ltwelcomegt
- ltfromgt
- lt!-- Deitel and Associates --gt
- 158315751610157816141604
- 157116061583
- lt!-- entity --gt
- assoc
- lt/fromgt
- ltsubjectgt
- lt!-- Welcome to the world of Unicode --gt
- 15711607160415751611
- 157616031605
- 160116101616
- 1593157516041605
- lt!-- entity --gt
21Lang.dtd
- lt!-- lang.dtd --gt
- lt!ELEMENT welcome ( from, subject )gt
- lt!ELEMENT from ( PCDATA )gt
- lt!ELEMENT subject ( PCDATA )gt
- lt!ENTITY assoc "15711587161716081588
161616101614157816181587"gt - lt!ENTITY text "15751604161016081606
1610160316081583"gt
22Lang.xml in validator
23Lang.xml in IE
24About the example
- The DTD reference contains DOCTYPE, the name of
the root, the SYSTEM flag indicating the DTD file
is external, and the name of that file. - Root element welcome contains two elements from
and subject. - Some lines contain entity references for unicode.
- The DTD also defines some other entity references.
25More about markup
- XML end tags may consist of /gt if there is an
empty element as in - ltemptyelt xxxx /gt
- but otherwise must consist of a complete end-tag
as in - ltsometaggt xxxxxxxxxxx lt/sometaggt
- Elements may or may not have content (child
elements or character data) - Elements may have 0 or more attributes associated
with them. Attributes appear in the elements
start tag - ltcar doors 4/gt
- Attribute values must appear in single or double
quotes. - Element and attribute names may not contain
blanks. - Here, element car has attribute doors with value
4. - Attributes may contain any characters and be of
any length but must start with a letter or
underscore.
26Usage.xml uses a stylesheet
- lt?xml version "1.0"?gt
- lt!-- Fig. 5.5 usage.xml --gt
- lt!-- Usage of elements and attributes --gt
- lt?xmlstylesheet type "text/xsl" href
"usage.xsl"?gt - ltbook isbn "999-99999-9-X"gt
- lttitlegtDeitelaposs XML Primerlt/titlegt
- ltauthorgt
- ltfirstNamegtPaullt/firstNamegt
- ltlastNamegtDeitellt/lastNamegt
- lt/authorgt
- ltchaptersgt
- ltpreface num "1" pages
"2"gtWelcomelt/prefacegt - ltchapter num "1" pages "4"gtEasy
XMLlt/chaptergt - ltchapter num "2" pages "2"gtXML
Elements?lt/chaptergt
27Usage.xls
- In notes
- lt? Xxxxx ?gt in usage.xml represents a pi (that
is, a processing instruction). PI consist of a
PI target (xmlstylesheet, in this example) and a
PI value. Note syntax. - PI can be used to help authors embed
application-specific data in an xml document. If
the application processing the xml doesnt use
the PI, then it has no effect on the xml document
content.
28Usage.xml in validator
29Usage.XML document loaded into IE Browser uses
stylesheet to generate HTML
30CData
- The character data appearing in CData sections is
ignored by the xml parser. - CData might be used for JavaScript or VBScript.
- CData starts with lt!CData and ends with gt
- CData may contain reserved characters, but not
the text gt
31Text example 5.7
- lt?xml version "1.0"?gt
- lt!-- Fig. 5.7 cdata.xml --gt
- lt!-- CDATA section containing C code --gt
- ltbook title "C How to Program" edition "3"gt
- ltsamplegt
- // C comment
- if ( this-gtgetX() lt 5 ampamp
value 0 ! 3 ) - cerr ltlt this-gtdisplayError()
- lt/samplegt
- ltsamplegt
- lt!CDATA
- // C comment
- if ( this-gtgetX() lt 5 value 0 ! 3
) - cerr ltlt this-gtdisplayError()
- gt
- lt/samplegt
- C How to Program by Deitel amp Deitel
- lt/bookgt
32CData example from text 5.7
33Cdata.xml in MS validator (file is in
examples\ch05)
34letter.xml - I removed blank lines to get it to
fit here
- lt?xml version "1.0"?gt
- ltlettergt
- ltcontact type "from"gt
- ltnamegtJane Doelt/namegt
- ltaddress1gtBox 12345lt/address1gt
- ltaddress2gt15 Any Ave.lt/address2gt
- ltcitygtOthertownlt/citygt
- ltstategtOtherstatelt/stategt
- ltzipgt67890lt/zipgt
- ltphonegt555-4321lt/phonegt
- ltflag gender "F"/gt
- lt/contactgt
- ltcontact type "to"gt
- ltnamegtJohn Doelt/namegt
- ltaddress1gt123 Main St.lt/address1gt
- ltaddress2gtlt/address2gt
- ltcitygtAnytownlt/citygt
- ltstategtAnystatelt/stategt
- ltzipgt12345lt/zipgt
35letter.xml in Validator
36namespaces
- Naming collisions can occur when xml authors use
the same tag names - Namespaces provide a mechanism for making tag
references unambiguous. - A namespace reference appears with the start and
end tags followed by a colon. So, - ltmoviecharactergtScroogelt/moviecharactergt can be
differentiated from ltasciicharactergtcolonlt/ascii
charactergt - Namespace prefixes are tied to unique URI in the
xml document. Almost any name can be used to
create a namespace prefix. - In this example ascii and movie are namespace
prefixes. Namespace prefixes can precede element
and attribute values to avoid collisions. - A URL may be used for a URI. The only
requirement though is uniqueness as the URLs are
not visited by the parser.
37Namespace example 5.8
- lt?xml version "1.0"?gt
- lt!-- Fig. 5.8 namespace.xml --gt
- lt!-- Namespaces --gt
- lttextdirectory xmlnstext "urndeiteltextInfo"
- xmlnsimage "urndeitelimageInfo"gt
- lttextfile filename "book.xml"gt
- lttextdescriptiongtA book listlt/textdescript
iongt - lt/textfilegt
- ltimagefile filename "funny.jpg"gt
- ltimagedescriptiongtA funny
picturelt/imagedescriptiongt - ltimagesize width "200" height "100"/gt
- lt/imagefilegt
- lt/textdirectorygt
38Namespace.xml in validator file is in
examples\ch05
39Namespace.xml example 5.8 in IE
40Namespaces continued
- Providing a prefix can be tedious. A default
namespace can be created and elements and
attributes used in the xml document from this
namespace do not need prefixes.
41Default namespaces
- lt?xml version "1.0"?gt
- lt!-- Fig. 5.9 defaultnamespace.xml --gt
- lt!-- Using Default Namespaces --gt
- ltdirectory xmlns "urndeiteltextInfo"
- xmlnsimage "urndeitelimageInfo"gt
- ltfile filename "book.xml"gt
- ltdescriptiongtA book listlt/descriptiongt
- lt/filegt
- ltimagefile filename "funny.jpg"gt
- ltimagedescriptiongtA funny
picturelt/imagedescriptiongt - ltimagesize width "200" height "100"/gt
- lt/imagefilegt
- lt/directorygt
42Default namespaces
- Now, file is in the default namespace.
- Compare this example to the earlier namespace
example where text and image were distinct
namespaces.
43Defaultnamespace.xml in IE
44Day planner case studyto be continued
- lt?xml version "1.0"?gt
- lt!-- Fig. 5.10 planner.xml --gt
- lt!-- Day Planner XML document --gt
- ltplannergt
- ltyear value "2000"gt
- ltdate month "7" day "15"gt
- ltnote time "1430"gtDoctoraposs
appointmentlt/notegt - ltnote time "1620"gtPhysics class at
BH291Clt/notegt - lt/dategt
- ltdate month "7" day "4"gt
- ltnotegtIndependence Daylt/notegt
- lt/dategt
- ltdate month "7" day "20"gt
- ltnote time "0900"gtGeneral Meeting in
room 32-Alt/notegt - lt/dategt
- ltdate month "7" day "20"gt
- ltnote time "1900"gtParty at
Joeaposslt/notegt - lt/dategt
- ltdate month "7" day "20"gt
45Planner.xml in validator
46day planner using a java GUI. SAX parser is used
to parse the document.(in text chapter 8)
47Homework on this section
- Install an xml validator
- Create your own xml file and validate it.
- Post screenshots of your XML file and what
validator.