Title: XML Technology
1XML Technology
- Nizar Mabroukeh
- nizar_at_ccse.kfupm.edu.sa
- COE 445
- KFUPM April, 2001
2We will cover
- What is XML?
- Components of an XML Document
- Document Type Definition (DTD)
- XML Data Islands
- Parsing XML and DOM
- XML presentation with CSS, XSL and XSLT
- XPath, XML and Database Integration
- Why use XML?
- Creating your own XML vocabulary
- Review of XML Applications and Tools
- XML Resources
3What is XML?
4Markup Languages
- SGML
- Standard Generalized Markup Language
- Mother of Markup Languages
- HTML
- Most popular presentation language for web
- XML
- Draws heavily on the merits shortcomings of
HTML SGML
5Issues with HTML
- Merits
- Very easy to use learn
- Presentation technology
- It is the most popular
- Shortcomings
- NOT a data technology
- Poor Searching
- There is no Intelligence of content/data
- We loose meaning association with content
- Data cannot be represented hierarchically
- Limited set of tags
6How does XML look?
- Simple XML data would look like
- ltbookgt
- lttitlegt XML Tech lt/titlegt
- ltauthorgt YAG lt/authorgt
- ltlevelgt Freshman lt/levelgt
- lt/bookgt
- ltbookgt Called the root node
7XML HTML
- Similar in appearance
- Both are based on SGML
- BUT
- XML describes data
- HTML displays data
8XML (eXtensible Markup Language)
9What Is XML?
- XML is a
- platform-independent,
- self-describing,
- expandable,
- standard data exchange format
- that can be used either independently or
- embedded and used within other solutions.
10Platform Independent
- Windows
- Unix
- Macintosh
- Mainframe
11Self-Describing
- Example
- ltDATEgtJuly 26, 1998lt/DATEgt
- Describes the information, not the presentation.
Format flexible.
12Expandable Extensible
- HTML has a fixed set of tags
- ltH1gt, ltBgt, ltPREgt
- XML lets you have your own tags
- ltdangerous-substancegt, ltShakespearean-charactergt
, ltcash-equivalentgt
13Standard
- W3C (World Wide Web Consortium) www.w3c.org
- XML 1.0 specification was issued as
standards-based text format for interchange of
data as of February 1998. - W3C XML Working Group designed XML as a
simplified subset of SGML
14Standard
- XML specification does not define any particular
tag names (like HTML), instead it defines general
syntactic rules enabling developers to create
their own domain-specific vocabularies of tags.
15Context
- Greater context to the information
- Tree structure is natural in XML
ltbalance-sheetgt lttotal-assetsgt ltasset-type"cur
rent"gt ltamount-period"1998"gt ltamountgt
41,000,000 lt/amountgt lt/amount-periodgt
lt/asset-typegt lt/total-assetsgt lt/balance-sheetgt
16Freedom
- Extensible markup language
- Customized tags
- Tags give meaning to the content
- Separates data from style
17Why XML?
- Derived as a subset of SGML
- Allows you to define your own tags
- XML ltauthorgt YAG lt/authorgt
- HTML ltBgt YAG lt/Bgt
- Provides meaningful readable data
- Meaning searches can be performed
- Much simpler than SGML
- SGML spec 300 pages, XML 33 pages
- Purely a Data Technology
- Supports compound documents.
18XML Advantages
- Web based
- Extensible
- License-free
- Platform independent
- Single end-to-end IT solution for electronic
information exchanges
19XML Documents
20XML Elements
- An XML element is made up of a start tag, an end
tag, and data in between. The start and end tags
describe the data within the tags, which is
considered the value of the element. - For example, the following XML element is a
ltchairmangt element with the value Sadiq Sait" -
- ltchairmangtSadiq Saitlt/chairmangt
- Elements can be empty, to represent an empty
element ltchairman/gt
21(No Transcript)
22XML Attributes
- An element can optionally contain one or more
attributes. An attribute is a name-value pair
separated by an equal sign (). -
- Example
- ltCITY ZIP31261"gtDhahranlt/CITYgt
- Here, ZIP31261" is an attribute of the ltCITYgt
element. Attributes are used as meta information
23XML is
24Parts of an XML document
Version Declaration
Document Type Definition (DTD)
ltrootgt BODY lt/rootgt
25Version Declaration
- lt?xml version1.0 encodingUTF-8
standaloneyes?gt - Encoding
- Supports Unicode 8, Unicode 16 Others
- In short Provides for multi-lingual data
- Standalone
- Indicates whether the document has any markup
declarations that are external to the document
26XML data
- This is how XML data would look like (the body of
the document) -
- ltbooksgt
- ltbookgt
- ltnamegt Codebook 6.0 lt/namegt
- ltauthorgt YAG lt/authorgt
- ltlevelgt Intermediate Advanced lt/levelgt
- lt/bookgt
- ltbookgt
- ltnamegt Jave for Beginners lt/namegt
- ltauthorgt Dale lt/authorgt
- ltlevelgt Beginner lt/levelgt
- lt/bookgt
- lt/booksgt
27XML Document Rules
- ltbooksgt
- ltbookgtlt/bookgt
- ltbook ISBN21-458-65-0gt
- ltpara id1 sid4gt
- lt?xml version1.0?gt
- ONE Root element
- ALL tags start AND end
- Tags cannot overlap and are case sensitive
- Attribute values enclosed in quotes
- Attributes not repeated in an element
- FIRST item must be
-
28Two types of XML documents
- Well-formed XML documents
- Valid XML documents
29Well-formed document
- Must contain one or more elements
- Must contain a uniquely named root element
- All other elements within the root element must
be nested correctly - An XML parser will reject malformed
documents(the method of rejection will vary by
parser author) - Documents that contain XML and HTML tags are
common - HTML within an XML document must be well-formed
30Valid XML document
- The XML document must be well formed
- Should contain a Document Type Definition
- DTD is a schema which contains the constraints
for the XML document - It contains Element definitions and their
Attributes - Attributes should comply with the following rules
- Cannot contain lt, or a single or .
- Elements must be nested correctly
31Document Type Definition
- DTD is a text document that defines the lexicon
of legal names for tags in a particular XML
vocabulary - It also defines how tags should be nested
- It can be written as code inside the XML file or
specified externally as a separate text file with
extension .dtd
32Sample DTD
- lt!-- Uses EBNF (Extended Backus Naur Form) --gt
- lt!DOCTYPE book
- lt!ELEMENT book(name,author,level)gt
- lt!ELEMENT name(PCDATA)gt
- lt!ELEMENT author(PCDATA)gt
- lt!ELEMTNT level(PCDATA)gt
- lt!ATTLIST author email CDATA IMPLIEDgt
- gt
- DTD may be specified externally with .dtd
extension lt!DOCTYPE book SYSTEM book.dtdgt
33More on DTD
- Special software can help you build your DTD
document visually instead of having to write all
this weird code, example software package is XML
Authority from Extensibility - An XML document is associated with a
corresponding DTD document for validation using
the lt!DOCTYPEgt tag
34Why use a DTD?
- Application independent way of sharing data
- Industries or trading parties can agree on a
standard for interchanging data - Verification that data received from trading
parties is valid.
35Complete XML document
- lt?xml version1.0 ?gt
- lt!DOCTYPE book
- lt!ELEMENT book(name,author,level?)gt
- lt!ELEMENT name(PCDATA)gt
- lt!ELEMENT author(PCDATA)gt
- lt!ELEMTNT level(PCDATA)gt
- gt
- ltbookgt
- ltnamegt Codebook 6.0 lt/namegt
- ltauthorgt YAG lt/authorgt
- ltlevelgt Intermediate Advanced lt/levelgt
- lt/bookgt
36OR
- lt?xml version1.0 ?gt
- lt!DOCTYPE book SYSTEM book.dtdgt
- ltbookgt
- ltnamegt Codebook 6.0 lt/namegt
- ltauthorgt YAG lt/authorgt
- ltlevelgt Intermediate Advanced lt/levelgt
- lt/bookgt
37XML Document Pluses
- Tightly Structured
- Extensible
- Easily models data
- Useful for applications and transferbetween
applications - Interchangeable
38XML Data Islands
39XML Data Islands
- A data island is an XML document that exists
within an HTML page. - It allows you to script against the XML document
without having to load it through script or
through the ltOBJECTgt tag. - Almost anything that can be in a well-formed XML
document can be inside a data island
40How to create XML data island
- The XML for a data island in HTML can be either
- Inline using ID,
- or called from an outside xml file using SRC,
- or created using a ltscriptgt tag
41Inline data island
- The ltXMLgt element marks the beginning of the data
island, and its ID attribute provides a name that
you can use to reference the data island. - ltXML ID"XMLID"gt
- ltcustomergt
- ltnamegtMark Hansonlt/namegt ltcustIDgt81422lt/custIDgt
- lt/customergt
- lt/XMLgt
42XML referenced from outside file
- referenced through a SRC attribute on the ltXMLgt
tag - ltXML ID"XMLID" SRC"customer.xml"gtlt/XMLgt
43Created using ltscriptgt tag
- ltSCRIPT LANGUAGE"xml" ID"XMLID"gt ltcustomergt
- ltnamegtMark Hansonlt/namegt ltcustIDgt81422lt/cust
IDgt - lt/customergt
- lt/SCRIPT gt
44Parsing XML
45What is XML Parsing
- For a computer program to access the structured
information in the document in a meaningful way,
parsing is required - The parser first reads the stream of characters
and recognizes the syntactic details of elements,
attributes and text in the document - Then, the parser exposes the hierarchical set of
information in the document as a tree of related
elements, attributes and text items
46- The logical tree of information items created
after parsing the XML document, is called the
Information Set or Infoset - This can then be manipulated in different ways
and data extracted for usage in applications,
databases,etc
47XML Parsers
- Always check for well-formedness
- Can be validating or non-validating
- Validation required association with DTD document
- Included in Microsoft Internet Explorer 5.0
- Language-neutral programming model
- By using W3C XML 1.0 and XML DOM it supports
JavaScript, VBScript, Java, C, Perl
48Manipulating XML using the DOM
- W3C provides a standard API called the Document
Object Model (DOM) to access an XML documents
infoset - The DOM API provides a complete set of operations
to programmatically manipulate the node tree
including navigating the nodes in the hierarchy,
creating and appending new nodes, removing nodes,
etc.
49- Once you are done with making changes to the node
tree you can save it and serialize the infoset
back into an XML document - xml infoset
parsing
serialization
50DOM Properties Methods
- An XML document object is created when an XML
data island is loaded and parsed. and it has
Properties Methods - XMLDocument Returns a reference to the
XML DOM exposed by the object - documentElement Returns the root element
- childNodes Returns a node list containing
children (if any) - item(id) Access individual nodes through an
index (zero based) - text Returns the text content of the node
- Lets look at an example
51DOM Example
- ltXML ID"xmlDocument"gt
- ltclassgt
- ltstudent studentID"13429"gt
- ltnamegtJane Smithlt/namegt
- ltGPAgt3.8lt/GPAgt
- lt/studentgt
- lt/classgt
- lt/XMLgt
- All of the below begin with xmlDocument.documentEl
ement.childNodes.item(0) - .childNodes.item(0).text Returns "James Smith"
- .childNodes.item(1).text Returns "3.8"
- .text Returns "James Smith 3.8" i.e.
name GPA - Note Everything is case sensitive here
Data Island
52XML Presentation
53Viewing XML
- Unlike HTML, XML does not predefine display
properties for specific elements. - C Data Source Object (DSO)
- Binds XML to HTML and gives better performance,
built in IE 5.0 - Cascading Style Sheets (CSS)
- Extended Stylesheet Language (XSL)
54CSS
- Separate file with a .css extension
- object, property nameproperty value
- DIV colorred font-size16pt
- / Comments are entered the C way here /
55Displaying XML with CSS
- paragraph
-
- COLOR red
- FONT-FAMILY 'Book Antiqua'
- FONT-VARIANT small-caps
- FONT-WEIGHT bolder
-
- preamble
-
- COLOR blue
- FONT-FAMILY 'Book Antiqua'
- FONT-VARIANT small-caps
- FONT-WEIGHT bolder
-
- lt?xml-stylesheet type"text/css"
href"const.css"?gt
- Modify the presentation of XML Documents
- Follow normal CSS syntax
- Referenced in the XML source document
56XSL
- eXtended Stylesheet Language
- Syntax is similar to XML
- In fact, XSL is written in XML
57- The aim of XSL is to provide a simple but
powerful stylesheet syntax, which can be used to
define how XML documents should appear on the
screen. - An XSL stylesheet transforms an XML document into
a suitable form for presentation - It allows us to control how and which parts of
the documents should be shown to the user.
58- The XSL processor takes an XML document as input,
and translates it into a different XML document
suitable for output. This resultant XML document
can be passed through a separate tool to add the
finishing touches ready for presentation.
59Displaying XML with XSL
- ltxslfor-each selectbooks/book"
order-byauthor"gt - ltTR VALIGN"top" gt ltTDgt
- ltxslvalue-of select"name"/gt
- lt/TDgt
- ltTDgt
- ltxslvalue-of selectauthor"/gt
- lt/TDgt
- lt/TRgt
- lt/xslfor-eachgt
- More powerful than CSS
- XSL documents are XML documents
60XSL Styling Tools
61Integrated use of XML
62XSLT
- eXtended Stylesheet Language for Transformations
- Transforms XML from one tree-based structure to
another
63Advantages of XSLT
- Convert between XML vocabularies used by
different applications - Present data from an XML document by transforming
it into HTML, or another format thats
appropriate to the user or special device
requesting the data
64Database Integration
- Query and Search XML documents
65XPath
- XML Path Language is a W3C standard
- It is a declarative language
- Used to interrogate an XML document to select
subsets of information
66- XPath provides a method for addressing parts of
an XML document. - Allows string, number and boolean manipulation
- Treats XML document as set of nodes, allows
matching - It is called a Path language because it is design
like the path notation in URLs and files in
directories
67XPath Example
- /MovieList/Movie/Cast/Actor/ returns all info
about all actors - /MovieList/Movie/Cast/ returns all info about
Cast including - that of Director and Actor and
- Actress
- To find attributes instead of elements we use _at_
- /MovieList/Movie/_at_Title returns titles of all
movies - //Actress/_at_Role returns the Role of
any Actresses - anywhere in the document
68- Find the Cast of any movies directed by
- Minghella with a running time greater than
- 130 minutes
- /MoiveList/MovieDirector/LastManghella and
_at_RunningTime gt130/Cast
69Where to Use XPath
70Using XPath
- Create an XPath for each unit of information
- Carry the XPaths with the information as it is
transformed into other formats - Use XPaths to link language strings and labels
with the information they describe - Generate an XSL stylesheet that uses the XPaths
to generate the outgoing message
71You should be using XML
- More reasons for you to use XML
72Why should I use XML
- XML enables a data web of information services
it is vendor-neutral, platform-neutral,
language-neutral technology - Simplifies application integration, consider the
following example ?
73Simplifies Integration
- If a company has
- Machines running OS from Sun, HP, IBM
- Databases from Oracle, IBM, Microsoft
- Packaged applications from Oracle, SAP
- An XML-based representation of data and the HTTP
protocol might be the only things these various
systems can ever hope to have in common! - Especially if you want to integrate them over the
Internet
74Why should I use XML
- XML also simplifies Information Publishing and
Reuse - With XML and XSLT you can easily
- Separate data from presentation, allowing you to
change the look of information without affecting
application code - Publish the same data using output styles
specific to each kind of requesting device
browser, cell phone, PDA or another computer,
etc.
75- WML Wireless Markup Language for cell phones and
PDAs - SVG Scalable Vector Graphics language for
rendering rich, data-driven images - XSL Formatting Objects for high-quality printed
output
76Creating your own XML Vocabulary
77XML Vocabulary
- The set of XML elements and attributes for a
certain application is called an XML Vocabulary - Can be used in more than one XML document for the
same application
78Create your own
- Begin each document with an XML decleration
- lt?xml version1.0?gt (case-sensitive)
- Use only one top-level document element
- cannot be repeated
- Match opening and closing tags properly (notice
case-sensetivity)
79- Add comments between
- lt!-- and --gt characters
- Start element and attribute names with a letter.
Cannot contain spaces - Put attributes in the opening tag
- Enclose attribute values in matching quotes
80 - Use only simple text as attribute values
- Use lt and amp instead of lt and for literal
less-than and ampersand characters - Write empty elements as ltElementName/gt
- If you follow these rules then your document will
be a well-formed XML document
81Review of Applications and Tools
82XML Tools
- Help author the grammars (Schema, Filter,
Updates) - View, Edit Manage XML
- Define mappings between XML logical views
Databases.
83How is XML used?
- Publishing
- Create once, use many
- Database Management
- Database integration
- e-Business
- Key driver
84XML Creation Tools
- Two schools
- Data centric
- Document centric
85Microsoft XML Notepad
msdn.microsoft.com/xml/notepad
- Free
- Tree-based
- Great search
86CueSoft eXML
www.cuesoft.com
- Tree or source view
- Attribute handling
- No search
- No nav bar in source view
87Techno2000 Clip!
- Powerful
- Flexible
- Wizards
- DTD creation
http//www.t2000-usa.com/download/download_index.h
tml
88Vervet Logic XML Pro
www.vervetlogic.com
89Document Oriented
- Examples
- SoftQuad XMetaL
- Corel WordPerfect 9
- Arbortext Adept Editor
- Not as good for data development
- Require DTD or equivalent
90XSL Tester
www.vbxml.com
91XML Authority from Extensibility
92XML Authority from Extensibility
93Other Choices
- Web-based
- DTD Generator
- www.pault.com/Xmltube/dtdgen.html
- IBM Suite
- Visual DTD
- Visual XML Transformation
94Utilities
- File manipulation
- XML Junction
- www.xmljunction.com
- ODBC2XML
95Security
- IBMs XML Security Suite
- www.alphaworks.ibm.com/tech/xmlsecuritysuite
96Programming and XML
- Programming tools and environments
- Java including JAXP
- Visual Basic (VB) and Visual Basic for
Applications (VBA) - Databases and other tools
- Document vs. event driven processing (DOM vs. SAX)
97Resources for XML
98Software Sources
- Interesting sites
- www.xmlsoftware.com
- alphaworks.ibm.com
- www.garshol.priv.no/download/xmltools/std_ix.html
99WebSites
- The Mother Ship
- www.w3c.org/xml
- Heavy Stuff
- www.ibm.com/developer/xml
- msdn.microsoft.com/xml
- www.oracle.com/xml/
- java.sun.com/xml/
100More Resources
- XML Information sites
- www.xmlinfo.com
- www.xml.com
- www.xml.org
- www.xmlelephant.com
- metalab.unc.edu/xml/
- www.ucc.ie/xml/
- XML Tutorials
- www.xml101.com
101More Resources
- XSL Information
- http//www.mulberrytech.com/xsl/xsl-list
- VB XML Information
- mailtovbxml-subscribe_at_onelist.com
102And finally
- Were only at the very start of the Web
revolution. - XML is the fundamental building block
- XML technology is moving forward, and Standards
are rapidly evolving
103Thank You