Title: Web EngineeringAdvanced Web Engineering
1Web Engineering/Advanced Web Engineering
- Week 11 - XML (Extensible Markup Language)
2Introduction
- XML
- The Extensible Markup Language (XML) is a
document processing standard proposed by the
World Wide Web Consortium (W3C), which is related
to Standard Generalised Markup Language (SGML). - Possible to search, sort, manipulate and render
XML using Extensible Markup Language (XSL) - Highly portable
- Files end in the .xml extension
3From HTML to XML..
- HTML major drawback information loses its
structure when translated into HTML - HTML is a presentation-oriented markup language,
so information embodied in it is difficult to
process - Information and knowledge servers are overloaded
because have to search information and perform
format processing - Servers often answer the same request many times
if users request several views on the same data
4From HTML to XML..
- HTML
- Lacks extensibility cant create tags or
attributes to parameterise or semantically
qualify data - Lacks structure does not support the
specification of deep structures needed to
represent database schemas or object-oriented
hierarchies - Lacks validation does not support language
specification that lets applications check
imported datas structural validity
5XML..
- XML is not for displaying information but for
managing information. - Working group of World Wide Web Consortium (W3C)
created XML as a standard for creating markup
languages. - Designed it for distributing structured
documents over the web - A kind of light SGML (Standard General Markup
Language) simplified to meet Web requirements - Unlike HTML, XML lets users
- Extract data from a document
- Define their own tags and attributes
- Define data structures and nest document
strcutures to any complexity level - Make applications that validate a documents
structure. Any XML document can contain an
optional description of its grammar for use by
applications that perform structural validation
6XML..
- The problem that XML helps us to solve is how to
transfer data between servers, or between the
client and the server. - Transferring data using a binary format is
complicated, as problems arise because different
platforms tend not to represent or order data the
same way. - Therefore XML, a standardised text format,
becomes an attractive alternative.
7XML..
- It is a Markup language for describing structured
data content is separated from presentation - XML documents contain only data
- Applications decide how to display the data
- Language for creating markup languages
- Can create new tags
- XML documents contain only data, not formatting
instructions, so applications that process XML
documents must decide how to display the
documents data. - For example a PDA (personal digital assistant)
may render an XML document differently than a
wireless phone or desktop computer would render
that document.
8XML..
- XML is a meta-language
- With HTML, existing markup is static ltHEADgt and
ltBODYgt for example, are tightly integrated into
the HTML standard and cannot be changed or
extended. - XML, on the other hand, allows ou to create your
own markup tags and configure each to your
liking for example - ltWebEngHeadinggt
- ltWebEngSummarygt
- ltWebEngReallyWildFontgt
- Each of these elements can be defined through
user defined document type definitions (DTD) and
stylesheets and applied to one or more XML
documents. - There are no correct tags for an XML document,
except those defined by the author
9XML applications
- XML permits document authors to create markup for
virtually any type of information. - Authors can create entirely new markup languages
for describing specific types of data, including
mathematical formulas, chemical molecular
structures, music, recipes etc. - XHTML
- VoiceXML (for speech)
- MathML (for mathematics)
- SMIL (the Synchronous Multimedia Integration
Language, for multimedia presentations) - CML (Chemical Markup Language, for chemistry)
- XBRL (Extensible Business Reporting Language, for
financial data exchange)
10DTD valid wellformed
- An XML document optionally can reference a
document that defines that XML documents
structure. This document is either a Document
Type Definition (DTD) or a schema. - A DTD (document type definition) is a syntactic
specification used as a model for XML documents. - It contains definitions of the elements and their
attached attributes. - When an XML document references a DTD or schema,
some parsers (called validating parsers) can read
the DTD/schema and check that the XML document
follows the structure that the DTD/schema
defines. - If it conforms to the DTD/schema it is valid.
- If the parser can process an XML document
successfully it is well-formed (syntactically
correct). - By definition, a valid document is well-formed.
11XML Parsers
- Processing an XML document requires a software
program called an XML parser (or processer).
These are available at no charge in many
languages (Java, Python, C etc.). -
- www.xml.com/xml/pub/Guide/XML_Parsers
- Parsers check an XML documents syntax and enable
software programs to process marked-up data. XML
parsers can support the Document Object Model
(DOM) or the Simple API for XML (SAX). - DOM Build a tree structure containing the XML
documents data - SAX Process the document and generate events
12Structuring Data
- Element types
- Can be declared to describe data structure
- XML elements
- Root element
- Must be exactly one per XML document
- Contains all other elements in document
- Lines preceding the root element are called the
prolog - Container element
- Contains sub-elements (children)
- Empty element
- No matching end tag
- In HTML, IMG
- Terminate with forward slash (/)
13Example 1
- Declaration of XML version
Outside DTD needed webengxml.dtd SYSTEM keyword
lt?xml version"1.0" standalone"no"?gt lt!DOCTYPE
WebEngModules SYSTEM webengxml.dtdgt lt!
comment any text --!gt lt WebEngModules
xmlnsWebEnghttp//www.webeng.ac.uk/gt lt
WebEngTitlegtXML Lecturelt/ WebEngTitlegt lt
WebEngSummarygtAll about XMLlt/ WebEngSummarygt lt
WebEngHoursgt2lt/ WebEng Hours gt lt
/WebEngModulesgt
Root element outermost element in document
denotes start and end points
Descendants of ltWebEngModulesgt element
14Extensible Style Language (XSL)
- xmlns
- Defines an XML namespace
- Identifies collections of element type
declarations so that they do not conflict with
declarations of same name created by other
programmers - Predefined namespaces
- xml, xsl
- Programmers can create own namespaces
- ltTitlegtXML Lecturelt/Titlegt
- ltTitlegtCGI Lecturelt/Titlegt
- Can be differentiated by using namespaces
- ltWebEngTitlegtXML Lecturelt/WebEngTitlegt
- ltObjectDevTitlegtInheritance Lecturelt/ObjectDevTi
tlegt
15Example 2 XML file
lt?xml version "1.0"?gt lt! Web Eng letter
formatted with XML --gt lt!DOCTYPE letter SYSTEM
walter.dtd"gt ltlettergt ltcontact type
"from"gt ltnamegtWalter Enginelt/namegt ltaddress1gt23
Harley St.lt/address1gt ltaddress2gtlt/address2gt ltcit
ygtSunderlandlt/citygt ltpostcodegtWE6
0DDlt/postcodegt ltflag gender "M"/gt lt/contactgt lt
contact type "to"gt ltnamegtT. Blairlt/namegt ltaddr
ess1gt10 Downing St.lt/address1gt ltaddress2gtNice
Arealt/address2gt ltcitygtLondonlt/citygt
ltpostcodegtlt/postcodegt ltflag gender
"M"/gt lt/contactgt ltsalutationgtDear
Sirlt/salutationgt ltparagraphgtWe would be very
happy to teach about Web Engineering.lt/paragraphgt
ltclosinggtAll yourslt/closinggt ltsignaturegtMr.
Enginelt/signaturegt lt/lettergt
16Document Type Definitions (DTD)
- Document Type Definition
- Specify list of element types, attributes and
their relationships to each other - Optional, but recommended for program conformity
- !Element
- Element type declaration defines the rules for
an element - Plus sign () one or more occurrences
- Asterisk () any number of occurrences
- Question mark (?) either zero or exactly one
occurrence - Omitted operator exactly one occurrence
- PCDATA
- The element can store parsed character data
17Document Type Definitions (DTD)
- !ATTLIST
- Defines attributes for an element
- IMPLIED
- Can assign its own type attribute or ignore
- REQUIRED
- The specified attribute must be declared in the
document - FIXED
- The Specified attribute must be declared with
given value
18Example 3 DTD file
lt!-- walter.dtd --gt lt!ELEMENT letter (contact,
salutation, paragraph, closing, signature
)gt lt!ELEMENT contact (name, address1, address2,
city, postcode, gender)gt lt!ATTLIST contact
type CDATA IMPLIEDgt lt!ELEMENT name
(PCDATA)gt lt!ELEMENT address1 (PCDATA)gt lt!ELEMENT
address2 (PCDATA)gt lt!ELEMENT city
(PCDATA)gt lt!ELEMENT postcode (PCDATA)gt lt!ATTLIST
flag gender (M F) "M"gt
- walter.dtd
- Declare elements and elements attributes
- IMPLIED indicates attribute is
unspecifiedsystem gives it a value - CDATA states that attribute contains a string
- PCDATA specifies parsed character data
- EMPTY specifies element does not contain content
(commonly used for attributes)
19Example 3 IE display
20Customized Markup Languages
- Customized Markup Languages
- Can create own tags to describe data, creating a
new markup language - MathML
- Wireless Markup Language (WML)
- Extensible Business Reporting Language (XBRL)
- Electronic Business XML (ebXML)
- Financial Products Markup Language (FpML)
21MathML
- MathML
- Developed by W3C for describing mathematical
notations and expressions - Amaya browser
- www.w3.org/Amaya/User/BinDist.html
22WML
- Wireless Markup Language
- Allows portions of Web pages to be displayed on
wireless devices - Works with Wireless Application Protocol (WAP)
- www.wapforum.org
- www.xml.com/pub/Guide/WML
23XBRL
- Extensible Business Reporting Language (XBRL)
- Facilitates the creation, exchange and validation
of financial information - Namespaces
- Minimize conflicts between XML elements with the
same name - Example
- ltschoolsubjectgtEnglishlt/schoolsubjectgt
- ltmedicalsubjectgtThrombosislt/medicalsubjectgt
24ebXML
- Electronic Business XML (ebXML)
- Used for exchanging business data
- www.ebxml.org
25FpML
- Financial Products Markup Language (FpML)
- Emerging standard for exchanging financial
information over the Internet - www.fpml.org
26Other Markup Languages
- Chemical Markup Language (CML) www.xml-cml.org
- VoiceXML www.voicexml.org
- Synchronous Multimedia Integration Language (SMIL
) www.w3.org/AudioVideo - Vector Markup Language (VML) www.w3.org/TR/NOTE-
VML - Product Data Markup Language (PDML)
www.pdml.org - Commerce XML (cXML) www.cxml.org/home
- XMI (XML Metadata Interchange) www.omg.org
- Trading Partner Agreement Markup Language (tpaML)
- www-4.ibm.com/software/developer/library/tpaml.
html - Small to Medium Business XML (SMBXML)
www.smbxml.org - Financial XML (FinXML) www.finxml.org
- Financial Information Exchange Markup Language
(FixML) - www.fixprotocol.org
27Using XML with HTML
- XML documents are data sources
- XML documents embedded in HTML documents
- Using the XML tag
- Embedded XML document called a data island
- ltXML ID xmldocgtlt/XMLgt
- Marks boundaries of data island
- Attribute ID
- Name used to reference the data island
- DATASRCname attribute
- In opening TABLE elements start-tag, binds
specified data island to table - To use bound bound data
- Use SPAN element with a DATAFLD attribute
28Document Object Model (DOM)
- Document Object Model (DOM)
- Retrieving data from a text file impractical
- DOM created when XML file is parsed
- Hierarchical tree structure
- Node Each name in the tree structure
- Single root node contains all other nodes
- Tree structure for article.xml
29Document Object Model (DOM)
30Extensible Style Language (XSL)
- XML documents can be placed in their own file
- Referenced in HTML document
- ltXML ID name SRC fileName.htmlgtlt/XMLgt
- xslfor-each element
- Iterates over items in specified document
31Microsoft Schema
- Schema
- Microsofts expansion of the DTD
- Called XML-Data
- Developed to a schema create document definitions
using XML syntax - Schemas or DTDs
- May be used to specify documents grammar
- DTDs may be preferred because Microsofts schema
language is proprietary technology
32Extensible Hypertext Markup Language (XHTML)
- XHTML
- Allows
- Complex documents to be created by combining HTML
elements with XMLs extensibility - Ability to create new elements
- Example XHTML document might combine HTML
elements with MathML and CML elements - Well formed documents
- Each XHTML document validated using DTDs
- Features provide structure HTML lacks
- Uses XML syntax
- All tags lowercase and closed
33Validation of xhtmlExample.html
34Microsoft BizTalk
- Internet data exchange
- Sending data between organizations is difficult
- Different platforms, applications and data
specifications - XML simplifies data transfers
- Microsoft BizTalk
- Manages and facilitates business transactions
- Ensures uniformity
- Three parts
- BizTalk Server
- BizTalk Framework
- BizTalk Schema Library
35Simple Object Access Protocol (SOAP)
- Data transfers
- Clients little processing power, invoke method
calls on other machines - Behind firewalls
- SOAP messages
- Envelope structure for describing a method call
- Request remote procedure call
- Response HTTP response containing results of
method call