Title: Introduction to XML
1Introduction to XML
2Contents
- XML Motivation
- XML DTD
- XML Name Space
- XML Schema
3Quick Introduction
- XML stands for
- eXstensible Markup Language
- It is a language for creating new languages
- In particular, it is designed to create tagged
languages - similar to HTML
- It is considered extensible because it allows
the - developer to create new tags
- as compared to HTML where the set of tags
has been fixed and new tags are ignored by
browsers
4An additional problem
- An additional problem can be seen by viewing the
HTML - source of the the CNN website
- This page is filled with headlines and
text/ images that support those headlines - A major headline looks like this
- lt H3gt lt A href " . . . " class " t1" gt Earliest
- certified election results in Florida
- 6 p. m. ESTlt / Agt lt / H3gt
- A minor headline looks like this
- nbsp nbsp 149 nbsp lt a href " .
. . " gt Bush sues - 4 counties over absentee ballotslt / agt lt brgt
5The XML approach
- Imagine if the source for CNN s webpage looked
like this - lt storygt
- lt headline class important gt Election returns
due - at 6 PM EST. lt / headlinegt
- lt supportingTextgt Blah Blah Blah lt /
supportingTextgt - lt / storygt
- Here, structure is preserved
- It would be very easy to write a program to
grab the headlines out of this document - How do we handle presentation?
- XSLT
6XML definitions
- An XML document consists of the following parts
- a Document Type Definition ( or DTD)
- Data
- The DTD defines the structure of the data that
follows it. A - parser can thus read the DTD and know how to
parse the - data that follows it
- As such, XML documents are said to be self-
describing , all the information for parsing the
data is contained in the document itself
7Well- Formed XML Documents
- XML documents are considered well-formed if they
conform to the XML Syntax rules - Well- formed documents can be parsed by any XML
Parser without the need for a DTD - It can use the rules to parse the document
cleanly, but without the DTD it does not know if
the document is valid
8Valid XML Documents
- An XML document is considered valid if
- ( 1) it is well- formed and
- ( 2) it conforms to the rules specified in its
associated DTD - That is, if the DTD says that a lt pgt tag can only
- contain lt bgt tags and plain text, then a lt pgt tag
- which contains an lt emgt tag is considered invalid
9XML Elements
- XML documents consist of one or more elements.
- Elements consist of a pair of tags and
(optionally) enclosed text.ltTITLEgtThe XML
Companionlt/TITLEgt - Elements may have attributes.ltTITLE
typebookgtThe XML Companionlt/TITLEgt - Elements may contain other elements.ltREFERENCEgt
ltTITLE typebookgtThe XML Companionlt/TITLEgtlt/
REFERENCEgt - Empty elements may be self closing.ltPICTURE
srcmypic.jpggt lt/PICTUREgtltPICTURE
srcmypic.jpg /gt
10XML Rules (vs HTML)
- All elements must be closed.
- Elements cannot overlap.ltBgtbold ltIgtbold
italiclt/Bgt italiclt/Igt this is illegal ! - XML is case sensitiveltbgt and ltBgt are different
tags - Attributes must be enclosed in inverted
commasltPICTURE srcmypic.jpg /gt
11A Simple XML Document
lt?xml version"1.0" ?gt ltbooklist title"Some
XML Books"gt lt/booklistgt
XML declaration
Root element (one per document)
12A Simple XML Document
lt?xml version"1.0" ?gt lt!DOCTYPE booklist SYSTEM
"books.dtd" gt ltbooklist title"Some XML
Books"gt lt/booklistgt
Define root element and specify DTD.
13A Simple XML Document
lt?xml version"1.0" ?gt lt!DOCTYPE booklist SYSTEM
"books.dtd" gt lt!-- This is a comment --gt
ltbooklist title"Some XML Books"gt lt/booklistgt
This is a comment (as SGML / HTML)
14A Simple XML Document
lt?xml version"1.0" ?gt lt!DOCTYPE booklist SYSTEM
"books.dtd" gt lt!-- This is a comment
--gt lt?xml-stylesheet type"text/xsl"
hrefiti-xml2.xsl"?gt ltbooklist title"Some
XML Books"gt lt/booklistgt
This defines the XSL stylesheet
15A Simple XML Document
lt?xml version"1.0" ?gt lt!DOCTYPE booklist SYSTEM
"books.dtd" gt lt!-- This is a comment
--gt lt?xml-stylesheet type"text/xsl"
href"books3.xsl"?gt lt?cocoon-process
type"xslt"?gt ltbooklist title"Some XML
Books"gt lt/booklistgt
This is a Cocoon processing directive (NB not
standard XML, but required by Cocoon 1.7.4).
16Adding Content
ltbooklist title"Some XML Books"gt ltbookgt
ltauthorgt ltnamegtSt. Laurentlt/namegt ltinitialgtSlt/i
nitialgt lt/authorgt ltdategt1998lt/dategt
lttitle edition"Second"gtXML A
Primerlt/titlegt ltpublishergtMIS
Presslt/publishergt ltwebsite
href"http//www.simonstl.com/xmlprim/" /gt
ltrating stars"4"/gt lt/bookgt lt/booklistgt
17Benefits of a DTD
- DTDs are optional in XML
- DTD allows validation of documents
- DTD defines the application
- Vital for collaborative development
- IPR implications
- DTD allows entity definitions (ie symbols,
shortcuts, foreign characters etc.).
18Document Declaration
- The document declaration comes after the XML
Declaration - Its tag name is DOCTYPE
- There are two forms
- internal
- lt ! DOCTYPE greeting . . . . DTD Goes Here
gt - external
- lt ! DOCTYPE greeting SYSTEM greeting. dtd gt
- We will cover the first form
19DTD Syntax
- The DTD is where you declare the elements ( a. k.
a. tags) and attributes that will appear in your
XML document - In defining elements, you use regular expressions
to declare the order in which elements are to
appear - Attributes can be associated with elements and
can have default values associated with them
20DTD for a Class Gradebook
- lt ! DOCTYPE gradebook
- lt ! ELEMENT gradebook ( class, student ) gt
- lt ! ELEMENT class ( name, studentsEnrolled) gt
- lt ! ATTLIST class semester CDATA REQUIREDgt
- lt ! ELEMENT name ( PCDATA) gt
- lt ! ELEMENT studentsEnrolled ( PCDATA) gt
- lt ! ELEMENT student ( name, grade ) gt
- lt ! ELEMENT grade ( PCDATA) gt
- lt ! ATTLIST grade name CDATA REQUIREDgt
- gt
21A XML Example from the DTD
- lt ? xml version 1. 0 ? gt
- lt ! DOCTYPE gradebook insert DTD from slide 19
here gt - lt gradebookgt
- lt class semester Fall 2000 gt
- lt namegt CSCI 3308lt / namegt
- lt studentsEnrolledgt 117lt / studentsEnrolledgt
- lt / classgt
- lt studentgt
- lt namegt Ken Andersonlt / namegt
- lt grade name lab0 gt 10lt / gradegt
- lt grade name lab1 gt 9lt / gradegt
- lt / studentgt
- lt gradebookgt
22Schema Overview
- An XML schema is an XML document containing a
formal description of what comprises a valid XML
document, it defines the elements of an XML
document and how these are structured. - The following schema instructions and guidelines
are referring to a schema written in the W3C XML
schema language. http//www.w3.org/2001/XMLSchema
- An XML document described by a schema is called
an instance document, if this satisfies all the
constraints specified by the schema, it is
considered to be schema-valid. - Various methods are utilized to associate an
instance document to a schema, here we use the
xsischemaLocation attribute of the root element
of the instance document. - To allow the exchange of XML documents between
different organizations a proper use of
namespaces is required to prevent
misunderstandings.
23Namespaces
- Namespaces have two purposes in XML
- To distinguish between elements and attributes
from different vocabularies with different
meanings that happen to share the same name - To group all the related elements and attributes
from a single XML application together so that
software can easily recognize them. - Namespaces are implemented by attaching a prefix
to each element and attribute separated by a
colon. Everything before the colon is called the
prefix, after the colon is called the local part
and the complete name, including the colon, is
called the qualified name, QName, or raw name. - Example
- ltfiInterestFisheriesgt
- lticcatInterestFisheriesgt
24Namespaces
- Each prefix is mapped to a URI by an xmlnsprefix
attribute, the URI is the real namespace while
the prefix is only a conventional acronym.
Examplesxmlnsiccat"http//www.iccat.es/sche
ma"xmlnsfi"http//www.fao.org/fi/figis/devcon/
"xmlnsxs"http//www.w3.org/2001/XMLSchema"
xmlnsxsi"http//www.w3.org/2001/XMLSchema-inst
ance" xsischemaLocation"http//www.iccat.es/sc
hema iccat.xsd"
25XML Namespaces
- Namespaces in XML are optional
- Namespaces ensure that elements are unique
resolve conflict among names of elements - In different contexts a given tag might mean
different things - eg consider ltBOOKgt - To me it might mean a book in a bibliography
- To a bookshop it might contain stock details
- To a travel agent it might contain information
about flight bookings! - Namespaces attach unique labels to a given tag
set. - URLs are usually used as namespace labels.
26Document Sources More Information
- References
- Kenneth M. Anderson, http//www.cs.colorado.edu/us
ers/kena/classes/3308/f00/lectures - Tim Brailsford,
- http//www.cs.nott.ac.uk/tjb/iti-xml/
- General XML Information
- http//www.w3c.org/xml/, http//www.xml.com/
- Free XML Parsers
- http / / xml. apache. org/
- Java and C parsers ( with bindings for Perl
and COM) - http / / www. alphaworks. ibm. com/
tech/ xml4j - IBM s Java parser for XML
- http / / www. alphaworks. ibm. com/
tech/ xml4c - IBM s C parser for XML
- http / / www. opentext. com/ services/
content_ management_ services/ xml_ sgml_
solutions. html
27Introduction to XML Schema
28Resources
- An Introduction to XML Schema
- http//www.cs.colorado.edu/kena/classes/7818/f01/
presentations/schema.ppt - XML Schema is a W3C Recommendation
- http//www.w3.org/XML/Schema
29Motivation
- Purpose of DTD
- Sharing grammar/data with others
- Validation by the parser
- Defaulting of values.
- Shortcomings of DTD
- a very limited capability for specifying
datatypes. - incompatible set of datatypes with those found in
databases - inconsistent syntax with XML
30XML Schema Requirements
- Structural Schemas
- Besides analogizing DTD, there are specific goals
beyond DTD - Integration with namespace
- Definition of incomplete constraints on content
of an element type - Integration of structural schemas with primitive
data types - inheritance
31XML Schema Requirements (2)
- Primitive Data Typing
- Based on experience with SQL, Java primitives.
- Conformance
- Define the relation of schemata to XML document
instances, and obligations on schema-aware
processors.
32Example (DTD)
- BookStore.dtd
- lt!ELEMENT BookStore (Book)gt
- lt!ELEMENT Book (Title, Author, Date, ISBN,
Publisher)gt - lt!ELEMENT Title (PCDATA)gt
- lt!ELEMENT Author (PCDATA)gt
- lt!ELEMENT Date (PCDATA)gt
- lt!ELEMENT ISBN (PCDATA)gt
- lt!ELEMENT Publisher (PCDATA)gt
33Example (Schema)
- lt?xml version"1.0"?gt
- ltxsdschema xmlnsxsd"http//www.w3.org/2001/XMLS
chema" - targetNamespace"http//www.
books.org" - xmlns"http//www.books.org"
- elementFormDefault"qualifie
d"gt - ltxsdelement name"BookStore"gt
- ltxsdcomplexTypegt
- ltxsdsequencegt
- ltxsdelement ref"Book"
minOccurs"1" maxOccurs"unbounded"/gt - lt/xsdsequencegt
- lt/xsdcomplexTypegt
- lt/xsdelementgt
- ltxsdelement name"Book"gt
- ltxsdcomplexTypegt
- ltxsdsequencegt
- ltxsdelement ref"Title"
minOccurs"1" maxOccurs"1"/gt - ltxsdelement ref"Author"
minOccurs"1" maxOccurs"1"/gt - ltxsdelement ref"Date"
minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"ISBN" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Publisher" minOccurs"1" maxOccurs"1"/gt
lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt ltxsdelement
name"Title" type"xsdstring"/gt
ltxsdelement name"Author" type"xsdstring"/gt
ltxsdelement name"Date" type"xsdstring"/gt
ltxsdelement name"ISBN" type"xsdstring"/gt
ltxsdelement name"Publisher" type"xsdstring"/gt
lt/xsdschemagt
34Example (vocabulary)
35Data Types
- A complex types allow elements in their content
and may carry attributes - A simple types cannot have element content and
cannot carry attributes, such integer. - A ur-type definition is present in each XML
Schema, serving as the root of the type
definition hierarchy for that schema. - Primitive datatypes are those that are not
defined in terms of other datatypes - Derived datatypes are those that are defined in
terms of other datatypes.
36(No Transcript)
37An Instance Document
lt?xml version"1.0"?gt ltBookStore xmlns
http//www.books.org -1
xmlnsxsihttp//www.w3.org/2001/XMLSchema-instanc
e -2 xsischemaLocation"http
//www.books.org
BookStore.xsd"gt -3
ltBookgt ltTitlegtMy Life and
Timeslt/Titlegt ltAuthorgtPaul
McCartneylt/Authorgt ltDategtJuly,
1998lt/Dategt ltISBNgt94303-12021-4389
2lt/ISBNgt ltPublishergtMcMillin
Publishinglt/Publishergt lt/Bookgt
... lt/BookStoregt
38Multiple Level Checking
BookStore.XML
BookStore.xsd
Validator
XMLSchema.xsd
39Web Service Summary
40Web Service
- Three Main Parts
- Simple Object Access Protocol (SOAP)
- Web Service Description Language (WSDL)
- Universal Description, Discovery, and Integration
(UDDI)
41Web Service
Web Service Stack Diagram
42Web Service
- Introduction to SOAP
- SOAP is to transfer information
- Simple XML Message
- Remote Procedure Call
- Strong Point of SOAP
- Lightweight Protocol
- Text-based XML format
- Can use HTTP Protocol
43Web Service
- Simple Object Access Protocol (SOAP)
- SOAP Message
- Envelope
- Header client authentication, transaction
management - Body include the information that a receiver
should get finally - Fault element
44Web Service
- Simple Object Access Protocol (SOAP)
- SOAP Message
- Envelope Top element for SOAP message
- Header client authentication, transaction
management - actor and mustUnderstand attribute of auth
element - Body include the information that a receiver
should get finally, - Information RPC request, RPC result, Error in
execution - Fault element
- SOAP Encoding
- How to processing data
- Ex) String title Book ? lttitle
xsitypexsdstringgtBooklt/titlegt - Encoding Style attribute
- envencodingStyle http//schemas.xmlsoap.org/soa
p/encoding/ - Simple Type
- Compound Type
45Web Service
- SOAP Encoding
- Simple Type
- int
- float
- negativeInteger
- string
- enumeration
- Compound Type
- Compound type value and structure
- Array
46Web Service
- SOAP Encoding
- Compound Type
- ltns0addBook3gt
- ltBook_1 hrefID1/gt
- lt/ns0addBook3gt
- ltbook idID1 xsitypens1Bookgt
- lttitle xsitypexsdstringgtbook1lt/titlegt
- ltprice xsitypexsdintgt29000lt/pricegt
- lt/bookgt
-
- deserialization by message receiver
- Book1 Book_1 new Book()
- Book_1.setTitle(book1)
- Book_1.setPrice(29000)
47Web Service
- Example of SOAP Message Request of getting
weather for the zip code
ltSOAP-ENVEnvelope xmlnsSOAP-ENV"http//schem
as.xmlsoap.org/soap/envelope/"
xmlnsxsi"http//www.w3.org/1999/XMLSchema-instan
ce" xmlnsxsd"http//www.w3.org/1999/XMLSchema
" gt ltSOAP-ENVBodygt ltns1getTemp
xmlnsns1"urnxmethods-Temperature"
SOAP-ENVencodingStyle"http//schemas.xmlsoap.org
/soap/encoding/"gt ltzipcode
xsitype"xsdstring"gt11211lt/zipcodegt
lt/ns1getTempgt lt/SOAP-ENVBodygt lt/SOAP-ENVEnvel
opegt
48Web Service
- Simple Object Access Protocol (SOAP)
- SOAP Message Transport
- Binding How to combine with transport protocol
- HTTP Binding
- HTTP Start line, Header, Body
- SOAPAction For RPC
49Web Service
- Web Service Definition Language(WSDL)
- Specification of Web Service Function
- Document Structure
ltdefinitionsgt lttypesgt Complex types for
arguments and return types lt/typesgt
ltmessagegt Describe arguments and return
values lt/messagegt ltportTypegt // interface
ltoperationgt Describe remote
procedures lt/operationgt lt/ portType gt
ltbindinggt Protocol used for invoking
SOAP(Application client), HTTP(Web Client),
MIME(char binary data) lt/bindinggt
ltservicegt ltportgt URL of Web
Service (endpoint) lt/portgt
lt/servicegt lt/definitionsgt
50Web Service
- Universal Description, Discovery, and Integration
(UDDI) - Create, Store, Search information
- UDDI Data Structure
- Information of White Page Company Name,
Address, Tel. No., and Description - Information of Yellow Page According to
Industry Classification(NAICS), According to
Products(UNSPEC), and Area - Information of Green Page Technical information
of company, ex) end point URL, URL of WSDL
document
51Web Service
UDDI Structure
Element Name Usage
Information Classification ltbusinessEntitygt
Company Name, Address
Correspond to White Page ltpublisherAssert
iongt Association among businessEntity
Correspond to White Page ltidentifierBaggt
Substitution ID for
businessEntity Correspond to
Yellow Page ltcategoryBaggt
Information for classification
Correspond to Yellow Page ltbusinessServicegt
Web Service name and description for
Correspond to Green Page
company ltbindingTemplategt
endpoint URL, tModel reference
Correspond to Green Page lttModelgt
URL of WSDL to define methods,
Correspond to Green Page
argument data types
for Web service
52Web Service
lttModelgt lt/tModelgt
ltbusinessEntitygt ltbusinessServicegt
ltbindingTemplategt Reference
lt/bindingTemplategt lt/businessEntitygt
ltpublisherAssertionsgt ltpublisherAssertiongt
Association ltpublisherAssertionsgt
ltpublisherAssertiongt
53Web Service Demonstration