Title: eXtensible Markup Language
1eXtensible Markup Language
- Jesús Ibáñez, Toni Navarrete, Josep Blat
- Universitat Pompeu Fabra
2eXtensible Markup Language
- New Internet mark-up metalanguage
- Previously SGML, HTML, DHTMLs
- Extensibility, structure and validation
- SGML adaptation for WWW
3eXtensible Markup Language
- Defined as standard by W3C (Generic SGML
Editorial Review Board - XML Working Group) - XML ! HTML
- XML SGML--
- XML, DTD (Document Type Definition) and XSL
(eXtensible Style Language)
4Main Characteristics
- Describing semantically document content
- Uncoupling semantic description from presentation
- Allowing each user community to define its own
labels, for instance ltPRICEgt, ltAUTHORgt,
ltSECTIONgt, ltDATEgt, ltIMPORTANCE LEVEL"Expert"gt
5XML Example (without DTD)
- lt?XML version"1.0" standalone"yes"?gt
- ltconversationgt
- ltgreetinggtHello world!lt/greetinggt
- ltanswergtStop it, Im getting
off!lt/answergt - lt/conversationgt
6Example with DTD (1)
- lt!DOCTYPE Book
- lt!ELEMENT Book(Title, Author, Date, ISBN,
Publisher) - lt!ELEMENT Title(PCDATA)gt
- lt!ELEMENT Author(PCDATA)gt
- lt!ELEMENT Date(PCDATA)gt
- lt!ELEMENT ISBN(PCDATA)gt
- lt!ELEMENT Publisher(PCDATA)gt
- gt
7Example with DTD (2)
- lt?xml version"1.0"? standalonenogt
- lt!DOCTYPE Book SYSTEM "file//localhost/xml-course
/xsl/Book.dtd"gt - ltBookgt
- ltTitlegtMy Life and Timeslt/Titlegt
- ltAuthorgtPaul McCartneylt/Authorgt
- ltDategtJuly, 1998lt/Dategt
- ltISBNgt94303-12021-43892lt/ISBNgt
- ltPublishergtMcMillan
Publishinglt/Publishergt - lt/Bookgt
8DTDs
- Allow to create new sets of labels
- Examples
- lt!ELEMENT Title (PCDATA)gt
- lt!ELEMENT Disk (Disk)gt (1 or more)
- lt!ELEMENT Book (Book)gt (0 or more)
- ? (0 or 1) , (sequence) (option)
- Attributes
- lt!ATTLIST ARTICLE DATE CDATAgt (CDATA means
Character Data) - lt!ATTLIST PERSON GENDER (male female)
IMPLIEDgt (optional) - lt!ATTLIST PERSON GENDER (male female) male
REQUIREDgt (required)
9DTDs
- lt!DOCTYPE Discography
- lt!ELEMENT Discography (disk)gt
- lt!ELEMENT Disk (Title, Group, Song)gt
- lt!ELEMENT Title(PCDATA)gt
- lt!ELEMENT Group(PCDATA)gt
- lt!ELEMENT Song (titleS, Durationgt
- lt!ELEMENT titleS(PCDATA)gt
- lt!ELEMENT Duration(PCDATA)gt
- gt
10DTDs
- lt Discographygt
- lt Diskgt
- lt TitlegtBrother in armslt/ Titlegt
- lt GroupgtDire Straitslt/ Groupgt
- lt Songgt
- lt titleSgtMoney for nothinglt/ titleSgt
- lt Durationgt520lt/ Durationgt
- lt/ Songgt
- lt Songgt
- lttitleSgtSo far awaylt/titleSgt
- ltdurationgt410lt/durationgt
- lt/ Songgt
- ...
- lt/Diskgt
- ltDiskgt
- ltTitlegtOn every streetlt/Titlegt
- ltGroupgtDire Straitslt/Groupgt
- ltSonggt
- ...
11DTDs
- lt!DOCTYPE publications
- lt!ELEMENT publications (disk book)gt
- lt!ELEMENT book ... gt
- lt!ELEMENT disk ... gt
- gt
12DTDs
- ltpublicationsgt
- ltdiskgt
- lttitlediskgtBrother in armslt/titlediskgt
- ltgroupgtDire Straitslt/groupgt
- ltsonggt
- lttitleSgtMoney for nothinglt/titleSgt
- ltdurationgt520lt/durationgt
- lt/songgt
- ...
- lt/discgt
- ltbookgt
- lttitlebookgtCien años de soledadlt/titlebookgt
- ltwritergtGabriel GarcÃa Márquezlt/writergt
- ...
- lt/bookgt
- ltbookgt
- lttitlebookgtLa ciudad de los prodigioslt/titlebook
gt - ltwritergtEduardo Mendozalt/writergt
- ...
13DTDs
- lt?xml version"1.0"?gt
- lt!DOCTYPE file
- lt!ELEMENT file (name, surname, address,
picture?)gt - lt!ELEMENT name (PCDATA)gt
- lt!ATTLIST name sex (malefemale) IMPLIEDgt
- lt!ELEMENT surname (PCDATA)gt
- lt!ELEMENT address (PCDATA)gt
- lt!ELEMENT picture EMPTYgt
- gt
- ltfilegt
- ltname sexmalegtTonilt/namegt
- ltsurnamegtNavarretelt/surnamegt
- ltsurnamegtTerrasalt/surnamegt
- ltaddressgtRambla 32lt/addressgt
- lt/filegt
14Well formed vs valid
- Valid XML the content conforms to the rules of
the associated DTD. - Completeness, good format and attribute values of
the XML data is ensured. - Well formed adjusted to XML syntax
- An XML document without DTD can be well formed
but, of course, cannot be valid.
15XML Schemata
- XML Schemata to define the structure of XML
documents (same as DTDs) - BUT in XML syntax. Advantage same parser to
validate, tools for dynamic creation - Use of Namespaces
- Improved data type definition (41 instead of 10,
plus user-defined) - Object orientation allows new types by extension
or restriction of previous ones - Validation (a document wrt a scheme, a scheme wrt
scheme of schemes)
16Schema definition
- An XML document whose root is schema and within
it elements and attributes are defined - lt?xml version"1.0?gt
- ltschemagt
- ... elements and attributes definition
- lt/schemagt
- element definition
- ltelement namename of the element
- typetype of the element
- options...
- gt
17Simple types of elements
- string characters chain
- boolean (false, 0, true, 1)
- float (32 bits)
- double (64 bits)
- decimal (integer)
- timeDuration
- recurringDuration (several subtypes)
- binary
- uriReference (Uniform Resource Indicator)
- And derived from these basic ones
18Data type structure
19Example
- lt?xml version"1.0 encoding"ISO-8859-1?gt
- ltbookshopgt
- ltbook isbn"84-111-1111-1"gt
- lttitlegtEl Quijotelt/ titlegt
- ltauthorgtMiguel de Cervanteslt/authorgt
- ltpublishergtPlaza y Janéslt/publishergt
- ltcharactergtDon Quijotelt/charactergt
- ltcharactergtSancho Panzalt/charactergt
- ltcharactergtDulcinealt/charactergt
- ltcharactergtRocinantelt/charactergt
- lt/bookgt
- ltbook isbn"84-222-2222-2"gt
- lttitlegtLa ciudad de los prodigioslt/ titlegt
- ltauthorgtEduardo Mendozalt/authorgt
- ltpublishergtSeix-Barrallt/publishergt
- ltcharactergtOnofre Boubilalt/charactergt
- ltcharactergtEfren Castellslt/charactergt
- lt/bookgt
- ltbook isbn"84-333-3333-3"gt
XML document previous to schema definition
20Building blocks simple elements and cardinality
- Simple elements
- ltelement nametitle" type"string" /gt
- ltelement name"author" type"string" /gt
- ltelement namepublisher" type"string" /gt
- ltelement namecharacter"
- minOccurs"0" maxOccurs"unbounded" /gt
- A DTD would be like
- lt!ELEMENT title (PCDATA)gt
- In the cardinality definition we replace the DTD
symbols ?, ,
21Building blocks Complex types
- The element book is composite, thus we define it
as a complex type - ltelement namebook"gt
- ltcomplexTypegt
- ltsequencegt
- ltelement nametitle" type"string" /gt
- ltelement name"author" type"string" /gt
- ltelement namepublisher" type"string" /gt
- ltelement namecharacter" minOccurs"0"
maxOccurs"unbounded" /gt - lt/sequencegt
- lt/complexTypegt
- lt/elementgt
22Alternative naming complex types
- We could also define a complex type with a name
- ltelement namebook typeBooktype /gt
- ltcomplexType nameBooktypegt
- ltelement nametitle" type"string" /gt
- ltelement name"author" type"string" /gt
- ltelement namepublisher" type"string" /gt
- ltelement namecharacter" minOccurs"0"
maxOccurs"unbounded" /gt - lt/complexTypegt
23Remark the combination of both is not allowed
- ltelement namebook typeBooktypegt
- ltcomplexType nameBooktypegt
- ltelement nametitle" type"string" /gt
- ltelement name"author" type"string" /gt
- ltelement namepublisher" type"string" /gt
- ltelement namecharacter" minOccurs"0"
maxOccurs"unbounded" /gt - lt/complexTypegt
- lt/elementgt
24Building blocks empty elements
- Elements such as HTML tags lthrgt or ltimg ...gt are
empty - lthr /gt
- ltimg srcimage.gif /gt
- Empty has to be declared as an implicit complex
type
ltelement namehrgt ltcomplexType contentempty
/gt lt/elementgt
ltelement nameimggt ltcomplexType
contentemptygt ltattribute namesrc
typestring /gt lt/complexTypegt lt/elementgt
25A level upwards ...
- Let us define bookshop
- ltelement namebookshop"gt
- ltcomplexTypegt
- ltelement namebook"
- minOccurs"0 maxOccurs"unbounded"gt
- ltcomplexTypegt
- ...
- lt/complexTypegt
- lt/elementgt
- lt/complexTypegt
- lt/elementgt
A schema definition is a BOTTOM-UP process
26Attribute definition
- Elements can have attributes associated to them
- In DTDs, we would write
- lt!ATTLIST book isbn REQUIREDgt
- In XML Schema
- ltattribute namename of the attribute
- typetype of the attribute
- options of the attribute ...
- gt
27Attribute definition
- At the end of the element definition
- ltelement namebook" minOccurs"0"
maxOccurs"unbounded"gt - ltcomplexTypegt
- ltelement nametitle" type"string" /gt
- ltelement name"autor" type"string" /gt
- ltelement namepublisher" type"string" /gt
- ltelement namecharacter"
- minOccurs"0" maxOccurs"unbounded" /gt
- ltattribute name"isbn" type"string" /gt
- lt/complexTypegt
- lt/elementgt
28General ordering
- The definitions are ordered for a better
legibility - 1) Simple types definition
- 2) Attributes definition
- 3) Complex types definition
29Referencing the schema
- We then add the schema reference in the XML
document assume it is book.xml and bookshop is
book.xsd then we would write - lt?xml version"1.0" encoding"ISO-8859-1"?gt
- ltbookshop
- xmlnsxsi"http//www.w3.org/2000/10/XMLSchema-in
stance - xsinoNamespaceSchemaLocationbook.xsd
- gt
- ...
- lt/bookshopgt
30Namespaces
- An XML Namespace is a collection of names (of
elements and attributes) identified by an URI - Namespaces are a very flexible tool. The re-use
of schemata, names, mixing them is promoted. - For instance we could use elements from two name
spaces - lt BOOKSgt
- ltbk BOOK xmlnsbk"urn BookLovers.orgBookInfo
- xmlnsmoney"urnFinanceMoney"gt
- ltbkTITLEgtA Suitable Boylt/bkTITLEgt
- ltbkPRICE moneycurrency"US Dollar"gt22.95lt/bkP
RICEgt - lt/bkBOOKgt
- lt/BOOKSgt
31Namespaces
- http//www.w3.org/2000/10/XMLSchema
- This is the Namespace for the schemata. Suffix
xsd is used if none, it is the default namespace - http//www.w3.org/2000/10/XMLSchema-instance
- Namespace for the documents instantiated from a
schema. The prefix xsi is usually used.
32Example
- ltschema xmlns"http//www.w3.org/2000/10/XMLSchema
1 - targetNamespace"http//www.upf.es/namespa
ces/Book 2 - elementFormDefault"qualified 3
- xmlnsxsi"http//www.w3.org/2000/10/XMLSc
hema-instance - xsischemaLocation
- "http//www.w3.org/2000
/10/XMLSchema - http//www.w3.org/2000
/10/XMLSchema.xsd" - xmlnsbk"http//www.publishing.org/namesp
aces/Book"gt
1 Indicates the default namespace, which is
XMLSchema 2 Indicates that the elements and
attributes in this schema are defined upon the
namespace http//www.upf.es/namespaces/Book 3 Ind
icates that all the elements created in this
namespace and used in the instantiated documents
have to be qualified with a prefix (if we had
used unqualified, only the global elements could
go)
33Example (2)
- ltschema xmlns"http//www.w3.org/2000/10/XMLSchema
- targetNamespace"http//www.upf.es/namespa
ces/Book - elementFormDefault"qualified
- xmlnsxsi"http//www.w3.org/2000/10/XMLSc
hema-instance 4 - xsischemaLocation 5
- "http//www.w3.org/2000
/10/XMLSchema 6 - http//www.w3.org/2000
/10/XMLSchema.xsd" 7 - xmlnsbk" http//www.upf.es/namespaces/Bo
ok"gt
4 Indicates that this XML document is
instantiated from the general Schema on Schemata
5 This is the namespace where the attribute
schemaLocation is defined 6 The namespace for the
general Schema on Schemata 7 URI of this Schema
on Schemata
34Example (3)
- ltschema xmlns"http//www.w3.org/2000/10/XMLSchema
- targetNamespace"http//www.upf.es/namespa
ces/Book - elementFormDefault"qualified
- xmlnsxsi"http//www.w3.org/2000/10/XMLSc
hema-instance - xsischemaLocation
- "http//www.w3.org/2000
/10/XMLSchema - http//www.w3.org/2000
/10/XMLSchema.xsd" - xmlnsbk"http//www.upf.es/namespaces/Boo
k"gt 8
8 We give a prefix to the target namespace to
facilitate the use in documents, for
instance ltelement refbkTitle" minOccurs"1"
maxOccurs"1"/gt
35Example (and 4)
- In the instantiated document
- ltbookshop xmlns "http//www.upf.es/namespaces/Boo
k 1 - xmlnsxsi"http//www.
w3.org/2000/10/XMLSchema-instance 2 - xsischemaLocationht
tp//www.upf.es/namespaces/book.xsd"gt 3
1 We define the default namespace of the
document 2 We include the namespace where schema
instantiation is defined (xsi) 3 With
schemaLocation we specify where is the Schema for
this document (book.xsd)
36Other important concepts
- ID and IDREFS
- DOM (Document Object Model)
- X-path
- X-pointer
- X-link
37ID and IDREFS
- ID attribute for unique identification of
element. Similar role of URI. Example assigning
the identity attack - ltparagraph idattackgtSuddenly the skies were
filled with aircraftlt/paragraphgt - IDREFS (identity reference) easiest way of
referring to an ID. Example In a DTD defined
attributes of employee empnumber as an ID and
boss as IDREFS here we say that Hanks ID is
126 and his boss is 124 (defined earlier) - lt employee empnumberemp126 bossemp124gt
Hanklt/employeegt
38DOM (Document Object Model)
- DOM is a technology for accessing and
manipulating parts of an XML document - DOM models a document as a tree whose nodes are
its elements - Then some properties and methods exist for the
objects, allowing the access and manipulation
39X-PATH
- X-Path is a language for referencing parts of an
XML document - It is used, for instance, to transform a document
through XSL - X-Path is based upon DOM and uses paths (similar
to URLs) to reference parts of a document
40X-POINTER
- X-Pointer is a language for pointing at a part of
an XML document - X-Pointer uses X-path for pointing
- X-Pointer enables linking
41Linking using XML X-LINK
- X-Link is a language for describing how to link
resources in XML - We use attributes for the element link in the
NameSpace xlink at "http//www.w3.org/XML/XLink/1.
0" - The attributes are used to describe end-points,
traversal, effect, resources
42Tools
- XML Browsers (visualisers)
- XML Editors
- XML Parsers
- XML Servers
- Relational DB to XML converters
- XSL Editors
- XSL Processors
43XSL
- Allows to incorporate a design into an XML
document, generating HTML, PDF, mail, SMS
message, ... - Using CSS and DSSSL (SGML)
44XSL
lt?xml version"1.0"?gt lt!DOCTYPE BookCatalogue
SYSTEM "file//localhost/xml-course/xsl/BookCatalo
gue.dtd"gt ltBookCataloguegt ltBookgt
ltTitlegtMy Life and Timeslt/Titlegt
ltAuthorgtPaul McCartneylt/Authorgt
ltDategtJuly, 1998lt/Dategt
ltISBNgt94303-12021-43892lt/ISBNgt
ltPublishergtMcMillin Publishinglt/Publishergt
lt/Bookgt ltBookgt
ltTitlegtIllusions The Adventures of a Reluctant
Messiahlt/Titlegt ltAuthorgtRichard
Bachlt/Authorgt ltDategt1977lt/Dategt
ltISBNgt0-440-34319-4lt/ISBNgt
ltPublishergtDell Publishing
Co.lt/Publishergt lt/Bookgt ltBookgt
ltTitlegtThe First and Last
Freedomlt/Titlegt ltAuthorgtJ.
Krishnamurtilt/Authorgt
ltDategt1954lt/Dategt
ltISBNgt0-06-064831-7lt/ISBNgt
ltPublishergtHarper amp Rowlt/Publishergt
lt/Bookgt lt/BookCataloguegt
45XSL
Document /
PI lt?xml version1.0?gt
DocumentType lt!DOCTYPE BookCatalogue ...gt
Element BookCatalogue
Element Book
Element Book
Element Book
...
...
Element ISBN
Element Publisher
Element Author
Element Date
Element Title
Text McMillin Publishing
Text 94303-12021-43892
Text My Life ...
Text July, 1998
Text Paul McCartney
46XSL
lt?xml version"1.0"?gt ltxslstylesheet
xmlnsxsl"http//www.w3.org/1999/XSL/Transform"
version"1.0"gt
ltxsltemplate match"/"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"BookCatalogue"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Book"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Title"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Author"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Date"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"ISBN"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Publisher"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"text()"gt
ltxslvalue-of select"."/gt lt/xsltemplategt lt/x
slstylesheetgt
47XSL
lt?xml version"1.0"?gt ltxslstylesheet
xmlnsxsl"http//www.w3.org/1999/XSL/Transform"
version"1.0"gt
ltxsltemplate match"/"gt ltHTMLgt
ltHEADgt ltTITLEgtBook
Cataloguelt/TITLEgt lt/HEADgt
ltBODYgt ltxslapply-templates/gt
lt/BODYgt lt/HTMLgt
lt/xsltemplategt ltxsltemplate
match"BookCatalogue"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Book"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Title"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Author"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Date"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"ISBN"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"Publisher"gt
ltxslapply-templates/gt lt/xsltemplategt
ltxsltemplate match"text()"gt
ltxslvalue-of select"."/gt lt/xsltemplategt lt/x
slstylesheetgt
added these
BookCatalogue.xsl
48XML-based formats
- XML is an architecture not an application
- SMIL (Synchronized Multimedia Integration
Language) - RDF (Resource Description Framework) for metadata
- CDF (Channel Definition Format) canales Microsoft
- MathML (Mathematical Markup Language)
- CML (Chemical Markup Language)
- BSML (Bioinformatic Sequence Markup Language)
- JML
- WIDL (B2B integration)
49Processing
- Two orientations to process XML documents using
Java as programming language - DOM (Document Object Model)
- tree structure (nodes, elements and text), most
used - SAX (Serial Access with the Simple API for XML)
- event based
- Fastest, less memory requirements, more difficult
to program
50Some references
- http//www.w3.org/
- Official web with all the standards
- http//www.xml.com/
- Web from OReilly publishers. A lot of good
documentation and resources. - http//www.xfront.com/
- Very good tutorials of XSL and XML-Schema
- http//xml.apache.org
- Apache parsers and documentation (Xerces, Xalan,
...) - XML and Java. B. McLAUGHLIN. OReilly, 2000
- Interesting about their combination using Apache
parsers -