Title: XML Extensible Markup Language
1XML Extensible Markup Language
- Aleksandar Bogdanovski
- Programing Enviroment LABoratory
- alebo_at_ida.liu.se
2What is XML ?
- Unformal definition
- XML is a markup language for representation of
- documents which contain stuctured information.
New in XML ?
http//www.w3.org/XML/1999/XML-in-10-points.html
http//www.w3.org/XML/1999/XML-in-10-points.html
3Document
- The word "document" refers not only to
traditional documents, but also to - the countless number of other XML "data
formats". These include vector - graphics, e-commerce transactions, mathematical
equations, object meta- - -data, server APIs, and a thousand other kinds of
structured information.
4SGML XML HTML
- SGML provides arbitrary structure. Full SGML
systems solve large, complex problems that
justify their expense. - XML is defined as an application profile of SGML,
or roughly speaking, a restricted form of SGML.
XML specifies neither semantics nor a tag set.
XML provides a facility to define tags and the
structural relationships between them. - In HTML, both the tag semantics and the tag set
are fixed.
510 Commandmends of XML
- XML shall be straightforwardly usable over the
Internet. - XML shall support a wide variety of applications.
- XML shall be compatible with SGML.
- It shall be easy to write programs which process
XML documents. - The number of optional features in XML is to be
kept to the absolute minimum, ideally zero. - XML documents should be human-legible and
reasonably clear. - The XML design should be prepared quickly.
- The design of XML shall be formal and concise.
- XML documents shall be easy to create.
- Terseness in XML markup is of minimal importance.
6Look and feel of XML Document
- A Simple XML Document
- lt?xml version"1.0"?gt
- ltoldjokegt
- ltburnsgtSay ltquotegtgoodnightlt/quotegt,
Gracie.lt/burnsgt - ltallengtltquotegtGoodnight, Gracie.lt/quotegtlt/allengt
- ltapplause/gt
- lt/oldjokegt
7Markup in XML
- There are six kinds of markup in XML
- Elements
- Entity references
- Comments
- Processing instructions (Pis)
- Marked (CDATA) sections
- Document type declarations (DTD)
8Elements
- Elements are the most common form of markup.
- Elements identify the nature of the content they
surround. - Some elements may be empty, in which case they
have no content. - Element begins with a start-tag, ltelementgt, ends
with an end-tag, lt/elementgt, - and has some content in between.
- Example
- ltburnsgtSay ltquotegtgoodnightlt/quotegt,
Gracie.lt/burnsgt - Attributes are name-value pairs that occur in
start-tags after the element name. - Example
- ltvolvo types40gt
- In XML, all attribute values must be quoted.
9Entity references
- To represent special characters in XML, entities
are used. Entities are - also used to refer to often repeated or varying
text and to include the - content of external files.
- Every entity must have a unique name.
- A special form of entity reference, called a
character reference, can be - used to insert arbitrary Unicode characters into
your document. - Character references take one of two forms
decimal references, 8478, - and hexadecimal references, x211E.
10Comments
- Comments begin with lt!-- and end with --gt.
Comments can contain - any data except the literal string --.
- Comments can be placed anywhere between markup in
the document. - Comments are not part of the textual content of
an XML document. An - XML processor is not required to pass them along
to an application.
11Processing instructions (PIs)
- Like comments, Pis are not textually part of the
XML document, but the - XML processor is required to pass them to an
application. - Processing instructions have the form lt?name
pidata?gt. - The name, called the PI target, identifies the PI
to the application. Any - data that follows the PI target is optional.
- The names used in PIs may be declared as
notations in order to formally - identify them.
- PI names beginning with xml are reserved for XML
standardization.
12CDATA sections
- CDATA section instructs the parser to ignore most
markup characters. - Example
- lt!CDATA
- p q
- b (i lt 3)
- gt
- Between the start of the section, lt!CDATA and
the end of the section, gt, - all character data is passed directly to the
application, without interpretation. - The only string that cannot occur in a CDATA
section is gt.
13Document Type Declaration (DTD)
- One of the greatest strengths of XML is that it
allows you to create your - own tag names. But it is not meaningful for tags
to occur in a completely - arbitrary order
- Example
- ltgraciegtltquotegtltoldjokegtGoodnight,
- ltapplause/gtGracielt/oldjokegtlt/quotegt
- ltburnsgtltgraciegtSay ltquotegtgoodnightlt/quotegt,
- lt/graciegtGracie.lt/burnsgt
- This doesnt make any sence, but syntactically
theres nothing wrong. For - the document to have a meaning, some constraints
on the sequence and - nesting of tags should be imposed.
14Document Type Declaration (DTD)
- Constarints are expresed in the Declarations.
- Declarations allow a document to communicate
meta-information to the - parser about its content.
- There are four kinds of declarations in XML
- Element Type Declarations
- Attribute List Declarations
- Entity Declarations
- Notation Declarations
15Element Type Declaration
- Element type declarations identify the names of
elements and the nature - of their content.
- Example
- lt!ELEMENT oldjoke (burns, allen, applause?)gt
- lt!ELEMENT burns   (PCDATA quote)gt
- lt!ELEMENT allen   (PCDATA quote)gt
- lt!ELEMENT quote   (PCDATA)gt
- lt!ELEMENT applause EMPTYgt
16Attribute List Declarations
- Attribute list declarations identify which
elements may have attributes, - what attributes they may have, what values the
attributes may hold, and - what value is the default.
- Example
- lt!ATTLIST oldjoke   Â
- name Â
- IDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â
- REQUIRED Â Â Â
- labelÂ
- CDATAÂ Â Â Â Â Â Â Â Â Â Â Â
- IMPLIED Â Â Â
- status ( funny notfunny ) 'funny'gt
- Each attribute in a declaration has 3 parts a
name, a type, and default - value.
17Attribute List Declarations
There are six possible attribute types
- CDATA
- CDATA attributes are strings, any text is
allowed. - ID
- The value of an ID attribute must be a name. All
of the ID values used in document - must be different. IDs uniquely identify
individual elements in a document. Elements - can have only a single ID attribute.
- IDREF or IDREFS
- An IDREF attribute's value must be the value of a
single ID attribute on some element - in the document. The value of an IDREFS attribute
may contain multiple IDREF - values separated by white space.
ENTITY or ENTITIES An ENTITY attribute's value
must be the name of a single entity. The value of
an ENTITIES attribute may contain multiple
entity names separated by white space. NMTOKEN
or NMTOKENS Restricted form of string attribute.
NMTOKEN attribute must consist of a single word
, but there are no additional constraints. The
value of an NMTOKENS attribute may contain
multiple NMTOKEN values separated by white space.
A list of names The value of an attribute must
be taken from a specific list of names. This is
frequently called an enumerated type.
Alternatively, you can specify that the names
must match a notation name.
18Attribute List Declarations
There are four possible default values
- REQUIRED
- The attribute must have an explicitly specified
value on every occurrence of the element in the - document.
- IMPLIED
- The attribute value is not required, and no
default value is provided. If a value is not
specified, the - XML processor must proceed without one.
- "value"
- An attribute can be given any legal value as a
default. The attribute value is not required on
each - element in the document, and if it is not
present, it will appear to be the specified
default. - FIXED "value"
- An attribute declaration may specify that an
attribute has a fixed value. In this case, the
attribute is - not required, but if it occurs, it must have the
specified value. If it is not present, it will
appear to - be the specified default.
19Entity Declarations
- Entity declarations allow you to associate a name
with some other fragment - of content. That construct can be a chunk of
regular text, a chunk of the - document type declaration, or a reference to an
external file containing - either text or binary data.
- Example
- lt!ENTITY ATI "ArborText, Inc."gt
- lt!ENTITY boilerplate SYSTEM "/standard/legalnotice
.xml"gt - lt!ENTITY ATIlogo SYSTEM "/standard/logo.gif"
NDATA GIF87Agt - There are three kinds of entities
- Internal Entities
- External Entities
- Parameter Entities
20Internal Entities
- The internal entities associate a name with a
string of literal text. - Example
- lt!ENTITY ATI "ArborText, Inc."gt
- Internal entities allow to define shortcuts for
frequently typed text or - text that is expected to change, such as the
revision status of a - document.
- The XML specification predefines five internal
entities - lt produces the left angle bracket, lt
- gt produces the right angle bracket, gt
- amp produces the ampersand,
- apos produces a single quote character (an
apostrophe), ' - quot produces a double quote character, "
21External Entities
- External entities associate a name with the
content of another file. - External entities contain either text or binary
data. - Example
- lt!ENTITY boilerplate SYSTEM "/standard/legalnotic
e.xml"gt - lt!ENTITY ATIlogo SYSTEM "/standard/logo.gif"
NDATA GIF87Agt - The textual content of the external file is
inserted at the point of - reference and parsed as part of the referring
document. - Binary data is not parsed and may only be
referenced in an attribute.
22Parameter Entities
- Parameter entities can only occur in the document
type declaration. - Parameter entity references are immediately
expanded in the - Document type declaration and their replacement
text is part of the - declaration.
- Example
- lt!ENTITY personcontent "PCDATA quote"gt
- lt!ELEMENT burns (personcontent)gt
- lt!ELEMENT allen (personcontent)gt
23Notation Declaration
- Notation declarations identify specific types of
external binary data. This - information is passed to the processing
application, which may make - whatever use of it it wishes.
- Example
- lt!NOTATION GIF87A SYSTEM "GIF"gt
24Including a DTD
- The DTD must be the first thing in the document
after the optional - processing instructions and comments.
- The DTD identifies the root element of the
document and may contain - additional declarations.
- Example
- lt?XML version"1.0" standalone"no"?gt
- lt!DOCTYPE chapter SYSTEM "dbook.dtd
- lt!ENTITY ulink.module "IGNORE"gt
- lt!ELEMENT ulink (PCDATA)gt
- lt!ATTLIST ulink   Â
- xmllink CDATAÂ FIXED "SIMPLE" Â Â Â
- xml-attributes CDATAÂ FIXED "HREF URL" Â Â Â
- URLÂ Â Â Â Â Â Â Â Â Â Â CDATAÂ REQUIREDgt gt
- ltchaptergt...lt/chaptergt
25Well-formed Documents
- A document can only be well-formed if it obeys
the syntax of XML.
The document must meet all of the following
conditions
- The document instance must conform to the grammar
of XML documents. - The replacement text for all parameter entities
referenced inside a markup declaration consists
of zero or more complete markup declarations. - No attribute may appear more than once on the
same start-tag. - String attribute values cannot contain references
to external entities.
- Non-empty tags must be properly nested.
- Parameter entities must be declared before they
are used. - All entities except the following amp, lt, gt,
apos, and quot must be declared. - A binary entity cannot be referenced in the flow
of content, it can only be used in an attribute
declared as ENTITY or ENTITIES. - Neither text nor parameter entities are allowed
to be recursive, directly or indirectly.
26Valid Documents
- A well-formed document is valid only if it
contains a proper document type - declaration and if the document obeys the
constraints of that declaration - (element sequence and nesting is valid, required
attributes are provided, - attribute values are of the correct type, etc.).
The XML specification - identifies all of the criteria in detail.
27The XML family
- Set of modules that offer useful services to
accomplish important and - frequently demanded tasks.
- The XML family consists of
- XLink
- XPointer
- XSchemas
- CSS XSL XSLT
- DOM etc.
28XLink
- Some of the highlights of XLink are
- XLink gives you control over the semantics of
the link - XLink introduces Extended Links. Extended Links
can involve more than two resources. - There are two types of links
- Simple Links
- Extended Links
29Simple Links
- A Simple Link strongly resembles an HTML ltAgt
link - Example
- ltlink xmllink"simple" href"locator"gtLink
Textlt/linkgt - A Simple Link identifies a link between two
resources, one of which is the - content of the linking element itself. This is an
in-line link. - The locator identifies the other resource. The
locator may be a URL, a - query, or an Extended Pointer.
30Extended Links
- Extended Links allow you to express relationships
between more than two - resources
- Example
- ltelink xmllink"extended" role"annotation"gt
- ltlocator xmllink"locator" href"text.loc"gt
- The Textlt/locatorgt
- ltlocator xmllink"locator" href"annot1.loc"gt
- Annotations lt/locatorgt
- ltlocator xmllink"locator" href"annot2.loc"gt
- More Annotationslt/locatorgt
- ltlocator xmllink"locator" href"litcrit.loc"gt
- Literary Criticismlt/locatorgt
- lt/elinkgt
31XPointer
- XPointers offer a syntax that allows you to
locate a resource by traversing - the element tree of the document containing the
resource. - Example
- child(2,oldjoke).(3,.)
- locates the third child (whatever it may be) of
the second oldjoke in the - document.
- XPointers can span regions of the tree.
- Example
- span(child(2,oldjoke),child(3,oldjoke))
- selects the second and third oldjoke s in the
document.
32XSchemas
- XML Schemas help developers to precisely define
the - structures of their own XML-based formats.
33CSS XSL XSLT
- CSS the style sheet language, is applicable to
XML as it is to HTML. - XSL is the advanced language for expressing style
sheets. It is based on - XSLT a transformation language used for
rearranging, adding and deleting - tags and attributes.
34Conclusion
- XML isn't always the best solution, but it is
always - worth considering.