XML Extensible Markup Language - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

XML Extensible Markup Language

Description:

XML is a markup language for representation of. documents which contain stuctured information. ... apos; produces a single quote character (an apostrophe) ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 35
Provided by: levonsa
Category:

less

Transcript and Presenter's Notes

Title: XML Extensible Markup Language


1
XML Extensible Markup Language
  • Aleksandar Bogdanovski
  • Programing Enviroment LABoratory
  • alebo_at_ida.liu.se

2
What is XML ?
  • Unformal definition
  • XML is a markup language for representation of
  • documents which contain stuctured information.

New in XML ?
http//www.w3.org/XML/1999/XML-in-10-points.html
http//www.w3.org/XML/1999/XML-in-10-points.html
3
Document
  • The word "document" refers not only to
    traditional documents, but also to
  • the countless number of other XML "data
    formats". These include vector
  • graphics, e-commerce transactions, mathematical
    equations, object meta-
  • -data, server APIs, and a thousand other kinds of
    structured information.

4
SGML XML HTML
  • SGML provides arbitrary structure. Full SGML
    systems solve large, complex problems that
    justify their expense.
  • XML is defined as an application profile of SGML,
    or roughly speaking, a restricted form of SGML.
    XML specifies neither semantics nor a tag set.
    XML provides a facility to define tags and the
    structural relationships between them.
  • In HTML, both the tag semantics and the tag set
    are fixed.

5
10 Commandmends of XML
  1. XML shall be straightforwardly usable over the
    Internet.
  2. XML shall support a wide variety of applications.
  3. XML shall be compatible with SGML.
  4. It shall be easy to write programs which process
    XML documents.
  5. The number of optional features in XML is to be
    kept to the absolute minimum, ideally zero.
  6. XML documents should be human-legible and
    reasonably clear.
  7. The XML design should be prepared quickly.
  8. The design of XML shall be formal and concise.
  9. XML documents shall be easy to create.
  10. Terseness in XML markup is of minimal importance.

6
Look and feel of XML Document
  • A Simple XML Document
  • lt?xml version"1.0"?gt
  • ltoldjokegt
  • ltburnsgtSay ltquotegtgoodnightlt/quotegt,
    Gracie.lt/burnsgt
  • ltallengtltquotegtGoodnight, Gracie.lt/quotegtlt/allengt
  • ltapplause/gt
  • lt/oldjokegt

7
Markup in XML
  • There are six kinds of markup in XML
  • Elements
  • Entity references
  • Comments
  • Processing instructions (Pis)
  • Marked (CDATA) sections
  • Document type declarations (DTD)

8
Elements
  • Elements are the most common form of markup.
  • Elements identify the nature of the content they
    surround.
  • Some elements may be empty, in which case they
    have no content.
  • Element begins with a start-tag, ltelementgt, ends
    with an end-tag, lt/elementgt,
  • and has some content in between.
  • Example
  • ltburnsgtSay ltquotegtgoodnightlt/quotegt,
    Gracie.lt/burnsgt
  • Attributes are name-value pairs that occur in
    start-tags after the element name.
  • Example
  • ltvolvo types40gt
  • In XML, all attribute values must be quoted.

9
Entity references
  • To represent special characters in XML, entities
    are used. Entities are
  • also used to refer to often repeated or varying
    text and to include the
  • content of external files.
  • Every entity must have a unique name.
  • A special form of entity reference, called a
    character reference, can be
  • used to insert arbitrary Unicode characters into
    your document.
  • Character references take one of two forms
    decimal references, 8478,
  • and hexadecimal references, x211E.

10
Comments
  • Comments begin with lt!-- and end with --gt.
    Comments can contain
  • any data except the literal string --.
  • Comments can be placed anywhere between markup in
    the document.
  • Comments are not part of the textual content of
    an XML document. An
  • XML processor is not required to pass them along
    to an application.

11
Processing instructions (PIs)
  • Like comments, Pis are not textually part of the
    XML document, but the
  • XML processor is required to pass them to an
    application.
  • Processing instructions have the form lt?name
    pidata?gt.
  • The name, called the PI target, identifies the PI
    to the application. Any
  • data that follows the PI target is optional.
  • The names used in PIs may be declared as
    notations in order to formally
  • identify them.
  • PI names beginning with xml are reserved for XML
    standardization.

12
CDATA sections
  • CDATA section instructs the parser to ignore most
    markup characters.
  • Example
  • lt!CDATA
  • p q
  • b (i lt 3)
  • gt
  • Between the start of the section, lt!CDATA and
    the end of the section, gt,
  • all character data is passed directly to the
    application, without interpretation.
  • The only string that cannot occur in a CDATA
    section is gt.

13
Document Type Declaration (DTD)
  • One of the greatest strengths of XML is that it
    allows you to create your
  • own tag names. But it is not meaningful for tags
    to occur in a completely
  • arbitrary order
  • Example
  • ltgraciegtltquotegtltoldjokegtGoodnight,
  • ltapplause/gtGracielt/oldjokegtlt/quotegt
  • ltburnsgtltgraciegtSay ltquotegtgoodnightlt/quotegt,
  • lt/graciegtGracie.lt/burnsgt
  • This doesnt make any sence, but syntactically
    theres nothing wrong. For
  • the document to have a meaning, some constraints
    on the sequence and
  • nesting of tags should be imposed.

14
Document Type Declaration (DTD)
  • Constarints are expresed in the Declarations.
  • Declarations allow a document to communicate
    meta-information to the
  • parser about its content.
  • There are four kinds of declarations in XML
  • Element Type Declarations
  • Attribute List Declarations
  • Entity Declarations
  • Notation Declarations

15
Element Type Declaration
  • Element type declarations identify the names of
    elements and the nature
  • of their content.
  • Example
  • lt!ELEMENT oldjoke  (burns, allen, applause?)gt
  • lt!ELEMENT burns    (PCDATA quote)gt
  • lt!ELEMENT allen    (PCDATA quote)gt
  • lt!ELEMENT quote    (PCDATA)gt
  • lt!ELEMENT applause EMPTYgt

16
Attribute List Declarations
  • Attribute list declarations identify which
    elements may have attributes,
  • what attributes they may have, what values the
    attributes may hold, and
  • what value is the default.
  • Example
  • lt!ATTLIST oldjoke    
  • name  
  • ID               
  • REQUIRED    
  • label 
  • CDATA            
  • IMPLIED    
  • status ( funny notfunny ) 'funny'gt
  • Each attribute in a declaration has 3 parts a
    name, a type, and default
  • value.

17
Attribute List Declarations
There are six possible attribute types
  • CDATA
  • CDATA attributes are strings, any text is
    allowed.
  • ID
  • The value of an ID attribute must be a name. All
    of the ID values used in document
  • must be different. IDs uniquely identify
    individual elements in a document. Elements
  • can have only a single ID attribute.
  • IDREF or IDREFS
  • An IDREF attribute's value must be the value of a
    single ID attribute on some element
  • in the document. The value of an IDREFS attribute
    may contain multiple IDREF
  • values separated by white space.

ENTITY or ENTITIES An ENTITY attribute's value
must be the name of a single entity. The value of
an ENTITIES attribute may contain multiple
entity names separated by white space. NMTOKEN
or NMTOKENS Restricted form of string attribute.
NMTOKEN attribute must consist of a single word
, but there are no additional constraints. The
value of an NMTOKENS attribute may contain
multiple NMTOKEN values separated by white space.
A list of names The value of an attribute must
be taken from a specific list of names. This is
frequently called an enumerated type.
Alternatively, you can specify that the names
must match a notation name.
18
Attribute List Declarations
There are four possible default values
  • REQUIRED
  • The attribute must have an explicitly specified
    value on every occurrence of the element in the
  • document.
  • IMPLIED
  • The attribute value is not required, and no
    default value is provided. If a value is not
    specified, the
  • XML processor must proceed without one.
  • "value"
  • An attribute can be given any legal value as a
    default. The attribute value is not required on
    each
  • element in the document, and if it is not
    present, it will appear to be the specified
    default.
  • FIXED "value"
  • An attribute declaration may specify that an
    attribute has a fixed value. In this case, the
    attribute is
  • not required, but if it occurs, it must have the
    specified value. If it is not present, it will
    appear to
  • be the specified default.

19
Entity Declarations
  • Entity declarations allow you to associate a name
    with some other fragment
  • of content. That construct can be a chunk of
    regular text, a chunk of the
  • document type declaration, or a reference to an
    external file containing
  • either text or binary data.
  • Example
  • lt!ENTITY ATI "ArborText, Inc."gt
  • lt!ENTITY boilerplate SYSTEM "/standard/legalnotice
    .xml"gt
  • lt!ENTITY ATIlogo SYSTEM "/standard/logo.gif"
    NDATA GIF87Agt
  • There are three kinds of entities
  • Internal Entities
  • External Entities
  • Parameter Entities

20
Internal Entities
  • The internal entities associate a name with a
    string of literal text.
  • Example
  • lt!ENTITY ATI "ArborText, Inc."gt
  • Internal entities allow to define shortcuts for
    frequently typed text or
  • text that is expected to change, such as the
    revision status of a
  • document.
  • The XML specification predefines five internal
    entities
  • lt produces the left angle bracket, lt
  • gt produces the right angle bracket, gt
  • amp produces the ampersand,
  • apos produces a single quote character (an
    apostrophe), '
  • quot produces a double quote character, "

21
External Entities
  • External entities associate a name with the
    content of another file.
  • External entities contain either text or binary
    data.
  • Example
  • lt!ENTITY boilerplate SYSTEM "/standard/legalnotic
    e.xml"gt
  • lt!ENTITY ATIlogo SYSTEM "/standard/logo.gif"
    NDATA GIF87Agt
  • The textual content of the external file is
    inserted at the point of
  • reference and parsed as part of the referring
    document.
  • Binary data is not parsed and may only be
    referenced in an attribute.

22
Parameter Entities
  • Parameter entities can only occur in the document
    type declaration.
  • Parameter entity references are immediately
    expanded in the
  • Document type declaration and their replacement
    text is part of the
  • declaration.
  • Example
  • lt!ENTITY personcontent "PCDATA quote"gt
  • lt!ELEMENT burns (personcontent)gt
  • lt!ELEMENT allen (personcontent)gt

23
Notation Declaration
  • Notation declarations identify specific types of
    external binary data. This
  • information is passed to the processing
    application, which may make
  • whatever use of it it wishes.
  • Example
  • lt!NOTATION GIF87A SYSTEM "GIF"gt

24
Including a DTD
  • The DTD must be the first thing in the document
    after the optional
  • processing instructions and comments.
  • The DTD identifies the root element of the
    document and may contain
  • additional declarations.
  • Example
  • lt?XML version"1.0" standalone"no"?gt
  • lt!DOCTYPE chapter SYSTEM "dbook.dtd
  • lt!ENTITY ulink.module "IGNORE"gt
  • lt!ELEMENT ulink (PCDATA)gt
  • lt!ATTLIST ulink    
  • xmllink CDATA  FIXED "SIMPLE"    
  • xml-attributes CDATA  FIXED "HREF URL"    
  • URL            CDATA  REQUIREDgt gt
  • ltchaptergt...lt/chaptergt

25
Well-formed Documents
  • A document can only be well-formed if it obeys
    the syntax of XML.

The document must meet all of the following
conditions
  • The document instance must conform to the grammar
    of XML documents.
  • The replacement text for all parameter entities
    referenced inside a markup declaration consists
    of zero or more complete markup declarations.
  • No attribute may appear more than once on the
    same start-tag.
  • String attribute values cannot contain references
    to external entities.
  • Non-empty tags must be properly nested.
  • Parameter entities must be declared before they
    are used.
  • All entities except the following amp, lt, gt,
    apos, and quot must be declared.
  • A binary entity cannot be referenced in the flow
    of content, it can only be used in an attribute
    declared as ENTITY or ENTITIES.
  • Neither text nor parameter entities are allowed
    to be recursive, directly or indirectly.

26
Valid Documents
  • A well-formed document is valid only if it
    contains a proper document type
  • declaration and if the document obeys the
    constraints of that declaration
  • (element sequence and nesting is valid, required
    attributes are provided,
  • attribute values are of the correct type, etc.).
    The XML specification
  • identifies all of the criteria in detail.

27
The XML family
  • Set of modules that offer useful services to
    accomplish important and
  • frequently demanded tasks.
  • The XML family consists of
  • XLink
  • XPointer
  • XSchemas
  • CSS XSL XSLT
  • DOM etc.

28
XLink
  • Some of the highlights of XLink are
  • XLink gives you control over the semantics of
    the link
  • XLink introduces Extended Links. Extended Links
    can involve more than two resources.
  • There are two types of links
  • Simple Links
  • Extended Links

29
Simple Links
  • A Simple Link strongly resembles an HTML ltAgt
    link
  • Example
  • ltlink xmllink"simple" href"locator"gtLink
    Textlt/linkgt
  • A Simple Link identifies a link between two
    resources, one of which is the
  • content of the linking element itself. This is an
    in-line link.
  • The locator identifies the other resource. The
    locator may be a URL, a
  • query, or an Extended Pointer.

30
Extended Links
  • Extended Links allow you to express relationships
    between more than two
  • resources
  • Example
  • ltelink xmllink"extended" role"annotation"gt
  • ltlocator xmllink"locator" href"text.loc"gt
  • The Textlt/locatorgt
  • ltlocator xmllink"locator" href"annot1.loc"gt
  • Annotations lt/locatorgt
  • ltlocator xmllink"locator" href"annot2.loc"gt
  • More Annotationslt/locatorgt
  • ltlocator xmllink"locator" href"litcrit.loc"gt
  • Literary Criticismlt/locatorgt
  • lt/elinkgt

31
XPointer
  • XPointers offer a syntax that allows you to
    locate a resource by traversing
  • the element tree of the document containing the
    resource.
  • Example
  • child(2,oldjoke).(3,.)
  • locates the third child (whatever it may be) of
    the second oldjoke in the
  • document.
  • XPointers can span regions of the tree.
  • Example
  • span(child(2,oldjoke),child(3,oldjoke))
  • selects the second and third oldjoke s in the
    document.

32
XSchemas
  • XML Schemas help developers to precisely define
    the
  • structures of their own XML-based formats.

33
CSS XSL XSLT
  • CSS the style sheet language, is applicable to
    XML as it is to HTML.
  • XSL is the advanced language for expressing style
    sheets. It is based on
  • XSLT a transformation language used for
    rearranging, adding and deleting
  • tags and attributes.

34
Conclusion
  • XML isn't always the best solution, but it is
    always
  • worth considering.
Write a Comment
User Comments (0)
About PowerShow.com