Extensible Markup Language XML - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Extensible Markup Language XML

Description:

Similarly, an XML element might be tagged as name, gender, birth date, salary, price, ... Tagged elements may be nested to any depth to provide structured data, ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 58
Provided by: asuman9
Category:

less

Transcript and Presenter's Notes

Title: Extensible Markup Language XML


1
Extensible Markup Language (XML)
2
XML
  • Extensible Markup Language has become the
    universal standard for representing data
  • XML started out as a standard data exchange
    format for the Web
  • Yet, it has quickly become the fundamental
    instrument in the development of Web-based online
    information services and electronic commerce
    applications
  • Almost all recent electronic commerce standards
    are based on XML

3
XML
  • A subset of SGML (Standard Generalized Markup
    Language) it is defined by World Wide Web
    Consortium (http//www.w3.org)
  • It is a fee-free open standard.
  • HTML enables a universal method of displaying
    data XML provides a universal method of
    describing data
  • Provides the ability to describe data in an open
    text-based format and deliver it using standard
    http protocol

4
XML
  • At present, many applications on the Web use XML
    for hosting large amounts of structured and
    semi-structured data
  • Representation of information in XML documents
    has been increasing at an astonishing pace
  • According to Meta Group, by 2003, about 65 of
    corporate data will be stored in an XML format

5
XML The Unifying Technology
XML Messaging
Internet
6
Maturity of Web Infrastructure
Technology
Standard
Innovation
Browse the Web
Program the Web
7
XML helps address the challenge
  • The data is self-describing
  • e.g. the meaning of the data is included
    identifiers surround every bit of data,
    indicating what it means
  • Far more flexible method of representing
    transmitted information
  • e.g. batched orders sent together can have
    different fields and format without breaking apps
    on each end
  • Open, standard technologies for moving,
    processing and validating the data
  • e.g. the XML parser can automatically parse,
    validate, and feed the information to an
    application, instead of every application having
    to include this functionality

8
XML An Example
Data stream in a typical interface
Electronic Commerce, 100, Turban, 25,
Addison-Wesley
Same data stream in XML
Electronic Commerce UANTITY 100 Turban
25 Addison-WesleyPUBLISHER
9
Markup (or Tagging)
  • XML uses textual markups to define data
  • An XML document is comprised of a collection of
    tagged elements each containing a start tag
    (), an end tag (), and the
    content between the two tags
  • Example
  • 1234ABCD

10
Tagging Data in XML
  • 1234ABCD
  • Considering the content only, it is not possible
    to understand what 1234ABCD stands for
  • The tag name PONumber intuitively tells that the
    content is a purchase order number
  • Similarly, an XML element might be tagged as
    name, gender, birth date, salary, price,
  • XML is extensible in the sense that users can
    create their own vocabularies, the tag names are
    neither predefined nor limited

11
Adding Structure to data
  • Tagged elements may be nested to any depth to
    provide structured data, or may be repeated to
    represent a list of values
  • A valid XML document contains a single root
    element, which constitutes the top-level of
    nesting
  • In other words, a valid XML document represents a
    tree of elements

12
Giving Meaning and Structure to Data
Start Tag
Start Tag
  • 1234ABCD
  • 20030601



  • 16

  • 95

An Element
Another Element
An Attribute
Data
End Tag
13
Giving Structure to Data
PurchaseOrderRequest
PurchaseOrderDate
LineItem
PONumber
ItemEAN_Identification
QuantityOrdered
UnitPrice
14
Well-formed and valid XML documents
  • There are two levels of correctness of an XML
    document
  • Well-formed. A well-formed document conforms to
    all of XML's syntax rules. For example, if an
    element has an opening tag with no closing tag
    and is not self-closing, it is not well-formed.
  • Valid. A valid document additionally conforms to
    some semantic rules. These rules are either
    user-defined, or included as an XML schema or
    DTD.

15
Well-formed documents XML syntax
  • The only indispensable syntactical requirement is
    that the document has exactly one root element
    (alternatively called the document element).
  • The root element can be preceded by an optional
    XML declaration.
  • version of XML
  • character encoding and external dependencies.
  • The specification requires that processors of XML
    support the pan-Unicode character encodings UTF-8
    and UTF-16

16
Well-formed documents XML syntax
  • XML comments start with .
  • The text enclosed by the root tags may contain an
    arbitrary number of XML elements. The basic
    syntax for one element is
  • content
  • Here, content is some text which may again
    contain XML elements.

17
Another example
18
Well-formed documents XML syntax
  • Attribute values must always be quoted, using
    single or double quotes ( OR )
  • Each attribute name should appear only once in
    any element.
  • Proper nesting elements may never overlap
  • Normal
    emphasized strong
    emphasized strong

  • Empty element tag, it has three equivalent
    forms
  • author"John" genre"science-fiction"
    date"2009-Jan-01" /

19
Entity references
  • An entity in XML is a named body of data, usually
    text, such as an unusual character.
  • An entity reference is a placeholder that
    represents that entity
  • It consists of the entity's name preceded by an
    ampersand ("") and followed by a semicolon
    ("").
  • XML has five predeclared entities
  • amp ampersand
  • lt
  • gt greater than
  • apos apostrophe
  • quot quotation mark
  • More entities are declared in the document's
    (DTD). (will see)

20
Well-formed documents
  • The document complies with its declared character
    encoding.
  • The encoding may be declared either externally
    ("Content-Type" header of HTTP) or internally.
  • Element names are case-sensitive.
  • ...
  • Choosing meaningful names implies the semantics
    of elements and attributes to a human reader

21
Valid documents XML semantics
  • By leaving the names, allowable hierarchy, and
    meanings of the elements and attributes open and
    definable by a customizable schema or DTD, XML
    provides a syntactic foundation for the creation
    of purpose specific, XML-based markup languages.
  • The schema merely supplements the syntax rules
    with a set of constraints.
  • Schemas typically restrict element and attribute
    names and their allowable containment hierarchies

  • Such as, an element named 'birthday' contains 3
    elements year, 'month' and 'day. Each is only
    character data.

22
Valid documents XML semantics
  • An XML document that complies with a particular
    schema/DTD, in addition to being well-formed, is
    said to be valid.
  • An XML schema expressed in terms of constraints
    on the structure and content of documents
  • Before SGML and XML, software designers had to
    define special file formats and special-purpose
    parsers and writers.
  • XML's regular structure and strict parsing rules
    allow software designers to leave parsing to
    standard tools
  • Well-tested tools exist to validate an XML
    document "against" a schema

23
Document Type Definition (DTD)
  • The principle purpose of the DTD is to declare
    the hierarchy of document elements
  • A document type definition defines
  • The name of the elements,
  • The content model of each element,
  • How often and in which order elements may
    appear,
  • If the end-tags can be shortcut,
  • The possible presence of attributes and their
    default values,
  • The names of the entities

24
An Example DTD
  • PurchaseOrderDate, LineItem)
  • QuantityOrdered, UnitPrice)
  • other elements are skipped -- ...

25
DTDs
  • A DTD specifies the structure of an XML element
    by specifying the names of its sub-elements and
    attributes
  • Sub-element structure is specified using the
    operators
  • set with zero or more elements
  • set with one or more elements
  • ? optional
  • or
  • All values are assumed to be string values,
    unless the type is ANY in which case the value
    can be an arbitrary XML fragment

26
DTDs
  • There is a special attribute id which can occur
    once for each element
  • EMPTY- the element has no content
  • Empty elements usually have attributes that give
    them useful properties
  • There is no concept of a root of a document an
    XML document conforming to a DTD can be rooted at
    any element specified in the DTD

27
Element Identity, Ids, and ID References
  • To support element sharing, XML reserves an
    attribute of type ID, which allows a unique key
    to be associated with an element
  • An attribute of type IDREF allows an element to
    refer to another element with the designated key
    and IDREFS may refer to multiple elements
  • John
    Smith
  • ...
  • ....
  • 1995

28
Entities
  • Entities represent the physical structure of an
    XML document
  • Two types of entities
  • General entities apply within the top level
    element and in attribute values
  • Parameter entities apply within the internal and
    external DTD subsets
  • Entity reference in a document
  • This contract is between receipent
    and contractor and the award is
    payment.
  • Entity reference expanded
  • This contract is between METU and EC
    and the award is 1 EURO.
  • By changing the entity declarations you can
    create any contract.

29
General Entities
  • General entity declaration
  • Entity reference in a document
  • The xml is derived from ISO 8879, an
    International Standard. labelxml/
  • Entity reference expanded
  • The Extensible Markup Language is derived
    from ISO 8879, an International Standard.
    Language/

30
Parameter Entities
  • is for use only in DTDs
  • Parameter entities carry information for use in
    the markup declaration, often a set of common
    attributes shared by several elements or a link
    to an outside DTD.
  • Parameter entities whose references are purely
    within DTD are known as internal entities,
    whereas references that draw information from
    outside files are external entities
  • Parameter entities use a sign both in their
    declaration and in their references to
    distinguish themselves from general entities

31
Parameter Entities
  • Parameter entity declaration
  • Parameter entity reference in DTD
  • Parameter entity reference expanded

32
DTDs
  • The oldest schema format for XML
  • Disadvantages
  • It has no support for newer features of XML, most
    importantly namespaces.
  • It lacks expressiveness. Certain formal aspects
    of an XML document cannot be captured in a DTD.
  • It uses a custom non-XML syntax, inherited from
    SGML, to describe the schema.
  • Still used in many applications because it is
    considered the easiest to read and write.

33
Valid documents XML semantics
  • Other schema languages
  • XML Schema (XSD) (will see)
  • RELAX NG (specified by OASIS, now an ISO standard
    as part of DSDL)
  • ISO DSDL (Document Schema Description Languages)

  • Schematron

34
XML Namespaces
  • Namespaces are a simple and straightforward way
    to distinguish names used in XML documents, no
    matter where they come from
  • The only reason namespaces exist, is to give
    elements and attributes programmer-friendly names
    that will be unique across the whole Internet

35
Example
  • xmlnsh"http//www.w3.org/HTML/1998/html4"
  • Book Review
  • XML A
    Primer
  • AuthorPrice
  • PagesDate
  • Simon St. Laurenthtd
  • 31.98
  • 352
  • 1998/01

36
XML Namespaces
  • The prefixes are linked to the full names using
    the attributes on the top element whose names
    begin xmlns
  • The prefixes are just shorthand placeholders for
    the full names
  • Those full names are URIs, i.e. Web addresses

37
Extensibility in XML
  • Anyone can invent new tags and attach a meaning
    to those tags
  • But if every user creates its own XML definition
    for describing his data, it is not possible to
    achieve interoperability
  • For example, one may prefer to use the tag name
    POR, while another prefers using the tag name
    PurchaseOrderReq
  • In other words, a tagged document is not very
    useful without some kind of agreement on the tags
    among inter-operating applications

38
Extensibility in XML
  • Anyone can invent new tags and attach a meaning
    to those tags
  • For example
  • This device
  • This device
  • But if every user creates its own XML definition
    for describing his data, it is not possible to
    achieve interoperability

39
Agreement on tags is necessary
  • In other words, a tagged document is not very
    useful without some kind of agreement on the tags
    among inter-operating applications

Mobile Device
Hand Held Device
40
Many Efforts for Standardized Tags
  • HL7 for healthcare
  • RosettaNet for supply chain integration in
    Information Technology and Electronic Components
    domain
  • GS1 again in supply chain
  • ebXML for eBusiness
  • Common Business Library (CBL) for electronic
    catalogs, purchase orders, etc.

41
XML Parsers
  • A parser takes an XML document and makes its
    structure and content available to an application
    through an API
  • There are two main Application Programming
    Interfaces (APIs) for writing parsers
  • Document Object Model (DOM) and
  • Simple API for XML (SAX)
  • Today, many parsers are both DOM and SAX compliant

42
XML DOM Parser
A parser validates and makes the data
contained in an XML document available
to the application
43
XSLT Processor
  • Converts an XML document to another form
  • An XSL style sheet is a set of transformation
    instructions for converting a source XML document
    to a target document

44
(No Transcript)
45
Why XML?
46
XML vs EDI
47
XML vs EDI
48
XML vs EDI
49
Critique of XML Advantages
  • It is text-based.
  • It supports Unicode, allowing almost any
    information in any written human language to be
    communicated.
  • It can represent the most general computer
    science data structures records, lists and
    trees.
  • Its self-documenting format describes structure
    and field names as well as specific values.
  • The strict syntax and parsing requirements make
    the necessary parsing algorithms extremely
    simple, efficient, and consistent.
  • XML is heavily used as a format for document
    storage and processing, both online and offline.

  • It is based on international standards.

50
Critique of XML Advantages
  • It allows validation using schema languages such
    as XSD and Schematron, which makes effective
    unit-testing, firewalls, acceptance testing,
    contractual specification and software
    construction easier.
  • The hierarchical structure is suitable for most
    (but not all) types of documents.
  • It manifests as plain text files, which are less
    restrictive than other proprietary document
    formats.
  • It is platform-independent
  • Forward and backward compatibility are relatively
    easy to maintain
  • Its predecessor, SGML, has been in use since
    1986, so there is extensive experience and
    software available.
  • An element fragment of a well-formed XML document
    is also a well-formed XML document.

51
Critique of XML Disadvantages
  • XML syntax is redundant or large relative to
    binary representations of similar data.
  • The redundancy may affect application efficiency
    through higher storage, transmission and
    processing costs.
  • XML syntax is verbose relative to other
    alternative 'text-based' data transmission
    formats.
  • No intrinsic data type support XML provides no
    specific notion of "integer", "string",
    "boolean", "date", and so on

52
Critique of XML Disadvantages
  • The hierarchical model for representation is
    limited in comparison to the relational model or
    an object oriented graph.
  • Expressing overlapping (non-hierarchical) node
    relationships requires extra effort.
  • XML namespaces are problematic to use and
    namespace support can be difficult to correctly
    implement in an XML parser.
  • XML is commonly depicted as "self-documenting"
    but this depiction ignores critical ambiguities.

53
Some well-known XML based languages and
applications
  • RSS Rich Site Summary
  • Ajax
  • SOAP Simple Object Access Protocol
  • WSDL Web Services Description Language
  • SVG Scalable Vector Graphics
  • Regarding Office Apps OASIS, OpenOffice,
    Microsoft Office
  • HL7 Clinical Document Architecture (CDA)
  • ...

54
HL7 Clinical Document Architecture (CDA)
  • A specification for document exchange using
  • XML,
  • the HL7 Reference Information Model (RIM)
  • Version 3 methodology
  • and vocabulary (SNOMED, ICD, local,)
  • CDA Header
  • Metadata required for document discovery,
    management, retrieval
  • CDA Body
  • Clinical report
  • Discharge Summary
  • Referral

55
Clinical Document Architecture
56
HL7 CDA
  • Level One
  • The unconstraint CDA Specification
  • Only the header is well structured
  • Level Two
  • Section Level Templates are applied with coded
    terms
  • Level Three
  • Entry Level Templates are applied
  • Machine Processable!

57
HL7 CDA Example
Write a Comment
User Comments (0)
About PowerShow.com