XML Vocabularies: Opportunities for Efficiency and Reliability - PowerPoint PPT Presentation

About This Presentation
Title:

XML Vocabularies: Opportunities for Efficiency and Reliability

Description:

... names of an XML 'namespace' must inherit from it expectations as to the meaning ... Meanings of names may change with context! ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 36
Provided by: victoria6
Learn more at: https://www.hytime.org
Category:

less

Transcript and Presenter's Notes

Title: XML Vocabularies: Opportunities for Efficiency and Reliability


1
XML VocabulariesOpportunities for Efficiency
and Reliability
  • Steven R. Newcomb
  • srn_at_techno.com
  • TechnoTeacher and ISOGEN Intl Corp.

2
A Markup Vocabulary is a list of names
  • Minimally, XML parsing yields elements with named
    types (tag names).
  • The list of these named element types (tag names)
    is the vocabulary of the document. (The names
    of their attributes are also part of the
    vocabulary.)

3
Is information in XML interchangeable?
UDLINGBLON 29ow GUINEA FOWL 2000 3
9 (Vocabulary hperm, thone,
kallow, spec, date)
4
XML Namespaces
  • XML Namespaces are vocabularies. The XML
    Namespace recommendation is a step on the road
    toward interoperability for XML messages.
  • A namespace amounts to an abstract place
    where there is a list of element type names (tag
    names) and/or attribute names.
  • URIs specify the namespaces in use.
  • There is no requirement that the specified URI is
    valid, much less that the indicated resource
    conforms to any sort of specification.
  • XML Namespaces provide a way for names to be
    guaranteed to be unique, and thats all.

5
XML Namespaces and expectations
  • In some sense, an XML resource that uses the
    names of an XML namespace must inherit from it
    expectations as to the meaning and conventional
    use of each of the names.
  • (Right? Otherwise, why use it at all?)

6
Do namespaces help with interchange?
thoneUDLINGBLON 29low GUINEA FOWL 2000
3 9 (Vocabulary in the HP
namespace hperm, thone, kallow, spec, date)
7
XML Namespaces Unresolved issues
  • How to express a namespace so it can be shared?
    What is the list of names?
  • How to write processing software for a namespace?
  • How to determine whether software for a namespace
    works according to the expectations of other
    users of the namespace?
  • How to determine whether an XML resource conforms
    to the syntactic and semantic requirements of the
    namespace?
  • How to determine, when information interchange
    fails, whose software is at fault? (The software
    that created the XML resource, or the recipient
    application?)
  • Accountability is vital to interchange.

8
XML Vocabularies in open environments
  • Ideally, an XML resource is self-describing.
    Since many XML resources use the same
    vocabularies, its efficient to describe them in
    terms of the vocabularies they use.
  • Anybody who receives a well-described XML
    resource should be able to interpret it
    accurately.
  • Anybody should be able to create an XML resource
    that uses a vocabulary correctly, so that its
    recipient will interpret it accurately.
  • Vocabularies should be able to support entire
    industries and areas of human endeavor, in open,
    multivendor environments.
  • Vocabularies should offer huge advantages in
    efficiency and reliability.

9
XML Vocabularies in closed environments
  • Closed syndicates and would-be cartels need to
    resolve the same issues, so that their XML
    messages will interoperate.
  • Its extremely inefficient for each syndicate to
    invent the methodologies and tools for
    guaranteeing reliable vocabulary-based
    interoperability.
  • Its also a net contraction in the noosphere of
    the syndicate. Where to find technical
    expertise? How to maintain it? Etc.
  • Enlightened self-interest demands that the same
    methodologies and tools that support open
    interoperability be used internally.
  • Vocabularies should offer huge advantages in
    efficiency and reliability.

10
Methodologies and Tools for Vocabularies
  • Vocabularies can be used to make XML resources
    fully self-describing, fully interchangeable and
    fully interoperable, down to the last syntactic
    and semantic feature.
  • This can be accomplished using existing W3C and
    ISO recommendations and standards, all from the
    XML and SGML families of recommendations and
    standards.
  • Alternatively, the same principles could be
    applied using different modeling syntaxes,
    purpose-built for the Web.
  • but if it can be done without reinventing
    everything, why bother?

11
Processing of XML resources 2 stages
  • The first stage of vocabulary processing can be
    accomplished by a single generic piece of
    software, the XML parser.
  • XML parsers dont do much vocabulary processing
    yet.
  • First stage vocabulary syntax processing and
    validation
  • Check for conformance of the XML resource to each
    of the vocabularies it uses, to see whether
    invalid names were used.
  • Check for conformance to the structural model
    (DTD?) of each vocabulary used.
  • Is each name used in a valid context with respect
    to the other names in the same namespace?
    Meanings of names may change with context!
  • Check for conformance of data and attribute
    values to lexical models of valid data of each
    element/attribute in each vocabulary.

12
Processing of XML resources 2 stages
  • Second stage of vocabulary processing is semantic
    interpretation of the vocabularies.
  • Since all vocabularies are different, according
    to the natures of their applications, no generic
    piece of software can interpret all vocabularies.
  • However, a paradigm in which vocabulary-specific
    processing need never include code which is
    duplicated in software that processes any other
    vocabulary could offer significant efficiencies
    and enhanced reliability.
  • More on this in a moment.

13
More efficiency/reliability in Stage 1
  • (Reminder Stage 1 is vocabulary syntax
    processing and validation.)
  • Provide a formalism for the expression of
    vocabularies the list of names, the contexts in
    which names can be used, and lexical models for
    the data contained in elements and in the value
    of attributes named in vocabularies.
  • The existing DTD formalism can already do most of
    this.
  • Lets not force applications to duplicate the
    functionality of checking the validity of
    vocabulary usage in XML resources. Lets build it
    into re-usable validating XML parsers.
  • They already validate against DTDs. Why not use
    that existing functionality for inherited
    vocabularies?

14
SX already validates inherited vocabularies.
  • There is an ISO standard for declaring, in an XML
    resource, conformance to one or more inheritable
    XML vocabularies. (In the ISO context, such a
    vocabulary is called an inheritable information
    architecture.)
  • Vocabularies can inherit from other vocabularies.
  • A single XML resource can inherit from more than
    one vocabulary.
  • Vocabularies are expressed using ordinary DTD
    syntax (with minor, optional enhancements).
  • Demonstration using the Topic Map inheritable
    vocabulary.

15
How to document vocabularies?
  • It would be great to be able to document
    vocabularies more effectively than we can now.

16
Which constructs are the comments about?
date) UDLINGBLON
29 GUINEA FOWL 2000 3
9
17
Documenting vocabularies
  • Topic maps are an extremely powerful way of
    documenting DTDs.
  • ...but thats another story for another time.

18
More efficiency/reliability in Stage 2
  • Reminder Stage 2 is application-specific
    (i.e., vocabulary-specific) processing of XML
    resources, after parsing and other processing
    common to all XML resources has already been
    done.
  • Stage 2 is about resource interoperability, not
    just about interchangeability. Its about how we
    can guarantee that everyone understands the
    resource in the same way.
  • Its about the meaning of each name in a
    vocabulary .
  • Its about the meaning of the data associated
    with each vocabulary name in each resource that
    uses the vocabulary.
  • Its about expectations the resource creators
    expectations about what will be understood by
    recipients of the resource, and the recipients
    expectations about the kinds of things that a
    resource that uses a certain vocabulary can say.

19
More efficiency/reliability in Stage 2
  • No generic processor can understand all
    vocabularies. In general, a special processor is
    needed for each vocabulary.
  • Still, there are huge opportunities, even in
    Stage 2, for efficiency and reliability
  • There can be a common way to express
    vocabulary-specific semantics.
  • At least some of these expressions can be formal
    and machine-readable, so tools can be built that
    enhance the productivity of application builders.
  • Many XML resources can inherit multiple
    vocabularies, thus recycling existing knowledge
    about vocabularies, and avoiding redundant
    learning cycles. (Example XLL combined with
    Biztalk.)
  • A re-usable software engine can be built for
    each vocabulary, and means for plugging such
    engines into applications can be developed.
    (Same example applies.)

20
Modeling is the key
  • In Stage 1 of XML resource processing, models of
    the structural and lexical requirements
    associated with each vocabulary can drive a
    generic parsing/validating process.
  • In Stage 2 of XML resource processing, models of
    the abstract information sets that can be
    conveyed by specific vocabularies can be created.
  • These abstract APIs give names to each of the
    properties of the information set that emerges
    from processing a vocabulary.
  • Abstract API models are contracts between
    programmers, just as a DTD is a contract between
    information users and providers.
  • In an actual implementation of a vocabulary
    processing engine, these property names can
    become function calls (or whatever).
  • In other words, these abstract information set
    models can drive a generic engine-building
    process that produces vocabulary-specific
    engines.

21
Bi-directional transformation
  • All XML resources convey information that really
    has two forms
  • The interchangeable (but otherwise useless), XML
    form, and
  • The parsed, processed, application-internal form.
  • Stages 1 and 2 are about the conversion from
    the interchange form to the useful form. The
    other transformation -- from the useful form to
    the interchange form -- is at least equally
    important.
  • For reliable, efficient information interchange,
    the nature of both transformations must be
    documented.
  • It would be great if the URI of the vocabularys
    namespace pointed at a document that had both
    models, and explained the algorithms involved in
    transforming information between them.

22
A common fallacy DTD is API
  • The fallacy is the structure of an XML resource
    should also be the API to the information it
    contains.
  • Trying to make the element structure also be the
    API makes it impossible to have both a good
    interchange structure and a good API. The
    attempt introduces inefficiency and invites
    unreliability of information interchange.
  • The Document Object Model (DOM) is an API to the
    generic structure of XML resources. It is not
    and can never be the API to the information sets
    conveyed by all vocabularies.
  • If, e.g., the XLL vocabularys functionality gets
    built into the DOM, what vocabularys
    functionality shouldnt be built into the DOM?
  • No committee can possibly do all this work!

23
Desirable qualities in an interchange syntax
  • Maximal appropriateness to the information it
    conveys
  • intrinsic character of information well reflected
    in interchange structure.
  • Communications efficiency
  • no redundancy
  • Validatability
  • no ambiguity
  • Neutrality
  • no hidden assumptions about platform, vendor or
    application
  • Self-description
  • conformance to intelligible, well-documented
    formal model

24
Interchange syntax model is a contract
  • DTD is a contract between
  • information creators
  • information consumers
  • applications developers
  • DTD enhanced with type checking, lexical typing,
    etc., is a more detailed contract between the
    same players

25
Desirable qualities in an Abstract API
  • Maximal convenience for applications developers
  • Abstract API is intuitive for learning and use
  • Abstract APIs often need redundant access
    methods, for the convenience of programmers
  • Processing tasks common to all applications
    (beyond parsing and validation) are supported by
    the implementation of the abstract API.
  • Abstract API should include both
  • Properties directly derivable from syntactic
    structure of interchange form.
  • Properties implicit in architecture but not
    reflected in syntactic structures.
  • Neutrality
  • no hidden assumptions about platform, vendor or
    application.
  • Self-description
  • API is intelligible, well-documented

26
Abstract API model is a contract, too
  • ...between programmers of applications that, with
    respect to a given vocabulary
  • Create XML resources.
  • Receive XML resources and use the information
    they convey.
  • Support the creation of XML resources that link
    to the emergent properties of other resources.
  • Support the querying of XML resources with
    respect to the values of specific emergent
    properties.

27
Two sides of one coin
  • The interchange syntax model and the abstract API
    are two aspects of the same information set
  • Syntax model consensus about the interchange
    format of the information set
  • Abstract API consensus about the abstract
    properties of the information set

28
XML needs
  • Enhanced syntactic modeling capabilities for
    generic XML processing/validation.
  • Especially Means for inheriting multiple
    vocabularies in XML instances, and for proving
    that they are all used correctly.
  • Note lexical modeling features, and many other
    syntactic enhancements can be made to XML by
    means of vocabularies.
  • Semantic modeling capabilities that allow us to
    give names to the emergent properties of XML
    resources that use vocabularies.
  • A convention, such as that which exists for XML
    Namespaces today, for pointing to these models
    from within XML resources, so as to indicate the
    use of a given vocabulary.

29
Semantic modeling emergent properties
  • Example of an emergent property The property
    of being a target of an xlink (considering XLL as
    a vocabulary, as it is in ISO-land).
  • All emergent properties of a vocabulary must be
    described clearly, comprehensively,
    unambiguously, and formally, because
  • accuracy and reliability are important.
  • the information is expected to be useful in
    multi-vendor application environments (if not,
    why inherit a vocabulary at all?).
  • implementation of vocabulary-specific
    applications must be done at reasonable cost.

30
Semantic validation becomes a side-effect
  • Computing an emergent property value often isnt
    possible without validating the interchanged
    information on which the computation is based.
  • For example, if an element that inherits from a
    vocabulary specifies a "start-time" attribute and
    an "end-time" attribute, we may intend that the
    duration of time between the start-time and the
    end-time be calculable and that it fall within a
    certain range (or at least be non-negative). In
    any case, we cant calculate the value of the
    duration property unless the start-time and
    end-time values exist and are amenable to
    calculation.

31
A standard property language exists
  • Its called "Property Sets
  • A property set is an XML document that conforms
    to the ISO standard DTD for property sets.
  • Already in commercial use the software already
    works with XML.
  • Every class of information component (node),
    and every property of every class, has a unique
    name.
  • These names can be used in queries.
  • This whole idea is often called "the Grove
    Paradigm. Its the basis of SGML processing,
    and the SGML Property Set aided the development
    of the DOM.

32
In the Grove Paradigm...
  • Vocabulary-specific engines can be plugged
    together in applications that support XML
    resources that use multiple vocabularies.
  • Vocabulary-specific engines generate a "grove"
    (object graph with relevant Property Set as
    schema) from any vocabulary-conforming XML
    instance.
  • Vocabulary-specific engines can mature and offer
    reliable semantic validation and processing
    services in a variety of application contexts,
    instead of being rebuilt in each application.
  • Time and cost of developing applications is
    reduced, while reliability of information
    interchange increases.

33
The Grove Paradigm is Portable
  • The Grove Paradigm is highly portable it can be
    used with any notation, not just XML and SGML.
  • Property sets can be used as a way to represent
    consensus about how to address the abstract
    properties of any notation.
  • Think about it a vocabulary is a notation. (And
    XML is a notation for vocabulary-notations.)
  • Lets look at some groves! (GroveMinder demo.)

34
Summary Designing XML Vocabularies
  • Questions to ask
  • Must certain semantic processing and validation
    operations be performed by all applications of
    this vocabulary?
  • Will more than one application have to deal with
    this vocabulary?
  • If so, its syntactic requirements deserve to be
    made explicit in a DTD (or something like a DTD),
    and
  • A property set (or other explicit Abstract API)
    defined for it will pay big dividends
  • in software reuse
  • in achieving widespread consensus about what the
    vocabulary really means
  • in determining what went wrong when
    vocabulary-mediated information interchange fails

35
The preceding SX and GroveMinder demos are
available fromSteve Newcomb
  • srn_at_techno.com
Write a Comment
User Comments (0)
About PowerShow.com