Title: XML Vocabularies: Opportunities for Efficiency and Reliability
1XML VocabulariesOpportunities for Efficiency
and Reliability
- Steven R. Newcomb
- srn_at_techno.com
- TechnoTeacher and ISOGEN Intl Corp.
2A Markup Vocabulary is a list of names
- Minimally, XML parsing yields elements with named
types (tag names). - The list of these named element types (tag names)
is the vocabulary of the document. (The names
of their attributes are also part of the
vocabulary.)
3Is information in XML interchangeable?
UDLINGBLON 29ow GUINEA FOWL 2000 3
9 (Vocabulary hperm, thone,
kallow, spec, date)
4XML Namespaces
- XML Namespaces are vocabularies. The XML
Namespace recommendation is a step on the road
toward interoperability for XML messages. - A namespace amounts to an abstract place
where there is a list of element type names (tag
names) and/or attribute names. - URIs specify the namespaces in use.
- There is no requirement that the specified URI is
valid, much less that the indicated resource
conforms to any sort of specification. - XML Namespaces provide a way for names to be
guaranteed to be unique, and thats all.
5XML Namespaces and expectations
- In some sense, an XML resource that uses the
names of an XML namespace must inherit from it
expectations as to the meaning and conventional
use of each of the names. - (Right? Otherwise, why use it at all?)
6Do namespaces help with interchange?
thoneUDLINGBLON 29low GUINEA FOWL 2000
3 9 (Vocabulary in the HP
namespace hperm, thone, kallow, spec, date)
7XML Namespaces Unresolved issues
- How to express a namespace so it can be shared?
What is the list of names? - How to write processing software for a namespace?
- How to determine whether software for a namespace
works according to the expectations of other
users of the namespace? - How to determine whether an XML resource conforms
to the syntactic and semantic requirements of the
namespace? - How to determine, when information interchange
fails, whose software is at fault? (The software
that created the XML resource, or the recipient
application?) - Accountability is vital to interchange.
8XML Vocabularies in open environments
- Ideally, an XML resource is self-describing.
Since many XML resources use the same
vocabularies, its efficient to describe them in
terms of the vocabularies they use. - Anybody who receives a well-described XML
resource should be able to interpret it
accurately. - Anybody should be able to create an XML resource
that uses a vocabulary correctly, so that its
recipient will interpret it accurately. - Vocabularies should be able to support entire
industries and areas of human endeavor, in open,
multivendor environments. - Vocabularies should offer huge advantages in
efficiency and reliability.
9XML Vocabularies in closed environments
- Closed syndicates and would-be cartels need to
resolve the same issues, so that their XML
messages will interoperate. - Its extremely inefficient for each syndicate to
invent the methodologies and tools for
guaranteeing reliable vocabulary-based
interoperability. - Its also a net contraction in the noosphere of
the syndicate. Where to find technical
expertise? How to maintain it? Etc. - Enlightened self-interest demands that the same
methodologies and tools that support open
interoperability be used internally. - Vocabularies should offer huge advantages in
efficiency and reliability.
10Methodologies and Tools for Vocabularies
- Vocabularies can be used to make XML resources
fully self-describing, fully interchangeable and
fully interoperable, down to the last syntactic
and semantic feature. - This can be accomplished using existing W3C and
ISO recommendations and standards, all from the
XML and SGML families of recommendations and
standards. - Alternatively, the same principles could be
applied using different modeling syntaxes,
purpose-built for the Web. - but if it can be done without reinventing
everything, why bother?
11Processing of XML resources 2 stages
- The first stage of vocabulary processing can be
accomplished by a single generic piece of
software, the XML parser. - XML parsers dont do much vocabulary processing
yet. - First stage vocabulary syntax processing and
validation - Check for conformance of the XML resource to each
of the vocabularies it uses, to see whether
invalid names were used. - Check for conformance to the structural model
(DTD?) of each vocabulary used. - Is each name used in a valid context with respect
to the other names in the same namespace?
Meanings of names may change with context! - Check for conformance of data and attribute
values to lexical models of valid data of each
element/attribute in each vocabulary.
12Processing of XML resources 2 stages
- Second stage of vocabulary processing is semantic
interpretation of the vocabularies. - Since all vocabularies are different, according
to the natures of their applications, no generic
piece of software can interpret all vocabularies. - However, a paradigm in which vocabulary-specific
processing need never include code which is
duplicated in software that processes any other
vocabulary could offer significant efficiencies
and enhanced reliability. - More on this in a moment.
13More efficiency/reliability in Stage 1
- (Reminder Stage 1 is vocabulary syntax
processing and validation.) - Provide a formalism for the expression of
vocabularies the list of names, the contexts in
which names can be used, and lexical models for
the data contained in elements and in the value
of attributes named in vocabularies. - The existing DTD formalism can already do most of
this. - Lets not force applications to duplicate the
functionality of checking the validity of
vocabulary usage in XML resources. Lets build it
into re-usable validating XML parsers. - They already validate against DTDs. Why not use
that existing functionality for inherited
vocabularies?
14SX already validates inherited vocabularies.
- There is an ISO standard for declaring, in an XML
resource, conformance to one or more inheritable
XML vocabularies. (In the ISO context, such a
vocabulary is called an inheritable information
architecture.) - Vocabularies can inherit from other vocabularies.
- A single XML resource can inherit from more than
one vocabulary. - Vocabularies are expressed using ordinary DTD
syntax (with minor, optional enhancements). - Demonstration using the Topic Map inheritable
vocabulary.
15How to document vocabularies?
- It would be great to be able to document
vocabularies more effectively than we can now.
16Which constructs are the comments about?
date) UDLINGBLON
29 GUINEA FOWL 2000 3
9
17Documenting vocabularies
- Topic maps are an extremely powerful way of
documenting DTDs. - ...but thats another story for another time.
18More efficiency/reliability in Stage 2
- Reminder Stage 2 is application-specific
(i.e., vocabulary-specific) processing of XML
resources, after parsing and other processing
common to all XML resources has already been
done. - Stage 2 is about resource interoperability, not
just about interchangeability. Its about how we
can guarantee that everyone understands the
resource in the same way. - Its about the meaning of each name in a
vocabulary . - Its about the meaning of the data associated
with each vocabulary name in each resource that
uses the vocabulary. - Its about expectations the resource creators
expectations about what will be understood by
recipients of the resource, and the recipients
expectations about the kinds of things that a
resource that uses a certain vocabulary can say.
19More efficiency/reliability in Stage 2
- No generic processor can understand all
vocabularies. In general, a special processor is
needed for each vocabulary. - Still, there are huge opportunities, even in
Stage 2, for efficiency and reliability - There can be a common way to express
vocabulary-specific semantics. - At least some of these expressions can be formal
and machine-readable, so tools can be built that
enhance the productivity of application builders. - Many XML resources can inherit multiple
vocabularies, thus recycling existing knowledge
about vocabularies, and avoiding redundant
learning cycles. (Example XLL combined with
Biztalk.) - A re-usable software engine can be built for
each vocabulary, and means for plugging such
engines into applications can be developed.
(Same example applies.)
20Modeling is the key
- In Stage 1 of XML resource processing, models of
the structural and lexical requirements
associated with each vocabulary can drive a
generic parsing/validating process. - In Stage 2 of XML resource processing, models of
the abstract information sets that can be
conveyed by specific vocabularies can be created.
- These abstract APIs give names to each of the
properties of the information set that emerges
from processing a vocabulary. - Abstract API models are contracts between
programmers, just as a DTD is a contract between
information users and providers. - In an actual implementation of a vocabulary
processing engine, these property names can
become function calls (or whatever). - In other words, these abstract information set
models can drive a generic engine-building
process that produces vocabulary-specific
engines.
21Bi-directional transformation
- All XML resources convey information that really
has two forms - The interchangeable (but otherwise useless), XML
form, and - The parsed, processed, application-internal form.
- Stages 1 and 2 are about the conversion from
the interchange form to the useful form. The
other transformation -- from the useful form to
the interchange form -- is at least equally
important. - For reliable, efficient information interchange,
the nature of both transformations must be
documented. - It would be great if the URI of the vocabularys
namespace pointed at a document that had both
models, and explained the algorithms involved in
transforming information between them.
22A common fallacy DTD is API
- The fallacy is the structure of an XML resource
should also be the API to the information it
contains. - Trying to make the element structure also be the
API makes it impossible to have both a good
interchange structure and a good API. The
attempt introduces inefficiency and invites
unreliability of information interchange. - The Document Object Model (DOM) is an API to the
generic structure of XML resources. It is not
and can never be the API to the information sets
conveyed by all vocabularies. - If, e.g., the XLL vocabularys functionality gets
built into the DOM, what vocabularys
functionality shouldnt be built into the DOM? - No committee can possibly do all this work!
23Desirable qualities in an interchange syntax
- Maximal appropriateness to the information it
conveys - intrinsic character of information well reflected
in interchange structure. - Communications efficiency
- no redundancy
- Validatability
- no ambiguity
- Neutrality
- no hidden assumptions about platform, vendor or
application - Self-description
- conformance to intelligible, well-documented
formal model
24Interchange syntax model is a contract
- DTD is a contract between
- information creators
- information consumers
- applications developers
- DTD enhanced with type checking, lexical typing,
etc., is a more detailed contract between the
same players
25Desirable qualities in an Abstract API
- Maximal convenience for applications developers
- Abstract API is intuitive for learning and use
- Abstract APIs often need redundant access
methods, for the convenience of programmers - Processing tasks common to all applications
(beyond parsing and validation) are supported by
the implementation of the abstract API. - Abstract API should include both
- Properties directly derivable from syntactic
structure of interchange form. - Properties implicit in architecture but not
reflected in syntactic structures. - Neutrality
- no hidden assumptions about platform, vendor or
application. - Self-description
- API is intelligible, well-documented
26Abstract API model is a contract, too
- ...between programmers of applications that, with
respect to a given vocabulary - Create XML resources.
- Receive XML resources and use the information
they convey. - Support the creation of XML resources that link
to the emergent properties of other resources. - Support the querying of XML resources with
respect to the values of specific emergent
properties.
27Two sides of one coin
- The interchange syntax model and the abstract API
are two aspects of the same information set - Syntax model consensus about the interchange
format of the information set - Abstract API consensus about the abstract
properties of the information set
28XML needs
- Enhanced syntactic modeling capabilities for
generic XML processing/validation. - Especially Means for inheriting multiple
vocabularies in XML instances, and for proving
that they are all used correctly. - Note lexical modeling features, and many other
syntactic enhancements can be made to XML by
means of vocabularies. - Semantic modeling capabilities that allow us to
give names to the emergent properties of XML
resources that use vocabularies. - A convention, such as that which exists for XML
Namespaces today, for pointing to these models
from within XML resources, so as to indicate the
use of a given vocabulary.
29Semantic modeling emergent properties
- Example of an emergent property The property
of being a target of an xlink (considering XLL as
a vocabulary, as it is in ISO-land). - All emergent properties of a vocabulary must be
described clearly, comprehensively,
unambiguously, and formally, because - accuracy and reliability are important.
- the information is expected to be useful in
multi-vendor application environments (if not,
why inherit a vocabulary at all?). - implementation of vocabulary-specific
applications must be done at reasonable cost.
30Semantic validation becomes a side-effect
- Computing an emergent property value often isnt
possible without validating the interchanged
information on which the computation is based. - For example, if an element that inherits from a
vocabulary specifies a "start-time" attribute and
an "end-time" attribute, we may intend that the
duration of time between the start-time and the
end-time be calculable and that it fall within a
certain range (or at least be non-negative). In
any case, we cant calculate the value of the
duration property unless the start-time and
end-time values exist and are amenable to
calculation.
31A standard property language exists
- Its called "Property Sets
- A property set is an XML document that conforms
to the ISO standard DTD for property sets. - Already in commercial use the software already
works with XML. - Every class of information component (node),
and every property of every class, has a unique
name. - These names can be used in queries.
- This whole idea is often called "the Grove
Paradigm. Its the basis of SGML processing,
and the SGML Property Set aided the development
of the DOM.
32In the Grove Paradigm...
- Vocabulary-specific engines can be plugged
together in applications that support XML
resources that use multiple vocabularies. - Vocabulary-specific engines generate a "grove"
(object graph with relevant Property Set as
schema) from any vocabulary-conforming XML
instance. - Vocabulary-specific engines can mature and offer
reliable semantic validation and processing
services in a variety of application contexts,
instead of being rebuilt in each application. - Time and cost of developing applications is
reduced, while reliability of information
interchange increases.
33The Grove Paradigm is Portable
- The Grove Paradigm is highly portable it can be
used with any notation, not just XML and SGML. - Property sets can be used as a way to represent
consensus about how to address the abstract
properties of any notation. - Think about it a vocabulary is a notation. (And
XML is a notation for vocabulary-notations.) - Lets look at some groves! (GroveMinder demo.)
34Summary Designing XML Vocabularies
- Questions to ask
- Must certain semantic processing and validation
operations be performed by all applications of
this vocabulary? - Will more than one application have to deal with
this vocabulary? - If so, its syntactic requirements deserve to be
made explicit in a DTD (or something like a DTD),
and - A property set (or other explicit Abstract API)
defined for it will pay big dividends - in software reuse
- in achieving widespread consensus about what the
vocabulary really means - in determining what went wrong when
vocabulary-mediated information interchange fails
35The preceding SX and GroveMinder demos are
available fromSteve Newcomb