Title: 1
1A Flexible XML-Based Glossary Approach for the
Federal Government
- By Ken Sall
- for the US Federal
- XML Community of Practice
- January 19, 2005
2Problem Statement
- After examining standard glossary terminology
(ISO 1087 and others), define an XML Schema or
DTD that models all useful aspects of a term
and its definition. - Should be applicable to any government agency.
- Consider flexibility and collaborative
development as key design criteria. Many
different agencies may use the model and many
individuals may author specific term definitions. - Create an XSLT stylesheet that knows about the
model and displays an XML glossary instance
document as HTML in any modern browser. - Eventually consider XSL-FO for PDF rendering of
the glossary.
3Design Goals
- Standards-Based - XML element names are loosely
based on an international standard, ISO 1087. - Flexible - The Glossary DTD, although initially a
strawman to stimulate discussion, is fairly
flexible with few required elements, many
optional elements, and several repeatable
elements. - Provides a Framework - Since so few elements are
required, terms can be added even before
definitions are known. These terms act as
placeholders that are fully supported by the DTD
and XSLT. (For example, see the stub terms "DTD"
and "XSLT" in the example instance.)
4Design Goals
- Specialized - Any term may have multiple
definitions so that different agencies may use
the same term with their own specialized meaning,
where necessary. - Collaborative - Since an XSLT stylesheet is used
to sort the terms alphabetically, many
individuals can work on their own glossary
fragments (XML instances of the Glossary DTD). At
any time, the various contributions can be easily
merged without manual editing. - Leverages Links - Search links are automatically
generated for each term by means of the XSLT,
both to help kick-start and to augment the
definition.
5ISO 1087 Terminology (etc.)
Key ISO 1087 Used ISO 1087
Used Unused
- Characteristic Abstraction of a property of an
object or of a set of objects. Note -
Characteristics are used for describing concepts.
ISO 1087-12000, 3.2.4 - Concept A unit of thought constituted through
abstraction on the basis of properties common to
a set of objects. Note - Concepts are not bound
to particular languages. They are, however,
influenced by the social or cultural background.
(ISO 10871990) Unit of knowledge created by a
unique combination of characteristics. ISO
1087-12000, 3.2.1 - Definition Statement which describes a concept
and permits its differentiation from other
concepts within a system of concepts. (ISO
10871990) Representation of a concept by a
descriptive statement which serves to
differentiate it from related concepts. ISO
1087-12000, 3.3.1
6ISO 1087 Terminology (etc.)
- Designation Representation of a concept by a
sign which denotes it. ISO 1087-12000, 3.4.1 - Dictionary see terminology and vocabulary
Structured collection of lexical units with
linguistic information about each of them. (ISO
10871990)
Key ISO 1087 Used ISO 1087
Used Unused
7ISO 1087 Terminology (etc.)
- Entry, Headword The term headword appears in two
different meanings. In lexicography, a headword
is the word used as the heading in a dictionary
entry or encyclopedia. In a descriptive
terminology entry where no preference is given to
any one term, there is no head term, but if
preference is given to a term, head term is
sometimes used in analogy to lexicography, as is
main entry term. (Wright Budin, 1997)
Key ISO 1087 Used ISO 1087
Used Unused
8ISO 1087 Terminology (etc.)
- Glossary see dictionary, terminology,
vocabulary Alphabetical list of terms or words
found in or relating to a specific topic or text.
It may or may not include explanations. Note -
The distinguishing criterion is that glossaries
are considered to reside in backmatter attached
to books and other publications rather than being
independent works in their own right. Glossaries
are sometimes perceived as being less scientific
in intent and methodology than terminologies,
terminology standards, and even vocabularies,
although a certain degree of synonymy exists.
(Wright Budin, 1997)
9ISO 1087 Terminology (etc.)
- Nomenclature System of terms which is elaborated
according to pre-established naming rules. (ISO
10871990) - Object Anything perceivable or conceivable. Note
- Objects may also be material (e.g. an engine, a
sheet of paper, a diamond), immaterial (e.g. a
conversion ratio, a project plan) or imagined
(e.g. a unicorn). Adapted from ISO 1087-12000,
3.1.1
Key ISO 1087 Used ISO 1087
Used Unused
10ISO 1087 Terminology (etc.)
- Synonym A word with the same meaning or nearly
the same meaning as another word in the same
language. (Longman Dictionary of English Language
and Culture Longman Group UK Limited 1992) Note
Terminologists distinguish between real synonyms,
i.e. terms which can be substituted with each
other whatever the context, and the more common
quasi-synonyms, which can differ from one another
by context and sometimes by subject field (Sager,
1990) - Term Designation of a defined concept in a
special language by a linguistic expression. Note
- A term may consist of one or more words or even
contain symbols. (ISO 10871990)
11ISO 1087 Terminology (etc.)
- Terminological Dictionary see dictionary and
vocabulary Dictionary containing terminological
data from one or more specific subject fields.
Note - admitted term technical dictionary (ISO
10871990) - Terminological Record Structured collection of
terminological data relevant to one concept. (ISO
10871990) - Terminological Database Structured sets of
terminological records in an information
processing system. (ISO 10871990)
Key ISO 1087 Used ISO 1087
Used Unused
12ISO 1087 Terminology (etc.)
- Terminology Work Any activity concerned with the
systematization and representation of concepts or
with the presentation of terminologies on the
basis of established principles and methods. (ISO
10871990) - Vocabulary see terminology, dictionary,
glossary Terminological dictionary containing
the terminology of a specific subject field or of
related subject fields and based on terminology
work. (ISO 10871990)
Key ISO 1087 Used ISO 1087
Used Unused
13Summary ISO 1087 Terminology
- Unused ISO 1087 Terms
- Characteristic
- Designation
- Dictionary
- Nomenclature
- Object
- PreferredTerm TBD?
- Terminological Dictionary / technical dictionary
- Terminological Record
- Terminological Database
- Terminological Dictionary
- Terminology Work
- Vocabulary
- ISO 1087 Terms Used
- Concept
- Definition
- Term
- Used but not ISO 1087
- Glossary
- Synonym
- RelatedTerm
- Additional Terms by Sall (next slide)
- Name
- Acronym
- ExpandedAcronym
- DefinitionSection
- Source
- Usage
14Additional (Non-Standard) Terminology
- Glossary change to Dictionary, Vocabulary,
Technical Dictionary or Terminology? - Name added only to allow Term to be a
container could change Term to Entry and Name to
Term? - Acronym necessary option for technical terms
- ExpandedAcronym ditto
- DefinitionSection - added simply as a repeatable
container to encompass all aspects pertaining to
a specific definition of a term - Source - useful for traceability and credibility
- Usage useful to have an optional example
sentence for a given definition (use in context)
15XML Glossary Model Strawman
16XML Example of One Term
ontology
semantic
web knowledge
management Defines
the common words and concepts used to describe
and represent an area of knowledge, and so
standardizes the meanings. An ontology
includes classes in the domains of interest,
instances, relationships, properties and their
values, functions of and processes
involving the objects, and relevant constraints
and rules. Daconta,
Obrst, Smith An onotology
can range from the simple notion of a taxonomy to
a thesaurus, to a conceptual model, to a logical
theory. Daconta, Obrst, Smith
classification system
taxonomy
OWL
philosophy
sometimes "Ontology" the
metaphysical study of the nature of being and
existence WordNetce Both the ontology and manner of
human existence are of concern to
Existentialism. metaphysics
17XML Ex Client-Side XSLT (Firefox)
18XML Ex Client-Side XSLT (IExplorer)
19XML Example XSLT Details
DefinitionSection based on Concept
CSS Styling
Optional and Repeatable Elements
New DefinitionSection based on 2nd Concept
Auto-generated Search Links
20Collaboration Merging Instances
- Since a Glossary consists of one or more Terms, a
relatively simple XSLT can be created to merge
the Term elements for two or more XML instances. - This means different authors (from the same or
different agencies) can work independently. - Issue What if same Term is defined by different
authors? Automatically add each definition, even
though they may overlap/conflict, or manually
edit collisions (could generate a conflict
message)? - Issue Should agency name be a Source or another
element (e.g., AgencySource)? Advantage is that
custom XSLT could extract or render terms on per
agency basis, if desired. Should there be an
optional, repeatable SourceLink element for a URL?
21Alternative GlossXML
22Alternative XML Acronym Desmystifier
23Next Steps
- Determine interested agencies.
- Establish funding.
- Resolve terminology issues for the Glossary
model. - Consider merge or replacement by GlossXML and/or
XML Acronym Demystifier. - Need to finalize DTD or XML Schema before
agencies start authoring. - Revise initial XSLT to match final Glossary
model. - Determine repository and submission mechanisms.
- Could be a good use for CORE.gov?
- Coordinate with Plans for Derived XML Registry
Prototype? - Write additional XSLT stylesheets for merging and
pulling agency-specific terms, etc. - Develop XSL-FO stylesheets for PDF rendering of
Glossary.