Title: Semantic Extensions to Domain-Specific Markup Languages
1Semantic Extensions to Domain-Specific Markup
Languages
- Aparna Varde, Elke Rundensteiner, Murali Mani,
Mohammed Maniruzzaman and Richard D. Sisson Jr. - Worcester Polytechnic Institute (WPI)
- Worcester, Massachusetts, USA
2Introduction
- XML, the eXtensible Markup Language Widespread
standard in storing and publishing data. - Domain-specific markup languages designed with
XML tag sets. - Standardization bodies extend these to include
additional semantics. - Aspects such domain knowledge, XML constraints
are important. - Focus of Paper Generic issues in extending
markup languages.
3Domain-specific markup language
- Medium of communication for potential users of
the domain. - Users industries, consumers, universities,
research organizations, publishers etc. - Follows XML syntax.
- Encompasses the semantics of the domain.
- Examples
- MML Medical Markup Language
- MatML Materials Science Markup Language
Industries
Markup Language
Publishers
Consumers
Research Organizations
Universities
4MML Medical Markup Language
- Creates standards for medical data to be stored
and accessed worldwide. - MML module contents, e.g., basic clinic
information, surgery record information. - Used by primary care physicians, general surgeons
etc. - Specific information in sub-areas such as
opthalmology cannot be stored with these
modules. - Thus there is need for more semantics in MML.
5Motivation for extension to markup languages
- Analogous to medical domain and opthalmology
there are specifics in other domains. - Why not define a new markup language for each
aspect? - Typically basic information in generic language
that needs cross-referencing, e.g., basic
surgical details in opthalmology. - Common information should not be stored twice.
- Advisable to extend existing markup language
with additional semantics.
6Extending the Materials Science Markup Language,
MatML
ltMatML_docgt ltMaterialgt
ltBulkDetailsgt lt/BulkDetailsgt
ltComponentDetailsgt ...
lt/ComponentDetailsgt . .
. . lt/Materialgt lt/MatML_docgt
- MatML Materials Science Markup Language.
- XML for materials property data.
- Heat Treating controlled heating and cooling of
materials to achieve desired mechanical and
thermal properties. - Need to include semantics of Heat Treating in
MatML. - At WPI, Heat Treating extension to MatML is
proposed. - Several issues, domain-specific and XML-related
crucial here.
7General issues in extending any markup language
- Steps essential in markup language extension.
- Desired language features.
- XML schema constraints.
- Retrieval using XQuery.
8Steps essential in markup language extension
- Understand domain semantics.
- Model the data.
- Conduct interviews.
- Define the ontology.
- Reiterate the ontology.
- Outline the initial schema.
- Revise the schema based on critical reviews.
91. Understand domain semantics
- Acquire domain knowledge terminology, processes,
entities etc. - This helps determine essential tags to store data
in the domain. - Study existing markup language in detail.
- This is to understand where exactly it needs
extension.
102. Model the data
- Build data model after studying domain.
- Use techniques such as Entity-Relationship
diagrams. - Thus represent domain entities, their properties
and relationships.
Subset of E-R Diagram for Heat Treating
113. Conduct interviews
- Needs of potential users are important.
- This helps determine entities and attributes in
extension. - Users industries, universities, research
organizations, publishers etc. - Domain experts can identify needs of users.
- Hence, interview the domain experts.
124. Define the ontology
- Ontology serves as established lingo for the
domain. - Hence defining ontology is important to proceed
with design. - Issues
- Synonyms two or more words with same meaning,
e.g., in financial domain, salary and income.
- Homographs one word with multiple meanings,
e.g., share in financial domain could refer to
sharing of assets or shares in the stock
market. - Clarify such terms with reference to context
through ontology.
135. Reiterate the ontology
- Once ontology established, useful to have another
round of discussions with experts. - Additional discussions with domain experts may
lead to further clarifications. - Example remove existing entities, create new
ones, based on terminology. - Accordingly ontology needs to be altered.
- Use this ontology for schema design.
High-level ontology for Heat Treating
146. Outline the initial schema
- Schema provides structure, i.e., defines grammar
for the markup language. - Once data model and ontology are approved by
domain experts, outline the initial schema. - Adhere to the syntax of original markup language
to be accommodated as extension.
Partial snapshot of schema for Heat Treating
extension to MatML.
157. Revise the schema based on critical reviews
- Initial schema serves as medium of communication
between designers and users. - This is subject to further changes until domain
experts are satisfied. - Schema revision may involve several iterations.
- Some of these include discussions with standards
bodies. - For proposed extension to be accepted as
worldwide standard, it must be approved by
experts standards bodies.
16Desired language features
- Avoid redundancy.
- Make information non-ambiguous.
- Provide easy interpretability of data.
- Capture domain constraints in the schema.
171. Avoid redundancy
- Markup language extension should be such that
duplication of storage is avoided. - Data stored in the original markup language
should be cross-referenced in the extension. - Example
- In medical domain, there should be
cross-referencing between basic clinic
information in the original language and
opthalmological details in the extension. - Schema should be structured accordingly.
182. Make information non-ambiguous
- Domain terminology, its semantics, aspects such
as synonyms / homographs are significant. - The schema design should adhere to the ontology
to avoid ambiguity. - Annotations should be included within the schema
to enhance clarity. - Example
- For spectacle prescriptions in opthalmology,
include meanings of terms myope and
hypermetrope in schema as annotations.
193. Provide easy interpretability of data
- Data is stored using markup language tags.
- Readers should be able to interpret this data
without much reference to the literature. - Thus the schema design should be organized
accordingly. - Example
- In science and engineering domains, experimental
conditions should be stored close to results to
enhance readability.
204. Capture domain constraints in the schema
- Certain requirements imposed by the domain need
to be captured in schema. - Done through XML constraints feature.
- Some constraints
- Primary key To uniquely identify an entity.
- Choice To declare mutually exclusive elements.
- Example In financial domain, a person could be
either insolvent (bankrupt) or asset-holder
but not both.
21XML schema constraints
- Sequence constraint.
- Disjunction constraint.
- Key constraint.
- Occurrence constraint.
221. Sequence constraint
- To declare a list of elements in order.
- Enclose elements in ltxsdsequencegt tags.
- Example
- In Heat Treating extension, element
QuenchConditions must occur before Results.
232. Disjunction constraint
- To declare mutually exclusive elements, i.e.,
only one of them can exist. - Enclose elements in ltxsdchoicegt tags.
- Example
- In Heat Treating, a part can be made by Casting
OR Powder Metallurgy, not both.
243. Key Constraint
- To declare an attribute to be a primary key,
i.e., it must be unique and non-null. - Indicate the attribute as type xsdID and its
use as required. - Example
- In Heat Treating, the name of the cooling medium
(quenchant) is crucial because the purpose of the
experiments is to categorize the quenchants.
254. Occurrence constraint
- To declare minimum and maximum permissible
occurrences of an element. - Indicate minOccurs x and maxOccurs y
where x and y denote the minimum and maximum
occurrences respectively. - Value maxOccurs unbounded means no upper
bound on number of occurrences. - Value minOccurs 0 means that element need not
be stored even once. - Example
- In Heat Treating, Cooling Rate must be recorded
at a minimum of 8 points in an experiment and
there is no upper bound for it. The maximum
number of graphs stored per experiment is 3 and
it is not necessary that at least one graph be
stored.
26Retrieval using XQuery
- Encourage users to store data in a case-sensitive
manner. - Use tags to enhance querying efficiency.
271. Encourage users to store data in a
case-sensitive manner
- XQuery is case-sensitive
- Hence it is useful to place emphasis on case when
storing data using markup language. - This facilitates retrieval using XQuery.
282. Use tags to enhance querying efficiency
- It is possible to anticipate a typical user query
in a domain. - Thus advisable to add a level of abstraction for
faster retrieval of information. - Example
- In Heat Treating, a user is likely to retrieve
name details of quenchant without its property
details. - Hence place tags ltNameDetailsgt and
ltPropertyDetailsgt around quenchant information. - Thus entire path of quenchant need not be
traversed for name details. - This enhances querying efficiency.
29Conclusions
- Aspects of extending domain-specific markup
languages discussed here. - These include motivation for extension, steps in
extension, language features, XML constraints and
retrieval considerations. - Extension to MatML proposed at CHTE, WPI to
include Heat Treating semantics. - Paper summarizes general issues in extending
domain-specific markup languages.
30Acknowledgments
- Database Systems Research Group in Department of
Computer Science at WPI. - Quenching Research Team in Department of
Materials Science at WPI. - Center for Heat Treating Excellence and its
member companies.