Semantic Extensions to Domain-Specific Markup Languages - PowerPoint PPT Presentation

About This Presentation
Title:

Semantic Extensions to Domain-Specific Markup Languages

Description:

Reiterate the ontology Once ontology established, useful to have another round of discussions with experts. ... Use tags to enhance querying efficiency. 1. – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 31
Provided by: Bes105
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Semantic Extensions to Domain-Specific Markup Languages


1
Semantic Extensions to Domain-Specific Markup
Languages
  • Aparna Varde, Elke Rundensteiner, Murali Mani,
    Mohammed Maniruzzaman and Richard D. Sisson Jr.
  • Worcester Polytechnic Institute (WPI)
  • Worcester, Massachusetts, USA

2
Introduction
  • XML, the eXtensible Markup Language Widespread
    standard in storing and publishing data.
  • Domain-specific markup languages designed with
    XML tag sets.
  • Standardization bodies extend these to include
    additional semantics.
  • Aspects such domain knowledge, XML constraints
    are important.
  • Focus of Paper Generic issues in extending
    markup languages.

3
Domain-specific markup language
  • Medium of communication for potential users of
    the domain.
  • Users industries, consumers, universities,
    research organizations, publishers etc.
  • Follows XML syntax.
  • Encompasses the semantics of the domain.
  • Examples
  • MML Medical Markup Language
  • MatML Materials Science Markup Language

Industries
Markup Language
Publishers
Consumers
Research Organizations
Universities
4
MML Medical Markup Language
  • Creates standards for medical data to be stored
    and accessed worldwide.
  • MML module contents, e.g., basic clinic
    information, surgery record information.
  • Used by primary care physicians, general surgeons
    etc.
  • Specific information in sub-areas such as
    opthalmology cannot be stored with these
    modules.
  • Thus there is need for more semantics in MML.

5
Motivation for extension to markup languages
  • Analogous to medical domain and opthalmology
    there are specifics in other domains.
  • Why not define a new markup language for each
    aspect?
  • Typically basic information in generic language
    that needs cross-referencing, e.g., basic
    surgical details in opthalmology.
  • Common information should not be stored twice.
  • Advisable to extend existing markup language
    with additional semantics.

6
Extending the Materials Science Markup Language,
MatML
ltMatML_docgt ltMaterialgt
ltBulkDetailsgt lt/BulkDetailsgt
ltComponentDetailsgt ...
lt/ComponentDetailsgt . .
. . lt/Materialgt lt/MatML_docgt
  • MatML Materials Science Markup Language.
  • XML for materials property data.
  • Heat Treating controlled heating and cooling of
    materials to achieve desired mechanical and
    thermal properties.
  • Need to include semantics of Heat Treating in
    MatML.
  • At WPI, Heat Treating extension to MatML is
    proposed.
  • Several issues, domain-specific and XML-related
    crucial here.

7
General issues in extending any markup language
  • Steps essential in markup language extension.
  • Desired language features.
  • XML schema constraints.
  • Retrieval using XQuery.

8
Steps essential in markup language extension
  1. Understand domain semantics.
  2. Model the data.
  3. Conduct interviews.
  4. Define the ontology.
  5. Reiterate the ontology.
  6. Outline the initial schema.
  7. Revise the schema based on critical reviews.

9
1. Understand domain semantics
  • Acquire domain knowledge terminology, processes,
    entities etc.
  • This helps determine essential tags to store data
    in the domain.
  • Study existing markup language in detail.
  • This is to understand where exactly it needs
    extension.

10
2. Model the data
  • Build data model after studying domain.
  • Use techniques such as Entity-Relationship
    diagrams.
  • Thus represent domain entities, their properties
    and relationships.

Subset of E-R Diagram for Heat Treating
11
3. Conduct interviews
  • Needs of potential users are important.
  • This helps determine entities and attributes in
    extension.
  • Users industries, universities, research
    organizations, publishers etc.
  • Domain experts can identify needs of users.
  • Hence, interview the domain experts.

12
4. Define the ontology
  • Ontology serves as established lingo for the
    domain.
  • Hence defining ontology is important to proceed
    with design.
  • Issues
  • Synonyms two or more words with same meaning,
    e.g., in financial domain, salary and income.
  • Homographs one word with multiple meanings,
    e.g., share in financial domain could refer to
    sharing of assets or shares in the stock
    market.
  • Clarify such terms with reference to context
    through ontology.

13
5. Reiterate the ontology
  • Once ontology established, useful to have another
    round of discussions with experts.
  • Additional discussions with domain experts may
    lead to further clarifications.
  • Example remove existing entities, create new
    ones, based on terminology.
  • Accordingly ontology needs to be altered.
  • Use this ontology for schema design.

High-level ontology for Heat Treating
14
6. Outline the initial schema
  • Schema provides structure, i.e., defines grammar
    for the markup language.
  • Once data model and ontology are approved by
    domain experts, outline the initial schema.
  • Adhere to the syntax of original markup language
    to be accommodated as extension.

Partial snapshot of schema for Heat Treating
extension to MatML.
15
7. Revise the schema based on critical reviews
  • Initial schema serves as medium of communication
    between designers and users.
  • This is subject to further changes until domain
    experts are satisfied.
  • Schema revision may involve several iterations.
  • Some of these include discussions with standards
    bodies.
  • For proposed extension to be accepted as
    worldwide standard, it must be approved by
    experts standards bodies.

16
Desired language features
  • Avoid redundancy.
  • Make information non-ambiguous.
  • Provide easy interpretability of data.
  • Capture domain constraints in the schema.

17
1. Avoid redundancy
  • Markup language extension should be such that
    duplication of storage is avoided.
  • Data stored in the original markup language
    should be cross-referenced in the extension.
  • Example
  • In medical domain, there should be
    cross-referencing between basic clinic
    information in the original language and
    opthalmological details in the extension.
  • Schema should be structured accordingly.

18
2. Make information non-ambiguous
  • Domain terminology, its semantics, aspects such
    as synonyms / homographs are significant.
  • The schema design should adhere to the ontology
    to avoid ambiguity.
  • Annotations should be included within the schema
    to enhance clarity.
  • Example
  • For spectacle prescriptions in opthalmology,
    include meanings of terms myope and
    hypermetrope in schema as annotations.

19
3. Provide easy interpretability of data
  • Data is stored using markup language tags.
  • Readers should be able to interpret this data
    without much reference to the literature.
  • Thus the schema design should be organized
    accordingly.
  • Example
  • In science and engineering domains, experimental
    conditions should be stored close to results to
    enhance readability.

20
4. Capture domain constraints in the schema
  • Certain requirements imposed by the domain need
    to be captured in schema.
  • Done through XML constraints feature.
  • Some constraints
  • Primary key To uniquely identify an entity.
  • Choice To declare mutually exclusive elements.
  • Example In financial domain, a person could be
    either insolvent (bankrupt) or asset-holder
    but not both.

21
XML schema constraints
  • Sequence constraint.
  • Disjunction constraint.
  • Key constraint.
  • Occurrence constraint.

22
1. Sequence constraint
  • To declare a list of elements in order.
  • Enclose elements in ltxsdsequencegt tags.
  • Example
  • In Heat Treating extension, element
    QuenchConditions must occur before Results.

23
2. Disjunction constraint
  • To declare mutually exclusive elements, i.e.,
    only one of them can exist.
  • Enclose elements in ltxsdchoicegt tags.
  • Example
  • In Heat Treating, a part can be made by Casting
    OR Powder Metallurgy, not both.

24
3. Key Constraint
  • To declare an attribute to be a primary key,
    i.e., it must be unique and non-null.
  • Indicate the attribute as type xsdID and its
    use as required.
  • Example
  • In Heat Treating, the name of the cooling medium
    (quenchant) is crucial because the purpose of the
    experiments is to categorize the quenchants.

25
4. Occurrence constraint
  • To declare minimum and maximum permissible
    occurrences of an element.
  • Indicate minOccurs x and maxOccurs y
    where x and y denote the minimum and maximum
    occurrences respectively.
  • Value maxOccurs unbounded means no upper
    bound on number of occurrences.
  • Value minOccurs 0 means that element need not
    be stored even once.
  • Example
  • In Heat Treating, Cooling Rate must be recorded
    at a minimum of 8 points in an experiment and
    there is no upper bound for it. The maximum
    number of graphs stored per experiment is 3 and
    it is not necessary that at least one graph be
    stored.

26
Retrieval using XQuery
  1. Encourage users to store data in a case-sensitive
    manner.
  2. Use tags to enhance querying efficiency.

27
1. Encourage users to store data in a
case-sensitive manner
  • XQuery is case-sensitive
  • Hence it is useful to place emphasis on case when
    storing data using markup language.
  • This facilitates retrieval using XQuery.

28
2. Use tags to enhance querying efficiency
  • It is possible to anticipate a typical user query
    in a domain.
  • Thus advisable to add a level of abstraction for
    faster retrieval of information.
  • Example
  • In Heat Treating, a user is likely to retrieve
    name details of quenchant without its property
    details.
  • Hence place tags ltNameDetailsgt and
    ltPropertyDetailsgt around quenchant information.
  • Thus entire path of quenchant need not be
    traversed for name details.
  • This enhances querying efficiency.

29
Conclusions
  • Aspects of extending domain-specific markup
    languages discussed here.
  • These include motivation for extension, steps in
    extension, language features, XML constraints and
    retrieval considerations.
  • Extension to MatML proposed at CHTE, WPI to
    include Heat Treating semantics.
  • Paper summarizes general issues in extending
    domain-specific markup languages.

30
Acknowledgments
  • Database Systems Research Group in Department of
    Computer Science at WPI.
  • Quenching Research Team in Department of
    Materials Science at WPI.
  • Center for Heat Treating Excellence and its
    member companies.
Write a Comment
User Comments (0)
About PowerShow.com