Title: RELAX NG
1RELAX NG
2Caveat
- I did not have a RELAX NG validator when I
wrote these slides.Therefore, if an example
appears to be wrong, it probably is.
3What is RELAX NG?
- RELAX NG is a schema language for XML
- It is an alternative to DTDs and XML Schemas
- It is based on earlier schema languages, RELAX
and TREX - It is not a W3C standard, but is an OASIS
standard - OASIS is the Organization for the Advancement of
Structured Information Standards - ebXML (Enterprise Business XML) is a joint effort
of OASIS and UN/CEFACT (United Nations Centre for
Trade Facilitation and Electronic Business) - OASIS developed the highly popular DocBook DTD
for describing books, articles, and technical
documents - RELAX NG has recently been adopted as an ISO/IEC
standard
4Design goals
- Simple and easy to learn
- Uses XML syntax
- But there is also a concise (non-XML) syntax
- Does not change the information set of an XML
document - (Im not sure what this means)
- Supports XML namespaces
- Treats attributes uniformly with elements so far
as possible - Has unrestricted support for unordered content
- Has unrestricted support for mixed content
- Has a solid theoretical basis
- Can make use of a separate datatyping language
(such W3C XML Schema Datatypes)
5RELAX NG tools
- Jing
- An open source validator written in Java
- Suns MSV
- Another validator
- DTDinst
- Translates from DTDs into RNG (RELAX NG) syntax
or RNG compact syntax - Trang
- Translates RNG compact syntax into RNG syntax
- Translates RNG or RNG compact syntax into DTDs
- Suns RELAX NG Converter
- Translates DTDs into RNG syntax (but not well)
- Translates an XML Schema subset into RNG syntax
(imperfectly)
6Basic structure
- A RELAX NG specification is written in XML, so it
obeys all XML rules - The RELAX NG specification has one root element
- The document it describes also has one root
element - The root element of the specification is element
- If the root element of your document is book,
then the RELAX NG specifications begins - ltelement name"book" xmlns"http//relaxng.org/
ns/structure/1.0"gt - and ends
- lt/elementgt
7Data elements
- RELAX NG makes a clear separation between
- the structure of a document (which it describes)
- the datatypes used in the document (which it gets
from somewhere else, such as from XML Schemas) - For starters, we will use the two (XML-defined)
elements - lttextgt ... lt/textgt (usually written lttext/gt)
- Plain character data, not containing other
elements - ltemptygtlt/emptygt (usually written ltempty/gt)
- Does not contain anything
- Other datatypes, such as ltdoublegt...lt/doublegtare
not defined in RELAX NG - To inherit datatypes from XML Schemas,
usedatatypeLibrary"http//www.w3.org/2001/XMLSc
hema-datatypes"as an attribute of the root
element
8Data types from XML Schemas
- Here are some of the predefined numeric types
xsdecimal xspositiveInteger xsbyte xsnegat
iveInteger xsshort xsnonPositiveInteger xsint
xsnonNegativeInteger xslong
- Here are some of the predefined date/time types
- xsdate -- A date in the format CCYY-MM-DD, for
example, 2002-11-05 - xstime -- A date in the format hhmmss (hours,
minutes, seconds) - xsdateTime -- Format is CCYY-MM-DDThhmmss
- The T is part of the syntax
9Predefined date and time types
- xsdate -- A date in the format CCYY-MM-DD, for
example, 2002-11-05 - xstime -- A date in the format hhmmss (hours,
minutes, seconds) - xsdateTime -- Format is CCYY-MM-DDThhmmss
- The T is part of the syntax
- Allowable restrictions on dates and times
- enumeration, minInclusive, minExclusive,
maxInclusive, maxExclusive, pattern, whiteSpace
10Defining tags
- To define a tag (and specify its content),
use ltelement name"myElement"gt lt!-- Content
goes here --gt lt/elementgt - Example The DTD lt!ELEMENT name (firstName,
lastName)gt lt!ELEMENT firstName
(PCDATA)gt lt!ELEMENT lastName (PCDATA)gt - Translates to ltelement name"name"gt
ltelement name"firstName"gt lttext/gt lt/elementgt
ltelement name"lastName"gt lttext/gt
lt/elementgt lt/elementgt - Note As in the DTD, the components must occur in
order
11RELAX NG describes patterns
- Your RELAX NG document specifies a pattern that
matches your valid XML documents - For example, the pattern
- ltelement name"name"gt ltelement
name"firstName"gt lttext/gt lt/elementgt ltelement
name"lastName"gt lttext/gt lt/elementgtlt/elementgt - Will match the XML
- ltnamegt ltfirstNamegtDavidlt/firstNamegt
ltlastNamegtMatuszeklt/lastNamegtlt/namegt
12Easy tags
- ltzeroOrMoregt ... lt/zeroOrMoregt
- The enclosed content occurs zero or more times
- ltoneOrMoregt ... lt/oneOrMoregt
- The enclosed content occurs one or more times
- ltoptionalgt ... lt/optionalgt
- The enclosed content occurs once or not at all
- ltchoicegt ... lt/choicegt
- Any one of the enclosed elements may occur
- lt!-- This is an XML comment it is not a
container, and it may not contain two
consecutive hyphens --gt
13Example
- ltelement name"addressList"gt ltzeroOrMoregt
ltelement name"name"gt ltelement
name"firstName"gt lttext/gt lt/elementgt
ltelement name"lastName"gt lttext/gt lt/elementgt
lt/elementgt ltelement name"address"gt
ltchoicegt ltelement name"emailgt
lttext/gt lt/elementgt ltelement
name"USPost"gt lttext/gt lt/elementgt
lt/choicegt lt/elementgt lt/zeroOrMoregtlt/elem
entgt
14Enumerations
- The ltvaluegt...lt/valuegt pattern matches a
specified value - Exampleltelement name"gender"gt ltchoicegt
ltvaluegtmalelt/valuegt
ltvaluegtfemalelt/valuegt lt/choicegtlt/elementgt - The contents of ltvaluegt are subject to whitespace
normalization - Leading and trailing whitespace is removed
- Internal sequences of whitespace characters are
collapsed to a single blank
15More about data
- Remember To inherit datatypes from XML Schemas,
add this attribute to the root elementdatatypeLi
brary "http//www.w3.org/2001/XMLSchema-dat
atypes" - You can access the inherited types with the
ltdatagt tag, for instance, ltdata type"doublegt - The ltdatagt pattern must match the entire content
of the enclosing tag, not just part of it - ltelement name"illegalUse"gt lt!-- Don't do this!
--gt ltdata type"double"/gt ltelement
name"moreStuff"gt lttext/gt lt/elementgtlt/elementgt - If you don't specify a datatype library, RELAX NG
defines the following for you (along with lttext/gt
and ltempty/gt) - ltstring/gt No whitespace normalization is done
- lttoken/gt A sequence of characters containing no
whitespace
16ltgroupgt
- ltgroupgt...lt/groupgt is used as fat parentheses
- Example
- ltchoicegt ltelement name"name"gt lttext/gt
ltelementgt ltgroupgt ltelement
name"firstName"gt lttext/gt
lt/elementgt ltelement name"lastName"gt
lttext/gt lt/elementgt
lt/groupgtlt/choicegt
17Attributes
- Attributes are defined practically the same way
as elements - ltattribute name"attributeName"gt...lt/attributegt
- Example
- ltelement name"name"gt ltattribute name"title"gt
lttext/gt lt/attributegt ltelement
name"firstName"gt lttext/gt lt/elementgt ltelement
name"lastName"gt lttext/gt lt/elementgtlt/elementgt - Matches
- ltname title"Dr."gt ltfirstNamegtDavidlt/firstNamegt
ltlastNamegtMatuszeklt/lastNamegtlt/namegt
18More about attributes
- With attributes, as with elements, you can use
ltoptionalgt, ltchoicegt, and ltgroupgt - It doesnt make sense to use ltoneOrMoregt or
ltzeroOrMoregt with attributes - In keeping with the usual XML rules,
- The order in which you list elements is
significant - The order in which you list attributes is not
significant
19Still more about attributes
- ltattribute name"attributeName"gt lttext/gt
lt/attributegt can be (and usually is)
abbreviated asltattribute name"attributeName"/gt - However,ltelement name"elementName"gt lttext/gt
lt/elementgt can not be abbreviated asltelement
name"elementName"/gt - If an element has no attributes and no content,
you must use ltempty/gt explicitly
20ltlistgt
- ltlistgt pattern lt/listgt matches a
whitespace-separated list of tokens, and applies
the pattern to those tokens - Examplelt!-- A floating-point number and some
integers --gtltelement name"vector"gt ltlistgt
ltdata type"float"/gt ltoneOrMoregt
ltdata type"int"/gt lt/oneOrMoregt
lt/listgtlt/elementgt
21ltinterleavegt
- ltinterleavegt ... lt/interleavegt allows the
contained elements to occur in any order - ltinterleavegt is more sophisticated than you might
expect - If a contained element can occur more than once,
the various instances do not need to occur
together
22Interleave example
- ltelement name"contactInformation"gt
ltinterleavegt ltzeroOrMoregt
ltelement name"phone"gt lttext/gt lt/elementgt
lt/zeroOrMoregt ltoneOrMoregt
ltelement name"email"gt lttext/gt lt/elementgt
lt/oneOrMoregt lt/interleavegtlt/elementgt - ltcontactInformationgt ltemailgtdave_at_acm.orglt/ema
ilgt ltphonegt215-898-8122lt/phonegt
ltemailgtmatuszek_at_central.cis.upenn.edult/emailgtlt/co
ntactInformationgt
23ltmixedgt
- ltmixedgt allows mixed content, that is, both text
and patterns - If pattern is a RELAX NG pattern, then ltmixedgt
pattern lt/mixedgtis shorthand for ltinterleavegt
lttext/gt pattern lt/interleavegt
24Example of ltmixedgt
- Pattern
- ltelement name"words"gt ltmixedgt
ltzeroOrMoregt ltchoicegt
ltelement name"bold"gt lttext/gt lt/elementgt
ltelement name"italic"gt lttext/gt
lt/elementgt lt/choicegt
lt/zeroOrMoregt lt/mixedgtlt/elementgt - Matches
- ltwordsgtThis is ltitalicgtnotlt/italicgt a
ltboldgtgreatlt/boldgt example, ltitalicgtbutlt/italicgt
it should suffice.lt/wordsgt
25The need for named patterns
- So far, we have defined elements exactly at the
point that they can be used - There is no equivalent of
- lt!ELEMENT person (name)gtlt!ELEMENT name
(firstName, lastName)gt...use person several
places in the DTD... - With the RELAX NG we have discussed so far, each
time we want to include a person, we would need
to explicitly define both person and name at that
point - ltelement name"person"gt ltelement
name"firstName"gt lttext/gt lt/elementgt ltelement
name"lastName"gt lttext/gt lt/elementgtlt/elementgt - The ltgrammargt element solves this problem
26Syntax of ltgrammargt
- ltgrammar xmlns"http//relaxng.org/ns/structure/1
.0"gt - ltstartgt
- ...usual RELAX NG elements, which may include
- ltref name"DefinedName"/gt
- lt/startgt
- lt!-- One or more of the following --gt
- ltdefine name"DefinedName"gt
- ...usual RELAX NG elements, attributes, groups,
etc. - lt/definegt
- lt/grammargt
27Use of ltgrammargt
- To write a ltgrammargt,
- Make ltgrammargt the root element of your
specification - Hence it should say xmlns"http//relaxng.org/ns/s
tructure/1.0" - Use, as the ltstartgt element, a pattern that
matches the entire (valid) XML document - In each ltdefinegt element, write a pattern that
you want to use other places in the specification - Wherever you want to use a defined element,
putltref name"NameOfDefinedElement"gt - Note that defined elements may be used in
definitions, not just in the ltstartgt element - Definitions may even be recursive, but
- Recursive references must be in an element, not
an attribute
28Long example of ltgrammargt
- lt!ELEMENT name (firstName, lastName)gt
- ltgrammar xmlns"http//relaxng.org/ns/structure/1
.0"gt ltstartgt ltref name"Name"/gt
lt/startgt ltdefine name"Name"gt
ltelement name"name"gt ltelement
name"firstName"gt lttext/gt lt/elementgt
ltelement name"lastName"gt ltref
name"LastName"gt lt/elementgt
lt/elementgt lt/definegt ltdefine
name"LastName"gt ltelement
name"lastName"gt lttext/gt lt/elementgt
lt/definegtlt/grammargt
XML is case sensitive--Note that defined terms
are capitalized differently
29Common usage I
- A typical way to use RELAX NG is to use a
ltgrammargt with just the root element in ltstartgt
and every element described by a ltdefinegt - ltgrammar xmlns"http//relaxng.org/ns/structure/1
.0"gt ltstartgt ltref name"NOVEL"gt
lt/startgt ltdefine name"NOVEL"gt
ltelement name"novel"gt ltref
name"TITLE"/gt ltref name"AUTHOR"/gt
ltoneOrMoregt ltref
name"CHAPTER"/gt lt/oneOrMoregt
lt/elementgt lt/definegt ...more...
30Common usage II
- ltdefine name"TITLE"gt ltelement name"title"gt
lttext/gt lt/elementgtlt/definegt - ltdefine name"AUTHOR"gt ltelement
name"author"gt lttext/gt
lt/elementgtlt/definegt
- ltdefine name"CHAPTER"gt ltelement
name"chapter"gt ltoneOrMoregt
ltref name"PARAGRAPH"/gt lt/oneOrMoregt
lt/elementgtlt/definegt - ltdefine name"PARAGRAPH"gt ltelement
name"paragraph"gt lttext/gt
lt/elementgt lt/definegt - lt/grammargt
31Replacing DTDs
- With ltgrammargt and multiple ltdefinegts, we can do
essentially the same things as a DTD - Advantages
- RELAX NG is more expressive than a DTD we can
interleave elements, specify data types, allow
specific data values, use namespaces, and control
the mixing of data and patterns - RELAX NG is written in XML
- RELAX NG is relatively easy to understand
- Disadvantages
- RELAX NG is extremely verbose
- But there is a compact syntax that is much
shorter - RELAX NG is not (yet) nearly as well known
- Hence there are fewer tools to work with it
- This situation seems to be changing
32The End
So by this maxim be impressed, USE THE TOOLS THAT
WORK THE BEST. Do not yield your sovereign
judgment, To any sort of political fudgement. The
criterion of sound design Should be, must be,
your guideline. And if you're designing
documents, Try RNG. We charge no rents.
--
John Cowan