HOW TO RELAX

$Id: howToRELAX.sdoc 1.9 2000/08/26 03:12:38 murata Exp $

MURATA Makoto

Part 1: RELAX Core

Part 1: RELAX Core

STEP 0: Introduction

$Id: step0.sdoc 1.10 2000/08/06 08:45:41 murata Exp $
STEP 0 gives a brief overview of RELAX and shows how to read this tutorial.

RELAX, brief overview

RELAX is a specification for describing XML-based languages. XHTML 1.0, for example, can be described in RELAX. A description written in RELAX is called a RELAX grammar.

Comparison with DTD

Compared with a traditional DTD (Document Type Definition), RELAX has new features as below:

A RELAX grammar can be written as an XML document.
RELAX borrows rich datatypes from XML Schema Part 2.
RELAX is namespace-aware.

The RELAX processor

The RELAX processorverifies XML documents against RELAX grammars. The input to the RELAX processors is an XML document and a RELAX grammar. To be precise, the RELAX processor does not directly handle XML documents and RELAX grammars, but rather receives the output of the XML processor which handles them.
The RELAX processor reports if the XML document is legitimate against the RELAX grammar. It may also report some other messages. The RELAX processor has no other outputs.

The organization of RELAX

RELAX consists of RELAX Core and RELAX Namespace. RELAX Core handles elements in a single namespace and their attributes. RELAX Core borrows datatypes from XML Schema Part 2. RELAX Namespace combines multiple modules so as to handle multiple namespaces. At present, this tutorial covers RELAX Core only.

About this tutorial

This tutorial is intended to be very easy to understand. All STEPs except this one use plenty of examples and provide concrete explanations.
STEPs 1 thru 10 are concerned with RELAX Core. You may stop at the end of any step and still have a reasonable understanding of RELAX. You can start to use RELAX immediately after reading STEP 1. If you read through STEP 6, you will know all the features shared by RELAX and XML DTDs. All RELAX processors are required to support these features. STEPs 7 thru 10 explain new features of RELAX. RELAX processors are not required to support these new features.
A little inaccuracy saves tons of explanation. STEPs 1 and 2 are sometimes inaccurate. Accurate explanations are provided by STEPs 7 and 8.

Summary

RELAX is very simple. If you are familiar with DTDs, you can be fluent in RELAX almost immediately. Even if you are not, you can easily master RELAX. Enjoy and RELAX!

STEP 1: Migration from XML DTD (without parameter entities)

$Id: step1.sdoc 1.13 2000/08/06 08:47:44 murata Exp $
STEP 1 covers basic features, which allow easy migration from DTDs. The DTD2RELAX converter uses these features only.

An example module

To provide an idea of RELAX, we will recapture a DTD as a RELAX module.
A DTD is shown below. The number attribute of title elements should be integers, but DTDs cannot represent this constraint.

<!ELEMENT doc (title, para*)> <!ELEMENT para (#PCDATA | em)*> <!ELEMENT title (#PCDATA | em)*> <!ELEMENT em (#PCDATA)> <!ATTLIST para class NMTOKEN #IMPLIED > <!ATTLIST title class NMTOKEN #IMPLIED number CDATA #REQUIRED >

Next, we show a RELAX module. The number attribute is specified as an integer.

Subsequent sections explain syntactical constructs appeared in this example.

The module element

A RELAX grammar is a combination of modules. If the number of namespaces is one and the grammar is not too large, a module provides a RELAX grammar. A module is represented by a module element.

The moduleVersion attribute shows the version of this module. In this example, it is "1.2".
The relaxCoreVersion attribute shows the version of RELAX Core. At present, it is always "1.0".
The targetNamespace attribute shows the namespace which this module is concerned with. In this example, it is "".
The namespace name for RELAX Core is "http://www.xml.gr.jp/xmlns/relaxCore".

The interface element

A module element begins with an interface element. A module has one interface element.

The export element

An interface element contains export element(s).

The label attribute of export elements specifies an element type that may become the root. More than one export may appear in an interface element.
The following example allows element type foo and bar as the root.

Element type declarations

Element type declarations (<!ELEMENT ...>) of XML are represented by elementRule elements. The role attribute of elementRule specifies an element type name. More than one elementRule may follow the interface element.

<elementRule role="element-type-name"> ...hedge model... </elementRule>

An elementRule element has a hedge model. A hedge is a sequence of elements (and their descendants) as well as character data. A hedge model is a constraint on permissible hedges.
A hedge model is either an element hedge model, datatype reference, or mixed hedge model.

Element hedge model

Element hedge modelsare represented by

empty,
none, ref, choice, sequence

elements and the occurs attribute. An element hedge model represents permissible sequences of child elements, which are possibly intervened by whitespace characters.

The empty element

emptyrepresents the empty sequence.
Consider an elementRule as below:

This elementRule implies that the content of a foo element is the empty sequence. A foo element can be a start tag followed by an end tag, or an empty-element tag.

<foo/>

Unlike EMPTY of XML, whitespace characters may intervene between start and end tags.

emptycan be used within choice and sequence. The motivation behind this extension will become clear in STEP 2. If you need exactly the same feature as EMPTY of XML, use the emptyString datatype (shown in STEP 3).
From now on, we assume that foo, foo1, and foo2 are declared by elementRules whose hedge models are empty.

The ref element

refreferences to an element type. For example, <ref label="foo"/> references to an element type foo.
Consider an elementRule as below:

This elementRule implies that the content of a bar element is a foo element. For example, the next bar element is legitimate against this elementRule.

Whitespace may appear before and after the foo element.

refcan have the occurs attribute. Permissible values are "*", "+", and "?" , which indicate "zero or more", "one or more", and "zero or one time", respectively.
An example of "?" as the occurs attribute is as below:

This elementRule implies that the content of a bar element is either a foo or empty.

Whitespace characters may appear before and after the foo element. Even when this bar is empty, it may have whitespace characters.

The choice element

choiceindicates a choice of the specified hedge models ("|" of XML 1.0). Subordinate elements of choice elements are element hedge models. choice can also have the occurs attribute.
An example of elementRule containing choice is shown below:

This elementRule indicates that the content of a bar element is one or more occurrences of either foo1 or foo2 elements.

The sequence element

sequenceis a sequence of the specified hedge models. ("," of XML 1.0). Subordinate elements of sequence are element hedge models. sequence can also have the occurs attribute.
An example of elementRule containing sequence is shown below:

This elementRule implies that the content of a bar element is either a sequence of a foo1 element and a foo2 element, or empty.

<bar/>

The none element

noneis an element hedge model, which does not match anything. none is unique to RELAX.

This elementRule implies that nothing is permitted as the content of bar elements. The motivation behind none will become clear in STEP 2.

Datatype reference

The type attribute of elementRule allows a content model that references to a datatype. Character strings in a document are compared with the specified datatype. Permissible datatypes are built-in datatypes of XML Schema Part 2, or datatypes unique to RELAX. Details of datatypes will be covered by STEP 3.
An example of elementRule containing type is shown below:

This elementRule indicates that the content of a bar element is a character string representing an integer.

Whitespace characters may not occur before or after the integer. For example, the following is not permitted.

Mixed hedge model

mixedsignificantly extends mixed content models (#PCDATA|a|b|...|z)* of XML.
A mixed element wraps an element hedge model. Recall that an element hedge model allows whitespace characters to intervene between elements. By wrapping it with mixed, any character is allowed to intervene.
(#PCDATA | foo1| foo2)*of XML can be captured as below:

The choice element in this mixed element matches zero or more occurrences of foo1 or foo2 elements. The mixed allows any character to intervene between these elements. Thus, this hedge model is equivalent to a (#PCDATA | foo1| foo2)* mixed content model of XML 1.0.
There are two ways to capture a (#PCDATA) content model. One is to reference to the datatype string by the type attribute. The other is to make an element hedge model that matches the empty sequence only and wrap it with mixed.

As a more advanced example, consider elementRule as below:

A sequence of <foo/> and <foo2/> matches ref in the mixed element. Thus, the following example is permitted by this elementRule.

<bar>Murata<foo1/>Makoto<foo2/>IUJ</bar>

As shown in the following example, CDATA sections and character references may appear.

<bar><![CDATA[Murata]]><foo1/>Makoto<foo2/>IUJ</bar>

Attribute-list declarations

Attribute-list declarations (<!ATTLIST ...>) of XML are captured by tag elements.

<tag name="element-type-name"> ...list of attribute declarations... </tag>

tagcan have attribute elements as subordinates.

attributedeclares an attribute. An example of attribute is shown below:

The value of the name attribute is the name of the declared attribute. In this example, it is age.
If the value of the required attribute is true, the attribute being declared is mandatory. If required is not specified, it is optional. Since required is specified in this example, the age attribute is mandatory.
The type attribute specifies a datatype name. If type is not specified, a datatype string (which allows any string) is assumed.
Consider an example of tag which contains this attribute element only.

The following start tag is permitted by this tag.

The following two start tags are not permitted. In the first example, the age attribute is omitted. In the second example, the value of age is not an integer.

<bar>

In DTD, you do not have to write an attribute-list declaration if an element type does not have any attributes. In RELAX, you must write an empty tag element even if there are no attributes. For example, if an element type bar does not have any attributes, you have to write a tag element as below:

Summary

If you have finished reading this STEP, you can immediately start to use RELAX. If you do not need further features, you do not have to read other STEPs. Enjoy and RELAX!

STEP 2: Migration from XML DTD (with parameter entities)

$Id: step2.sdoc 1.10 2000/11/01 13:41:12 murata Exp $
Often, you have to write the same thing many times. Features in STEP 2 allow you to create a description once and reference to it repeatedly. These features mimic parameter entities of XML.

Parameter entities used in content models

hedgeRuleallows you to write a hedge model once, name it, and reference to it repeatedly. In other words, hedgeRule mimics parameter entities referenced from content models in DTD.

Overview

The syntax of hedgeRule is shown below. foo is a name assigned to the hedge model of this hedgeRule.

<hedgeRule label="foo"> ...element content model... </hedgeRule>

To reference to such a hedgeRule, we write <hedgeRef label="foo"/>. This hedgeRef is replaced with the element hedge model specified in the hedgeRule.
In the following example, the hedge model of the elementRule for the element type doc references to a hedgeRule. This elementRule is borrowed from the module in the beginning of STEP 1, and the hedge model minus title is rewritten by a hedgeRule.

The reference to doc.body is expanded as below:

In this example, a hedgeRule is referenced from an elementRule. But a hedgeRule may reference to another hedgeRule.

Permissible hedge models

hedgeRulecan have element hedge models only. Datatype references or mixed hedge models are not permitted. For example, the following rules are not permitted.

If you want to use hedgeRef in conjunction with a mixed hedge model, you have to surround the hedgeRef with mixed in an elementRule element, rather than using the mixed element inside a hedgeRule element. An example is shown below. The mixed hedge model references to phrase, and phrase is described by a hedgeRule.

The occurs attribute

hedgeRefthat references to a parameter entity can have occurs, and an element hedge model specified in hedgeRule can also have occurs. In the following example, both have occurs.

If this example is recaptured in DTD, expansion of the parameter entity bar is obvious.

<!ENTITY % bar "(foo1, foo2)+">  <!ELEMENT foo (%bar;)*>  <!ELEMENT foo ((foo1, foo2)+)*>

The following shows expansion of the above example. Observe that a choice element containing a single child is introduced during expansion. This choice element inherits occurs="*" from the ref.

Occurrence order of hedgeRef and hedgeRule

Unlike parameter entities of DTD, hedgeRule does not have to precede ref that reference to it. For example, the following is not an error.

Illegal reference to itself

hedgeRulemay not reference to itself directly or indirectly. The follow is an error since the hedge model for bar references to bar itself.

In the following example, the hedge model for bar1 references to bar2 and the hedge model for bar2 references to bar1. Thus, there is an error.

Use of empty

empty, shown in STEP 1, is typically used in hedgeRule. An example is as below:

Users of this module can change the structure of section by customizing the description of frontMatter.

Use of none

none, shown in STEP 1, is also used in hedgeRule. An example is as below:

Users of this module can change the structure of block-class by customizing the description of local-block-class.

Parameter entities used in attribute-list declarations

attPoolallows you to declare attributes once and reference to the declarations repeatedly. In other words, attPool mimics parameter entities referenced from attribute-list declarations.

Overview

The syntax of attPool is shown below. foo is a name of a parameter entity.

<attPool role="foo"> ...attribute definitions... </attPool>

To reference to such an attPool, we write <ref role="foo"/> before attribute declarations. This ref is replaced with attribute declarations specified in the attPool.
In the following example, a tag for the element type title references to attPool. This tag is borrowed from the module in the beginning of STEP 1 and rewritten. The role attribute, which is common to many element types, is described by attPool named common.att.

This ref is expanded as below:

In this example, attPool is referenced from tag, but it can also be referenced from attPool.

Occurrence order of ref and attPool

Unlike parameter entities of DTD, attPool does not have to precede ref that reference to it. For example, the following is not an error.

Multiple ref elements

A single tag or attPool may contain more than one ref element. In the following example, an attPool element references to more than one ref element. Required attributes are grouped as common-req.att and optional attributes are grouped as common-opt.att. These two are referenced from the attPool element for common.att.

Illegal reference to itself

As in the case of hedgeRule, a direct or indirect reference to itself is an error. For example, the following is an error.

Summary

STEP 2 covers almost all features of XML DTD. Enjoy and RELAX!

STEP 3: Datatypes

$Id: step3.sdoc 1.10 2000/08/26 03:14:38 murata Exp $
STEP 3 introduces datatypes.

Datatypes of XML Schema Part 2

XML Schema Part 2introduces many built-in datatypes. They are designed so that other specifications can utilize them. RELAX borrows all these built-in datatypes.
Some of the built-in datatypes of XML Schema Part 2 are borrowed from XML DTD; the others are newly introduced. Those borrowed from XML DTD are shown as below:

NMTOKEN
NMTOKENS
ID
IDREF
IDREFS
ENTITY
ENTITIES
NOTATION

Next, built-in datatypes newly introduced by XML Schema Part 2 are shown below:

string
boolean
float
double
decimal
timeDuration
recurringDuration
binary
uriReference
QName
language
Name
NCName
integer
nonPositiveInteger
negativeInteger
long
int
short
byte
nonNegativeInteger
unsignedLong
unsignedInt
unsignedShort
unsignedByte
positiveInteger
timeInstant
time
timePeriod
date
month
year
century
recurringDate
recurringDay

In XML Schema Part 2, when users reference to these built-in datatypes, users can further specify constraints such as value ranges. The same thing applies to RELAX. However, unlike XML Schema Part 2, RELAX does not allow users to define their own datatypes.

Datatypes unique to RELAX

Datatypes unique to RELAX are none and emptyString.

none

noneis an empty datatype. No character strings belong to this datatype. RELAX uses none so as to prohibit attributes. In the following example, the class attribute is prohibited.

Thus, the following start tag is not permitted.

emptyString

emptyStringis a datatype that allows the empty string only. This datatype is compatible with EMPTY of DTD.

This elementRule allows the following two elements only. Whitespace characters may not occur between <em> and </em>.

<em/>

Additional constraints

Like XML Schema Part 2, RELAX allows users to specify additional constraints on datatypes. For example, users can specify integer and further specify a constraint "18 thru 65". The syntax for such additional constraints is the same as in XML Schema Part 2.

elementRule

To impose constraints on a datatype specified by elementRule, attach child elements to the elementRule.
In the following example, the hedge model for the element type age is a reference to integer. minInclusive and maxInclusive represent constraints on minimum and maximum values, respectively. Thus, permissible contents of age elements are character strings representing integers from 18 to 65.

A age element can contain string "20" as its content.

But string "11" is not allowed.

attribute

To impose constraints on a datatype specified by attribute, attach child elements to attribute.
In the following example, the sex attribute of employee is constrained to be either man or woman. Here, enumeration is a constraint which specifies a permissible value.

The sex attribute can have the string "man".

But it cannot contain the string "foo".

Summary

STEP 3 provides more than enough features to play with. Enjoy and RELAX!

STEP 4: Annotation

$Id: step4.sdoc 1.12 2000/11/01 13:43:29 murata Exp $
DTD documentation is highly important. Since DTDs merely define syntactical constructs, plenty of annotations in natural languages are required so as to explain the intended semantics of these constructs. Although XML comments are always available, such comments will be ignored by browsers which parse and then display RELAX modules.
STEP 4 provides features for annotating RELAX modules. Since they are represented by elements and attributes, browsers which parse RELAX modules can show annotations in a user-friendly manner.

The annotation element

The top-level element for annotations is the annotation element. annotation may occur in the following places.

at most once before the interface element
at most once in an export element
at most once as the eldest child of each elementRule
at most once as the eldest child of each hedgeRule
at most once as the eldest child of each tag
at most once as the eldest child of each attPool
at most once as the eldest child of each attribute
at most once as the child of each include
at most once as the eldest child of each element
at most once as the eldest child of each div

The elementRule element shown below has an annotation as its eldest child. The content of this annotation is omitted.

Child elements of an annotation element are documentation elements and appinfo elements.

The documentation element

documentationis an element for representing explanations in natural languages. Since RELAX Namespace is not available yet, documentation may contain text data only.
The following shows a documentation element added to the above example.

<elementRule role="para"> <annotation> <documentation>This is a paragraph.</documentation> </annotation> <mixed> <ref label="fnote" occurs="*"/> </mixed> </elementRule>

If a documentation element has the source attribute, the attribute value is a URI that references to an explanation. In this case, the content of documentation is not used. Browsers for modules typically use this URI to provide a link.

If a documentation element has the xml:lang attribute, the attribute value announces the natural language in which the content of the documentation is written.
In the next example, "en" is specified as the value of xml:lang.

<elementRule role="para"> <annotation> <documentation xml:lang="en">This is a paragraph.</documentation> </annotation> <mixed> <ref label="fnote" occurs="*"/> </mixed> </elementRule>

The appinfo element

Other than verifiers, which examine documents against RELAX modules, many programs might handle RELAX modules. For example, some programs may create a database schema from a module. appinfo provides hidden information for such programs. Since RELAX Namespace is not available yet, appinfo may contain text data only.

<elementRule role="foo" type="integer"> <annotation><appinfo>default:1</appinfo></annotation> </elementRule>

If an appinfo element has the source attribute, the attribute value is a URI that references to hidden information. In this case, the content of appinfo is not used.

The div element

We often would like to annotate a collection of elementRules, hedgeRules, tags, and attPools. The div element allows such an annotated group.
divelements may occur in module elements as siblings of elementRules, hedgeRules, tags, and attPools. div elements may further contain div elements. A div element may contain elementRules, hedgeRules, tags, attPools, and divs.
The following is a module shown in STEP 1. It is annotated after introducing div elements.

<module moduleVersion="1.2" relaxCoreVersion="1.0" targetNamespace="" xmlns="http://www.xml.gr.jp/xmlns/relaxCore"> <interface> <export label="doc"/> </interface> <div> <annotation> <documentation>The root node</documentation> </annotation> <elementRule role="doc"> <sequence> <ref label="title"/> <ref label="para" occurs="*"/> </sequence> </elementRule> <tag name="doc"/> </div> <div> <annotation> <documentation>Paragraphs</documentation> </annotation> <elementRule role="para"> <mixed> <ref label="em" occurs="*"/> </mixed> </elementRule> <tag name="para"> <attribute name="class" type="NMTOKEN"/> </tag> </div> <elementRule role="title"> <mixed> <ref label="em" occurs="*"/> </mixed> </elementRule> <tag name="title"> <attribute name="class" type="NMTOKEN"/> <attribute name="number" required="true" type="integer"/> </tag> <elementRule role="em" type="string"/> <tag name="em"/> </module>

Summary

STEP 4 makes it easy to document your module. Enjoy and RELAX!

STEP 5: Dividing large modules

$Id: step5.sdoc 1.7 2000/04/14 12:40:02 murata Exp $
Large modules are hard to maintain. STEP 5 introduces a mechanism for dividing a module into several pieces, which can be maintained easier.

Why divide modules?

Suppose that we rewrite a DTD containing 200 element types in RELAX. This size is fairly large, but is not uncommon. For each element type, RELAX needs an elementRule and a tag. If each elementRule and tag requires three lines, the total is 1200 lines. If we write extensive documentation, the total may become 3000 lines or even more. This size is too large to put in a single file.
Even DTD provides external parameter entities so as to divide large DTDs into modules and maintain each module independently. RELAX strongly requires some mechanism for dividing large modules.

The include element

In RELAX, a module can reference to another module by the include element. The include element is replaced with the content of the referenced module.
Let us examine an example of include. First, a module to be included is as below:

This module contains an elementRule and tag for the element type bar. The interface element is empty. Suppose that this module is stored in bar.rlx.
Next, a module which references to and includes this module is as below:

This module contains an elementRule and tag for the element type foo. The include at the end of this this module references to bar.rlx via the moduleLocation attribute.
The include element is replaced by the body of the referenced module, which the content of the module element except the interface element. In this example, replacement is done as below:

Non-empty interface elements

In the above example, the interface element of the referenced module is empty. Suppose that an export element is supplied in the interface element.

In this case, the children of the interface element in the referenced module are attached to the interface element in the referencing module. In this example, the result of replacement is as below:

Summary

STEP 5 makes it easy to maintain large modules. Enjoy and RELAX!

STEP 6: Default values, entities, and notations

$Id: step6.sdoc 1.9 2000/11/01 13:45:32 murata Exp $
Among the features of DTD, we have not covered default values, entities, and notations. STEP 6 is concerned with them.

Reasons that RELAX does not handle them

RELAX does not handle default values, entities, and notations. They are intentionally omitted from RELAX so that we can continue to use existing XML processors.
Suppose that RELAX introduced constructs for these features. For example, assume that RELAX had the default attribute which provides the default value of an attribute. Existing XML processors will not examine RELAX modules when they parse XML documents. Thus, they will not use default. The same thing applies to entities and notations: even if RELAX had constructs for declaring entities and notations, existing XML processors would not use them.
If we would like to introduce such features to RELAX, the only solution is to create RELAX-specific XML parsers. Those users who create and verify XML documents against RELAX grammars would certainly have to use such RELAX-specific XML parsers. Furthermore, those users who receive such XML documents would have to switch to RELAX-specific XML parsers. In our opinion, this is not very realistic.

Using DTD and RELAX together

Then, are we unable to use default values, entities, and notations? No, we can use these features if we use DTD and RELAX together.
The following is an XML document containing a DTD.

<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE person [ <!ATTLIST person bloodType CDATA "A"> ]> <person/>

This document is verified against a RELAX module as below:

In this example, the DTD specifies the default value "A". XML processors do use this default. We can verify this XML document against the RELAX module without any problems. Verification is done as if "A" was specified as the attribute value.
Similarly, entities and notations can be described in DTD. First, we show an example of parsed entities.

<?xml version="1.0"?> <!DOCTYPE doc [ <!ENTITY foo "This is a pen"> ]> <doc> <para>&foo;</para> </doc>

This document is legitimate against the RELAX module as below:

Next, we show an example of unparsed entities and notations.

<?xml version="1.0"?> <!DOCTYPE doc [ <!NOTATION eps PUBLIC "-//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Adobe Systems Encapsulated Postscript//EN"> <!ENTITY logo_eps SYSTEM "logo.eps" NDATA eps> <!ELEMENT doc EMPTY> <!ATTLIST doc logo ENTITY #IMPLIED> ]> <doc logo="logo_eps"/>

This document is legitimate against the following RELAX module.

Better leave them out

As we have seen in the previous section, we can use default values, entities, and notations by using DTD and RELAX together. Their use is not recommended, however.
Default values can be mimicked by application programs. We only have to hardcode "default values" in application programs and use them when attributes are absent. We can also write XSLT scripts so as to embed "default values" when attributes are absent.
Use links (especially, XLink) rather than external parsed entities or external unparsed entities. Links are much more appropriate for the WWW.
Internal parsed entities can be used without any problems, however. Some text data such as "<" can be best represented by internal parsed entities (e.g., <).
Unfortunately, default values, entities, and notations in DTD are not always processed as expected by casual users. This is because some XML processors do not fetch external DTD subsets or external parameter entities. However, all examples in this STEP use internal DTD subsets and thus are free from such unexpected results.

Summary

STEPs 1 thru 6 provide more than enough features for the migration from DTD to RELAX. As long as we use these features only, we can convert RELAX to DTD and vice versa without loss of information except for datatypes and facets. In the future, conversion between XML Schema should be possible. Enjoy and RELAX!

STEP 7: elementRule and hedgeRule, revisited

$Id: step7.sdoc 1.13 2000/08/26 08:37:11 murata Exp $
Until this step, we have covered those features of elementRule and hedgeRule which can be easily understood by DTD authors. Actually, RELAX has a much more generalized framework.

elementRule and labels

An elementRule element can have the label attribute. We first consider underlying requirements and then introduce this attribute.

Context-sensitive content models

We often would like to attach different content models to the same tag name, depending on the context. As an example, consider paragraphs in sections and those in tables. Permissible subordinates of these two types of paragraphs are slightly different; we might want to allow paragraphs in sections to contain footnotes, but might want to allow those in tables to contain text only.

<section> <para>This paragraph can contain footnotes <footnote>This is a footnote</footnote>.</para> </section>

<table> ... <para>This paragraph cannot contain a footnote <footnote>This is an illegal footnote</footnote>.</para> ... </table>

Thus, we would like to switch content models (shown below) depending on whether paragraphs occur in sections or tables.

<!ELEMENT para (#PCDATA|footnote)*>  <!ELEMENT para (#PCDATA)>

A good motivation for context-sensitive content models can be found in HTML. In HTML, an a element may not occur as a direct or indirect subordinate of another a element. The same situation is true of the form element, as well; a form element may not occur inside another form element.

<a href="foo"><span><a href="bar">dmy</a></span></a>

<form> ... <div> <form> ... </form> </div> ... </form>

In HTML, a elements can contain span elements. Since we would like to prohibit even indirect nesting, we have to allow those span elements outside of a elements to contain a elements and do not allow those in a elements to contain span elements. The same thing applies to form; we have to allow those div elements outside of form elements to contain form elements and do not allow those in form elements to contain form elements.

<!ELEMENT span (#PCDATA|a)*>  <!ELEMENT span (#PCDATA)>

However, the DTD formalism allows only one content model per tag name. Thus, we cannot use different content models for paragraphs in different contexts. span may have only one content model; we cannot switch content models depending on whether the span element appears in some a element. The same thing applies to div.
Historically, two approaches have been used to overcome this problem. One is to introduce different tag names for different contexts. The following example illustrates this approach. Paragraphs in sections have the tag name paraInSection, and those in tables have the tag name paraInTable.

<!ELEMENT paraInSection (#PCDATA|footnote)*> <!ELEMENT paraInTable (#PCDATA)>

<section> <paraInSection>This paragraph can contain footnotes <footnote>This is a footnote</footnote>.</paraInSection> </section>

<table> ... <paraInTable>This paragraph cannot contain a footnote.</paraInTable> ... </table>

This approach causes a flood of similar tag names: we have to duplicate tag name sets for common constructs such as paragraphs, footnotes, itemized lists, etc.
Instead of introducing more than one tag name for each construct, another approach creates a loose content model by merging different content models for different contexts. The following example illustrates this approach. Not only paragraphs in sections but those in tables are allowed to contain subordinate footnotes.

<!ELEMENT para (#PCDATA|footnote)*>

This approach causes loose validation. The following example validates against the above example.

<table> ... <para>This paragraph cannot contain a footnote <footnote>This is an illegal footnote</footnote>.</para> ... </table>

The label attribute of elementRule elements

For a single tag name to have different content models depending on contexts, RELAX introduces labels. A single tag name associated with different labels can have different hedge models.
An elementRule can have the label attribute. A form of elementRule is as below:

<elementRule role="name" label="label"> ...content model... </elementRule>

If the label attribute is omitted, the value of the role attribute is used. Thus, the following elementRules are equivalent.

<elementRule role="foo"> ...content model... </elementRule>

<elementRule role="foo" label="foo"> ...content model... </elementRule>

For paragraphs containing footnotes and paragraphs not containing footnotes, the following example uses different labels and thus different content models.

The first elementRule show that paragraphs of the paraWithFNotes label contain text and footnotes. The second elementRule show that paragraphs of the paraWithoutFNotes label contain text only.
In most cases, there is one to one correspondence between labels and tag names. In fact, in all examples until this STEP, a tag name has only one associated label. To address issues presented in the previous subsection, we have to associate more than one label with a single tag name.

The label attribute of ref elements

Next, we revisit the label attribute of ref elements. Values of this attribute are always labels. STEP 1 explained that values are element type names, but that explanation is a white lie. RELAX does not have element types. (To tell the truth, XML 1.0 does not define element types. Element type declarations are defined, but element types are never defined.)
Since paraWithFNotes and paraWithoutFNotes in the last example in the previous section are labels, they can be referenced by ref elements. Content models for sections reference to paraWithFNotes, while those for tables (to be precise, table cells) reference to paraWithoutFNotes.

Sharing labels

Multiple hedgeRule elements sharing the same label

More than one hedgeRule can specify the same label for the label attribute. In the following example, there are two hedgeRules for the blockElem label.

The following elementRule references to this blockElem.

On validation against RELAX grammars, hedgeRef are first expanded. We use this hedgeRef as an example to demonstrate such expansion.
Both of the hedgeRules describing the blockElem label have ref elements as hedge models. By grouping them with a choice element, we have the following.

The hedgeRef we intend to expand specifies * as the occurs attribute. We copy this attribute to the choice element.

Finally, we replace the hedgeRef with this choice element.

Let us summarize the procedure for expanding a hedgeRef element referencing to some label.

Locate all hedgeRules for this label.
Group hedge models of these hedgeRules with a choice element.
Copy the occurs attribute of the hedgeRef to this choice element.
Replace the hedgeRef with this choice element.

Since multiple hedgeRules are allowed to share a label, we are not forced to write a single hedgeRule. For example, if we would like to add numberedItemizedList as another sort of blockElem, we only have to add the following hedgeRule; we do not have to modify other hedgeRules.

Prohibition of label sharing by hedgeRule and elementRule

hedgeRuleand elementRule are prohibited from sharing a label. The following example is a syntax error.

Multiple elementRule elements sharing the same label

Multiple elementRules can specify the same label for the label attribute. Moreover, the attribute role of multiple elementRules may be identical.
In the following example, two elementRules specify section as the value of the role attribute. Since neither specify the label attribute, section is assumed as the value of this attribute.

In the case that multiple elementRules exist for a single label, at least one of them are required to hold.
Consider the following section element. The first elementRule and the second elementRule holds for this element. Thus, we can attach the section label.

The following section element contains a fig element, and thus only the second elementRule holds. Since one holding elementRule is sufficient, we can again attach the section label to this element.

Let us consider advantages of allowing more than one elementRule for a single label. Suppose that we already have a module and that we are going to modify this module so that more documents become legitimate.
In the traditional approach, we have to modify an existing elementRule. We cannot guarantee that what was legitimate is still legitimate after such modification.
In RELAX, we do not have to revise existing elementRules, but we only have to add more elementRules. In this approach, what was legitimate is guaranteed to be legitimate.
In the previous example, the initial plan was to allow only paras as contents of section. The first elementRule was written for this purpose. Later, the second elementRule was added so as to allow fig as contents of section. Since the first elementRule is still active, none of the then-legitimate documents has become illegitimate.

Summary

If you have struggled to create large DTDs, STEP 7 would probably look attractive. Weak points of DTD can be easily addressed in RELAX. Enjoy and RELAX!

STEP 8: tag and attPool, revisited

$Id: step8.sdoc 1.15 2000/08/26 14:03:04 murata Exp murata $
In STEP 2, tag was compared to an attribute-list declaration and attPool was compared to parameter entities describing attributes. Actually, RELAX has a much more generalized framework.

The role attribute of tag elements

On top of the name attribute, tag elements can have the role attribute. In this section, we first consider motivations for this extension, and then introduce this attribute.

Switching content models depending on attribute values

Often, we would like to attach different content models to the same tag name, depending on attribute values. For example, we might want to switch content models of val element, depending on the type attribute. If the attribute value is integer, the content model is a reference to the datatype integer. If it is string, the content model is a reference to the datatype string.

<val type="integer">10</val>  <val type="string">foo bar</val>  <val type="integer">foo bar</val>

Thus, we would like to switch content models (shown below) depending on whether the attribute value is integer or string.

<elementRule role="val" type="integer"/>  <elementRule role="val" type="string"/>

However, as long as we use features covered in STEPs 0 thru 7, we have to attach content models to tag names. Attribute values are not taken into consideration. Thus, no matter what the value of the type attribute is, the same elementRule is used.

Constraints represented by tag elements

On top of the name attribute, tag elements can have the role attribute. tag elements take the following form. While the name attribute specifies tag names, the role attribute specifies roles.

A tag element attaches a role to a collection of constraints on tag names and attributes. When a start tag (or empty-element tag) satisfies these constraints, this tag plays the specified role.
For example, consider a tag element as below:

This tag element specifies that the tag name be val and the type attribute have the value integer. If a start tag (or empty-element tag) satisfies this constraint, this tag plays the val-integer role.

In the following tag element, the constraint on the type attribute is that the attribute value be string and the role name is val-string.

The following start tag does not play the val-integer role, but plays the val-string role.

Attributes may occur even if they are not specified by tag elements. For example, the following start tag has an attribute unknown, which is not specified by the previous tag element. This start tag still plays the role val-string, but warning message will be issued.

How should we interpret those tag elements without the role attribute such as those in STEPs 1 thru STEP 7? When the role attribute is omitted, it is assumed to have the value of the name attribute. Thus, the following two tag elements are semantically identical.

The role attribute of elementRule elements

The role attribute of elementRule elements do not specify tag names, but rather specifies roles. Thus, we can switch hedge models for the same tag name, depending on attribute values.
If we use roles val-string and val-integer shown in the previous example, we can have two elementRules for start tags of the tag name val. An elementRule that references to the val-string role is concerned with start tags whose type attribute has the value string. An elementRule that references to the val-integer role is concerned with start tags whose type attribute has the value integer.

<tag name="val" role="val-integer"> <attribute name="type" type="NMTOKEN" required="true"> <enumeration value="integer"/> </attribute> </tag> <elementRule role="val-integer" label="val" type="integer"/>  <tag name="val" role="val-string"> <attribute name="type" type="NMTOKEN" required="true"> <enumeration value="string"/> </attribute> </tag> <elementRule role="val-string" label="val" type="string"/>

Note that two tag elements specify the tag name val and the type attribute. In RELAX, tag elements are not declarations, which may appear once and only once, but rather constraints, which may appear more than once.

Prohibition of references by ref elements

Roles referenced by ref elements may not be described by tag elements. If they are described, they must be described by attPool elements.
In the next example, a ref element references to the foo role, which is described by a tag element. This example is thus a syntax error.

The none datatype, revisited

STEP 3 introduced the none datatype. none is useful for switching content models depending on the presence or absence of an attribute.
For example, suppose that <div class="sec"> and <div> require different content models. A role for the former, say divSec, can be described as below:

How do we describe a role for <div>, say divWithoutClass? One might think that the following example would work.

However, this description allows divWithoutClass even for <div class="sec">. Although the "undeclared attribute" message is issued, this start tag is assumed to play both roles. ⁽¹⁾
To explicitly disallow the class attribute, we have to use the none datatype and write as below:

Since no character strings are permitted by the none datatype, any value specified for the class attribute will prevent the divWithoutClass role.

RELAX allows tags to play roles even if they have undeclared attributes. There are two reasons for this design. First, traditional XML processors continue validation even if they encounter undeclared attributes. Second, HTML allows undeclared attributes.

attPool elements

Unlike parameter entitles of DTDs, attPool elements are not expanded. tag elements and attPool elements are very similar and equally important in RELAX.

Constraints represented by attPool

We have observed that a tag element attaches a role to a collection of constraints on tag names and attributes. The only difference between attPool and tag is that attPool elements do not contain constraints on tag names. In other words, an attPool element attaches a role to a collection of constraints on attributes.
Consider the following attPool.

This attPool element specifies that the class attribute is specified, and its value is "informative" and attaches the info role to this constraint. There are no constraints on tag names. Because of this attPool, the following empty-element tag plays the info role.

Just like tag, attributes not specified by attPool may occur. For example, the following start tag plays the info role.

Prohibition of references by elementRule elements

Roles referenced by the role attribute of elementRule elements may not be described by attPool elements. If they are described, they must be described by tag elements.
The following elementRule describes the info role, which is described by an attPool element. Thus, this example is a syntax error.

Prohibition of role sharing by multiple tag or attPool elements

Multiple tag elements cannot share a single role.
In the following example, two tag elements share the bar role. Thus, this example is a syntax error.

In the next example, a role and tag name are both shared by two tag elements. This example is also a syntax error.

Even when the role attribute is omitted and the value of the name attribute is used, role sharing is prohibited. The two tag elements in the next example are identical to the two tag elements shown above. Thus, this example is also a syntax error.

In the following example, two attPool elements share the bar role. Thus, this example is a syntax error.

In this last example, a tag element and an attPool element share the bar role. Thus, this example is also a syntax error.

Summary

In STEPs 0 thru 7, we have assumed that a tag element declares a tag name and attributes. Actually, a tag element attaches a role to a collection of constraints on tag names and attributes. In examples in STEPs 1 thru 7, roles and tag names coincide, but they are not always identical. In most cases, there are one-to-one correspondences among labels, roles, and tag names. But this is not always the case.
The following table summarizes syntactical constructs that describe or reference to tag names, labels, or roles.

Syntactical constructs	tag names, labels, or roles
The `role` attribute of `elementRule`	references to roles described by `tag`
The `label` attribute of `elementRule`	description of labels
The `label` attribute of `hedgeRule`	description of labels
The `label` attribute of `ref`	reference to labels described by `elementRule`
The `label` attribute of `hedgeRef`	reference to labels described by `hedgeRule`
The `name` attribute of `tag`	description of tag names
The `role` attribute of `tag`	description of roles
The `role` attribute of `attPool`	description of roles
The `role` attribute of `ref`	reference to roles described by `attPool`

The following table summarizes whether tag names, labels, and roles occur in XML documents.

Types of names	In XML instances	In RELAX modules
tag names	occur	occur as part of clauses
roles	do not occur	occur in clauses (descriptions of and references to roles)
labels	do not occur	occur in production rules (descriptions of and references to labels)

In traditional DTDs, it has been impossible to switch content models depending on attribute values, but RELAX has made it possible. The only required extension is the role attribute. This demonstrates simplicity and descriptive power of RELAX. Enjoy and RELAX!

STEP 9: Hedge content model element

RELAX allows element elements as permissible hedge models. They are mere syntax sugar, and are expanded as ref, elementRule, and tag elements. In this section, we show motivation behind element elements and then present the mechanism.

Simulating programming languages and database languages

RELAX is an extension of DTDs, and is based on a grammatical data model. This model is very different from data models of programming languages and database systems. On the other hand, RELAX should be able to mimic declarations in programming languages and schemata in database languages.
In programming languages, we declare variables and attach datatypes to them. In the next example, variables x and y are declared and a datatype int is attached to them.

public class Point { int x; int y; }

When a variable x is declared in another class, it may have a different type. In the next example, a datatype float is attached to x of the class Foo.

public class Foo { float x; }

The element element

The element element is an element hedge model that specifies both a variable name and type name. An element element always has the name attribute and type attribute. Furthermore, it may have the occurs attribute.

Use of element elements allows tag and elementRule elements such as below:

A Point such that x=100 and y=200 can be represented by an XML document as below:

Expansion to ref, elementRule, and tag elements

The element element is merely syntax sugar. Each element element in a hedge model is replaced by a ref element, while an elementRule element and tag element are generated.
The elementRule in the previous subsection is duplicated below. Two element elements in this example have the type attribute. Let us consider how these element elements are expanded.

Each of the element elements is replaced by a ref element. Furthermore, an elementRule element and tag element are generated for each element element. As a hedge model, each elementRule has a reference to the datatype specified by the type attribute of the original element element.

When an element element has the occurs attribute, it is copied to the generated ref element. For example, suppose that the elements in the first elementRule specifies occurs="?" (see below).

The result of expansion is as below:

Expansion procedure

In this section, we summarize expansion of element elements.

Generating ref elements

A ref element is generated. As the value of its label attribute, we generate a label that does not conflict with any other label. If the element has the occurs attribute, it is copied to the generated ref element.

Generating elementRule elements

An elementRule element is generated. As the value of its role attribute, we generate a role that does not conflict with any other role. The value of the label attribute is the label generated together with the ref element. As the hedge model of this elementRule, the type attribute of the element element is copied.

Generating tag elements

A tag element is generated. Its role attribute specifies the role automatically generated together with the elementRule. The name attribute of the generated tag specifies the value of the name attribute of the original element element.

Summary

For users of programming languages and database languages, description by element elements probably look very natural and easy to understand. Enjoy and RELAX!

STEP 10: tag embedded in elementRule

$Id: step10.sdoc 1.8 2000/11/01 13:46:38 murata Exp $
In this section, we consider embedding of tag elements in elementRule elements.

Describing attributes and hedge models together

In STEPs 0 thru 9, attributes and tag names are separated from hedge models. Attributes and tag names are described by tag and attPool elements, while hedge models are described by elementRule and hedgeRule elements. An elementRule references to a tag via a role, and the tag may in turn reference to attPool elements.
When an elementRule and a tag is so closely related, it may be convenient to merge them into a single element rather than separating them.
As an example of elementRule-tag separation, we duplicate an example in STEP 8 below.

Suppose that roles val-integer and val-string are referenced from these two elementRule elements only. Rather than introducing two names val-integer and val-string for referencing, authors might want to directly embed tag elements within elementRule elements.

<elementRule label="val" type="integer"> <tag> <attribute name="type" type="NMTOKEN" required="true"> <enumeration value="integer"/> </attribute> </tag> </elementRule>  <elementRule label="val" type="string"> <tag> <attribute name="type" type="NMTOKEN" required="true"> <enumeration value="string"/> </attribute> </tag> </elementRule>

An advantage of this style is that roles do not need names. Before this rewrite, we needed names which are different from the tag names or labels. Omission of these names enhance readability.
Some people find it attractive to describe attributes and hedge models together. For example, points with the x-coordinate and y-coordinate can be represented in two alternative manners. The first example uses attributes, while the second uses elements. Their differences are minor and can be easily rewritten from each other.

An elementRule containing a tag may not have the role attribute. The label attribute is mandatory, instead.
An embedded tag may not have the role attribute. The name attribute is permitted, but it is not present in this example.

Handling of embedded tag elements

An embedded tag element is moved from the elementRule and placed as a sibling element. We show how the first example in this STEP is handled.

First, we generate a role that does not conflict with any other role. In this example, we generate role val$1.
Next, we move the embedded tag element from the elementRule and place as a sibling element. We then add the role attribute and specify the generated role as the attribute value.
Only when this tag element does not have the name attribute, we introduce this attribute. As the attribute value, we use the value of the label attribute of the elementRule element. In this example, we specify "val" as the value of the name attribute.
Finally, we add the role attribute to the elementRule and specify the role generated above.

Summary

To describe elements and attributes together, embedded tag elements provides concise and comprehensible description. Enjoy and RELAX!

HOW TO RELAX

Table of contents

Part 1: RELAX Core

STEP 0: Introduction

RELAX, brief overview

Comparison with DTD

The RELAX processor

The organization of RELAX

About this tutorial

Summary

STEP 1: Migration from XML DTD (without parameter entities)

An example module

The module element

The interface element

The export element

Element type declarations

Element hedge model

The empty element

The ref element

The choice element

The sequence element

The none element

Datatype reference

Mixed hedge model

Attribute-list declarations

Summary

STEP 2: Migration from XML DTD (with parameter entities)

Parameter entities used in content models

Overview

Permissible hedge models

The occurs attribute

Occurrence order of hedgeRef and hedgeRule

Illegal reference to itself

Use of empty

Use of none

Parameter entities used in attribute-list declarations

Overview

Occurrence order of ref and attPool

Multiple ref elements

Illegal reference to itself

Summary

STEP 3: Datatypes

Datatypes of XML Schema Part 2

Datatypes unique to RELAX

none

emptyString

Additional constraints

elementRule

attribute

Summary

STEP 4: Annotation

The annotation element

The documentation element

The appinfo element

The div element

Summary

STEP 5: Dividing large modules

Why divide modules?

The include element

Non-empty interface elements

Summary

STEP 6: Default values, entities, and notations

Reasons that RELAX does not handle them

Using DTD and RELAX together

Better leave them out

Summary

STEP 7: elementRule and hedgeRule, revisited

elementRule and labels

Context-sensitive content models

The label attribute of elementRule elements

The label attribute of ref elements

Sharing labels

Multiple hedgeRule elements sharing the same label

Prohibition of label sharing by hedgeRule and elementRule

Multiple elementRule elements sharing the same label

Summary

STEP 8: tag and attPool, revisited

The role attribute of tag elements

Switching content models depending on attribute values

Constraints represented by tag elements