Powered by SmartDoc
ENGLISHJAPANESE

STEP 7: elementRule and hedgeRule, revisited

$Id: step7.sdoc 1.13 2000/08/26 08:37:11 murata Exp $

Until this step, we have covered those features of elementRule and hedgeRule which can be easily understood by DTD authors. Actually, RELAX has a much more generalized framework.

elementRule and labels

An elementRule element can have the label attribute. We first consider underlying requirements and then introduce this attribute.

Context-sensitive content models

We often would like to attach different content models to the same tag name, depending on the context. As an example, consider paragraphs in sections and those in tables. Permissible subordinates of these two types of paragraphs are slightly different; we might want to allow paragraphs in sections to contain footnotes, but might want to allow those in tables to contain text only.

<!-- This example is legal. -->
<section>
  <para>This paragraph can contain footnotes <footnote>This 
        is a footnote</footnote>.</para>
</section>
<!-- This example is illegal. -->
<table>
  ...
  <para>This paragraph cannot contain a footnote <footnote>This 
        is an illegal footnote</footnote>.</para>
  ...
</table>

Thus, we would like to switch content models (shown below) depending on whether paragraphs occur in sections or tables.

<!-- Case 1: subordinate to <section> elements. -->
<!ELEMENT para (#PCDATA|footnote)*>

<!-- Case 2: subordinate to <table> elements. -->
<!ELEMENT para (#PCDATA)>

A good motivation for context-sensitive content models can be found in HTML. In HTML, an a element may not occur as a direct or indirect subordinate of another a element. The same situation is true of the form element, as well; a form element may not occur inside another form element.

<!-- This example is illegal. -->
<a href="foo"><span><a href="bar">dmy</a></span></a>
<!-- This example is also illegal. -->
<form>
  ...
  <div>
    <form>
      ...
    </form>
  </div>
  ...
</form>

In HTML, a elements can contain span elements. Since we would like to prohibit even indirect nesting, we have to allow those span elements outside of a elements to contain a elements and do not allow those in a elements to contain span elements. The same thing applies to form; we have to allow those div elements outside of form elements to contain form elements and do not allow those in form elements to contain form elements.

<!-- Case 1: subordinate to <a> elements. -->
<!ELEMENT span (#PCDATA|a)*>

<!-- Case 2: not subordinate to <a> elements. -->
<!ELEMENT span (#PCDATA)>

However, the DTD formalism allows only one content model per tag name. Thus, we cannot use different content models for paragraphs in different contexts. span may have only one content model; we cannot switch content models depending on whether the span element appears in some a element. The same thing applies to div.

Historically, two approaches have been used to overcome this problem. One is to introduce different tag names for different contexts. The following example illustrates this approach. Paragraphs in sections have the tag name paraInSection, and those in tables have the tag name paraInTable.

<!ELEMENT paraInSection (#PCDATA|footnote)*>

<!ELEMENT paraInTable (#PCDATA)>
<!-- This example is legal. -->
<section>
  <paraInSection>This paragraph can contain footnotes <footnote>This 
        is a footnote</footnote>.</paraInSection>
</section>
<table>
  ...
  <paraInTable>This paragraph cannot contain a footnote.</paraInTable>
  ...
</table>

This approach causes a flood of similar tag names: we have to duplicate tag name sets for common constructs such as paragraphs, footnotes, itemized lists, etc.

Instead of introducing more than one tag name for each construct, another approach creates a loose content model by merging different content models for different contexts. The following example illustrates this approach. Not only paragraphs in sections but those in tables are allowed to contain subordinate footnotes.

<!ELEMENT para (#PCDATA|footnote)*>

This approach causes loose validation. The following example validates against the above example.

<!-- This example is illegal. -->
<table>
  ...
  <para>This paragraph cannot contain a footnote <footnote>This 
        is an illegal footnote</footnote>.</para>
  ...
</table>

The label attribute of elementRule elements

For a single tag name to have different content models depending on contexts, RELAX introduces labels. A single tag name associated with different labels can have different hedge models.

An elementRule can have the label attribute. A form of elementRule is as below:

<elementRule role="name" label="label">
  ...content model...
</elementRule>

If the label attribute is omitted, the value of the role attribute is used. Thus, the following elementRules are equivalent.

<elementRule role="foo">
  ...content model...
</elementRule>
<elementRule role="foo" label="foo">
  ...content model...
</elementRule>

For paragraphs containing footnotes and paragraphs not containing footnotes, the following example uses different labels and thus different content models.

<elementRule role="para" label="paraWithFNotes">
  <mixed>
    <ref label="footnote" occurs="*"/>
  </mixed>
</elementRule>

<elementRule role="para" label="paraWithoutFNotes">
  <mixed>
    <empty/>
  <mixed/>
</elementRule>

<tag name="para"/>

The first elementRule show that paragraphs of the paraWithFNotes label contain text and footnotes. The second elementRule show that paragraphs of the paraWithoutFNotes label contain text only.

In most cases, there is one to one correspondence between labels and tag names. In fact, in all examples until this STEP, a tag name has only one associated label. To address issues presented in the previous subsection, we have to associate more than one label with a single tag name.

The label attribute of ref elements

Next, we revisit the label attribute of ref elements. Values of this attribute are always labels. STEP 1 explained that values are element type names, but that explanation is a white lie. RELAX does not have element types. (To tell the truth, XML 1.0 does not define element types. Element type declarations are defined, but element types are never defined.)

Since paraWithFNotes and paraWithoutFNotes in the last example in the previous section are labels, they can be referenced by ref elements. Content models for sections reference to paraWithFNotes, while those for tables (to be precise, table cells) reference to paraWithoutFNotes.

<elementRule role="section">
  <ref label="paraWithFNotes" occurs="*"/>
</elementRule>

<elementRule role="cell">
  <ref label="paraWithoutFNotes" occurs="*"/>
</elementRule>

Sharing labels

Multiple hedgeRule elements sharing the same label

More than one hedgeRule can specify the same label for the label attribute. In the following example, there are two hedgeRules for the blockElem label.

<hedgeRule label="blockElem">
  <ref label="para"/>
</hedgeRule>

<hedgeRule label="blockElem">
  <ref label="itemizedList"/>
</hedgeRule>

The following elementRule references to this blockElem.

<elementRule role="doc">
  <sequence>
    <ref label="title"/>
    <hedgeRef label="blockElem" occurs="*"/>
  </sequence>
</elementRule>

On validation against RELAX grammars, hedgeRef are first expanded. We use this hedgeRef as an example to demonstrate such expansion.

Both of the hedgeRules describing the blockElem label have ref elements as hedge models. By grouping them with a choice element, we have the following.

<choice>
  <ref label="para"/>
  <ref label="itemizedList"/>
</choice>

The hedgeRef we intend to expand specifies * as the occurs attribute. We copy this attribute to the choice element.

<choice occurs="*">
  <ref label="para"/>
  <ref label="itemizedList"/>
</choice>

Finally, we replace the hedgeRef with this choice element.

<elementRule role="doc">
  <sequence>
    <ref label="title"/>
    <choice occurs="*">
      <ref label="para"/>
      <ref label="itemizedList"/>
    </choice>
  </sequence>
</elementRule>

Let us summarize the procedure for expanding a hedgeRef element referencing to some label.

  1. Locate all hedgeRules for this label.
  2. Group hedge models of these hedgeRules with a choice element.
  3. Copy the occurs attribute of the hedgeRef to this choice element.
  4. Replace the hedgeRef with this choice element.

Since multiple hedgeRules are allowed to share a label, we are not forced to write a single hedgeRule. For example, if we would like to add numberedItemizedList as another sort of blockElem, we only have to add the following hedgeRule; we do not have to modify other hedgeRules.

<hedgeRule label="blockElem">
  <ref label="numberedItemizedList"/>
</hedgeRule>

Prohibition of label sharing by hedgeRule and elementRule

hedgeRuleand elementRule are prohibited from sharing a label. The following example is a syntax error.

<hedgeRule label="foo">
  <ref label="bar"/>
</hedgeRule>

<elementRule role="foo" label="foo">
  <empty/>
</elementRule>

Multiple elementRule elements sharing the same label

Multiple elementRules can specify the same label for the label attribute. Moreover, the attribute role of multiple elementRules may be identical.

In the following example, two elementRules specify section as the value of the role attribute. Since neither specify the label attribute, section is assumed as the value of this attribute.

<tag name="section"/>

<elementRule role="section">
  <ref label="para" occurs="*"/>
</elementRule>

<elementRule role="section">
  <choice occurs="*">
    <ref label="para"/>
    <ref label="fig"/>
  </choice>
</elementRule>

In the case that multiple elementRules exist for a single label, at least one of them are required to hold.

Consider the following section element. The first elementRule and the second elementRule holds for this element. Thus, we can attach the section label.

<section><para/></section>

The following section element contains a fig element, and thus only the second elementRule holds. Since one holding elementRule is sufficient, we can again attach the section label to this element.

<section><para/><fig/><para/></section>

Let us consider advantages of allowing more than one elementRule for a single label. Suppose that we already have a module and that we are going to modify this module so that more documents become legitimate.

In the traditional approach, we have to modify an existing elementRule. We cannot guarantee that what was legitimate is still legitimate after such modification.

In RELAX, we do not have to revise existing elementRules, but we only have to add more elementRules. In this approach, what was legitimate is guaranteed to be legitimate.

In the previous example, the initial plan was to allow only paras as contents of section. The first elementRule was written for this purpose. Later, the second elementRule was added so as to allow fig as contents of section. Since the first elementRule is still active, none of the then-legitimate documents has become illegitimate.

Summary

If you have struggled to create large DTDs, STEP 7 would probably look attractive. Weak points of DTD can be easily addressed in RELAX. Enjoy and RELAX!