ENGLISH | JAPANESE |
$Id: step7.sdoc 1.13 2000/08/26 08:37:11 murata Exp $
Until this step, we have covered those features of elementRule
and hedgeRule
which can be easily understood by DTD authors. Actually, RELAX has a much more generalized framework.
elementRule
and labelsAn elementRule
element can have the label
attribute. We first consider underlying requirements and then introduce this attribute.
We often would like to attach different content models to the same tag name, depending on the context. As an example, consider paragraphs in sections and those in tables. Permissible subordinates of these two types of paragraphs are slightly different; we might want to allow paragraphs in sections to contain footnotes, but might want to allow those in tables to contain text only.
<!-- This example is legal. --> <section> <para>This paragraph can contain footnotes <footnote>This is a footnote</footnote>.</para> </section>
<!-- This example is illegal. --> <table> ... <para>This paragraph cannot contain a footnote <footnote>This is an illegal footnote</footnote>.</para> ... </table>
Thus, we would like to switch content models (shown below) depending on whether paragraphs occur in sections or tables.
<!-- Case 1: subordinate to <section> elements. --> <!ELEMENT para (#PCDATA|footnote)*> <!-- Case 2: subordinate to <table> elements. --> <!ELEMENT para (#PCDATA)>
A good motivation for context-sensitive content models can be found in HTML. In HTML, an a
element may not occur as a direct or indirect subordinate of another a
element. The same situation is true of the form
element, as well; a form
element may not occur inside another form
element.
<!-- This example is illegal. --> <a href="foo"><span><a href="bar">dmy</a></span></a>
<!-- This example is also illegal. --> <form> ... <div> <form> ... </form> </div> ... </form>
In HTML, a
elements can contain span
elements. Since we would like to prohibit even indirect nesting, we have to allow those span
elements outside of a
elements to contain a
elements and do not allow those in a
elements to contain span
elements. The same thing applies to form
; we have to allow those div
elements outside of form
elements to contain form
elements and do not allow those in form
elements to contain form
elements.
<!-- Case 1: subordinate to <a> elements. --> <!ELEMENT span (#PCDATA|a)*> <!-- Case 2: not subordinate to <a> elements. --> <!ELEMENT span (#PCDATA)>
However, the DTD formalism allows only one content model per tag name. Thus, we cannot use different content models for paragraphs in different contexts. span
may have only one content model; we cannot switch content models depending on whether the span
element appears in some a
element. The same thing applies to div
.
Historically, two approaches have been used to overcome this problem. One is to introduce different tag names for different contexts. The following example illustrates this approach. Paragraphs in sections have the tag name paraInSection
, and those in tables have the tag name paraInTable
.
<!ELEMENT paraInSection (#PCDATA|footnote)*> <!ELEMENT paraInTable (#PCDATA)>
<!-- This example is legal. --> <section> <paraInSection>This paragraph can contain footnotes <footnote>This is a footnote</footnote>.</paraInSection> </section>
<table> ... <paraInTable>This paragraph cannot contain a footnote.</paraInTable> ... </table>
This approach causes a flood of similar tag names: we have to duplicate tag name sets for common constructs such as paragraphs, footnotes, itemized lists, etc.
Instead of introducing more than one tag name for each construct, another approach creates a loose content model by merging different content models for different contexts. The following example illustrates this approach. Not only paragraphs in sections but those in tables are allowed to contain subordinate footnotes.
<!ELEMENT para (#PCDATA|footnote)*>
This approach causes loose validation. The following example validates against the above example.
<!-- This example is illegal. --> <table> ... <para>This paragraph cannot contain a footnote <footnote>This is an illegal footnote</footnote>.</para> ... </table>
label
attribute of
elementRule
elementsFor a single tag name to have different content models depending on contexts, RELAX introduces labels. A single tag name associated with different labels can have different hedge models.
An elementRule
can have the label
attribute. A form of elementRule
is as below:
<elementRule role="name" label="label"> ...content model... </elementRule>
If the label
attribute is omitted, the value of the role
attribute is used. Thus, the following elementRule
s are equivalent.
<elementRule role="foo"> ...content model... </elementRule>
<elementRule role="foo" label="foo"> ...content model... </elementRule>
For paragraphs containing footnotes and paragraphs not containing footnotes, the following example uses different labels and thus different content models.
<elementRule role="para" label="paraWithFNotes"> <mixed> <ref label="footnote" occurs="*"/> </mixed> </elementRule> <elementRule role="para" label="paraWithoutFNotes"> <mixed> <empty/> <mixed/> </elementRule> <tag name="para"/>
The first elementRule
show that paragraphs of the paraWithFNotes
label contain text and footnotes. The second elementRule
show that paragraphs of the paraWithoutFNotes
label contain text only.
In most cases, there is one to one correspondence between labels and tag names. In fact, in all examples until this STEP, a tag name has only one associated label. To address issues presented in the previous subsection, we have to associate more than one label with a single tag name.
label
attribute of ref
elementsNext, we revisit the label
attribute of ref
elements. Values of this attribute are always labels. STEP 1 explained that values are element type names, but that explanation is a white lie. RELAX does not have element types. (To tell the truth, XML 1.0 does not define element types. Element type declarations are defined, but element types are never defined.)
Since paraWithFNotes
and paraWithoutFNotes
in the last example in the previous section are labels, they can be referenced by ref
elements. Content models for sections reference to paraWithFNotes
, while those for tables (to be precise, table cells) reference to paraWithoutFNotes
.
<elementRule role="section"> <ref label="paraWithFNotes" occurs="*"/> </elementRule> <elementRule role="cell"> <ref label="paraWithoutFNotes" occurs="*"/> </elementRule>
hedgeRule
elements sharing
the same labelMore than one hedgeRule
can specify the same label for the label
attribute. In the following example, there are two hedgeRule
s for the blockElem
label.
<hedgeRule label="blockElem"> <ref label="para"/> </hedgeRule> <hedgeRule label="blockElem"> <ref label="itemizedList"/> </hedgeRule>
The following elementRule
references to this blockElem
.
<elementRule role="doc"> <sequence> <ref label="title"/> <hedgeRef label="blockElem" occurs="*"/> </sequence> </elementRule>
On validation against RELAX grammars, hedgeRef
are first expanded. We use this hedgeRef
as an example to demonstrate such expansion.
Both of the hedgeRule
s describing the blockElem
label have ref
elements as hedge models. By grouping them with a choice
element, we have the following.
<choice> <ref label="para"/> <ref label="itemizedList"/> </choice>
The hedgeRef
we intend to expand specifies *
as the occurs
attribute. We copy this attribute to the choice
element.
<choice occurs="*"> <ref label="para"/> <ref label="itemizedList"/> </choice>
Finally, we replace the hedgeRef
with this choice
element.
<elementRule role="doc"> <sequence> <ref label="title"/> <choice occurs="*"> <ref label="para"/> <ref label="itemizedList"/> </choice> </sequence> </elementRule>
Let us summarize the procedure for expanding a hedgeRef
element referencing to some label.
hedgeRule
s for this label.
hedgeRule
s with a choice
element.
occurs
attribute of the hedgeRef
to this choice
element.
hedgeRef
with this choice
element.
Since multiple hedgeRule
s are allowed to share a label, we are not forced to write a single hedgeRule
. For example, if we would like to add numberedItemizedList
as another sort of blockElem
, we only have to add the following hedgeRule
; we do not have to modify other hedgeRule
s.
<hedgeRule label="blockElem"> <ref label="numberedItemizedList"/> </hedgeRule>
hedgeRule
and elementRule
hedgeRule
and elementRule
are prohibited from sharing a label. The following example is a syntax error.
<hedgeRule label="foo"> <ref label="bar"/> </hedgeRule> <elementRule role="foo" label="foo"> <empty/> </elementRule>
elementRule
elements sharing the same labelMultiple elementRule
s can specify the same label for the label
attribute. Moreover, the attribute role
of multiple elementRule
s may be identical.
In the following example, two elementRule
s specify section
as the value of the role
attribute. Since neither specify the label
attribute, section
is assumed as the value of this attribute.
<tag name="section"/> <elementRule role="section"> <ref label="para" occurs="*"/> </elementRule> <elementRule role="section"> <choice occurs="*"> <ref label="para"/> <ref label="fig"/> </choice> </elementRule>
In the case that multiple elementRule
s exist for a single label, at least one of them are required to hold.
Consider the following section
element. The first elementRule
and the second elementRule
holds for this element. Thus, we can attach the section
label.
<section><para/></section>
The following section
element contains a fig
element, and thus only the second elementRule
holds. Since one holding elementRule
is sufficient, we can again attach the section
label to this element.
<section><para/><fig/><para/></section>
Let us consider advantages of allowing more than one elementRule
for a single label. Suppose that we already have a module and that we are going to modify this module so that more documents become legitimate.
In the traditional approach, we have to modify an existing elementRule
. We cannot guarantee that what was legitimate is still legitimate after such modification.
In RELAX, we do not have to revise existing elementRule
s, but we only have to add more elementRule
s. In this approach, what was legitimate is guaranteed to be legitimate.
In the previous example, the initial plan was to allow only para
s as contents of section
. The first elementRule
was written for this purpose. Later, the second elementRule
was added so as to allow fig
as contents of section
. Since the first elementRule
is still active, none of the then-legitimate documents has become illegitimate.
If you have struggled to create large DTDs, STEP 7 would probably look attractive. Weak points of DTD can be easily addressed in RELAX. Enjoy and RELAX!