elementRule and hedgeRule, revisited
tag and attPool, revisited
element
tag embedded in
elementRule
number attribute of title elements should be integers, but DTDs cannot represent this constraint.
<!ELEMENT doc (title, para*)> <!ELEMENT para (#PCDATA | em)*> <!ELEMENT title (#PCDATA | em)*> <!ELEMENT em (#PCDATA)> <!ATTLIST para class NMTOKEN #IMPLIED > <!ATTLIST title class NMTOKEN #IMPLIED number CDATA #REQUIRED > |
Next, we show a RELAX module. The number attribute is specified as an integer.
<module
moduleVersion="1.2"
relaxCoreVersion="1.0"
targetNamespace=""
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
<interface>
<export label="doc"/>
</interface>
<elementRule role="doc">
<sequence>
<ref label="title"/>
<ref label="para" occurs="*"/>
</sequence>
</elementRule>
<elementRule role="para">
<mixed>
<ref label="em" occurs="*"/>
</mixed>
</elementRule>
<elementRule role="title">
<mixed>
<ref label="em" occurs="*"/>
</mixed>
</elementRule>
<elementRule role="em" type="string"/>
<tag name="doc"/>
<tag name="para">
<attribute name="class" type="NMTOKEN"/>
</tag>
<tag name="title">
<attribute name="class" type="NMTOKEN"/>
<attribute name="number" required="true" type="integer"/>
</tag>
<tag name="em"/>
</module>
|
Subsequent sections explain syntactical constructs appeared in this example.
module element.
<module
moduleVersion="1.2"
relaxCoreVersion="1.0"
targetNamespace=""
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
...
</module>
|
The moduleVersion attribute shows the version of this module. In this example, it is "1.2".
The relaxCoreVersion attribute shows the version of RELAX Core. At present, it is always "1.0".
The targetNamespace attribute shows the namespace which this module is concerned with. In this example, it is "".
The namespace name for RELAX Core is "http://www.xml.gr.jp/xmlns/relaxCore".
module element begins with an interface element. A module has one interface element.
<module
moduleVersion="1.2"
relaxCoreVersion="1.0"
targetNamespace=""
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
<interface>
...
</interface>
...
</module>
|
interface element contains export element(s).
<export label="foo"/> |
The label attribute of export elements specifies an element type that may become the root. More than one export may appear in an interface element.
The following example allows element type foo and bar as the root.
<interface> <export label="foo"/> <export label="bar"/> </interface> |
elementRule elements. The role attribute of elementRule specifies an element type name. More than one elementRule may follow the interface element.
<elementRule role="element-type-name"> ...hedge model... </elementRule> |
An elementRule element has a hedge
model. A hedge is a sequence of elements (and their descendants) as well as character data. A hedge model is a constraint on permissible hedges.
A hedge model is either an element hedge model, datatype reference, or mixed hedge model.
empty,
none, ref, choice, sequence elements and the occurs attribute. An element hedge model represents permissible sequences of child elements, which are possibly intervened by whitespace characters.emptyrepresents the empty sequence.elementRule as below:
<elementRule role="foo"> <empty/> </elementRule> |
This elementRule implies that the content of a foo element is the empty sequence. A foo element can be a start tag followed by an end tag, or an empty-element tag.
<foo/> |
<foo></foo> |
Unlike EMPTY of XML, whitespace characters may intervene between start and end tags.
<foo> </foo> |
emptycan be used within choice and sequence. The motivation behind this extension will become clear in STEP 2. If you need exactly the same feature as EMPTY of XML, use the emptyString datatype (shown in STEP
3).
From now on, we assume that foo, foo1, and foo2 are declared by elementRules whose hedge models are empty.
refreferences to an element type. For example, <ref label="foo"/> references to an element type foo.elementRule as below:
<elementRule role="bar"> <ref label="foo"/> </elementRule> |
This elementRule implies that the content of a bar element is a foo element. For example, the next bar element is legitimate against this elementRule.
<bar><foo/></bar> |
Whitespace may appear before and after the foo element.
<bar> <foo/> </bar> |
refcan have the occurs attribute. Permissible values are "*", "+", and "?" , which indicate "zero or more", "one or more", and "zero or one time", respectively.
An example of "?" as the occurs attribute is as below:
<elementRule role="bar"> <ref label="foo" occurs="?"/> </elementRule> |
This elementRule implies that the content of a bar element is either a foo or empty.
<bar><foo/></bar> |
<bar></bar> |
Whitespace characters may appear before and after the foo element. Even when this bar is empty, it may have whitespace characters.
<bar> <foo/> </bar> |
<bar> </bar> |
choiceindicates a choice of the specified hedge models ("|" of XML 1.0). Subordinate elements of choice elements are element hedge models. choice can also have the occurs attribute.elementRule containing choice is shown below:
<elementRule role="bar">
<choice occurs="+">
<ref label="foo1"/>
<ref label="foo2"/>
</choice>
</elementRule>
|
This elementRule indicates that the content of a bar element is one or more occurrences of either foo1 or foo2 elements.
<bar><foo2/></bar> |
<bar> <foo2/> </bar> |
<bar> <foo1/> <foo2/> <foo1/> </bar> |
sequenceis a sequence of the specified hedge models. ("," of XML 1.0). Subordinate elements of sequence are element hedge models. sequence can also have the occurs attribute.elementRule containing sequence is shown below:
<elementRule role="bar">
<sequence occurs="?">
<ref label="foo1"/>
<ref label="foo2"/>
</sequence>
</elementRule>
|
This elementRule implies that the content of a bar element is either a sequence of a foo1 element and a foo2 element, or empty.
<bar><foo1/><foo2/></bar> |
<bar> <foo1/> <foo2/></bar> |
<bar/> |
<bar></bar> |
<bar> </bar> |
noneis an element hedge model, which does not match anything. none is unique to RELAX.
<elementRule role="bar"> <none/> </elementRule> |
This elementRule implies that nothing is permitted as the content of bar elements. The motivation behind none will become clear in STEP 2.
type attribute of elementRule allows a content model that references to a datatype. Character strings in a document are compared with the specified datatype. Permissible datatypes are built-in datatypes of XML Schema Part 2, or datatypes unique to RELAX. Details of datatypes will be covered by STEP 3.elementRule containing type is shown below:
<elementRule role="bar" type="integer"/> |
This elementRule indicates that the content of a bar element is a character string representing an integer.
<bar>10</bar> |
Whitespace characters may not occur before or after the integer. For example, the following is not permitted.
<bar> 10 </bar> |
mixedsignificantly extends mixed content models (#PCDATA|a|b|...|z)* of XML.mixed element wraps an element hedge model. Recall that an element hedge model allows whitespace characters to intervene between elements. By wrapping it with mixed, any character is allowed to intervene.(#PCDATA | foo1| foo2)*of XML can be captured as below:
<elementRule role="bar">
<mixed>
<choice occurs="*">
<ref label="foo1"/>
<ref label="foo2"/>
</choice>
</mixed>
</elementRule>
|
The choice element in this mixed element matches zero or more occurrences of foo1 or foo2 elements. The mixed allows any character to intervene between these elements. Thus, this hedge model is equivalent to a (#PCDATA | foo1| foo2)* mixed content model of XML 1.0.
There are two ways to capture a (#PCDATA) content model. One is to reference to the datatype string by the type attribute. The other is to make an element hedge model that matches the empty sequence only and wrap it with mixed.
<elementRule role="bar" type="string"/> |
<elementRule role="bar">
<mixed>
<empty/>
</mixed>
</elementRule>
|
As a more advanced example, consider elementRule as below:
<elementRule role="bar">
<mixed>
<sequence>
<ref label="foo1"/>
<ref label="foo2"/>
</sequence>
</mixed>
</elementRule>
|
A sequence of <foo/> and <foo2/> matches ref in the mixed element. Thus, the following example is permitted by this elementRule.
<bar>Murata<foo1/>Makoto<foo2/>IUJ</bar> |
As shown in the following example, CDATA sections and character references may appear.
<bar><![CDATA[Murata]]><foo1/>Makoto<foo2/>IUJ</bar> |
(<!ATTLIST ...>) of XML are captured by tag elements.
<tag name="element-type-name"> ...list of attribute declarations... </tag> |
tagcan have attribute elements as subordinates.
<tag name="element-type-name"> <attribute ... /> <attribute ... /> </tag> |
attributedeclares an attribute. An example of attribute is shown below:
<attribute name="age" required="true" type="integer"/> |
The value of the name attribute is the name of the declared attribute. In this example, it is age.
If the value of the required attribute is true, the attribute being declared is mandatory. If required is not specified, it is optional. Since required is specified in this example, the age attribute is mandatory.
The type attribute specifies a datatype name. If type is not specified, a datatype string (which allows any string) is assumed.
Consider an example of tag which contains this attribute element only.
<tag name="bar"> <attribute name="age" required="true" type="integer"/> </tag> |
The following start tag is permitted by this tag.
<bar age="39"> |
The following two start tags are not permitted. In the first example, the age attribute is omitted. In the second example, the value of age is not an integer.
<bar> |
<bar age="bu huo">
<!-- "bu huo" means forty years in Chinese. In Japan,
it is pronounced as "FUWAKU". -->
|
In DTD, you do not have to write an attribute-list declaration if an element type does not have any attributes. In RELAX, you must write an empty tag element even if there are no attributes. For example, if an element type bar does not have any attributes, you have to write a tag element as below:
<tag name="bar"/> |
hedgeRuleallows you to write a hedge model once, name it, and reference to it repeatedly. In other words, hedgeRule mimics parameter entities referenced from content models in DTD.hedgeRule is shown below. foo is a name assigned to the hedge model of this hedgeRule.
<hedgeRule label="foo"> ...element content model... </hedgeRule> |
To reference to such a hedgeRule, we write <hedgeRef label="foo"/>. This hedgeRef is replaced with the element hedge model specified in the hedgeRule.
In the following example, the hedge model of the elementRule for the element type doc references to a hedgeRule. This elementRule is borrowed from the module in the beginning of STEP 1, and the hedge model minus title is rewritten by a hedgeRule.
<hedgeRule label="doc.body">
<ref label="para" occurs="*"/>
</hedgeRule>
<elementRule role="doc">
<sequence>
<ref label="title"/>
<hedgeRef label="doc.body"/>
</sequence>
</elementRule>
|
The reference to doc.body is expanded as below:
<elementRule role="doc">
<sequence>
<ref label="title"/>
<ref label="para" occurs="*"/>
</sequence>
</elementRule>
|
In this example, a hedgeRule is referenced from an elementRule. But a hedgeRule may reference to another hedgeRule.
hedgeRulecan have element hedge models only. Datatype references or mixed hedge models are not permitted. For example, the following rules are not permitted.
<hedgeRule label="mixed.param">
<mixed>
<choice occurs="*">
<ref label="em"/>
<ref label="strong"/>
<choice>
</mixed>
</hedgeRule>
<hedgeRule label="string.param" type="string"/>
|
If you want to use hedgeRef in conjunction with a mixed hedge model, you have to surround the hedgeRef with mixed in an elementRule element, rather than using the mixed element inside a hedgeRule element. An example is shown below. The mixed hedge model references to phrase, and phrase is described by a hedgeRule.
<hedgeRule label="phrase">
<choice>
<ref label="em"/>
<ref label="strong"/>
<choice>
</hedgeRule>
<elementRule role="p">
<mixed>
<hedgeRef label="phrase" occurs="*"/>
</mixed>
</elementRule>
|
hedgeRefthat references to a parameter entity can have occurs, and an element hedge model specified in hedgeRule can also have occurs. In the following example, both have occurs.
<hedgeRule label="bar">
<sequence occurs="+" >
<ref label="foo1"/>
<ref label="foo2"/>
</sequence>
</hedgeRule>
<elementRule role="foo">
<hedgeRef label="bar" occurs="*"/>
</elementRule>
|
If this example is recaptured in DTD, expansion of the parameter entity bar is obvious.
<!ENTITY % bar "(foo1, foo2)+"> <!-- original --> <!ELEMENT foo (%bar;)*> <!-- expanded --> <!ELEMENT foo ((foo1, foo2)+)*> |
The following shows expansion of the above example. Observe that a choice element containing a single child is introduced during expansion. This choice element inherits occurs="*" from the ref.
<elementRule role="foo">
<choice occurs="*">
<sequence occurs="+" >
<ref label="foo1"/>
<ref label="foo2"/>
</sequence>
</choice>
</elementRule>
|
hedgeRule does not have to precede ref that reference to it. For example, the following is not an error.
<elementRule role="doc">
<sequence>
<ref label="title"/>
<hedgeRef label="doc.body"/>
</sequence>
</elementRule>
<hedgeRule label="doc.body">
<ref label="para" occurs="*"/>
</hedgeRule>
|
hedgeRulemay not reference to itself directly or indirectly. The follow is an error since the hedge model for bar references to bar itself.
<hedgeRule label="bar">
<choice>
<ref label="title"/>
<hedgeRef label="bar" occurs="*"/>
</choice>
</hedgeRule>
|
In the following example, the hedge model for bar1 references to bar2 and the hedge model for bar2 references to bar1. Thus, there is an error.
<hedgeRule label="bar1">
<hedgeRef label="bar2" occurs="*"/>
</hedgeRule>
<hedgeRule label="bar2">
<choice>
<ref label="title"/>
<hedgeRef label="bar1"/>
</choice>
</hedgeRule>
|
empty, shown in STEP 1, is typically used in hedgeRule. An example is as below:
<hedgeRule label="frontMatter">
<empty/>
</hedgeRule>
<elementRule role="section">
<sequence>
<ref label="title"/>
<hedgeRef label="frontMatter"/>
<ref label="para" occurs="*"/>
</sequence>
</elementRule>
|
Users of this module can change the structure of section by customizing the description of frontMatter.
none, shown in STEP 1, is also used in hedgeRule. An example is as below:
<hedgeRule label="local-block-class">
<none/>
</hedgeRule>
<hedgeRule label="block-class">
<choice>
<ref label="para"/>
<ref label="fig"/>
<hedgeRef label="local-black-class"/>
</choice>
</hedgeRule>
|
Users of this module can change the structure of block-class by customizing the description of local-block-class.
attPoolallows you to declare attributes once and reference to the declarations repeatedly. In other words, attPool mimics parameter entities referenced from attribute-list declarations.attPool is shown below. foo is a name of a parameter entity.
<attPool role="foo"> ...attribute definitions... </attPool> |
To reference to such an attPool, we write <ref role="foo"/> before attribute declarations. This ref is replaced with attribute declarations specified in the attPool.
In the following example, a tag for the element type title references to attPool. This tag is borrowed from the module in the beginning of STEP 1 and rewritten. The role attribute, which is common to many element types, is described by attPool named common.att.
<attPool role="common.att"> <attribute name="class" type="NMTOKEN"/> </attPool> <tag name="title"> <ref role="common.att"/> <attribute name="number" required="true" type="integer"/> </tag> |
This ref is expanded as below:
<tag name="title"> <attribute name="class" type="NMTOKEN"/> <attribute name="number" required="true" type="integer"/> </tag> |
In this example, attPool is referenced from tag, but it can also be referenced from attPool.
attPool does not have to precede ref that reference to it. For example, the following is not an error.
<tag name="title"> <ref role="common.att"/> <attribute name="number" required="true" type="integer"/> </tag> <attPool role="common.att"> <attribute name="role" type="NMTOKEN"/> </attPool> |
tag or attPool may contain more than one ref element. In the following example, an attPool element references to more than one ref element. Required attributes are grouped as common-req.att and optional attributes are grouped as common-opt.att. These two are referenced from the attPool element for common.att.
<attPool role="common.att"> <ref role="common-req.att"/> <ref role="common-opt.att"/> </attPool> <attPool role="common-req.att"> <attribute name="role" type="NMTOKEN" required="true"/> </attPool> <attPool role="common-opt.att"> <attribute name="id" type="NMTOKEN"/> </attPool> |
hedgeRule, a direct or indirect reference to itself is an error. For example, the following is an error.
<attPool role="bar1"> <ref role="bar2"/> <attribute name="id" type="NMTOKEN"/> </attPool> <attPool role="bar2"> <ref role="bar1"/> </attPool> |
none and emptyString.noneis an empty datatype. No character strings belong to this datatype. RELAX uses none so as to prohibit attributes. In the following example, the class attribute is prohibited.
<tag name="p"> <attribute name="class" type="none"/> </tag> |
Thus, the following start tag is not permitted.
<p class="foo"> |
emptyStringis a datatype that allows the empty string only. This datatype is compatible with EMPTY of DTD.
<elementRule role="em" type="emptyString"/> |
This elementRule allows the following two elements only. Whitespace characters may not occur between <em> and </em>.
<em/> |
<em></em> |
integer and further specify a constraint "18 thru 65". The syntax for such additional constraints is the same as in XML Schema Part 2.elementRule, attach child elements to the elementRule.age is a reference to integer. minInclusive and maxInclusive represent constraints on minimum and maximum values, respectively. Thus, permissible contents of age elements are character strings representing integers from 18 to 65.
<elementRule role="age" type="integer"> <minInclusive value="18"/> <maxInclusive value="65"/> </elementRule> |
A age element can contain string "20" as its content.
<age>20</age> |
But string "11" is not allowed.
<age>11</age> |
attribute, attach child elements to attribute.sex attribute of employee is constrained to be either man or woman. Here, enumeration is a constraint which specifies a permissible value.
<tag name="employee">
<attribute name="sex" type="NMTOKEN">
<enumeration value="man"/>
<enumeration value="woman"/>
</attribute>
</tag>
|
The sex attribute can have the string "man".
<employee sex="man"/> |
But it cannot contain the string "foo".
<employee sex="foo"/> |
annotation element. annotation may occur in the following places.interface element
export element
elementRule
hedgeRule
tag
attPool
attribute
include
element
div
elementRule element shown below has an annotation as its eldest child. The content of this annotation is omitted.
<elementRule role="para">
<annotation> ... </annotation>
<mixed>
<ref label="fnote" occurs="*"/>
</mixed>
</elementRule>
|
Child elements of an annotation element are documentation elements and appinfo elements.
documentationis an element for representing explanations in natural languages. Since RELAX Namespace is not available yet, documentation may contain text data only.documentation element added to the above example.
<elementRule role="para">
<annotation>
<documentation>This is a paragraph.</documentation>
</annotation>
<mixed>
<ref label="fnote" occurs="*"/>
</mixed>
</elementRule>
|
If a documentation element has the source attribute, the attribute value is a URI that references to an explanation. In this case, the content of documentation is not used. Browsers for modules typically use this URI to provide a link.
<elementRule role="para">
<annotation>
<documentation source="http://www.xml.gr.jp/relax/"/>
</annotation>
<mixed>
<ref label="fnote" occurs="*"/>
</mixed>
</elementRule>
|
If a documentation element has the xml:lang attribute, the attribute value announces the natural language in which the content of the documentation is written.
In the next example, "en" is specified as the value of xml:lang.
<elementRule role="para">
<annotation>
<documentation xml:lang="en">This is a paragraph.</documentation>
</annotation>
<mixed>
<ref label="fnote" occurs="*"/>
</mixed>
</elementRule>
|
appinfo provides hidden information for such programs. Since RELAX Namespace is not available yet, appinfo may contain text data only.
<elementRule role="foo" type="integer"> <annotation><appinfo>default:1</appinfo></annotation> </elementRule> |
If an appinfo element has the source attribute, the attribute value is a URI that references to hidden information. In this case, the content of appinfo is not used.
elementRules, hedgeRules, tags, and attPools. The div element allows such an annotated group.divelements may occur in module elements as siblings of elementRules, hedgeRules, tags, and attPools. div elements may further contain div elements. A div element may contain elementRules, hedgeRules, tags, attPools, and divs.div elements.
<module
moduleVersion="1.2"
relaxCoreVersion="1.0"
targetNamespace=""
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
<interface>
<export label="doc"/>
</interface>
<div>
<annotation>
<documentation>The root node</documentation>
</annotation>
<elementRule role="doc">
<sequence>
<ref label="title"/>
<ref label="para" occurs="*"/>
</sequence>
</elementRule>
<tag name="doc"/>
</div>
<div>
<annotation>
<documentation>Paragraphs</documentation>
</annotation>
<elementRule role="para">
<mixed>
<ref label="em" occurs="*"/>
</mixed>
</elementRule>
<tag name="para">
<attribute name="class" type="NMTOKEN"/>
</tag>
</div>
<elementRule role="title">
<mixed>
<ref label="em" occurs="*"/>
</mixed>
</elementRule>
<tag name="title">
<attribute name="class" type="NMTOKEN"/>
<attribute name="number" required="true" type="integer"/>
</tag>
<elementRule role="em" type="string"/>
<tag name="em"/>
</module>
|
elementRule and a tag. If each elementRule and tag requires three lines, the total is 1200 lines. If we write extensive documentation, the total may become 3000 lines or even more. This size is too large to put in a single file.include element. The include element is replaced with the content of the referenced module.include. First, a module to be included is as below:
<module
moduleVersion="1.2"
relaxCoreVersion="1.0"
targetNamespace=""
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
<interface/>
<elementRule role="bar" type="emptyString"/>
<tag name="bar"/>
</module>
|
This module contains an elementRule and tag for the element type bar. The interface element is empty. Suppose that this module is stored in bar.rlx.
Next, a module which references to and includes this module is as below:
<module
moduleVersion="1.2"
relaxCoreVersion="1.0"
targetNamespace=""
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
<interface>
<export label="foo"/>
</interface>
<elementRule role="foo">
<ref label="bar"/>
</elementRule>
<tag name="foo"/>
<include moduleLocation="bar.rlx" />
</module>
|
This module contains an elementRule and tag for the element type foo. The include at the end of this this module references to bar.rlx via the moduleLocation attribute.
The include element is replaced by the body of the referenced module, which the content of the module element except the interface element. In this example, replacement is done as below:
<module
moduleVersion="1.2"
relaxCoreVersion="1.0"
targetNamespace=""
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
<interface>
<export label="foo"/>
</interface>
<elementRule role="foo">
<ref label="bar"/>
</elementRule>
<tag name="foo"/>
<elementRule role="bar" type="emptyString"/>
<tag name="bar"/>
</module>
|
interface element of the referenced module is empty. Suppose that an export element is supplied in the interface element.
<module
moduleVersion="1.2"
relaxCoreVersion="1.0"
targetNamespace=""
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
<interface>
<export label="bar"/>
</interface>
<elementRule role="bar" type="emptyString"/>
<tag name="bar"/>
</module>
|
In this case, the children of the interface element in the referenced module are attached to the interface element in the referencing module. In this example, the result of replacement is as below:
<module
moduleVersion="1.2"
relaxCoreVersion="1.0"
targetNamespace=""
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
<interface>
<export label="foo"/>
<export label="bar"/>
</interface>
<elementRule role="foo">
<ref label="bar"/>
</elementRule>
<tag name="foo"/>
<elementRule role="bar" type="emptyString"/>
<tag name="bar"/>
</module>
|
default attribute which provides the default value of an attribute. Existing XML processors will not examine RELAX modules when they parse XML documents. Thus, they will not use default. The same thing applies to entities and notations: even if RELAX had constructs for declaring entities and notations, existing XML processors would not use them.
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE person [ <!ATTLIST person bloodType CDATA "A"> ]> <person/> |
This document is verified against a RELAX module as below:
<module
moduleVersion="1.0"
relaxCoreVersion="1.0"
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
<interface>
<export label="person"/>
</interface>
<elementRule role="person">
<empty/>
</elementRule>
<tag name="person">
<attribute name="bloodType">
<enumeration value="O"/>
<enumeration value="A"/>
<enumeration value="B"/>
<enumeration value="AB"/>
</attribute>
</tag>
</module>
|
In this example, the DTD specifies the default value "A". XML processors do use this default. We can verify this XML document against the RELAX module without any problems. Verification is done as if "A" was specified as the attribute value.
Similarly, entities and notations can be described in DTD. First, we show an example of parsed entities.
<?xml version="1.0"?> <!DOCTYPE doc [ <!ENTITY foo "This is a pen"> ]> <doc> <para>&foo;</para> </doc> |
This document is legitimate against the RELAX module as below:
<module
moduleVersion="1.0"
relaxCoreVersion="1.0"
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
<interface>
<export label="doc"/>
</interface>
<elementRule role="doc">
<ref label="para" occurs="*"/>
</elementRule>
<elementRule role="para" type="string"/>
<tag name="doc"/>
<tag name="para"/>
</module>
|
Next, we show an example of unparsed entities and notations.
<?xml version="1.0"?> <!DOCTYPE doc [ <!NOTATION eps PUBLIC "-//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Adobe Systems Encapsulated Postscript//EN"> <!ENTITY logo_eps SYSTEM "logo.eps" NDATA eps> <!ELEMENT doc EMPTY> <!ATTLIST doc logo ENTITY #IMPLIED> ]> <doc logo="logo_eps"/> |
This document is legitimate against the following RELAX module.
<module
moduleVersion="1.0"
relaxCoreVersion="1.0"
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
<interface>
<export label="doc"/>
</interface>
<elementRule role="doc" type="emptyString"/>
<tag name="doc">
<attribute name="logo" type="ENTITY"/>
</tag>
</module>
|
elementRule and hedgeRule which can be easily understood by DTD authors. Actually, RELAX has a much more generalized framework.elementRule element can have the label attribute. We first consider underlying requirements and then introduce this attribute.
<!-- This example is legal. -->
<section>
<para>This paragraph can contain footnotes <footnote>This
is a footnote</footnote>.</para>
</section>
|
<!-- This example is illegal. -->
<table>
...
<para>This paragraph cannot contain a footnote <footnote>This
is an illegal footnote</footnote>.</para>
...
</table>
|
Thus, we would like to switch content models (shown below) depending on whether paragraphs occur in sections or tables.
<!-- Case 1: subordinate to <section> elements. --> <!ELEMENT para (#PCDATA|footnote)*> <!-- Case 2: subordinate to <table> elements. --> <!ELEMENT para (#PCDATA)> |
A good motivation for context-sensitive content models can be found in HTML. In HTML, an a element may not occur as a direct or indirect subordinate of another a element. The same situation is true of the form element, as well; a form element may not occur inside another form element.
<!-- This example is illegal. --> <a href="foo"><span><a href="bar">dmy</a></span></a> |
<!-- This example is also illegal. -->
<form>
...
<div>
<form>
...
</form>
</div>
...
</form>
|
In HTML, a elements can contain span elements. Since we would like to prohibit even indirect nesting, we have to allow those span elements outside of a elements to contain a elements and do not allow those in a elements to contain span elements. The same thing applies to form; we have to allow those div elements outside of form elements to contain form elements and do not allow those in form elements to contain form elements.
<!-- Case 1: subordinate to <a> elements. --> <!ELEMENT span (#PCDATA|a)*> <!-- Case 2: not subordinate to <a> elements. --> <!ELEMENT span (#PCDATA)> |
However, the DTD formalism allows only one content model per tag name. Thus, we cannot use different content models for paragraphs in different contexts. span may have only one content model; we cannot switch content models depending on whether the span element appears in some a element. The same thing applies to div.
Historically, two approaches have been used to overcome this problem. One is to introduce different tag names for different contexts. The following example illustrates this approach. Paragraphs in sections have the tag name paraInSection, and those in tables have the tag name paraInTable.
<!ELEMENT paraInSection (#PCDATA|footnote)*> <!ELEMENT paraInTable (#PCDATA)> |
<!-- This example is legal. -->
<section>
<paraInSection>This paragraph can contain footnotes <footnote>This
is a footnote</footnote>.</paraInSection>
</section>
|
<table> ... <paraInTable>This paragraph cannot contain a footnote.</paraInTable> ... </table> |
This approach causes a flood of similar tag names: we have to duplicate tag name sets for common constructs such as paragraphs, footnotes, itemized lists, etc.
Instead of introducing more than one tag name for each construct, another approach creates a loose content model by merging different content models for different contexts. The following example illustrates this approach. Not only paragraphs in sections but those in tables are allowed to contain subordinate footnotes.
<!ELEMENT para (#PCDATA|footnote)*> |
This approach causes loose validation. The following example validates against the above example.
<!-- This example is illegal. -->
<table>
...
<para>This paragraph cannot contain a footnote <footnote>This
is an illegal footnote</footnote>.</para>
...
</table>
|
elementRule can have the label attribute. A form of elementRule is as below:
<elementRule role="name" label="label"> ...content model... </elementRule> |
If the label attribute is omitted, the value of the role attribute is used. Thus, the following elementRules are equivalent.
<elementRule role="foo"> ...content model... </elementRule> |
<elementRule role="foo" label="foo"> ...content model... </elementRule> |
For paragraphs containing footnotes and paragraphs not containing footnotes, the following example uses different labels and thus different content models.
<elementRule role="para" label="paraWithFNotes">
<mixed>
<ref label="footnote" occurs="*"/>
</mixed>
</elementRule>
<elementRule role="para" label="paraWithoutFNotes">
<mixed>
<empty/>
<mixed/>
</elementRule>
<tag name="para"/>
|
The first elementRule show that paragraphs of the paraWithFNotes label contain text and footnotes. The second elementRule show that paragraphs of the paraWithoutFNotes label contain text only.
In most cases, there is one to one correspondence between labels and tag names. In fact, in all examples until this STEP, a tag name has only one associated label. To address issues presented in the previous subsection, we have to associate more than one label with a single tag name.
label attribute of ref elements. Values of this attribute are always labels. STEP 1 explained that values are element type names, but that explanation is a white lie. RELAX does not have element types. (To tell the truth, XML 1.0 does not define element types. Element type declarations are defined, but element types are never defined.)paraWithFNotes and paraWithoutFNotes in the last example in the previous section are labels, they can be referenced by ref elements. Content models for sections reference to paraWithFNotes, while those for tables (to be precise, table cells) reference to paraWithoutFNotes.
<elementRule role="section"> <ref label="paraWithFNotes" occurs="*"/> </elementRule> <elementRule role="cell"> <ref label="paraWithoutFNotes" occurs="*"/> </elementRule> |
hedgeRule can specify the same label for the label attribute. In the following example, there are two hedgeRules for the blockElem label.
<hedgeRule label="blockElem"> <ref label="para"/> </hedgeRule> <hedgeRule label="blockElem"> <ref label="itemizedList"/> </hedgeRule> |
The following elementRule references to this blockElem.
<elementRule role="doc">
<sequence>
<ref label="title"/>
<hedgeRef label="blockElem" occurs="*"/>
</sequence>
</elementRule>
|
On validation against RELAX grammars, hedgeRef are first expanded. We use this hedgeRef as an example to demonstrate such expansion.
Both of the hedgeRules describing the blockElem label have ref elements as hedge models. By grouping them with a choice element, we have the following.
<choice> <ref label="para"/> <ref label="itemizedList"/> </choice> |
The hedgeRef we intend to expand specifies * as the occurs attribute. We copy this attribute to the choice element.
<choice occurs="*"> <ref label="para"/> <ref label="itemizedList"/> </choice> |
Finally, we replace the hedgeRef with this choice element.
<elementRule role="doc">
<sequence>
<ref label="title"/>
<choice occurs="*">
<ref label="para"/>
<ref label="itemizedList"/>
</choice>
</sequence>
</elementRule>
|
Let us summarize the procedure for expanding a hedgeRef element referencing to some label.
hedgeRules for this label.
hedgeRules with a choice element.
occurs attribute of the hedgeRef to this choice element.
hedgeRef with this choice element.
hedgeRules are allowed to share a label, we are not forced to write a single hedgeRule. For example, if we would like to add numberedItemizedList as another sort of blockElem, we only have to add the following hedgeRule; we do not have to modify other hedgeRules.
<hedgeRule label="blockElem"> <ref label="numberedItemizedList"/> </hedgeRule> |
hedgeRuleand elementRule are prohibited from sharing a label. The following example is a syntax error.
<hedgeRule label="foo"> <ref label="bar"/> </hedgeRule> <elementRule role="foo" label="foo"> <empty/> </elementRule> |
elementRules can specify the same label for the label attribute. Moreover, the attribute role of multiple elementRules may be identical.elementRules specify section as the value of the role attribute. Since neither specify the label attribute, section is assumed as the value of this attribute.
<tag name="section"/>
<elementRule role="section">
<ref label="para" occurs="*"/>
</elementRule>
<elementRule role="section">
<choice occurs="*">
<ref label="para"/>
<ref label="fig"/>
</choice>
</elementRule>
|
In the case that multiple elementRules exist for a single label, at least one of them are required to hold.
Consider the following section element. The first elementRule and the second elementRule holds for this element. Thus, we can attach the section label.
<section><para/></section> |
The following section element contains a fig element, and thus only the second elementRule holds. Since one holding elementRule is sufficient, we can again attach the section label to this element.
<section><para/><fig/><para/></section> |
Let us consider advantages of allowing more than one elementRule for a single label. Suppose that we already have a module and that we are going to modify this module so that more documents become legitimate.
In the traditional approach, we have to modify an existing elementRule. We cannot guarantee that what was legitimate is still legitimate after such modification.
In RELAX, we do not have to revise existing elementRules, but we only have to add more elementRules. In this approach, what was legitimate is guaranteed to be legitimate.
In the previous example, the initial plan was to allow only paras as contents of section. The first elementRule was written for this purpose. Later, the second elementRule was added so as to allow fig as contents of section. Since the first elementRule is still active, none of the then-legitimate documents has become illegitimate.
tag was compared to an attribute-list declaration and attPool was compared to parameter entities describing attributes. Actually, RELAX has a much more generalized framework.name attribute, tag elements can have the role
attribute. In this section, we first consider motivations for this extension, and then introduce this attribute.val element, depending on the type attribute. If the attribute value is integer, the content model is a reference to the datatype integer. If it is string, the content model is a reference to the datatype string.
<!-- This is legal. --> <val type="integer">10</val> <!-- This is also legal. --> <val type="string">foo bar</val> <!-- This is illegal. --> <val type="integer">foo bar</val> |
Thus, we would like to switch content models (shown below) depending on whether the attribute value is integer or string.
<!-- Case 1: type="integer" --> <elementRule role="val" type="integer"/> <!-- Case 2: type="string" --> <elementRule role="val" type="string"/> |
However, as long as we use features covered in STEPs 0 thru 7, we have to attach content models to tag names. Attribute values are not taken into consideration. Thus, no matter what the value of the type attribute is, the same elementRule is used.
name attribute, tag elements can have the role
attribute. tag elements take the following form. While the name attribute specifies tag names, the role attribute specifies roles.
<tag name="tag-name" role="role-name"> ... </tag> |
A tag element attaches a role to a collection of constraints on tag names and attributes. When a start tag (or empty-element tag) satisfies these constraints, this tag plays the specified role.
For example, consider a tag element as below:
<tag name="val" role="val-integer">
<attribute name="type" type="NMTOKEN" required="true">
<enumeration value="integer"/>
</attribute>
</tag>
|
This tag element specifies that the
tag name be val and the type attribute have
the value integer. If a start tag (or empty-element tag) satisfies this constraint, this tag plays the val-integer role.
<val type="integer"> |
In the following tag element, the constraint on the type attribute is that the
attribute value be string and the role name is val-string.
<tag name="val" role="val-string">
<attribute name="type" type="NMTOKEN" required="true">
<enumeration value="string"/>
</attribute>
</tag>
|
The following start tag does not play the val-integer role, but plays the val-string role.
<val type="string"> |
Attributes may occur even if they are not specified by tag elements. For example, the following start tag has an attribute unknown, which is not specified by the previous tag element. This start tag still plays the role val-string, but warning message will be issued.
<val type="string" unknown=""> |
How should we interpret those tag elements without the role attribute such as those in STEPs 1 thru STEP 7? When the role attribute is omitted, it is assumed to have the value of the name attribute. Thus, the following two tag elements are semantically identical.
<tag name="foo"> <attribute name="bar" type="int"/> </tag> <tag name="foo" role="foo"> <attribute name="bar" type="int"/> </tag> |
role attribute of elementRule elements do not specify tag names, but rather specifies roles. Thus, we can switch hedge models for the same tag name, depending on attribute values.val-string and val-integer shown in the previous example, we can have two elementRules for start tags of the tag name val. An elementRule that references to the val-string role is concerned with start tags whose type attribute has the value string. An elementRule that references to the val-integer role is concerned with start tags whose type attribute has the value integer.
<!-- Case 1: type="integer" -->
<tag name="val" role="val-integer">
<attribute name="type" type="NMTOKEN" required="true">
<enumeration value="integer"/>
</attribute>
</tag>
<elementRule role="val-integer" label="val" type="integer"/>
<!-- Case 2: type="string" -->
<tag name="val" role="val-string">
<attribute name="type" type="NMTOKEN" required="true">
<enumeration value="string"/>
</attribute>
</tag>
<elementRule role="val-string" label="val" type="string"/>
|
Note that two tag elements specify the tag name val and the type attribute. In RELAX, tag elements are not declarations, which may appear once and only once, but rather constraints, which may appear more than once.
ref elements may not be described by tag elements. If they are described, they must be described by attPool elements.ref element references to the foo role, which is described by a tag element. This example is thus a syntax error.
<tag name="foo"/> <attPool role="bar"> <ref role="foo"/> </attPool> |
none datatype. none is useful for switching content models depending on the presence or absence of an attribute.<div class="sec"> and <div> require different content models. A role for the former, say divSec, can be described as below:
<tag name="div" role="divSec">
<attribute name="class" type="string">
<enumeration value="sec"/>
</attribute>
</tag>
|
How do we describe a role for <div>, say divWithoutClass? One might think that the following example would work.
<tag name="div" role="divWithoutClass"/> |
However, this description allows divWithoutClass even for <div
class="sec">. Although the "undeclared attribute" message is issued, this start tag is assumed to play both roles. (1)
To explicitly disallow the class attribute, we have to use the none datatype and write as below:
<tag name="div" role="divWithoutClass"> <attribute name="class" type="none"/> </tag> |
Since no character strings are permitted by the none datatype, any value specified for the class attribute will prevent the divWithoutClass role.
attPool elements are not expanded. tag elements and attPool elements are very similar and equally important in RELAX.tag element attaches a role to a collection of constraints on tag names and attributes. The only difference between attPool and tag is that attPool elements do not contain constraints on tag names. In other words, an attPool element attaches a role to a collection of constraints on attributes.attPool.
<attPool role="info">
<attribute name="class" required="true">
<enumeration value="informative"/>
</attribute>
</attPool>
|
This attPool element specifies that the class attribute is specified, and its value is
"informative" and attaches the info role to this constraint. There are no constraints on tag names. Because of this attPool, the following empty-element tag plays the info role.
<some class="informative"/> |
Just like tag, attributes not specified by attPool may occur. For example, the following start tag plays the info role.
<some class="informative" unknown=""/> |
role attribute of elementRule elements may not be described by attPool elements. If they are described, they must be described by tag elements.elementRule describes the info role, which is described by an attPool element. Thus, this example is a syntax error.
<attPool role="info"/> <elementRule role="info" label="informative" type="emptyString"/> |
tag elements cannot share a single role.tag elements share the bar role. Thus, this example is a syntax error.
<tag name="foo1" role="bar"> <attribute name="a" type="string"/> ... </tag> <tag name="foo2" role="bar"> <attribute name="b" type="string"/> ... </tag> |
In the next example, a role and tag name are both shared by two tag elements. This example is also a syntax error.
<tag name="foo" role="foo"> <attribute name="a" type="string"/> ... </tag> <tag name="foo" role="foo"> <attribute name="b" type="string"/> ... </tag> |
Even when the role attribute is omitted and the value of the name attribute is used, role sharing is prohibited. The two tag elements in the next example are identical to the two tag elements shown above. Thus, this example is also a syntax error.
<tag name="foo"> <attribute name="a" type="string"/> ... </tag> <tag name="foo"> <attribute name="b" type="string"/> ... </tag> |
In the following example, two attPool elements share the bar role. Thus, this example is a syntax error.
<attPool role="bar"> <attribute name="a" type="string"/> ... </attPool> <attPool role="bar"> <attribute name="b" type="string"/> ... </attPool> |
In this last example, a tag element and an attPool element share the bar role. Thus, this example is also a syntax error.
<attPool role="bar"> <attribute name="a" type="string"/> ... </attPool> <tag role="bar" name="foo"> <attribute name="b" type="string"/> ... </tag> |
tag element declares a tag name and attributes. Actually, a tag element attaches a role to a collection of constraints on tag names and attributes. In examples in STEPs 1 thru 7, roles and tag names coincide, but they are not always identical. In most cases, there are one-to-one correspondences among labels, roles, and tag names. But this is not always the case.| Syntactical constructs | tag names, labels, or roles |
The role attribute of elementRule |
references to roles described by tag |
The label attribute of elementRule |
description of labels |
The label attribute of hedgeRule |
description of labels |
The label attribute of ref |
reference to labels described by elementRule |
The label attribute of hedgeRef |
reference to labels described by hedgeRule |
The name attribute of tag |
description of tag names |
The role attribute of tag |
description of roles |
The role attribute of attPool |
description of roles |
The role attribute of ref |
reference to roles described by attPool |
| Types of names | In XML instances | In RELAX modules |
| tag names | occur | occur as part of clauses |
| roles | do not occur | occur in clauses (descriptions of and references to roles) |
| labels | do not occur | occur in production rules (descriptions of and references to labels) |
role attribute. This demonstrates simplicity and descriptive power of RELAX. Enjoy and RELAX!element elements as permissible hedge models. They are mere syntax sugar, and are expanded as ref, elementRule, and tag elements. In this section, we show motivation behind element elements and then present the mechanism.x and y are declared and a datatype int is attached to them.
public class Point {
int x;
int y;
}
|
When a variable x is declared in another class, it may have a different type. In the next example, a datatype float is attached to x of the class Foo.
public class Foo {
float x;
}
|
element element is an element hedge model that specifies both a variable name and type name. An element element always has the name
attribute and type attribute. Furthermore, it may have the occurs attribute.
<element name="tag-name" type="datatype-name"/> |
<element name="tag-name" type="datatype-name" occurs="*"/> |
Use of element elements allows tag and elementRule elements such as below:
<tag name="Point"/>
<elementRule role="Point">
<sequence>
<element name="x" type="integer"/>
<element name="y" type="integer"/>
</sequence>
</elementRule>
|
A Point such that x=100 and y=200 can be represented by an XML document as below:
<Point> <x>100</x> <y>200</y> </Point> |
element element is merely syntax sugar. Each element element in a hedge model is replaced by a ref element, while an elementRule element and tag element are generated.elementRule in the previous subsection is duplicated below. Two element elements in this example have the type attribute. Let us consider how these element elements are expanded.
<elementRule label="Point">
<sequence>
<element name="x" type="integer"/>
<element name="y" type="integer"/>
</sequence>
</elementRule>
|
Each of the element elements is replaced by a ref element. Furthermore, an elementRule element and tag element are generated for each element element. As a hedge model, each elementRule has a reference to the datatype specified by the type attribute of the original element element.
<elementRule label="Point">
<sequence>
<ref label="Point$1"/>
<ref label="Point$2"/>
</sequence>
</elementRule>
<elementRule role="Point$1" label="Point$1" type="integer"/>
<tag role="Point$1" name="x"/>
<elementRule role="Point$2" label="Point$2" type="integer"/>
<tag role="Point$2" name="y"/>
|
When an element element has the occurs attribute, it is copied to the generated ref element. For example, suppose that the elements in the first elementRule specifies occurs="?" (see below).
<elementRule label="Point">
<sequence>
<element name="x" type="integer" occurs="?"/>
<element name="y" type="integer" occurs="?"/>
</sequence>
</elementRule>
|
The result of expansion is as below:
<elementRule label="Point">
<sequence>
<ref label="Point$1" occurs="?"/>
<ref label="Point$2" occurs="?"/>
</sequence>
</elementRule>
<elementRule role="Point$1" label="Point$1" type="integer"/>
<tag role="Point$1" name="x"/>
<elementRule role="Point$2" label="Point$2" type="integer"/>
<tag role="Point$2" name="y"/>
|
element elements.ref element is generated. As the value of its label attribute, we generate a label that does not conflict with any other label. If the element has the occurs attribute, it is copied to the generated ref element.elementRule element is generated. As the value of its role attribute, we generate a role that does not conflict with any other role. The value of the label attribute is the label generated together with the ref element. As the hedge model of this elementRule, the type attribute of the element element is copied.tag element is generated. Its role attribute specifies the role automatically generated together with the elementRule. The name attribute of the generated tag specifies the value of the name attribute of the original element element.element elements probably look very natural and easy to understand. Enjoy and RELAX!tag elements in elementRule elements.tag and attPool elements, while hedge models are described by elementRule and hedgeRule elements. An elementRule references to a tag via a role, and the tag may in turn reference to attPool elements.elementRule and a tag is so closely related, it may be convenient to merge them into a single element rather than separating them.elementRule-tag separation, we duplicate an example in STEP 8 below.
<!-- Case 1: type="integer" -->
<tag name="val" role="val-integer">
<attribute name="type" type="NMTOKEN" required="true">
<enumeration value="integer"/>
</attribute>
</tag>
<elementRule role="val-integer" label="val" type="integer"/>
<!-- Case 2: type="string" -->
<tag name="val" role="val-string">
<attribute name="type" type="NMTOKEN" required="true">
<enumeration value="string"/>
</attribute>
</tag>
<elementRule role="val-string" label="val" type="string"/>
|
Suppose that roles val-integer and val-string are referenced from these two elementRule elements only. Rather than introducing two names val-integer and val-string for referencing, authors might want to directly embed tag elements within elementRule elements.
<!-- Case 1: type="integer" -->
<elementRule label="val" type="integer">
<tag>
<attribute name="type" type="NMTOKEN" required="true">
<enumeration value="integer"/>
</attribute>
</tag>
</elementRule>
<!-- Case 2: type="string" -->
<elementRule label="val" type="string">
<tag>
<attribute name="type" type="NMTOKEN" required="true">
<enumeration value="string"/>
</attribute>
</tag>
</elementRule>
|
An advantage of this style is that roles do not need names. Before this rewrite, we needed names which are different from the tag names or labels. Omission of these names enhance readability.
Some people find it attractive to describe attributes and hedge models together. For example, points with the x-coordinate and y-coordinate can be represented in two alternative manners. The first example uses attributes, while the second uses elements. Their differences are minor and can be easily rewritten from each other.
<elementRule label="point" type="emptyString">
<tag>
<attribute name="x" type="integer"/>
<attribute name="y" type="integer"/>
</tag>
</elementRule>
|
<elementRule label="point">
<tag/>
<sequence>
<element name="x" type="integer"/>
<element name="y" type="integer"/>
</sequence>
</elementRule>
|
An elementRule containing a tag may not have the role attribute. The label attribute is mandatory, instead.
An embedded tag may not have the role attribute. The name attribute is permitted, but it is not present in this example.
tag element is moved from the elementRule and placed as a sibling element. We show how the first example in this STEP is handled.
<elementRule label="val" type="integer">
<tag>
<attribute name="type" type="NMTOKEN" required="true">
<enumeration value="integer"/>
</attribute>
</tag>
</elementRule>
|
First, we generate a role that does not conflict with any other role. In this example, we generate role val$1.
Next, we move the embedded tag element from the elementRule and place as a sibling element. We then add the role attribute and specify the generated role as the attribute value.
Only when this tag element does not have the name attribute, we introduce this attribute. As the attribute value, we use the value of the label attribute of the elementRule element. In this example, we specify "val" as the value of the name attribute.
Finally, we add the role attribute to the elementRule and specify the role generated above.
<elementRule label="val" type="integer" role="val$1">
</elementRule>
<tag name="val" role="val$1">
<attribute name="type" type="NMTOKEN" required="true">
<enumeration value="integer"/>
</attribute>
</tag>
|
tag elements provides concise and comprehensible description. Enjoy and RELAX!