Powered by SmartDoc
ENGLISHJAPANESE

STEP 1: Migration from XML DTD (without parameter entities)

$Id: step1.sdoc 1.13 2000/08/06 08:47:44 murata Exp $

STEP 1 covers basic features, which allow easy migration from DTDs. The DTD2RELAX converter uses these features only.

An example module

To provide an idea of RELAX, we will recapture a DTD as a RELAX module.

A DTD is shown below. The number attribute of title elements should be integers, but DTDs cannot represent this constraint.

<!ELEMENT doc   (title, para*)>

<!ELEMENT para  (#PCDATA | em)*>

<!ELEMENT title (#PCDATA | em)*>

<!ELEMENT em    (#PCDATA)>

<!ATTLIST para
  class   NMTOKEN #IMPLIED
>

<!ATTLIST title  
  class   NMTOKEN #IMPLIED
  number  CDATA   #REQUIRED
>

Next, we show a RELAX module. The number attribute is specified as an integer.

<module
      moduleVersion="1.2"
      relaxCoreVersion="1.0"
      targetNamespace=""
      xmlns="http://www.xml.gr.jp/xmlns/relaxCore">

  <interface>
    <export label="doc"/>
  </interface>

  <elementRule role="doc">
    <sequence>
      <ref label="title"/>
      <ref label="para" occurs="*"/>
    </sequence>
  </elementRule>
  
  <elementRule role="para">
    <mixed>
      <ref label="em" occurs="*"/>
    </mixed>
  </elementRule>
  
  <elementRule role="title">
    <mixed>
      <ref label="em" occurs="*"/>
    </mixed>
  </elementRule>

  <elementRule role="em" type="string"/>
  
  <tag name="doc"/>

  <tag name="para">
    <attribute name="class" type="NMTOKEN"/>
  </tag>
  
  <tag name="title">
    <attribute name="class" type="NMTOKEN"/>
    <attribute name="number" required="true" type="integer"/>
  </tag>

  <tag name="em"/>

</module>

Subsequent sections explain syntactical constructs appeared in this example.

The module element

A RELAX grammar is a combination of modules. If the number of namespaces is one and the grammar is not too large, a module provides a RELAX grammar. A module is represented by a module element.

<module
      moduleVersion="1.2"
      relaxCoreVersion="1.0"
      targetNamespace=""
      xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
  ...
</module>

The moduleVersion attribute shows the version of this module. In this example, it is "1.2".

The relaxCoreVersion attribute shows the version of RELAX Core. At present, it is always "1.0".

The targetNamespace attribute shows the namespace which this module is concerned with. In this example, it is "".

The namespace name for RELAX Core is "http://www.xml.gr.jp/xmlns/relaxCore".

The interface element

A module element begins with an interface element. A module has one interface element.

<module
      moduleVersion="1.2"
      relaxCoreVersion="1.0"
      targetNamespace=""
      xmlns="http://www.xml.gr.jp/xmlns/relaxCore">

  <interface>
    ...
  </interface>
  ...
</module>

The export element

An interface element contains export element(s).

<export label="foo"/>

The label attribute of export elements specifies an element type that may become the root. More than one export may appear in an interface element.

The following example allows element type foo and bar as the root.

<interface>
  <export label="foo"/>
  <export label="bar"/>
</interface>

Element type declarations

Element type declarations (<!ELEMENT ...>) of XML are represented by elementRule elements. The role attribute of elementRule specifies an element type name. More than one elementRule may follow the interface element.

<elementRule role="element-type-name">
  ...hedge model...
</elementRule>

An elementRule element has a hedge model. A hedge is a sequence of elements (and their descendants) as well as character data. A hedge model is a constraint on permissible hedges.

A hedge model is either an element hedge model, datatype reference, or mixed hedge model.

Element hedge model

Element hedge modelsare represented by empty, none, ref, choice, sequence elements and the occurs attribute. An element hedge model represents permissible sequences of child elements, which are possibly intervened by whitespace characters.

The empty element

emptyrepresents the empty sequence.

Consider an elementRule as below:

<elementRule role="foo">
  <empty/>
</elementRule>

This elementRule implies that the content of a foo element is the empty sequence. A foo element can be a start tag followed by an end tag, or an empty-element tag.

<foo/>
<foo></foo>

Unlike EMPTY of XML, whitespace characters may intervene between start and end tags.

<foo>  </foo>

emptycan be used within choice and sequence. The motivation behind this extension will become clear in STEP 2. If you need exactly the same feature as EMPTY of XML, use the emptyString datatype (shown in STEP 3).

From now on, we assume that foo, foo1, and foo2 are declared by elementRules whose hedge models are empty.

The ref element

refreferences to an element type. For example, <ref label="foo"/> references to an element type foo.

Consider an elementRule as below:

<elementRule role="bar">
  <ref label="foo"/>
</elementRule>

This elementRule implies that the content of a bar element is a foo element. For example, the next bar element is legitimate against this elementRule.

<bar><foo/></bar>

Whitespace may appear before and after the foo element.

<bar>
  <foo/>
</bar>

refcan have the occurs attribute. Permissible values are "*", "+", and "?" , which indicate "zero or more", "one or more", and "zero or one time", respectively.

An example of "?" as the occurs attribute is as below:

<elementRule role="bar">
  <ref label="foo" occurs="?"/>
</elementRule>

This elementRule implies that the content of a bar element is either a foo or empty.

<bar><foo/></bar>
<bar></bar>

Whitespace characters may appear before and after the foo element. Even when this bar is empty, it may have whitespace characters.

<bar>
  <foo/>
</bar>
<bar>
</bar>

The choice element

choiceindicates a choice of the specified hedge models ("|" of XML 1.0). Subordinate elements of choice elements are element hedge models. choice can also have the occurs attribute.

An example of elementRule containing choice is shown below:

<elementRule role="bar">
  <choice occurs="+">
    <ref label="foo1"/>
    <ref label="foo2"/>
  </choice>
</elementRule>

This elementRule indicates that the content of a bar element is one or more occurrences of either foo1 or foo2 elements.

<bar><foo2/></bar>
<bar>
  <foo2/>
</bar>
<bar>
  <foo1/>
  <foo2/>
  <foo1/>
</bar>

The sequence element

sequenceis a sequence of the specified hedge models. ("," of XML 1.0). Subordinate elements of sequence are element hedge models. sequence can also have the occurs attribute.

An example of elementRule containing sequence is shown below:

<elementRule role="bar">
  <sequence occurs="?">
    <ref label="foo1"/>
    <ref label="foo2"/>
  </sequence>
</elementRule>

This elementRule implies that the content of a bar element is either a sequence of a foo1 element and a foo2 element, or empty.

<bar><foo1/><foo2/></bar>
<bar>
  <foo1/>
  <foo2/></bar>
<bar/>
<bar></bar>
<bar>
  </bar>

The none element

noneis an element hedge model, which does not match anything. none is unique to RELAX.

<elementRule role="bar">
  <none/>
</elementRule>

This elementRule implies that nothing is permitted as the content of bar elements. The motivation behind none will become clear in STEP 2.

Datatype reference

The type attribute of elementRule allows a content model that references to a datatype. Character strings in a document are compared with the specified datatype. Permissible datatypes are built-in datatypes of XML Schema Part 2, or datatypes unique to RELAX. Details of datatypes will be covered by STEP 3.

An example of elementRule containing type is shown below:

<elementRule role="bar" type="integer"/>

This elementRule indicates that the content of a bar element is a character string representing an integer.

<bar>10</bar>

Whitespace characters may not occur before or after the integer. For example, the following is not permitted.

<bar>
  10
</bar>

Mixed hedge model

mixedsignificantly extends mixed content models (#PCDATA|a|b|...|z)* of XML.

A mixed element wraps an element hedge model. Recall that an element hedge model allows whitespace characters to intervene between elements. By wrapping it with mixed, any character is allowed to intervene.

(#PCDATA | foo1| foo2)*of XML can be captured as below:

<elementRule role="bar">
  <mixed>
    <choice occurs="*">
      <ref label="foo1"/>
      <ref label="foo2"/>
    </choice>
  </mixed>
</elementRule>

The choice element in this mixed element matches zero or more occurrences of foo1 or foo2 elements. The mixed allows any character to intervene between these elements. Thus, this hedge model is equivalent to a (#PCDATA | foo1| foo2)* mixed content model of XML 1.0.

There are two ways to capture a (#PCDATA) content model. One is to reference to the datatype string by the type attribute. The other is to make an element hedge model that matches the empty sequence only and wrap it with mixed.

<elementRule role="bar" type="string"/>
<elementRule role="bar">
  <mixed>
    <empty/>
  </mixed>
</elementRule>

As a more advanced example, consider elementRule as below:

<elementRule role="bar">
  <mixed>
    <sequence>
      <ref label="foo1"/>
      <ref label="foo2"/>
    </sequence>
  </mixed>
</elementRule>

A sequence of <foo/> and <foo2/> matches ref in the mixed element. Thus, the following example is permitted by this elementRule.

<bar>Murata<foo1/>Makoto<foo2/>IUJ</bar>

As shown in the following example, CDATA sections and character references may appear.

<bar><![CDATA[Murata]]><foo1/>Mako&#x74;&#x6F;<foo2/>IUJ</bar>

Attribute-list declarations

Attribute-list declarations (<!ATTLIST ...>) of XML are captured by tag elements.

<tag name="element-type-name">
  ...list of attribute declarations...
</tag>

tagcan have attribute elements as subordinates.

<tag name="element-type-name">
  <attribute ... />
  <attribute ... />
</tag>

attributedeclares an attribute. An example of attribute is shown below:

<attribute name="age" required="true" type="integer"/>

The value of the name attribute is the name of the declared attribute. In this example, it is age.

If the value of the required attribute is true, the attribute being declared is mandatory. If required is not specified, it is optional. Since required is specified in this example, the age attribute is mandatory.

The type attribute specifies a datatype name. If type is not specified, a datatype string (which allows any string) is assumed.

Consider an example of tag which contains this attribute element only.

<tag name="bar">
  <attribute name="age" required="true" type="integer"/>
</tag>

The following start tag is permitted by this tag.

<bar age="39">

The following two start tags are not permitted. In the first example, the age attribute is omitted. In the second example, the value of age is not an integer.

<bar>
<bar age="bu huo">
<!-- "bu huo" means forty years in Chinese.  In Japan, 
     it is pronounced as "FUWAKU". -->

In DTD, you do not have to write an attribute-list declaration if an element type does not have any attributes. In RELAX, you must write an empty tag element even if there are no attributes. For example, if an element type bar does not have any attributes, you have to write a tag element as below:

<tag name="bar"/>

Summary

If you have finished reading this STEP, you can immediately start to use RELAX. If you do not need further features, you do not have to read other STEPs. Enjoy and RELAX!