Prototypical implementation of "Divide and Validate"

$Id: divideAndValidate.html 1.3 2001/03/11 12:30:52 murata Exp $

7 March, 2001

Murata Makoto [FAMILY Given]

Introduction

To illustrate validation in RELAX Namespace, I wrote a small Java program. Given a non-monolithic XML document, this program decomposes it to a collection of islands, each of which is of a single namespace.

Each island can be validated by the RELAX Core processor. Furthermore, processors for different schema languages (even DTDs!) can be applied to these island.

Decomposition Algorithm

If an element e and its parent element e' belong to different namespaces, e is detached from e'. Instead of e, a dummy node is introduced as a child element of e'.

A dummy node belongs to the namespace http://www.xml.gr.jp/xmlns/dummy". The attribute "namespaceName" of the dummy node indicates the namespace of e.

Example 1

Consider an XML document as below:

<doc:doc xmlns:doc="urn:document" xmlns:table="urn:table">
  <doc:para>this is a para</doc:para>
  <table:table number="1">
    <table:row>
      <table:cell>
        <doc:para>1st para</doc:para>
        <doc:para>2nd para</doc:para>
      </table:cell>
    </table:row>
  </table:table>
  <table:table number="2">
    <table:row>
      <table:cell>
        <doc:para>3rd para</doc:para>
        <doc:para>4th para</doc:para>
      </table:cell>
    </table:row>
  </table:table>
</doc:doc>

This non-monolithic document is decommposed into seven islands.

namespace$ java -cp 
"/crimson-1.1/jaxp.jar;/crimson-1.1/crimson.jar;/sax2/sax.jar;/relax/namespace/src/" 
org.iso_relax.dispatcher.TestDispatcher -n explain.xml

*Island start: urn:document
<para>1st para</para>
*Island end

*Island start: urn:document
<para>2nd para</para>
*Island end

*Island start: urn:table
<table number="1">
    <row>
      <cell>
        <dummy namespaceName="urn:document 
xmlns="http://www.xml.gr.jp/xmlns/dummy"></dummy>
        <dummy namespaceName="urn:document 
xmlns="http://www.xml.gr.jp/xmlns/dummy"></dummy>
      </cell>
    </row>
  </table>
*Island end

*Island start: urn:document
<para>3rd para</para>
*Island end

*Island start: urn:document
<para>4th para</para>
*Island end

*Island start: urn:table
<table number="2">
    <row>
      <cell>
        <dummy namespaceName="urn:document 
xmlns="http://www.xml.gr.jp/xmlns/dummy"></dummy>
        <dummy namespaceName="urn:document 
xmlns="http://www.xml.gr.jp/xmlns/dummy"></dummy>
      </cell>
    </row>
  </table>
*Island end

*Island start: urn:document
<doc>
  <para>this is a para</para>
  <dummy namespaceName="urn:table xmlns="http://www.xml.gr.jp/xmlns/dummy"></dummy>
  <dummy namespaceName="urn:table xmlns="http://www.xml.gr.jp/xmlns/dummy"></dummy>
</doc>
*Island end
namespace$ 

Example 2: RSS

The RSS specification uses namespaces heavily. The second example in Section 7 has six namespaces. This document is decomposed as follows:

java -cp "/crimson-1.1/jaxp.jar;/crimson-1.1/crimson.jar;/sax2/sax.jar;/relax/namespace/src/" org.iso_relax.dispatcher.TestDispatcher -n rssExample.xml

*Island start: http://purl.org/dc/elements/1.1/
<publisher>The O'Reilly Network</publisher>
*Island end

*Island start: http://purl.org/dc/elements/1.1/
<creator>Rael Dornfest (mailto:rael@oreilly.com)</creator>
*Island end

*Island start: http://purl.org/dc/elements/1.1/
<rights>Copyright ? 2000 O'Reilly & Associates, Inc.</rights>
*Island end

*Island start: http://purl.org/dc/elements/1.1/
<date>2000-01-01T12:00+00:00</date>
*Island end

*Island start: http://purl.org/rss/1.0/modules/syndication/
<updatePeriod>hourly</updatePeriod>
*Island end

*Island start: http://purl.org/rss/1.0/modules/syndication/
<updateFrequency>2</updateFrequency>
*Island end

*Island start: http://purl.org/rss/1.0/modules/syndication/
<updateBase>2000-01-01T12:00+00:00</updateBase>
*Island end

*Island start: http://www.w3.org/1999/02/22-rdf-syntax-ns#
<Seq>
        <li resource="http://c.moreover.com/click/here.pl?r123"></li>
      </Seq>
*Island end

*Island start: http://purl.org/rss/1.0/
<channel {http://www.w3.org/1999/02/22-rdf-syntax-ns#}about="http://meerkat.oreillynet.com/?_fl=rss1.0">
    <title>Meerkat</title>
    <link>http://meerkat.oreillynet.com</link>
    <description>Meerkat: An Open Wire Service</description>
    <dummy namespaceName="http://purl.org/dc/elements/1.1/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/dc/elements/1.1/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/dc/elements/1.1/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/dc/elements/1.1/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/rss/1.0/modules/syndication/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/rss/1.0/modules/syndication/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/rss/1.0/modules/syndication/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>

    <image {http://www.w3.org/1999/02/22-rdf-syntax-ns#}resource="http://meerkat.oreillynet.com/icons/meerkat-powered.jpg"></image>

    <items>
      <dummy namespaceName="http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    </items>

    <textinput {http://www.w3.org/1999/02/22-rdf-syntax-ns#}resource="http://meerkat.oreillynet.com"></textinput>

  </channel>
*Island end

*Island start: http://purl.org/rss/1.0/
<image {http://www.w3.org/1999/02/22-rdf-syntax-ns#}about="http://meerkat.oreillynet.com/icons/meerkat-powered.jpg">
    <title>Meerkat Powered!</title>
    <url>http://meerkat.oreillynet.com/icons/meerkat-powered.jpg</url>
    <link>http://meerkat.oreillynet.com</link>
  </image>
*Island end

*Island start: http://purl.org/dc/elements/1.1/
<description>
      XML is placing increasingly heavy loads on the existing technical
      infrastructure of the Internet.
    </description>
*Island end

*Island start: http://purl.org/dc/elements/1.1/
<publisher>The O'Reilly Network</publisher>
*Island end

*Island start: http://purl.org/dc/elements/1.1/
<creator>Simon St.Laurent (mailto:simonstl@simonstl.com)</creator>
*Island end

*Island start: http://purl.org/dc/elements/1.1/
<rights>Copyright ? 2000 O'Reilly & Associates, Inc.</rights>
*Island end

*Island start: http://purl.org/dc/elements/1.1/
<subject>XML</subject>
*Island end

*Island start: http://purl.org/rss/1.0/modules/company/
<name>XML.com</name>
*Island end

*Island start: http://purl.org/rss/1.0/modules/company/
<market>NASDAQ</market>
*Island end

*Island start: http://purl.org/rss/1.0/modules/company/
<symbol>XML</symbol>
*Island end

*Island start: http://purl.org/rss/1.0/
<item {http://www.w3.org/1999/02/22-rdf-syntax-ns#}about="http://c.moreover.com/click/here.pl?r123">
    <title>XML: A Disruptive Technology</title> 
    <link>http://c.moreover.com/click/here.pl?r123</link>
    <dummy namespaceName="http://purl.org/dc/elements/1.1/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/dc/elements/1.1/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/dc/elements/1.1/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/dc/elements/1.1/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/dc/elements/1.1/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/rss/1.0/modules/company/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/rss/1.0/modules/company/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/rss/1.0/modules/company/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
  </item>
*Island end

*Island start: http://purl.org/rss/1.0/modules/textinput/
<function>search</function>
*Island end

*Island start: http://purl.org/rss/1.0/modules/textinput/
<inputType>regex</inputType>
*Island end

*Island start: http://purl.org/rss/1.0/
<textinput {http://www.w3.org/1999/02/22-rdf-syntax-ns#}about="http://meerkat.oreillynet.com">
    <title>Search Meerkat</title>
    <description>Search Meerkat's RSS Database...</description>
    <name>s</name>
    <link>http://meerkat.oreillynet.com/</link>
    <dummy namespaceName="http://purl.org/rss/1.0/modules/textinput/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    <dummy namespaceName="http://purl.org/rss/1.0/modules/textinput/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
  </textinput>
*Island end

*Island start: http://www.w3.org/1999/02/22-rdf-syntax-ns#
<RDF> 

  <dummy namespaceName="http://purl.org/rss/1.0/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>

  <dummy namespaceName="http://purl.org/rss/1.0/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>

  <dummy namespaceName="http://purl.org/rss/1.0/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/> 

  <dummy namespaceName="http://purl.org/rss/1.0/ xmlns="http://www.xml.gr.jp/xmlns/dummy"/>

</RDF>
*Island end