Welcome!

XML Authors: Lori MacVittie, Martin Ingram, Tong Liu, Gilad Shainer, Brian Sparks

Related Topics: XML

XML: Article

Desperately Seeking...Help for XML Schema

Desperately Seeking...Help for XML Schema

Whether it's the Russian dolls, Salami Slices, or Chameleon Schemas discussed at www.xfront.com, the dizzying array of elements and attributes (and their complex interactions) found in the XML Schema specification, or the multitude of UML stereotypes at www.xmlmodeling.com, W3C XML Schema will disappear into oblivion without a non-XML syntax and a set of easy-to-use and -understand best practices and modeling techniques.

There's been a lot of interest in XML Schema, and well there should be. A lot of very smart people put the XML Schema spec together, and I'm sure it took an amazing amount of effort. But if I can be frank for a moment, XML Schema is rocket science. From noNamespaceSchemaLocation to block attributes and substitutionGroups, the XML Schema syntax is simply too complex....STOP the MADNESS and just say NO! While I fault the XML Namespace Recommendation for many of the complexities haunting the XML Schema Spec today, many of the wounds are self-inflicted. One (small) case in point is XML Schema - complexType Inheritance.

complexType Inheritance Is Simply Broken
One of the main object-oriented features of XML Schema is complexType inheritance. Most tutorials (including most introductory tutorials, like the XML Schema Primer) about XML Schema have examples that show off this feature, but complexType inheritance is simply broken. In addition to the syntax being overly complex (extension, restriction, simpleType, complexType, simpleContent, complexContent, block, abstract), it's simply not useful for type reusability.

Why? complexType inheritance has two major limitations:

  1. You can extend a base class definition only "to the right."
  2. You can extend the definition only with a sequence.
Here's an example that illustrates these problems:

Let's assume I have a complexType called XType.

<xsd:complexType name="XType" >
<xsd:sequence>
<xsd:element name="a" type="xsd:string" />
<xsd:element name="b" type="xsd:string" />
<xsd:element name="c" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>

That is, XType has a content model of (a,b,c). For simplicity I'll use DTD syntax here to demonstrate the problem. (It's unfortunate that I need to resort to DTD syntax to discuss a problem with XML Schema...but I digress.)

Now let's say I wanted to reuse the parent type XType, and model several "subtypes" that correspond to the following content models:
1. (a,b,c),d
2. (a,b,c)|d
3. ((a,b,c),d)+
4. (a,b,c)*
5. ((a,b,c), d) | e )
6. d, (a,b,c)
7. d, (a,b,c)+,e

Unfortunately, only No. 1 above can be modeled using complexType inheritance. That is, a child type can extend the parent's type only "to the right." Moreover, the child type can extend the parent type only with a sequence...that is, a comma between the parent and child definition. (Even No. 3 above, which uses a sequence, is invalid because of the grouping of the child's model with the parent's model.)

Of course, we can support Nos. 1-7 above, and get all the reusability we want with XML Schema model groups. But the problem here is that I can't just throw away the complexType syntax and replace it with model group syntax. If I use model groups, I have to complicate my XML Schema document even more. This is probably why many tutorials on XML Schema avoid discussing model groups altogether, which is unfortunate, because model groups are the best way to get flexibility and reuse with XML Schema. Another restriction of complexType inheritance is that it doesn't support multiple inheritance. Model groups, however, can be used to coalesce any number of other model groups in reusable combinations.

But wait a minute, you say, isn't complexType inheritance useful for other things besides reuse, like "polymorphism" or base type "restriction"?

Well, yes, that's true, but they're also broken.

To achieve base type restriction with XML Schema, we have to duplicate the entire content model of the parent type in the child's type definition. Yes, that's right, you have to copy and paste the parent's definition into the child!

Say whaaat?!!??

To achieve polymorphism with XML Schema, we have to place xsi:type attributes in our instance documents. This means that document authors and designers are forced to use attributes dictated by the W3C in their business vocabularies! It also forces much complexity on the programmers trying to process these documents. Doesn't this fly in the face of what XML is all about?

Desperately Seeking...
Easy-to-understand best practices and modeling techniques

While I greatly appreciate all the effort on best practices and XML modeling at Web sites like www.xfront.com, www.xmlmodeling.com, and www.xml patterns.com, I feel they miss the mark. What we need is a set of simplified XML Schema best practices and modeling ideas. We need best practices and modeling initiatives that rein in, rather than accentuate, the numerous complexities of the XML Schema specification.

Desperately Seeking...
NON-XML syntax for XML schema

We also need a compact syntax for XML Schema. Let's take a look at an example, a sample XML instance document for a "course":

<course name="XML Schema, Distilled" >
<module name="Overview of Model Groups" >
<lesson name="Building a Model Group" >
<slide name="What is a Model Group" >
<p> Model Groups are quite <b>powerful</b>
and <i>easy</i> to understand.
</p>
</slide>
...
</lesson>
</module>
</course>

And here's a complete compact schema document that can be used to ultimately validate the XML instance document above (using James Clark's compact syntax for RELAX NG):

start = course
course = element course { name, module+ }
module = element module { name, lesson+ }
lesson = element lesson { name, slide* }
slide = element slide { name, content }
name = attribute name { text }
content = ( contentTags )*
contentTags = element (p | b | i )
{ ( text | content ) }

One thing to note is that the schema above is seven lines shorter than the equivalent DTD...and look, mom, no parameter entities!

Desperately Seeking...
Help for XML schema

Hopefully, the smart XML Schema folks are working on a non-XML syntax for XML Schema. Perhaps James Clark's syntax can be leveraged here, without even exposing so-called "features" like complexType inheritance, concentrating on creating "easier to use and understand" XML Schemas. In addition, let's create a set of XML Schema best practices and modeling tools built with ease of use in mind. These should be engineered to hide, rather than accentuate, the complexities of XML Schema.

SIDEBAR
The World Wide Web Consortium (W3C)
The W3C was created to lead the Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. An international industry consortium jointly run by the MIT Laboratory for Computer Science (MIT LCS) in the U.S., the National Institute for Research in Computer Science and Control (INRIA) in France, and Keio University in Japan, services include a repository of information about the Web for developers and users, and various prototype and sample applications to demonstrate the use of new technology. To date, nearly 500 organizations are members of the Consortium. For more information see www.w3.org/.

More Stories By Tom Gaven

Tom Gaven lives in northern Virginia, and has developed and delivered training on many different technologies. He has authored over 30 courses, including Assembler, C, C++, Java, OS/2, and Windows. He also authored MindQ's Developer Training for Java program. In the last 2 years, he has been architecting and developing products with XML, XSLT, XML Schema, RELAX NG, Java, and Schematron. Tom is currently working on tools and courseware to make XML easier to use. See http://www.xmldistilled.com for more information.

Comments (3) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Nauman Malik 06/20/02 12:00:00 PM EDT

Tom, I finally found the time for a conscientious reading. As always, you make some sound points (I didn't realize that 'sequence' is a requirement of derivation by extension; will be checking that one myself), and that classic Gaven wit shows through in your writing (your slides have it too). Yes, XSD is a complex beast and much needs to be fixed in Version 1.0. Version 1.0, Tom. Version 1.0. Can we not get behind this great effort to help improve it? Because even you acknowledge that there is much good and useful in XML Schema.

The brevity of a non-XML syntax for a schema language is alluring but much is lost as well, the biggest loss being the ability to use an XML aware tool on the schema itself, to transform it, to render it, to intelligently document it. I'm not yet convinced that a regression to a non-XML syntax (at least in the direction the XML train is headed) is a good thing.

Presently, I'm content with using a simplified subset of XSD as a base and augmenting its limitations with alternate technologies such as XSLT, Schematron, etc. AND LETTING THE 'SMART' FOLKS AT THE XSD WORKING GROUP KNOW ABOUT IT.

Nevertheless, enjoyed the article and hope to read more from you in the future.

--Nauman--

Prashanth Rao 06/10/02 10:12:00 PM EDT

Reading this article makes me think that this is the beginning of a much larger critique of XML Schema. Tom, keep your illuminating articles flowing.
Good job!

Bruce Peat 06/09/02 02:50:00 PM EDT

Excellent article - insightful!