Welcome!

XML Authors: Katharine Hadow, Greg Schulz, Ambal Balakrishnan, Jeff Scholes, Brad Abrams

Related Topics: XML

XML: Article

Can One Size Fit All?

Can One Size Fit All?

Traditionally, APIs for processing XML have been categorized according to whether they're designed for processing entire XML documents loaded in memory, such as the W3C DOM, or for processing XML in a streaming, forward-only fashion, such as SAX. However, these divisions do not fully represent the various classes of APIs for processing XML.

In a recent article entitled "A Survey of APIs and Techniques for Processing XML," I describe six primary methodologies for processing XML.
1.  Push-model APIs such as SAX
2.  Pull-model APIs such as the .NET Framework's XmlReader class
3.  Tree-model APIs such as DOM
4.  Cursor-model APIs such as the .NET Framework's XPathNavigator class
5.  Object-XML mapping technologies such as the .NET Framework's XmlSerializer class
6.  XML-specific languages such as XQuery

This list highlights that the range of considerations when choosing an API or technique for processing XML extends beyond forward-only access over XML streams versus random access over XML documents stored in memory. Other considerations include whether the XML being processed is used to represent semi-structured documents versus rigidly structured data, whether the XML is considered to be strongly or weakly typed, and ease of use of the API.

The purpose of this article is to explore whether a single API could be designed that satisfies the various needs that warrant the existence of six different categories of technologies for processing XML.

Rigidly Structured Data and Semi-Structured Documents
One of the main reasons for XML's rise to prominence as the lingua franca for information interchange is that, unlike prior data interchange formats, XML can easily represent both rigidly structured tabular data (e.g., relational data or serialized objects) and semi-structured data (e.g., office documents). However, applications that utilize XML typically produce or consume XML that is primarily either rigidly structured data or semi-structured documents. Several defining characteristics distinguish both XML usage patterns.

Software applications are usually the primary consumers of XML documents that represent rigidly structured data. Such XML documents usually have content that is meant primarily for machine processing that is labeled with markup targeted for human consumption. XML configuration files, log files, and relational database dumps are examples of rigidly structured data that are meant primarily for machine processing. The markup in these documents is mainly of use to human readers who are either editing or debugging an XML application. Such XML documents typically comprise elements and attributes where only the deepest subelements - the leaf nodes - contain character data. Although XML considers the order of elements to be significant, the order of sibling elements in such documents is often not important to the semantics of the document (e.g., the order of the rows in a database dump is often not significant). The following is an example of an XML document representing rigidly structured data:

<items>
<compact-disc>
<price>16.95</price>
<artist>Nelly</artist>
<title>Nellyville</title>
</compact-disc>
<compact-disc>
<price>17.55</price>
<artist>Baby D</artist>
<title>Lil Chopper Toy</title>
</compact-disc>
</items>

Human readers are usually the primary consumers of semi-structured XML documents. In this case, the XML markup assists software applications to process the data. Web pages and business documents are examples of semi-structured documents that are meant primarily for human consumption. Their markup is mainly of use to programs that are processing or displaying the information within the documents. Such XML documents typically comprise elements and attributes where character data appears alongside subelements, and character data is not confined to the leaf nodes. The interleaving of character data with subelements is often described as mixed content. The order of elements in semi-structured documents is often significant (e.g., the order of chapter elements within a book element matters). Features such as entities, processing instructions, and comments are more likely to be used in semi-structured documents to aid authors and readers of the XML. The following is an example of a typical semi-structured XML document:

<p xmlns="http://www.w3.org/1999/xhtml/"> If customer is not
available at the address then attempt to
leave package at one of the
following locations listed in order of
which should be attempted first
<ol>
<li>Next Door</li>
<li>Front Desk</li>
<li>On Doorstep</li>
</ol>
<b>Note</b> Remember to leave
a note detailing where to
pick up the package.
</p>

In reality, many uses of XML fall somewhere in the middle, where there is an island of rigid structure within a semi-structured document or an area with "open content" in a rigidly structured document. XML easily accommodates these scenarios because the choice of which model of document to exchange is not mutually exclusive.

The Relationship Between Data Typing and XML Usage Patterns
The different XML usage patterns, semi-structured documents and rigidly structured data, typically have different requirements when it comes to accessing the content within XML documents as typed data.

Consumers of such XML documents that contain rigidly structured data often want to consume the documents as strongly typed XML. Specifically, such applications tend to map the elements, attributes, and character data within the XML document to programming language primitives and data structures so that they can better perform operations on them. This mapping is usually done using either an XML Schema or a mapping language. Listing 1 is an example of a W3C XML Schema document that describes the strongly typed view of the XML document.

Consumers of semi-structured XML documents typically want to consume the documents as weakly typed or untyped content presented as an XML data model. In such cases XML APIs that emphasize an XML-centric data model, such as DOM and SAX, are used to process the document. An XML-centric view of such semi-structured documents is preferable to an object-centric view because such documents typically use features peculiar to XML, such as mixed content, processing instructions, and the order of occurrence of elements within the document is significant.

Choosing a Data Model
The first question to ask is whether it's feasible for an API that is meant to process both rigidly structured XML data and semi-structured XML documents to be based on the same XML data model. XML documents containing rigidly structured data typically consist of XML elements and attributes with character data. Semi-structured XML documents also typically consist of elements and attributes with character data but also utilize other aspects of XML, such as processing instructions, comments, CDATA sections, and entities. However, given that CDATA sections are just a syntactic shortcut around encoding certain types of text and entities are just placeholders for elements or character data, they actually do not have to be represented in the data model. XML documents that contain rigidly structured data as well as semi-structured documents may utilize XML namespaces, which should also be accounted for in the data model.

There are already a number of existing abstractions for XML documents, including the XML DOM, the XML infoset, and the XPath 1.0 data model. Of these the XPath 1.0 data model best meets the requirements set forth in the previous paragraph. The XPath 1.0 data model provides a simple and consistent view of an XML document that is loosely coupled to the text-based nature of the XML 1.0 recommendation. The fact that the XPath 1.0 data model ignores certain aspects of the XML 1.0 recommendation makes it easier to map other domain models to the XPath 1.0 data model. For instance, information such as which quotation characters are used in an attribute or whether character data was directly entered or represented as an entity is not directly exposed in the XPath 1.0 data model. Thus, when exposing a relational database, file system, or in-memory object graph as XML it is easier to do so if the API doesn't require you to expose information that is only pertinent to XML text documents. Examples of such "virtual XML" views of relational and object-oriented data based on the XPath 1.0 data model are Microsoft's SQLXML and the ObjectXPathNavigator on MSDN, respectively.

There is one limitation of the XPath 1.0 data model that makes it less than ideal for use in representing rigidly structured data within XML documents: the lack of support for strong typing. The XQuery and XPath 2.0 data model is the next iteration of the XPath 1.0 data model. The data model is the XPath 1.0 data model with the addition that the data types associated with elements and attributes can be identified using an expanded name (i.e., the xs:QName type). The ability to identify the data type of nodes via the namespace URI and a local name (i.e., an expanded name) provides a loosely coupled mechanism for supporting W3C XML Schema data types and potentially any other type system in which individual types can be identified by an expanded name.

Thus, we have arrived at the XQuery/XPath 2.0 data model as the data model suitable for an API that is meant for processing both rigidly structured XML data and semi-structured XML documents.

A Single Model for Forward-Only Access to XML
In "A Survey of APIs and Techniques for Processing XML," I pointed out that pull-model APIs can handle streaming, forward-only access to XML, as well as push-model APIs. Thus, both categories of XML APIs can be collapsed into a single one: streaming, forward-only access to XML.

Listing 2 is an example of using the pull-based XmlReader class in the .NET Framework to obtain the artist name and title of the first compact disc in an items element (Listings 2-7 can be found at www.sys-con/xml/sourcec.cfm).

In the same article I pointed out that on close inspection a pull-model parser is a cursor that happens to be restricted to being able only to move forward and not back. Listing 3 is an example that utilizes the .NET Framework's XPathNavigator class to obtain the artist name and title of the first compact disc in an items element

From the examples in Listings 2 and 3 it doesn't seem that there is much difference between accessing the contents of an XML document using a cursor- model API and using a pull-based API. But looks can be deceiving. In simple cases there is not much difference between the two, but it does get slightly more difficult in complex cases.

The primary programming idiom when using a pull-based parser is to create a loop that continually reads from the XML document until the end of the document is reached and to act solely upon items of interest as they are seen. The same effect can be achieved using a traditional cursor-model API as shown in Listing 4.

The output of both DumpTree() methods when passed one of the XML fragments from earlier in the article is shown in Listing 5.

From Listing 5 it can be seen that a cursor-model API can be used to walk all the nodes in an XML document in document order in much the same way as a push- or pull-based API. However, in the example above, the code using the .NET Framework's cursor-based XPathNavigator class is more cumbersome than equivalent code using the XmlReader class. This has less to do with the nature of cursor-based APIs and more to do with the fact that the XPathNavigator class does not have helper methods that make it friendly towards traversing nodes in document order.

To make the .NET Framework's XPathNavigator more suitable as an API for pull-based processing of XML, you could introduce a base class called ForwardOnlyPathNavigator, which would possess only the forward-only access methods from the XPathNavigator and possibly an additional MoveToNextInDocumentOrder() method that would make it equivalent to most pull-based APIs. This would then unify the streaming, forward-only access model for XML documents with the cursor model.

A Single Model for Random Access to XML
In "A Survey of APIs and Techniques for Processing XML," I pointed out that cursor-model APIs could be used to traverse in-memory XML documents just as well as tree-model APIs. Cursor-model APIs have an added advantage over tree-model APIs in that an XML cursor need not require the heavyweight interface of a traditional tree-model API where every significant token in the underlying XML must map to an object.

Listing 6 is an example of using the XmlDocument class in the .NET Framework to obtain the artist name and title of the first compact disc in an items element.

Listing 6 shows a common idiom when accessing XML through a tree-model API; the nodes of interest are requested through a query mechanism, then processed as needed. A similar usage pattern is evident in cursor-based APIs as Listing 7 using the .NET Framework's XPathNavigator class shows.

From these examples it can be seen that a cursor-model API is a satisfactory access mechanism for processing XML documents in-memory.

Strongly Typed XML
A large number of consumers of rigidly structured XML data tend to prefer it to be strongly typed so it can be mapped to objects and data structures native to the programming environment. The growing popularity of Object-XML mapping technologies such as JAXB, the .NET Framework's XmlSerializer, and Castor is a testament to this trend. In such cases the consumer is usually not interested in the fact that the data is provided as XML as long as it can be mapped to objects and data structures in the target programming environment.

You would then be justified in expecting that such consumers would be uninterested in XML APIs. However, this is not the case. A number of people have seen the growing benefits of being able to access objects as XML infosets when necessary since it gives them access to a wide range of technologies for processing XML such as rich queries using XPath (this is the basis of the ObjectXPathNavigator described in an Extreme XML column on MSDN). In such cases, a cursor-model API that provides an XML view of an object graph turns out to be quite beneficial. This approach was taken by the aforementioned ObjectXPathNavigator as well as BEA's XML Beans technology.

In some cases this means the ability to nest cursors is important. For instance, many XML Schemas written using the W3C XML Schema Definition Language (XSD) use the wildcards (xs:any and xs:anyAttribute) to enable extensibility of the XML messages being sent. This often leads to some parts of the document being strongly typed while others are untyped. The XmlSerializer in the .NET Framework maps such untyped content to one or more instances of the XmlNode class. Given that an instance of XmlNode can itself provide a cursor over its contents, the cursor over an object that contains one or more XmlNode objects as fields or properties needs to know how to handle nested items that provide their own XML cursors.

Conclusion
The purpose of this article was to explore whether it was possible for a single API to satisfy the major usage scenarios for consumers of XML documents. When I first started this article I assumed the answer was no, but once I actually started to investigate the issue I was surprised to find out that my initial impressions were mistaken. It is actually possible for a cursor-model API based on the XPath/XQuery data model to satisfy the needs of the users of several categories of XML data access technologies. The cursor-model API would be factored into forward-only and random-access versions to balance the needs of those who want streaming access to XML versus those processing entire XML documents in memory. A cursor-model API is also a nice compliment to XML<->Object mapping technologies because it enables users to transform the data within an XML document into primitives and data structures within their target programming language while retaining the ability to access the data as XML nodes.

The primary criticism of such an API is that it would be a compromise across widely differing usage scenarios and thus may not be optimized for the specifics of a given scenario. If such an API was designed and intended to replace existing models for processing XML, it would have to be carefully designed not to have too little functionality by being focused on the lowest common denominator nor too much by trying to have all the functionality of existing API models, thus making it bloated and difficult to use.

It will be interesting to see where the future takes us.

Reference

  • Obasanjo, Dare. (2003). "A Survey of APIs and Techniques for Processing XML." www.xml.com/pub/a/2003/07/09/xmlapis.html
  • More Stories By Dare Obasanjo

    Dare Obasanjo is a Program Manager at Microsoft where he works on the Contacts team. The Contacts team provides back-end support for Windows Live Messenger, Windows Live Spaces, Windows Live Expo, and related services. Obasanjo is also known for RSS Bandit, a popular .NET-based RSS reader.

    Comments (1) View Comments

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    Most Recent Comments
    lavercq 10/08/03 04:42:12 PM EDT

    I try to copy a dictionarie of a local language on my web site.
    But , I can't imagine that all the words (approximatively 20.000 ) will be in only one file ...!
    How can I do to lie the differents files ?
    Thank you
    Faithfully yours.