|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS Web Services XML in Transit: Encoding Data
XML in Transit: Encoding Data
By: Simeon Simeonov
Dec. 21, 2000 12:00 AM
I just came back from the first face-to-face meeting of the W3C working group on XML Protocol (is it just me, or is the name somewhat odd-sounding?), and I'm wondering what topics to exclude from this column. Yes, that's right - exclude. Encoding data in XML is a difficult topic for many reasons. First, it's one of those technical subjects in which you need to look at lots of XML instance/schema/DTD snippets. Second, the devil is very much in the details and there are lots of them. Last but not least, there are as many ways to encode data in XML as there are data encoding needs. With this caveat, let's dive in. Keeping with the spirit of the column we'll touch on issues that are most relevant to XML protocols.
Have Protocol, Will Move Data <SOAP-ENV:Envelope There are many data transport scenarios and many possible data encoding styles that can be used with them. To put some structure to the discussion, think of the decision space as a choice tree. A choice tree has yes/no questions at its nodes and outcomes at its leaves (see Figure 1).
XML Data
There's a catch.... The problem has to do with a seldom-considered but important aspect of XML - the uniqueness rule for ID attributes. The values of attributes of type ID must be unique in an XML instance so that the elements with these attributes can be conveniently referred to using attributes of type IDREF (following code snippet). (For more information on the uses of ID/IDREF read "Eliminating Redundancy in XML Using ID/IDREF" [XML-J, Vol. 1, issue 4].) <Target id="mainTarget"/>If your data doesn't use ID attributes you can include it inline (textually) in the XML protocol message under a separate namespace. However, if you do use ID attributes you'll run the risk of violating the uniqueness rule. For example, in the following code both message elements have the same id. This makes the document invalid XML. And no, namespaces do not address the issue. In fact, the problems are so serious that nothing short of a change in the core XML specification and in most XML processing tools can change the status quo. Don't wait for this to happen.
<message id="msg-1"> There are two ways to work around the problem. If no one ever externally references specific IDs within the protocol message data, your XML protocol toolset can automatically rewrite the IDs and references to them as you include the XML inside the message (see code below). This will give you the benefits described above at the cost of some extra processing and a slight deterioration in readability due to the machine-generated IDs.
<message id="msg-1"> However, if you can't do this, you'll have to include the XML as an opaque chunk of text inside your protocol message (see the following code). In this case we've escaped all pointy brackets, but we could have included the whole message in a CDATA section. The benefit of this approach is that it's easy and works for any XML content. But you don't get any of the benefits of XML either. You can't validate, query, or transform the data directly and you can't reference pieces of it from other parts of the message.
<message id="msg-1">
Binary Data You may want to consider using base-64 encoding even when you want to move some plain text as part of a message because XML's document-centric SGML origin led to several awkward restrictions on the textual content of XML instances. For example, an XML document can't include any control characters (ASCII codes 0-31) except tabs, carriage returns, and line feeds. This covers both the straight occurrences of the characters and their encoded form as character references (e.g., ). (This caused me a lot of pain when I was creating WDDX; I still haven't gotten over it.) Further, carriage returns are always converted to line feeds by XML processors. It's important to keep in mind that not all characters you can put in a string variable in a programming language can be represented in XML documents.
Abstract Data Models
All programming language and database data structures can be expressed as DLGs. Therefore, if we have a good way to represent DLGs in XML, we have a generic mechanism for handling abstract data models. We need three things:
As with many things in the XML industry, several specifications address this space. XMI, described in the XML-J article "UML, MOF, and XMI" (Vol. 1, issue 3), offers one mechanism. SOAP defines its own set of encoding rules that are fairly detailed and rather complex. In fact, they take up about 50% of the volume of the specification. The other 50% covers the envelope framework, header/body structure, extensibility mechanisms, intermediaries, error handling, RPC conventions, and HTTP bindings. We won't go into the details; there are too many of them. Suffice to say, in many cases you'll never have to worry about the mechanics of the serialization/deserialization processes. The following code gives you a taste of how the instance data looks, while Listing 2 shows you a possible schema for the data. The instance data markup can appear inside both the headers and the body of a SOAP message.
<x:person> As you can see, a lot is going on here. First, it's clear that the SOAP encoding model depends heavily on XML Schema. ID/IDREF attributes are used to handle multiple references to the same piece of data. The xsi:type attribute can be used to provide type information to the XML processor in the absence of a schema. For some types, notably sequences/arrays, you need to subclass predefined data types. In addition, array content information (SOAP-ENC:arrayType) must be stored in the instance data; pity the array structure syntax is not XML. Pretty much any data can be encoded; there are no limits on the types of objects that can be represented. The schema fragment could have been autogenerated by introspecting some Java classes, for example. There are also ways to encode data without having to worry about the schema at all, using self-describing element names.
Linking Data There are two general mechanisms for doing this. The first one comes straight out of XML 1.0. It involves external entity references that allow content external to an XML document to be brought in during processing. Many people in the industry prefer pure markup approaches and therefore favor using explicit link elements that comply with the XLink specification. Both methods could work. Both require extensions to the existing XML protocol toolsets. Of course, there are purely application-based methods for linking. You could pass a URI known to mean "get the actual content here." However, this approach doesn't scale to generic data-encoding mechanisms because it requires application-level knowledge. External content can be kept on a separate server to be delivered on demand. It can also be packaged together with the protocol message in a MIME envelope. In this case the links to it should probably use the MIME unique-content IDs (CIDs) for identification purposes. Traditionally, SOAP has steered clear of anything having to do with MIME. On the other hand, the ebXML Transport/Routing and Packaging working group is looking very seriously at multipart MIME messages. This historic difference is understandable when we consider that SOAP grew out of RPC work and the ebXML folks are focused on business messaging where, for example, an auto insurance claim might carry along several accident pictures. MIME offers a mechanism to combine the XML protocol message with the external content in a single package.
Choose Wisely
This space is evolving quite rapidly and the pending release of the XML Schema specification will add fuel to the fires of innovation. Fasten your seatbelts - there's little standardization in this space right now and there will be some turmoil before we emerge with sensible ways to approach the common data-encoding scenarios described here. Although there's lots more ground to cover on this subject, I think I should move on quickly to try to stay on top of innovation in the XML protocol space. In the next XML in Transit column I'll take a look at the Web Services Description Language (WSDL), another hallmark joint effort by Microsoft and IBM. Keep it coming, guys. XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||