YOUR FEEDBACK
John Portnov wrote: This code does not work for me. I created a new website and a C# console applic...
AJAXWorld RIA Conference
$300 Savings Expire August 22
Register Today and SAVE!


2008 East
DIAMOND SPONSOR:
Data Direct
Frontiers in Data Access: The Coming Wave in Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
Intel
Virtualization – Path to Predictive Enterprise
Green Hills
IT Security in a Hostile World
JBoss / freedom oss
Practical SOA Approach
GOLD SPONSORS:
Software AG
The Art & Science of SOA: How Governance Enables Adoption
PlateSpin
Effective Planning for Virtual Infrastructure Growth
Fujitsu
Automated Business Process Discovery & Virtualization Service
Ceedo
Workspace Virtualization
Click For 2007 West
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TODAY'S TOP SOA & WEBSERVICES LINKS


Making XML Ready for E-Business
Making XML Ready for E-Business

This month I'm going to discuss a new technology being standardized within the World Wide Web Consortium (W3C) that will have a major impact on the way XML will be used in the next few years.

As many of you know, Document Type Definitions (DTDs) were part of the XML 1.0 specification and provided a way to define and constrain the structure of a document. By document here I'm not restricting us to books or articles. Rather I mean the full class of data that XML might represent - including application-to-application messages, program data and anything else that needs the structure that XML provides. DTDs can provide only some of the structure needed for e-business applications and that's why the forthcoming W3C XML Schema definition is so important.

The XML DTD specification is essentially a subset of the SGML DTD specification originally standardized in 1974 under the leadership of Charles Goldfarb. SGML in turn grew out of the GML work started by Goldfarb and others at IBM in the 1960s. In some sense, then, XML is a member of at least the third generation of markup languages - which is why, even though it became a W3C Recommendation only in early 1998, we can state that it's a mature technology.

Shortcomings of DTDs
All this maturity aside, though, much (but not all) of the work involving SGML was for publishing applications. While XML DTDs allow you to define elements and attributes and specify how elements should be nested, you can't say anything about the form of the element content and very little about attribute content. If you're concerned only with publishing, this is adequate, but it's too simple if you're using XML as the syntax for portable application data.

Consider the following XML snippet:

<stockTransaction type="buy">
<symbol>IBM</symbol>
<quantity>1000</quantity>
<limitPrice currency="US">120.00</limitPrice>
</stockTransaction>

If we want to create a visual rendition of this we can generate HTML and produce a table as shown in Table 1.

While it's important for this information to be correct, we're not checking the format of the values in any way, just displaying them. A DTD would allow us to specify that the only allowable values for the attribute type are "buy" and "sell," but it can't express that the stock symbol must be from a given collection, that the quantity must be a positive integer or that the limit price must be a decimal within a certain range of values. It's up to the application program that's processing the data to ensure that the values are acceptable. If the application is going to compute with these values or do database queries based on them, they need to be in the correct form. It's up to every application program that processes the data to ensure its validity before using it. Thus every application programmer must write or borrow routines that check that the data is in the correct format.

Moving Beyond DTDs
The above examples are fairly straightforward. But what if we're expecting a product number in the form of "3 capital letters, followed by two digits, followed by a hyphen, followed by 4 digits" - for example, FGY78-5427? This is the basic problem with DTDs: they force applications to implement data validity checks for a wide range of potential formats, an error-prone process and one that must be repeated for every programming language that might process the data. The XML parser checks that the document is well formed and perhaps does a structural validity check, but the application program does the rest.

It makes more sense to move the format-checking routine into the parser so that applications can concentrate more on doing whatever they're supposed to do with the data. XML parsers are generic tools that work on XML data from any source. If there is common XML processing performed on data before an application does its special work, it's reasonable to consider moving that common function into the parser.

Of course, we could end up with some pretty big parsers if we don't build them in a modular way that allows us to choose the functionality we want. That's why, for example, IBM's XML4J parser (now the primary code-base for the Apache Xerces XML parser) offers a validating configuration along with a smaller nonvalidating version. Nevertheless, moving this functionality out of applications and into the parsers is a smart thing to do to get more reliable code.

Let's look at how we might constrain the quantity element in the transaction above. If we simply want to insist that it be a positive integer, we can express it this way in the XML Schema definition:

<element name="quantity" type="positiveInteger"/>

If we want to limit it to a maximum of 10,000 shares it takes a bit more work, but we can do it like this:

<element name="quantity">
<simpleType base="positiveInteger">
<maxExclusive value="10001">
</simpleType>
</element>

Using the XML Schema
While you could write these by hand, XML tools that support schema creation will have user interfaces that simplify the way you define the format for the XML data. The draft XML Schema specification has a whole document devoted to datatypes. David Fallside's Primer (XML Schema Part 0: Primer, by David C. Fallside of IBM), also part of the draft specification, provides an excellent introduction.

By the way, you probably noticed that XML Schema uses XML syntax. Schemas can thus be manipulated by standard XML tools such as editors and XSLT processors.

I've made the point that XML application programming becomes easier when datatype checking occurs in the parser. However, a subtler point is that the schema actually documents the data formats. There's no standard way of doing it in a DTD, though you can play tricks with attributes and comments.

The schema allows us to separate the processing logic between the parser and the application. If we later decide that we really want to restrict the quantity of stocks to be bought or sold to be less that 5,000, we can simply update the schema and all programs that use the new schema will inherit the change. Obviously this is easier than requiring changes to all the programs themselves.

The types used in XML Schema go beyond the simple integers and decimals in the foregoing example. The stockTransaction is an instantiation of a complex type. XML Schema provides sophisticated methods for reusing schemas and the simple and complex types within them. For example, we could create a new type by extending stockTransaction to include additional information such as the stock owner name, brokerage, brokerage ID number, date of transaction, settlement date and so on. (This kind of derivation should be familiar to C++ and Java programmers.) If the base stockTransaction gets changed, the new complex types created from it will automatically inherit the changes.

You can also restrict types so that only a subset of the possible values can be used in the XML data. We could, for example, create a special kind of stock transaction where the quantity must be between 100 and 1,000 and the limit price must be greater than $100.

These features in XML Schema have given XML enough power to represent real business data for transactions, application and session data, and database support. The work done by the W3C XML Schema working group has added an important component for making XML ready for e-business. XML is now a first class language for representing portable data that is independent of the programming language, application and operating system used to create it.

Conclusion
Later this year you should start looking for tools that support XML Schema but you should plan your migration strategy from DTDs now. For the basic parser technology I recommend you learn about and track the work being done on Xerces within the Apache organization at xml.apache.org. You can get the latest XML Schema specifications from the W3C at www.w3.org/tr

About Dr. Robert S. Sutor
Dr. Bob Sutor is Director of Marketing for IBM's WebSphere Foundation Software as well as its Web services and SOA efforts. A 21 year veteran of IBM, Sutor has spent most of his career in IBM Research, specializing in symbolic computation and Internet publishing. In 1999 he moved to the IBM Software Group and focused on jump starting industry use of XML. This led to positions on the Board of Directors of the OASIS standards group and the vice chairmanship of the ebXML effort, a joint OASIS/United Nations endeavor. Sutor then led IBM's industry standards and Web services strategy efforts. He currently leads IBM's marketing efforts around the WebSphere Application Server and enterprise modernization software. Sutor is a frequent speaker on WebSphere, Web services, and Service Oriented Architecture. He is widely cited in the press and was recently featured in interviews in the Harvard Business Review and InfoWorld.

XML JOURNAL LATEST STORIES . . .
ISO said Friday that the appeals made by Brazil, India, South Africa and Venezuela protesting the standardization of Microsoft’s Office Open XML (OOXML) file format hadn’t gone anywhere – it was unclear whether any of them had any standing anyway – but since they “failed to g...
Red Hat CTO Brian Stevens, Citrix CTO Simon Crosby, Egenera CTO Pete Manca, Allen Stewart, Group Manager, Windows Virtualization at Microsoft, and Brian Duckering, Sr. Director of Products and Alliances at Symantec were the top industry executives who joined Jeremy Geelan in the 4th Fl...
Two of the biggest launches in Rich Internet Application history took place in 2007/2008 when Adobe launched AIR 1.0 in February '08 and Microsoft launched Silverlight (September '07). At the 6th International AJAXWorld RIA Conference & Expo in October SYS-CON Events is delighted to be...
Since its inception, XML has been criticized for the overhead it introduces into the enterprise infrastructure. Business data encoded in XML takes five to 10 times more bandwidth to transmit in the network and proportionally more disk space to store.
Vordel unveiled version 5.1 of its XML network infrastructure products, to accelerate, manage and protect XML applications. Vordel 5.1 addresses the need for lifecycle management of policy across the SOA. By combining the central management of SOA policies with distributed enforcement ...
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON FEATURED WHITEPAPERS


ADS BY GOOGLE
BREAKING XML NEWS
Avineon, Inc. (http://www.avineon.com), a successful provider of IT, geospatial, engineering and pro...