YOUR FEEDBACK
duwei wrote: 1. Low hourly rate + high quality. 2. Top Adobe Flex outsourcing service provid...
Cloud Computing Conference
March 22-24, 2009, New York
Register Today and SAVE !..


2008 East
DIAMOND SPONSOR:
Data Direct
Frontiers in Data Access: The Coming Wave in Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
Intel
Virtualization – Path to Predictive Enterprise
Green Hills
IT Security in a Hostile World
JBoss / freedom oss
Practical SOA Approach
GOLD SPONSORS:
Software AG
The Art & Science of SOA: How Governance Enables Adoption
PlateSpin
Effective Planning for Virtual Infrastructure Growth
Fujitsu
Automated Business Process Discovery & Virtualization Service
Ceedo
Workspace Virtualization
Click For 2007 West
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TODAY'S TOP SOA & WEBSERVICES LINKS


Getting Started with XML Data Interchange
Getting Started with XML Data Interchange

As software developers, you may be receiving requests for XML support, implementing XML support or using a third-party package that supports or will soon support XML data interchange. Two good reasons to support XML are:

  1. Browser and development tool vendors have announced their XML plans in full or in part.
  2. XML promises to deliver better interoperability between applications and across all platforms - local and remote.
This article presents XML in the context of data interchange, a text file format initially introduced as a document authoring tool or a better HTML. Many developers are now deciding that XML is a great format for electronic data interchange.

XML data interchange is easy. In this article I'll demonstrate the addition of XML support to a preexisting application - a clone of the Minesweeper game from the Windows family of operating systems. I originally wrote the article while learning the Java Swing class libraries and grid layouts. I won't discuss those topics but will focus only on the addition of XML support. This application links with the IBM XML4J and the Sun JAXP1.0 libraries, and assumes JDK 1.2 or higher.

Early adopters of any new technology expose themselves to risks. I'll identify these risks and a roadmap for minimizing your exposure. The approach is through the tried-and-true object-oriented method of encapsulation. We can also minimize exposure by deciding what parts of XML to embrace and what to temporarily ignore. Let's look at the existing standards.

XML Data Standards and Parsers
XML is standards based. The document itself is standardized, as are the programming libraries that read and write XML data. An XML text file can have a separate related file called a DTD (Document Type Definition) that tells an XML-enabled application what to look for when validating XML data. Since the DTD is controversial, alternate XML schemas have been proposed. These were still under development at the time of this writing. Due to this change you may want to consider DTDless parsing.

XML is read and written with a parser. At this stage parsers come in two somewhat standardized flavors: DOM (Document Object Model) and SAX (Simple API for XML). In October 1998 the W3 Consortium defined the DOM parser. The SAX was defined later outside the W3C, mostly as a result of the dissatisfaction with the DOM model. The consortium went as far as to define Java parser interfaces. However, they didn't define the actual classes for parsing, just the Java interfaces. The following is from the Sun Web site:

"However, the Level 1 DOM specification is silent on the subject of how to input and output. It tells you how a DOM has to operate, but does not cover methods for reading or writing XML. As a result, you can't create a DOM from an existing XML file without going outside the DOM Level 1 specification."

Potentially a vendor may provide an input method, but not an output one, and maintain DOM Level 1 compliance. Since the parser libraries for DOM are defined only in terms of required interfaces, you can't code the import statement at the top of your Java classes with any accuracy, because every vendor has different classes that implement the DOM Level 1 standardized Java interfaces. The approach taken here, and demonstrated in the sample code, emphasizes encapsulation of any vendor's Java class. This is appropriate, especially in light of the schema/DTD debate that exists over the data format. Different vendors' parsers will support schemas at different times, or not at all. If you need an XML schema in the future, it would be nice if your application's code could minimize the impact of changing the parser library.

DOM and SAX Compared
Both the Sun and IBM libraries provide DOM and SAX parsing strategies. DOM parsers require the entire document to be "well formed" or parsable before it can be read. If any of the XML is bad, the DOM parser won't receive any data. This isn't true of SAX, which parses only a portion of the data at a time. SAX will raise an exception for bad data portions. If your data is large, consider using a SAX parser or chaining together smaller DOM trees with user-defined continuation markers in the data itself. The DOM model provides an in-memory tree representation of your data.

The SAX model triggers an end element event as it parses the XML and allows you to create your own data structures at that time. DOM was first and is more formally standardized by the W3C standards body. The SAX model, less rigorously standardized, is supported by many parser vendors. Be aware that any data read by a DOM parser can be read by SAX. The reverse, however, isn't always true. Remember, DOM parsing is all or nothing, while SAX implements error tolerance, throws an exception on a bad XML section and continues parsing. The sample code uses the DOM model mostly for ease of use. I like the in-memory tree for small applications. For developers concerned about program size and algorithm efficiency, further investigation of SAX is warranted.

Getting Started
To build in flexible support for XML consider the following:

  • DTDless parsing: If you're interoperating with another XML-enabled application, it may require a DTD; therefore DTD-less parsing may not be an option. If you're able to parse DTD-less, your data interchange strategy will have a greater choice of parsers. Also, memory may be conserved. Doing DTD-less parsing in the data interchange context is analogous to writing to a database with no constraints defined. However, if you do succeed at an initial DTD-less approach, you'll have more options when or if schema-enabled parsers become available. The drawback is that the code has to be written to ensure that valid values are placed in and taken from the XML stream.
  • Use as little of any vendor's library as is feasible: If you look at Figure 1 and Table 1, you can see that XML parsing libraries are large - IBM has over 150 classes. There are considerable differences in the number of classes and interfaces each vendor provides in the parsing libraries. The DOM parser is supported by the same 18 files in each library. If you do code to the superset, changing libraries later will be cumbersome.

Class Framework Overview
A quick overview of the software layers may be helpful. If your organization is designing applications correctly, you'll find clear divisions between the persistence, business logic and presentation layers. Figure 2 provides an overview of the sample code's division of layers. The sample code had no preexisting persistence layer. It was added only for demonstration purposes in this article. Extensive use of the Observable model is used. Any class implementing the Observer interface in Java can add themselves as an Observer of a descendant of an Observable class. In our example code in GameLogic's constructor a call is made to fGetterSetter.AddObserver(this). Subsequently, when ParserWrapper calls setChanged() and notifyObservers(), the update() method of GameLogic is called. This Observer/Observable pattern is also used to communicate between the nonvisual layer - GameLogic and the Java application's mainframe.

Fundamentally, the object model subsumes the lower layers via "hasa" relationships or containment. On the presentation layer the JFrame descendant "hasa" GameLogic instance, which "hasa" ParserWrapper instance. The GameLogic instance uses late binding to determine if it should instantiate a Sun or IBM parser object. You could choose a more Model-View-Controller-compliant object hierarchy, but the simple containment model is sufficient to demonstrate an XML vendor-neutral object hierarchy. The model is event-driven through the Observer/Observable technique of the JDK. Note: In this Internet-enabled age there'll most likely be a separate transport layer to the left of the persistence layer.

The Data
The first focus of an XML project should be data modeling. If it occurs first, more teams can begin their detailed designs in the business logic and transport layers. If DTDs are used, it's the responsibility of the data modeler to provide an XML file sample and a corresponding DTD. In my simple DTD-less example I modeled an XML file using Microsoft's XML Notepad beta 1.5 (see Figure 3).

The game will save itself to an XML file. XML is relatively self-documenting. Define your elements as you see fit. To review the contents, open the saved game data with the XML Notepad. Note that at this time Microsoft's XML Notepad doesn't support DTDs. XML Notepad can be downloaded for free from http://msdn.microsoft.com/xml/NOTEPAD/download.asp. XML schemas and DTDs allow the data modeler to define the constraints of data without specifying the code, similar to constraints on a database. Reasons to use them will be compelling when matching standards and parsers are available. The development team with the most flexible implementation can be the first to use the new parsers.

Unless you're interoperating with another vendor, you have complete control over the data. A tool such as XML Notepad is ideal for modeling your data structures. XML files are fairly self-documenting. Figure 3 shows an XML file for a game that's been in progress for four seconds; it has 10 rows and columns. The Items element has Row elements with Piece elements that have attributes of Id (the column), ShownValue, UnderlyingValue and isPlayed. A DTD would be useful to limit isPlayed to [0,1] or ['T','F'].

The Class
For the sample code I've chosen a DOM parser. To begin parsing it's not necessary to understand all 150-plus Java classes and interfaces provided by IBM, nor is it necessary to understand the 80-plus classes provided by Sun. Most of what you need to know can be found in five central interfaces: Attr, Document, Node, NodeList and NamedNodeMap (see Figure 4). Since these are in the DOM package, they're found in both libraries. Unfortunately, we need to go outside the specification to implement the actual I/O and find the classes that implement these interfaces. (It's beyond the scope of this article to explain the interactions of the 100-plus classes found in IBM's XML4J, but if you study the five interfaces in Figure 4 you'll have a good idea how to get started with DOM. The methods - the core interfaces imported by ParserWrapper - have been named appropriately so you should have a good idea what they do.)

It's easy to get started with these interfaces. Listing 1 provides the code that would read and write the gameTime parameter to or from an in-memory DOM tree, assuming that Node n is the Parent-labeled game in Figure 3.

The ParserWrapper Class provides the following:

  1. Retrieves any given Node in a one-step, straightforward manner.
  2. Allows user-defined keys to be added for the retrieval of any Node element on the tree.
  3. Encapsulates the Node interface.
  4. Implements primitive-type getters and setters that allow users to get and set typed values to the tree without type casting or translation.
  5. Provides abstract definitions of the required I/O classes so implementation may leverage vendor specifics if provided.
  6. Provides for SAX-like, end-element events. (This will aid encapsulation if the object ever implements SAX internally.)
  7. Mapping of any data structure. (Our example most closely resembles an array, but the class can just as easily map any arbitrary data structure.)
The problem with the setGameTime() and getGameTime() samples is that the interaction with the business logic layer requires a node parameter. The Node class is a DOM interface. This object framework should be flexible enough to conceivably switch to SAX later. In addition, the business logic shouldn't have any knowledge that XML is involved in the persistence layer.

The solution is found by allowing the ParserWrapper to call back to the GameLogic class as it walks the tree. This happens when the ParserWrapper.walkTree(..) method is called. It calls back for each Node on the tree as it's traversed. The GameLogic, in this callback, can then name the Node (see the ParserWrapper method putCurrentNodeToHashTable(String strHashKey)) for later retrieval from a private hashtable in the ParserWrapper Object using the getValue(String hashKey, String attrName) function call. At this point all XML knowledge is contained in the ParserWrapper () class and any attribute value can be accessed in a quick one-step fashion via user-defined hashkeys.

The callback mechanisms described above are implemented with the JDK Observer/Observable design pattern. This could be a problem if you have a predefined persistence layer that's a descendant of any class other than java.lang.Object. If this is the case, consider implementing your framework with a Model-View-Controller design pattern.

During construction of the gameLogic object, the following line of code defines the game logic as a receiver of the Sax-like events.

public GameLogic(ParserWrapper AGetterSetter, String FileName )
{
fGetterSetter = AGetterSetter ;
fGetterSetter.addObserver(this);
// ParserWrapper will call my update when it walks the tree
fGetterSetter.OpenIt(FileName);
fGetterSetter.walkTree("Game","Items", true, false, false,false, false, true );
this.setTotalMines(this.getTotalMines());
}
When walkTree() is called, since the GameLogic Object has added itself as an Observer, the ParserWrapper calls the update() method of the game logic:
private String getHashCodeKey(int I, int J){
return "Piece["+I+"]["+J+"]";
}
public void update(Observable o, Object arg) {
if (o.equals(fGetterSetter)){
if (arg.equals("Row"))
currRow = fGetterSetter.getCurrentValue("Id",0);
else if ( arg.equals("Piece")){
int iCol = fGetterSetter.getCurrentValue("Id",0);
fGetterSetter.putCurrentNodeToHashTable(this.getHashCodeKey(currRow,iCol);
}
}
}
When the game logic needs to know if a piece is already played, it's accessed simply by the GameLogic objects public method:
public boolean getPlayed(int i, int j){
return fGetterSetter.getValue(this.getHashCodeKey(i,j),"isPlayed",true);
}

Selection Criteria
Parser selection criteria vary from application to application. On the Internet program size is always an issue. Parsers can be large - 500KB. In addition, application developers might opt to parse their XML streams with both SAX and DOM strategies in the same application. There are many issues associated with parser conformance to XML specifications. A good place to learn about them is www.oasis-open.org/. Oasis publishes conformance tests on their Web site.

When schema support becomes available, development teams that have chosen a vendor-neutral path that maximizes flexibility will be rewarded. Development teams should accommodate change by double-checking the flexibility of their implementations. Essentially, development teams should ask: "What is our vendor neutrality strategy?"

An interesting site on the Web that promises to help developers with vendor neutrality and parser complexity can be found at www.jdom.org. JDOM believes simpler is better.

About the Sample Code
The code presented here isn't intended as a template for the next "killer" XML app, but there are some good design elements. It would be interesting to see a third vendor's DOM parser pushed into this object model. Alternatively, you could try using Sun and/or IBM's SAX model implementations and draw comparisons.

About Mark Wardell
Mark Wardell is a member of CGI Houston's Advanced Web Technology Practice. CGI is North America's fifth largest independent IT consulting firm.

YOUR FEEDBACK
Buck Woodson wrote: Good overview. "The SAX model, less rigorously standardized, is supported by many parser vendors. Be aware that any data read by a DOM parser can be read by SAX. The reverse, however, isn't always true. " Seems like all readers should just be sax maniacs. The MS XML notepad link is a dead doc. When I have a pressing need to dive in I will look closer. Thanks!
XML JOURNAL LATEST STORIES . . .
A round-up of the many themes and topics of interest to infrastructure architects, developers and IT managers featuring at SYS-CON's Cloud Computing Expo being held November 19-21, 2008 at The Fairmont Hotel in San Jose, California. The conference is expecting a record turnout of senio...
SYS-CON Events announced today that the leading global SOA, Virtualization, Cloud Computing and Open Source technology provider FreedomOSS named "Gold Sponsor" of SYS-CON's SOA World Conference & Expo which will take place November 19-21, 2008, at the Fairmont Hotel in the heart of Sil...
Cloud Computing offers significant benefits over traditional solutions for deploying production systems as well as for conducting development and testing activities. This session will distill the unique characteristics of clouds and describe how to best think about deployments in the c...
Intel has just released Intel XML Software Suite 1.2. This latest release helps maximize XML performance, while minimizing the effort for any Enterprise, SOA, SaaS, and Web 2.0 based applications. Intel XML Software Suite 1.2 optimizes XML application performance, takes full advantage ...
SYS-CON Events announced today that the leading global SOA, Virtualization, Cloud Computing and Open Source technology provider Intel named "Gold Sponsor" of SYS-CON's SOA World Conference & Expo which will take place November 19-21, 2008, at the Fairmont Hotel in the heart of Silicon ...
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON FEATURED WHITEPAPERS


ADS BY GOOGLE