|
|
YOUR FEEDBACK
SOA World Conference
Virtualization Conference $200 Savings Expire May 16, 2008... – Register Today! Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS Feature
Advanced XML Processing with StAX in ColdFusion
A powerful, fast, and efficient alternative to other ways of XML parsing
By: Jim Collins
Jun. 20, 2006 03:45 PM
Digg This!
Putting support for XML processing in ColdFusion 6.0 was regarded as a major feature upgrade. With the switch to Java, ColdFusion could leverage the existing Java functions in Jakarta Commons and add support for things like Web Services (Axis). However, binding itself to Java also bound ColdFusion to the limitations of the Java feature set.
Conventional CFXML RSS, ATOM, and even well formed XHTML work quite well. The RSS feed is in XML and contains various elements. Like other RSS feeds, the root element is "rss," with XmlText, XmlAttributes, and the channel as elements under the rss root. These elements are shown in Figure 1. The channel element contains the information that we're interested in. You will see that the channel element contains repeated "item" elements corresponding to each article Nic Tunney has written for CFDJ. This item element contains the article title, a URL to the article, a publication date, and a description. The Coldfusion code to get this information is simple and is in Listing 1. It results in the output shown in Figure 2. Here we've used the DOM to extract the information we want. This information could also be extracted using XPath or an XSLT stylesheet. So what's going on here? A CFDump of a ColdFusion XML document using <cfdump var="#MyXML.getClass().getName()#"/> will show that MyXML is a Java object of type org.apache.xerces.dom.DeferredDocumentImpl. ColdFusion has retrieved the entire RSS object and created a DOM (Document Object Model) representation of it in memory. A complete guide to the structure of a ColdFusion XML Object can be found at Click Here!. This is the result we would expect because ColdFusion uses the Xerxes parser (part of the Apache XML Project) to achieve this, and Xerces is a DOM parser. What does this mean? The first method of parsing for XML was DOM parsing. In this model the entire XML file (or feed) is read and a DOM object is created in memory. For this reason, the DOM is referred to as a "tree-based API" because it creates an object resembling a tree in memory. Although this method is simple and straightforward, it has problems associated with it. This is an example of H.L. Mencken's observation that "For every problem there is a solution which is simple, clean, and wrong." Reading in the entire document is excess processing overhead, especially if we're only interested in part of the document. To add insult to injury, the DOM object created by Xerces can be two to three times larger than the original XML document. Another issue with this approach is that frequently the developer will need a different data structure than the one made available by the DOM. It's very inefficient to build a DOM tree and then create a new data structure and discard the original. The DOM model fails when the XML source is very large and can crash ColdFusion. Reading and creating the DOM in memory is also...very...slow.
SAX SAX has a very lightweight memory and processing footprint, and is very fast. Now, this is all well and good but there are still some problems. For one, the callback issue. There's no way in ColdFusion to add a ColdFusion function as a called method. This isn't a ColdFusion limitation. Java developers find using callback methods counter-intuitive. It's like driving in reverse. SAX also still processes the entire document. There's a way to stop processing by throwing an error, but that's like stopping your car by ramming it into a crash barrier. It works but it's probably not optimal. Another drawback is that the programmer must keep track of the current state of the document in the code each time he processes an XML document. SAX isn't completely useless to ColdFusion developers, however. For instance, ColdFusion itself probably uses SAX to validate XML documents. My primary reason for discussing SAX was to introduce you to some basic concepts that are used in another model, StAX, that are very useful to the ColdFusion developer.
StAX One of the great things about StAX is it can process an XML source of any size. And, it's very, very fast. StAX is implemented by using a standard API, which is then implemented by the specific StAX parser. The StAX standard API is defined by JSR 173, which can be found at www.jcp.org/en/jsr/detail?id=173. StAX is supported by a number of vendors such as Sun, Oracle, and BEA, each of which has released its own implementation. The Open Source community is also very active in StAX development, with Tatu Salorama's Woodstox being the leader. Woodstox will be the implementation used in the examples that follow. Woodstox is available for download at http://woodstox.codehaus.org/. These examples will use a CFC that I've developed called CFStAX. The purpose of this wrapper is to make it easy to work with StAX even if you're uncomfortable working with Java APIs, and to simplify some of the setup required. CFStAX is available at www.sourceforge.com/projects/cfsynergy. CFStAX uses Woodstox and was developed with the time and help of Tatu Salorama, its author. StAX offers two models for processing XML, a Stream model and an Event model. These are also referred to as the cursor-style API and the Iterator-style API, respectively. In the Stream Model, the XML source is parsed using the XMLStreamReader object. The XMLStreamReader.next() method returns an integer value corresponding to the event type of the XML object encountered, i.e., start_document, start_element, etc. You can write a series of elseif statements to take various actions based on the event returned by XMLStreamReader.next(). The Stream model is the model most used in StAX primarily because of its simplicity. Using the Event model, an XMLEventReader delivers XMLEvent objects using its next() method. Again, events of interest can be handled by a series of elseif statements. The XMLEventReader is particularly elegant for doing XML-to-XML transformations. There's a reason in both cases for using a series of elseif statements. A case statement is difficult to implement because of ColdFusion's lack of support for static variables. If we could do cfswitch(XMLStreamReader.getEventType() ) then we could use switch statements.
Examples Once the jar files have been added to the classpath, the jrun service must be restarted. Now that we have set up our system, we can go through some StAX usage examples and see how powerful this technique is.
Reading XML with StAX The Event model is similar but uses methods to determine the event properties. See Listing 3. Note that this EventStream code can cause a problem in certain cases, throwing a QName error. This is a known problem and will be addressed in a future release of CFStaX. Writing and merging XML are other functions frequently used, and are very easy to do using StAX but are beyond the scope of this article. Full examples are available in the documentation provided with CFStAX.
Conclusion
References Processing XML with Java (complete book online). www.cafeconleche.org/books/xmljava/ Elliotte Rusty Harold. "An Introduction to StAX." September 17, 2003. www.xml.com/pub/a/2003/09/17/stax.html Lara D'Abreo. "StAX: DOM Ease with SAX Efficiency."January 11, 2006. www.devx.com/Java/Article/30298
XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||