YOUR FEEDBACK
cautionyou wrote: I agree with that the biggest change is the breadth of the projects that are hap...
Cloud Computing Conference
March 22-24, 2009, New York
Register Today and SAVE !..


2008 East
DIAMOND SPONSOR:
Data Direct
Frontiers in Data Access: The Coming Wave in Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
Intel
Virtualization – Path to Predictive Enterprise
Green Hills
IT Security in a Hostile World
JBoss / freedom oss
Practical SOA Approach
GOLD SPONSORS:
Software AG
The Art & Science of SOA: How Governance Enables Adoption
PlateSpin
Effective Planning for Virtual Infrastructure Growth
Fujitsu
Automated Business Process Discovery & Virtualization Service
Ceedo
Workspace Virtualization
Click For 2007 West
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TODAY'S TOP SOA & WEBSERVICES LINKS


XML Pooling
XML Pooling

XML is a popular data exchange standard. With a platform-independent language such as Java to process it, XML - if applied well - can make data more efficient. Today, almost all data in the industry flows in an XML format. Ironically, however, we haven't put as much effort into enhancing XML as we've put into Java. This has to change. It's to our benefit to focus on XML as much as we do on Java.

Java's performance can be enhanced by reducing the time it takes to garbage-collect objects. To do this, Java programmers use the concept of precreating pools of objects, threads, and connections to fine-tune the code and optimize performance. However, we're missing something by just focusing on Java. What about the data in applications? Why don't programmers try to tune that too? We can apply this same pooling concept to XML.

In this article I provide an example in which such a concept can be applied. I use a DTD parser that's available in the market and the DOM API (Document Object Model; Apache's Xerces implementation) to implement it. The idea is to have a DTD, create an XML from the DTD program, then pool XML instances to be populated when data arrives.

Why XML?
First I'll discuss the logging application, an important part of any enterprise application. The specific functions of logs vary; for example, there are systems, transaction, and security-related logs. Obviously the amount of logs generated can be quite large; however, they're not really helpful if they're plain text. They're not user-readable and, more important, if you want to find a particular piece of information, they're not searchable.

For these reasons, XML is perfect for generating log information. XML's advantage is that it's searchable and can churn quality return hits. We can structure the document in such a way that we can have the data demarcated into separate subtrees under a common root node. This way you can use a system that doesn't have to hold the entire tree in memory. Separate components can be used to populate the appropriate subtrees.

In addition, XML allows us to attach different stylesheets to the log and provide information to various users in a variety of formats. You can put certain information on the Web as HTML or in a Word document format. XML also enables formatted information to be provided to users who want certain views of the logs (e.g., who logged in at a particular time to a particular application). This is supported because XML can be searched.

From Java Pooling to XML Pooling?
Pooling is a concept that's been around for some time now. It's used extensively in Java for object pooling to reduce object churn and, in turn, reduce the time for garbage collection. The concept is to precreate objects in a sizable amount to reduce the overhead of creating them anew at runtime, thus increasing the application's performance. This is particularly important when you need to use an object for processing since it creates room for increased efficiency. If pooling is used to preload objects so that responses can be taken care of in an efficient manner, why hasn't it been applied to XML processing? Is it applicable? Well, we'll see how.

Loading a DTD
The idea is to emulate Java. In a Java application you have a class loader. When you want to use Java classes, you would load them into memory and create instances of them. Consider a DTD you've written. If you could load the DTD in memory, instantiate it like a class, and create instances that are the actual XML documents, you can also apply pooling concepts to XML.

Let's go back to the logging example. You can define a standard DTD (refer to the DTD provided on XML-J's Web site: (www.xml-journal.com) for logging that would include subtrees for transaction, system, and error logs. The DTD that's defined in the code is a simple one and doesn't contain any complex data structures. However, the concept can be extended to accommodate complex DTDs.

Once we have a DTD defined, we need to load it into memory using certain tools. There are DTD parsers available in the market that would load the DTD and provide programmers with data structures from which to extract the DTD components. These data structures allow for the extraction of the elements, the content type, and the attributes. However, they don't tie into the standard DOM APIs.

The DTD parser breaks the format into a set of elements and attributes but doesn't retain the correlation between these pieces. One way to maintain the structure is to build a relation structure - a tree in memory - and define the element's relationships and attributes.

Building the Template
Once the DTD is loaded into a structure in memory, we need to create instances of the DTD. The DOM API allows the creation and manipulation of XML documents. The tree could be navigated and the element's type objects mapped to the DOM elements. Attributes of each element can be added to the DOM element's objects with default or empty values. The document object in DOM allows the elements to be deleted and added to the document through its API. Any implementation of a DOM 2.0-compliant API could be used to build the XML. The leaf nodes, which are the values of the XML elements, can be populated when the data arrives.

This way, once the XML document is created, we can precreate a pool of documents. This saves time and enables us to create the documents and build the XML when we receive the data.

Populating the XML Document
Now when the data flows into the logging application it can be added to these precreated XML documents as text nodes. Since the text nodes are added to the precreated documents we'll have saved time, thus helping with performance. By that I mean an XML document can represent all the logs for a session. This way you can create Enterprise JavaBeans that would maintain the state of the session in which the information is logged so you can send the information, then store it in the XML.

There is, however, one glitch. What if the DTD has a choice of elements? How does one precreate these elements and then populate the data? The best way is to leave this decision till runtime, then create subtrees if the nodes are up on the hierarchy - for example, second-level nodes. The next step is to create instances of both. If one is used, the other part could possibly be used in another XML instance. However, if they're lower-level nodes, then build the tree at that point. There could be better ways to do this.

Having resolved this, we could pick the data, add text elements, then add attributes to the elements. Thus we've cut down on the time needed to create the entire set of upper-level nodes, which we know will be present. Couple this with EJBs using stateful session beans where the state is maintained across the entire life cycle of the logs (also the creation of the XML document). You can achieve failover and fault-tolerance in logging. In addition, the stateful session beans allow the logs to be collected based on the sessions, so that data for a session can be stored in one log. This makes it easier to collate information from various units based on sessions.

Storing the XML Document
The XML document can be serialized based on the type of storage mechanism that's used. In a relational database we could store it as a string of all the serialized XML. In addition, if we're looking at an object database, we can store the XML as a DOM object.

Advantages of Using XML
The advantage of using XML is that information that's stored, such as logs, can now be searched. XML also allows data to be displayed in multiple formats by simply attaching a stylesheet. You can display the logs as HTML files over the Net and include that in a report in a Word or RTF format. This way we can keep the displays open, even format them as WML documents and send them to the user.

References

  1. DTD Parser Wutka: www.wutka.com/ dtdparser.html
  2. DOM API: www.w3.org/DOM/
  3. Xerces API for DOM 2.0: http://xml. apache.org/xerces-j/index.html
XML JOURNAL LATEST STORIES . . .
A round-up of the many themes and topics of interest to infrastructure architects, developers and IT managers featuring at SYS-CON's Cloud Computing Expo being held November 19-21, 2008 at The Fairmont Hotel in San Jose, California. The conference is expecting a record turnout of senio...
SYS-CON Events announced today that the leading global SOA, Virtualization, Cloud Computing and Open Source technology provider FreedomOSS named "Gold Sponsor" of SYS-CON's SOA World Conference & Expo which will take place November 19-21, 2008, at the Fairmont Hotel in the heart of Sil...
Cloud Computing offers significant benefits over traditional solutions for deploying production systems as well as for conducting development and testing activities. This session will distill the unique characteristics of clouds and describe how to best think about deployments in the c...
Intel has just released Intel XML Software Suite 1.2. This latest release helps maximize XML performance, while minimizing the effort for any Enterprise, SOA, SaaS, and Web 2.0 based applications. Intel XML Software Suite 1.2 optimizes XML application performance, takes full advantage ...
SYS-CON Events announced today that the leading global SOA, Virtualization, Cloud Computing and Open Source technology provider Intel named "Gold Sponsor" of SYS-CON's SOA World Conference & Expo which will take place November 19-21, 2008, at the Fairmont Hotel in the heart of Silicon ...
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON FEATURED WHITEPAPERS


ADS BY GOOGLE