YOUR FEEDBACK
cautionyou wrote: I agree with that the biggest change is the breadth of the projects that are hap...
Cloud Computing Conference
March 22-24, 2009, New York
Register Today and SAVE !..


2008 East
DIAMOND SPONSOR:
Data Direct
Frontiers in Data Access: The Coming Wave in Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
Intel
Virtualization – Path to Predictive Enterprise
Green Hills
IT Security in a Hostile World
JBoss / freedom oss
Practical SOA Approach
GOLD SPONSORS:
Software AG
The Art & Science of SOA: How Governance Enables Adoption
PlateSpin
Effective Planning for Virtual Infrastructure Growth
Fujitsu
Automated Business Process Discovery & Virtualization Service
Ceedo
Workspace Virtualization
Click For 2007 West
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TODAY'S TOP SOA & WEBSERVICES LINKS


Preparing for Tomorrow - Today
Preparing for Tomorrow - Today

Today many companies are evaluating the application of XML to their technology initiatives. With all its potential, performance, scalability, and accessibility implications need to be considered when developing an implementation strategy utilizing XML.

We all know that XML is the enabler for doing business over the Internet. Its self-descriptive nature simplifies the exchange of data between parties, making it a powerful standard that simplifies B2B communication. Yet, while this ease of interaction has many benefits - speeding transaction times, opening up new channels, establishing access to data never before attainable - it introduces a challenge that could actually inhibit XML's mass adoption. Namely, if XML is as successful as we expect it to be and the number of XML business transactions grows exponentially, can the currently installed B2B infrastructure support such unprecedented levels of activity?

This four-part series addresses this question by studying several system components likely to be impacted by the large volume of XML transactions generated in an automated commerce chain. The series will focus on key attributes that are critical for a successful B2B system and demonstrate that proper advanced planning will ensure the scalability and performance of XML-based systems as XML transaction levels increase.

This first article explores what happens to a transaction when it's represented in XML and how it impacts the performance and scalability of B2B e-commerce systems. In addition, I'll identify the key attributes of a B2B-XML transaction and the impact of this additional context on performance and scalability. Understanding this cause-and-effect relationship is the first step in assessing the impact of using XML.

The Nature of XML
To discuss the performance and scalability of XML-based systems, it's important to understand the impact of converting a transaction to XML. You're probably familiar with many of the attributes of XML: it's self-describing, flexible, and actionable, making it the ideal foundation for B2B interoperability. However, each of its positive attributes (flexibility, extensibility, ease of use, and platform independence) comes at a price: transaction size, externally defined data structures, text representation of data, and text representation of attributes and qualifiers that establish a well-formed XML transaction. Each attribute contributes in a different way to the overall performance and scalability of the XML-based system.

Figure 1 illustrates a typical business transaction, a purchase order (P.O.) between Buyer 1 and Supplier A for 100 widgets. The P.O. transaction content is shown in a printed format, as well as a more compressed delimited format. Historically, both of these formats have communicated the transaction between business partners. But though they contain the information required by the business user, they fail to provide the supporting information required by the user's application to automatically process and act on the content of the transaction. Typically, users rekey this information into their business application in order to process the transaction.

Now let's take a look at this same transaction when represented in XML. One advantage of XML is that it provides a description of the content in the transaction that's separate from the actual transaction data. This document type definition (DTD) contains information that describes the data contained in the transaction. In some cases the DTD is included within the body of the XML transaction; in others it's a separate file that's referenced only within the XML transaction. Listing 1 shows the same P.O. transaction in XML. Note that this transaction has a unique DTD - <!DOCTYPE PURCHASEORDER SYSTEM> - that's not included within the XML representation of this P.O.

While XML defines the alphabet (encoding) and grammar of a language, it doesn't provide any context, which is needed if a B2B conversation is to have any meaning. Many parallel efforts are driven by a wide range of standards bodies to create dictionaries that will provide context for specific business communities' emerging standards bodies. These efforts hold the promise of providing the context that's lacking in the XML specification itself. One standards body, the Open Applications Group, Inc., (OAGI) (www.openapplications.org), defines its dictionary in terms of business object documents (BOD). Each BOD represents a specific function within the business process. When the data from a P. O. transaction is represented in the form of an OAGI BOD, there will always be an increase in the size of the transaction. Listing 2 demonstrates this increase when the data from Listing 1 is encoded in the OAGI Process P.O. BOD.

These examples demonstrate that the OAGI version of the XML is more comprehensive in its content and context than any of the formats shown previously in Figure 1 and Listing 1. The P.O. BOD is flexible enough to handle variations in language, time zone, and units of measure. This additional specification allows more flexibility and enables the recipient of the transaction to act more precisely while executing the order. The OAGI BOD attributes include:

  • Greater flexibility: Note the date format and ability to specify time zones.
  • Extensibility: User and partner specific areas provide for additional information.
  • Ease of use: The tags that provide business context and structure are well defined.
The question remains: Are these benefits worth the cost from a performance and scalability perspective?

It's Not the Size of the XML That Matters - It's What's in It That Counts
First and foremost, the growth in the size of the transaction when it's flexibly described using a specific XML business dialect, as in Listing 2, is significant. In Figure 1, the size of the transaction with formatting is 300B, and this could be compressed into a delimited data interchange that would further reduce the size to 180B. This same transaction represented in XML without a business-oriented dialect would expand more than three times (see Listing 1). When you add the flexibility offered by a business-oriented DTD, such as the OAGI BOD, the file becomes five to 10 times larger, as shown in Listing 2. In this example our 300B transaction grew to 2,992B - almost 10 times the size of the original transaction - once the OAGI XML BOD was applied.

If the source transaction is increased by a factor of 10, then on a linear basis one can predict the impact. If the current B2B capabilities support 100 transactions per supplier per day, then 10 suppliers require 600MB of capacity for the XML transactions alone over the course of a year. Compare this to the 40MB required when the transaction is in a delimited form, and the effect of XML utilization on bandwidth and storage capacity becomes apparent. Note that these figures don't take into account the overhead associated with the indexing, filtering, and segmentation of the transaction for the purpose of retrieval, nor do they account for the size of the DTD, which is accessed with the transaction at the time of processing.

All Tags Are Equal... But Some Are More Equal Than Others
Another key attribute of XML is its self-describing capability through the use of tags that surround the XML data elements. The DTD provides a data representation for any given transaction, and this representation may be included as part of the transaction. To properly parse the XML transaction, a system must first read the DTD, which tells the system what elements to expect and what relationships exist between the various elements. It's this information contained within the DTD that enables an application to act on the data contained in a given XML transaction. When processing text using XML, no additional processing is required for transforming or presenting the data. But when processing dates or numbers, it's an entirely different matter.

The application that receives an XML transaction must typically take some action based on a numeric calculation performed on the data contained in the transaction. Since numeric fields in XML are represented as text, the parser must first convert the data into a numeric representation. This must be done prior to the application performing its calculation.

The performance implications of this XML implementation detail are more difficult to predict. In general, numeric calculations are more performance intensive than date calculations. Text calculations are the least intensive. To estimate the impact of this attribute, the logic applied to a given transaction must be broken down into three types of discrete elements for the transaction. The more calculations based on a number or date element, the greater the performance and scalability impact to the system.

Validate Now or Validate Later....Either Way, You Must Pay
Finally, consider the use of qualifiers and required elements within the DTD and their impact on performance. One of the key strengths of XML is its flexibility. By defining elements as optional, it's possible to generalize transactions by type, so the transaction can be made applicable to many diverse situations. As a result of this flexibility, however, the processing required to resolve these transactions requires more application logic. This additional logic will also influence system performance.

One alternative to improve performance is to embed the logic wherever possible in the DTD as required elements or through qualifiers. While this reduces the runtime processing, it also makes the test for a well-formed transaction more processor intensive since there are additional steps to test this transaction against. This impacts performance and affects the scalability of the system. Flexibility presents some additional challenges or requires an extra step, which is eliminated as soon as the transaction is validated.

Planning for Performance and Scalability
The performance impact of these attributes, while considerable, shouldn't be used as an argument to dismiss XML. On the contrary, by knowing the cause and effect of these attributes on a given XML system, it's possible to balance performance and scalability against the flow of transactions into the production environment. If a system requires flexibility, extensibility, ease of use, and platform independence, then XML is very appropriate. By knowing the impact of these key attributes on the performance of a specific system, it becomes apparent how to plan for maximum performance and scalability.

For example, consider the choice between a system that stores XML transactions and one that resolves transactions into a specific database schema. Considering the performance implications of XML's key attributes, it's important to weigh the need for change in the transaction and the variations to those supplier transactions versus the need to resolve the transactional content into its constituent data types and relationships for application performance. If an XML system must accept a significant volume of transactions, and those transactions are consistent across suppliers, then the need for adaptability is secondary. In this case the system could be designed to utilize XML as an interoperability layer, and the transactions can be resolved into a database as they arrive.

Mapping directly into a database structure, however, isn't always appropriate. For example, if there are many different types of transactions across a range of time with many systems, it becomes extremely difficult, if not impossible, to know in advance exactly what data elements will need to be stored or the relationship between elements. Without this advance knowledge, the database schema can't be designed to maximize performance.

If the situation is more dynamic, however, the decision may be different. Consider the need to provide dynamic analytics on a multitude of transaction types, across a wide span of time, and with various systems. In this instance the required flexibility will almost certainly demand the data be stored in native XML. Knowing that more performance is required when handling XML, one must design appropriate performance and scalability into the system. Knowing the influencing XML factors, such as the cost of numeric calculations or growth in size of the transaction, allows the system to be architected so it can grow as needed.

Parting Thoughts
The performance impact of XML attributes shouldn't discourage programmers and system architects from using XML. On the contrary, by knowing the cause and effect between the attri butes and a given XML system, it's possible to balance performance and scalability against the flow of transactions into one's environment. By understanding how XML is used in an environment, it becomes apparent how to plan for performance and scalability.

To maximize the performance and scalability of an XML system, you must balance the requirements of the enterprise with the role of XML. If the requirements dictate an adaptable dynamic solution where the evolution of the solution is unclear, then maintaining the transactions within the system as native XML will pay dividends that far outweigh the performance impact of doing so. If it's clear that the enterprise solution can be bound to a single data model that will evolve slowly and in a predictable way, then performance and scalability can be allowed to dictate the system implementation, and the role of XML becomes that of an integration layer that maps nonconforming transactions into the relational data model.

Given the turbulence and infancy of today's B2B landscape, I recommend focusing on the opportunity XML offers to understand fully the impact of that decision from the performance and scalability perspective. As the adoption of XML grows and as XML tools and applications become more prevalent, the performance and scalability discussion will focus on the specific implementation details.

Part 2 will focus on storage and retrieval issues associated with using XML. I'll discuss the scalability, performance, and context implications associated with storing XML in its native format versus resolving it to a database. If you'd like me to discuss some particular aspect of this topic, e-mail me at the address below.

Reference
The Open Applications Group, Inc. www.openapplications.org.

XML JOURNAL LATEST STORIES . . .
A round-up of the many themes and topics of interest to infrastructure architects, developers and IT managers featuring at SYS-CON's Cloud Computing Expo being held November 19-21, 2008 at The Fairmont Hotel in San Jose, California. The conference is expecting a record turnout of senio...
SYS-CON Events announced today that the leading global SOA, Virtualization, Cloud Computing and Open Source technology provider FreedomOSS named "Gold Sponsor" of SYS-CON's SOA World Conference & Expo which will take place November 19-21, 2008, at the Fairmont Hotel in the heart of Sil...
Cloud Computing offers significant benefits over traditional solutions for deploying production systems as well as for conducting development and testing activities. This session will distill the unique characteristics of clouds and describe how to best think about deployments in the c...
Intel has just released Intel XML Software Suite 1.2. This latest release helps maximize XML performance, while minimizing the effort for any Enterprise, SOA, SaaS, and Web 2.0 based applications. Intel XML Software Suite 1.2 optimizes XML application performance, takes full advantage ...
SYS-CON Events announced today that the leading global SOA, Virtualization, Cloud Computing and Open Source technology provider Intel named "Gold Sponsor" of SYS-CON's SOA World Conference & Expo which will take place November 19-21, 2008, at the Fairmont Hotel in the heart of Silicon ...
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON FEATURED WHITEPAPERS


ADS BY GOOGLE