Industrial IoT Authors: Pat Romanski, William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz

Related Topics: Industrial IoT, Microservices Expo

Industrial IoT: Article

XML Compression and Its Role in SOA Performance

Dealing with the increased size of SOA payloads

Looking at the uncertainty and volatility of market conditions today, enterprises are depending on new cutting-edge technology to have an edge over their fierce competitors. At the same time, they try extracting more value from their existing IT investments. Adding to these disparate applications and technologies are the acquisitions and mergers that inherently bring in different sets of applications.

Better integration of these myriad applications built on different technologies clearly makes them more valuable. Using Service Oriented Architecture (SOA), enterprises can not only achieve better integration but also be future-ready as an agile enterprise that can swiftly respond to change in business processes.

XML and Its Role in SOA
XML is emerging as the lingua franca of data representation and exchange across applications interacting in an SOA world. A close look at the standards stack for SOA (Figure 1) shows that XML is the foundation for all the Web Services standards like XML Schema, SOAP, WSDL, and UDDI. These standards leverage the core concept of XML-based representations to carry out information interchange between service providers and requestors in a SOA.

Notwithstanding the core syntactic standards of SOA as shown in Figure 1, semantics is another important dimension that plays a crucial part in communication between a service provider and a service consumer in an SOA infrastructure and requires that the contents of the messages be mutually understood, which leads to the requirement of semantic interoperability.

XML solves the semantic interoperability problems associated with working with different data formats in different applications across multiple platforms. Different vertical business domain stakeholders have come to together and defined shared XML-based vocabularies to solve the semantic interoperability issue. (See http://xml.coverpages.org for a comprehensive list of such standardization efforts.) Using XML inherently brings ease of representation since it's text-based, flexible, and extensible. The platform- and language-independence of XML has catalyzed it as SOA's mainstream representation format.

SOA Performance Challenge and XML Compression Solutions
While self-describing XML-based service descriptions and messages in SOA make the data exchange easier, lending reusability and extensibility, they also increase the size of the data significantly. This is because the XML message typically contains not only the data as text, but also the format of the data. It contains all the information about the data presentation to the end user like font, size, and style. The verbosity of text-based representation by itself also tends to increase the data size in SOA payloads. So XML data representation not only increases data storage and data transfer times in SOA but also increases data parsing times in the context of a SOA, creating a performance challenge for a SOA.

The following are the salient points driving the need for compressing XML document in the context of SOA:

  1. Redundant data in XML documents, e.g., white space, similar node names.
  2. Text-based XML document sizes tend to be large.
  3. The need for an efficient way to store files based on XML.
  4. Large volumes of XML data sent over the network as SOAP payloads.
One category of industry solutions used to solve SOA performance management problems rely on the notion of XML compression. These solutions leverage use compression techniques to reduce the size of the XML payloads being carried in the SOA messages and transfer data in compressed format. However, there is the additional cost of compression/decompression at either end that has to be accommodated when computing the overall cost.

While the issues related to data storage and data transfer times can be resolved to a significant level by using compression techniques, the problem related to the processing overhead can be solved using both software and hardware solutions. A variety of tools and methodologies are already on the market to overcome XML processing limitations. Some prominent categories of tools and technologies that help overcome the limitations associated with using XML are briefly mentioned here:

XML Hardware
Large XML data processing will consume enormous amounts of CPU, memory, and network bandwidth. Traditionally there were processors that did general-purpose processing, but with the advent of XML and XML-based applications a new breed of custom acceleration processors are being developed. This specialized hardware, called XML accelerators, not only accelerate time-consuming tasks like XSL transformation and schema validation, but security-related features like encryption. These operate over networks and perform XML processing at wirespeed. XML accelerators are network devices that offload overtaxed servers by processing XML at a higher speed.

Compact Representation
A key premise in this approach is to use a compact representation to compact the size of the message being carried around. One mechanism is to have XML transferred in compact encodings like Abstract Syntax Notation. The usual textual format of XML offers no way to determine the end of a data value; hence the application has to examine every byte received. In this case the time consumed is increased and performance isn't that great. A different approach would be to represent XML in a binary format such as Abstract Syntax Notation number One (ASN1). This notation is associated with standardized encoding rules such as the Basic Encoding Rules (BER) and Packed Encoding Rules (PER) and is useful for applications that have bandwidth restrictions. This significantly reduces the time consumed and enhances performance.

XML Cache/Component Parsers
Repeatedly used XML data can be cached to reduce XML processing overheads. Similarly specific XML parsers can be used that cater to the specific needs of an SOA application.

XML Software Compression
Since XML is text-based, we can use gzip, bzip, etc. like techniques that leverage Lempel-Ziv and Huffman Encoding Algorithms for compression. These compression techniques are generic text compressors and they're effective and have very good compression ratios too. These techniques are good for sequential data, but unlike normal text, XML data is tree-structured data. XMILL from AT&T is a focused XML compression technique. It regroups similar XML nodes and uses conventional compressors such as gzip to compress the result of the regrouping of nodes. A comparison of the salient features of gzip and XMILL are as below:

  • available in both Open Source and commercial implementations
  • Provides a good compression rate
  • free from patented algorithms
  • knowledge of the document structure isn't needed XMILL
  • better compression rate compared to gzip (by a factor of two)
  • it separates structure from content
  • moderately faster than gzip
  • three types of compressors available:
    - atomic compressors for the basic data types
    - Combined compressors
    - User-defined compressors
Binary XML Standards: XOP, MTOM & RRSHB
New schemas are being developed to solve the problem of exchanging large documents between the service provider and the consumer. These schemas address the problem of fitting binary data directly into an XML message.

MTOM is a description of how XML-binary Optimized Packaging (XOP) is layered into a SOAP HTTP transport and uses XOP to let SOAP bindings speed up data transmission by selective encoding portions of the XML message. But MTOM uses a MIME package as opposed to XML and has the overhead of MIME processing to base-64 encoding.

Resource Representation SOAP Header Block (RRSHB) sends all the data needed to process the message. It can send a Web resource as a part of the SOAP message. This is specific to those cases where access to the resource is restricted to the body of the message and there is network overhead.

SOA infrastructure relies heavily on XML to be the lingua franca, and effective SOA performance management requires efficient ways of handling XML. XML compression techniques can go a long way in handling the SOA performance challenge. Needless to say, specific application needs are very decisive in choosing a compression technique from the myriad of techniques mentioned in this article.


  1. XMILL http://sourceforge.net/projects/xmill
  2. gzip www.gzip.org
  3. Datapower XML hardware www.datapower.com/products/xa35.html
  4. Sarvega Hardware www.sarvega.com/xml-security-products.html
  5. www.w3.org/TR/soap12-mtom/
  6. www.w3.org/TR/soap12-rep/
  7. www.w3.org/TR/soap12-mtom/#XOP

More Stories By Dr. Srinivas Padmanabhuni

Dr. Srinivas Padmanabhuni is a principal researcher with the Web Services Centre of Excellence in SETLabs, Infosys Technologies, and specializes in Web Services, service-oriented architecture, and grid technologies alongside pursuing interests in Semantic Web, intelligent agents, and enterprise architecture. He has authored several papers in international conferences. Dr. Padmanabhuni holds a PhD degree in computing science from University of Alberta, Edmonton, Canada.

More Stories By Akash Saurav Das

The authors are interning and/or working as part of the Web Services COE (Center of Excellence) for Infosys Technologies, a global IT consulting firm, and have substantial experience in publishing papers, presenting papers at conferences, and defining standards for SOA and Web services. The Web Services COE specializes in SOA, Web services, and other related technologies.

Comments (2)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
While the focus and objectives of IoT initiatives are many and diverse, they all share a few common attributes, and one of those is the network. Commonly, that network includes the Internet, over which there isn't any real control for performance and availability. Or is there? The current state of the art for Big Data analytics, as applied to network telemetry, offers new opportunities for improving and assuring operational integrity. In his session at @ThingsExpo, Jim Frey, Vice President of S...
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settl...
@CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in ...
Two weeks ago (November 3-5), I attended the Cloud Expo Silicon Valley as a speaker, where I presented on the security and privacy due diligence requirements for cloud solutions. Cloud security is a topical issue for every CIO, CISO, and technology buyer. Decision-makers are always looking for insights on how to mitigate the security risks of implementing and using cloud solutions. Based on the presentation topics covered at the conference, as well as the general discussions heard between sessio...
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
Rodrigo Coutinho is part of OutSystems' founders' team and currently the Head of Product Design. He provides a cross-functional role where he supports Product Management in defining the positioning and direction of the Agile Platform, while at the same time promoting model-based development and new techniques to deliver applications in the cloud.
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
LogRocket helps product teams develop better experiences for users by recording videos of user sessions with logs and network data. It identifies UX problems and reveals the root cause of every bug. LogRocket presents impactful errors on a website, and how to reproduce it. With LogRocket, users can replay problems.
Data Theorem is a leading provider of modern application security. Its core mission is to analyze and secure any modern application anytime, anywhere. The Data Theorem Analyzer Engine continuously scans APIs and mobile applications in search of security flaws and data privacy gaps. Data Theorem products help organizations build safer applications that maximize data security and brand protection. The company has detected more than 300 million application eavesdropping incidents and currently secu...