|By Jimmy Zhang||
|March 18, 2005 12:00 AM EST||
SOAP is an XML based data protocol standardized by W3C for the purpose of enabling inter-application data exchange over the Internet. In a typical Web Services scenario, a SOAP message delivered via HTTP needs to be parsed before anything else can happen. As two popular SOAP processing methods, DOM and SAX/Pull force application developers to choose between performance/memory efficiency and ease of use. VTD-XML is the latest open-source, "non-extractive" XML processing API written in Java that overcomes many problems and issues of the status quo. The combination of its high performance, low memory footprint, random access, incremental update, and inherent persistence simply means this: With VTD-XML, application developers can finally unleash to the fullest extent the power of SOAP.
To many application developers, Web Services are usually synonymous with SOAP over HTTP. While HTTP (hyper-text transfer protocol) has been around for over a decade, the real excitement of Web Services lies in the use of SOAP (Simple Object Access Protocol). Effectively a subset of XML, SOAP possesses some unique attributes that set Web Services apart from previous distributed computing technologies, such as DCOM and CORBA. For one, SOAP is open and human readable, meaning that programming SOAP is simpler and easier to grasp. And equally important is the fact that SOAP representation of data is loosely encoded. Applications communicating using SOAP are no longer restrained by the rigidity of schema, making possible the true de-coupling between application logic and wire format of data.
Current SOAP Processing Overview
Due to its textual nature, a SOAP message must be parsed into machine-readable form before it can be understood by software applications. There are two types of SOAP processing models widely in use today:
So in a way, with current XML processing methods, it is difficult to get both high processing/memory efficiency and ease of use. But there is more to think about.
Right now, parsing SOAP messages, whether the application uses DOM or SAX, is pretty much inevitable, even if it is done repetitively to the same messages. Would it be nice if there is a pre-parsed form of XML directly reusable without the overhead of parsing every time?
Also consider modifying the text content of the following XML file.
<color> red </color>
Using DOM, it would require at least the following three steps: build the DOM tree, navigate to and then update the text node, write the updated structure back into XML. So no matter how trivial the modification is, there is a round trip penalty of parsing the document and writing it back out. What if it is only a snippet buried within a big document? Would it be nice to be able to surgically remove then insert the update "in-place?"
VTD: A Simple Solution
Historically, the first step of text processing is usually to tokenize the input file into many little null-terminated strings. But there is another way to tokenize. Rather than extracting the token content out of the input, one can instead retain the original document intact in memory and use the offsets and lengths to describe tokens. In other words, tokenization can be done "non-extractively." We can look at how this "non-extractive" tokenization approach works in practice and compare it with traditional "extractive" view of tokens in the context of some common usage scenarios.
- String comparison- Under the traditional text-processing framework, C's "strcmp" function (in <string.h>) compares an "extractive" token against a known string. In our new "non-extractive" approach, one can simply use C's "strncmp" function in <string.h>.
- String to numerical data conversion- C's "atof" and "atoi" convert strings into numerical data types. One can introduce new functions or macros to convert "non-extractive" tokens into integers or floats. For example, the new "atof_ne" would have to take three inputs: the character pointer, the starting offset, and the length. Notice that the character pointer points at the memory buffer in which the entire document resides.
- Trim- To remove leading and trailing white spaces of "non-extractive" tokens, we only need to re-compute the offsets and lengths based on their older values. To do the same thing to extractive tokens usually involves creating new tokens.
Above considerations have led to the design of a "non-extractive" token encoding specification called Virtual Token Descriptor (VTD). A VTD record is a 64-bit integer that encodes the length, the starting offset, the token type and nesting depth of a token in XML. For certain types of tokens, the length field further encodes the prefix length and qualified name length, since both share the identical offset.
One immediate benefit of VTD's non-extractive tokenization is that, because the document is kept intact, VTD allows applications to surgically insert and remove XML content similar to manipulating a byte array. For example, removing or changing the value of an attribute value or text content is the same as skipping the segment marked by the offset and length containing "unwanted" text. Also VTD makes possible the removal of entire element by simply skipping it according to its offset and length.
Built on the concept of VTD, VTD-XML is the latest open source, "non-extractive," Java-based XML processing API (VTD-XML) ideally suited for SOAP processing. Currently it supports only five built-in entities (& < > ' "). The latest VTD-XML is version 0.8, which can be download here (http://vtd-xml.sf.net). Aside from maintaining the XML file intact in memory and exclusively using VTD to describe tokens, VTD-XML also introduces the concept of location caches that provide efficient random access. Different from DOM, VTD-XML's notion of hierarchy consists exclusively of elements, which essentially correspond to VTD records for starting tags. Resembling the index section of a book, location caches again make extensive use of 64-bit integers. The project web site (http://vtd-xml.sf.net) has an in-depth description on how VTD-XML achieves the purpose of random access with location caches.
VTD-XML should exhibit the following characteristics when used in a Web Services project. First, it parses SOAP messages at the performance level equivalent, if not faster, than SAX with the NULL content handler. On a 1.5 GHz Athlon processor, VTD-XML processes SOAP message at around 25~35 MB/sec. Second, unlike SAX, VTD-XML offers full random access as the entire parsed XML is resident in memory. Furthermore, if you are one of the developers that finds DOM's node-based API verbose and difficult to use, you should find VTD-XML's API clean and easy to comprehend. And VTD-XML's memory requirement is about 1.3x to 1.5x the size of XML, with 1 being the document itself as it is part of the internal representation of VTD-XML. Plus, incremental, dynamic update to the XML content is much more efficient than either DOM or SAX.
Why does VTD-XML consume less memory than DOM? In many VM-based object-oriented programming languages, per-object allocation incurs a small amount of memory overhead. VTD records are immune to the overhead because they not an objects. Also VTD records are constant in length and can be stored in large memory blocks, which are more efficient to allocate and garbage collect. For example, by allocating a large array for 4096 VTD records, one incurs the per-array overhead (16 bytes in JDK 1.4) only once across 4096 records, thus reducing per-record overhead to very little.
And more importantly, VTD's efficient memory usage has strong implication on its performance. DOM is slow in a very large part because it is resource intensive. The spirit of VTD is this: one simply doesn't have to, and has every incentive not to, create strings objects because they are slow to create. Even worse, they eventually need to be garbage collected. VTD-XML is able to achieve SAX's performance level because VTD significantly reduces DOM's memory usage, thus leading to savings on both object creation and garbage collection.
At the top level, VTD-XML provides three essential classes: VTDGen, VTDNav, and AutoPilot.
- VTDGen parses the XML/SOAP messages into VTD records and location caches.
- VTDNav is a cursor-based API allowing for DOM-like random access of the XML structure.
- AutoPilot works with VTDNav and emulates the behavior of DOM's node iterator.
A Sample Project
To process SOAP with VTD-XML, the starting point is a memory buffer filled with the content of XML/SOAP message. The sample message containing the purchase order (shown below) in the body section of the SOAP envelope. For simplicity reasons, the project assumes the message resides on disk. In real life, one is more likely to read the message off HTTP. (See Listing 1.)
At the top level, this project has a single main method (shown below) that wraps all code with a single try catch block that takes care of various exception conditions for IO operation, parsing and navigation. (See Listing 2.)
The following code parses the SOAP message. It first allocates a byte array, and reads into it the byte content of the SOAP message. Then, it instantiates VTDGen and passes to it the byte array. Next, it calls VTDGen's member method "parse()" to generate the internal, parsed representation of the SOAP message. Notice that "parse()" accepts the Boolean value of "true" to indicate the parsing is namespace-aware. (See Figure 3.)
After parsing, the sample code obtains an instance of VTDNav and uses the namespace aware "toElementNS()" to move the cursor to various positions of the element hierarchy and prints out corresponding text values, or selectively pulls out the XML fragment at the cursor position. (See Listing 4)
The code above concerning VTDNav has several points worth mentioning.
- There is one and only one cursor available, which can be moved using "toElement()" or "toElementNS()." Those methods return a boolean indicating the status of the movement. If true, the cursor is repositioned; otherwise, no movement on the cursor.
- Several member methods, such as "getAttrVal()" and "getText()", return an integer corresponding to the index value of the VTD record if there is one. -1 is returned if no such record is found.
- VTDNav performs string to VTD record comparison directly, avoiding the round trip of creating and de-allocate string object.
- VTDNav also performs VTD record to numerical data type directly for the same purpose.
- There is a global stack available so one can save, then quickly store the the saved cursor location.
- VTDNav also allows one to convert a VTD record into a string object. Use this carefully for reasons in 3.
The code that composes the invoice is shown in Listing 6.
The Road Map and a Quick Recap
The other property of VTD-XML is that its internal representation is inherent persistent, making it possible to avoid parsing for repetitive read-only XML processing. This also makes possible an XML upgrade path that improves XML processing performance without losing human readability.
As readers can see, VTD-XML, the new, non-extractive, Java-based XML processing API based on VTD, offers a number of benefits not found with existing XML processing APIs. The most significant one is that it simultaneously offers high performance, low memory usage, user-friendliness. Also it introduces the notion of incremental update. As XML makes inroads into IT and becomes increasingly indispensable in our lives, VTD-XML should find its way in more places and hopefully enable new exciting XML applications.
SYS-CON Events announced today that Windstream, a leading provider of advanced network and cloud communications, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Windstream (Nasdaq: WIN), a FORTUNE 500 and S&P 500 company, is a leading provider of advanced network communications, including cloud computing and managed services, to businesses nationwide. The company also offers broadband, phone and digital TV services to consumers primarily in rural areas.
Dec. 22, 2014 11:00 AM EST Reads: 2,385
The BPM world is going through some evolution or changes where traditional business process management solutions really have nowhere to go in terms of development of the road map. In this demo at 15th Cloud Expo, Kyle Hansen, Director of Professional Services at AgilePoint, shows AgilePoint’s unique approach to dealing with this market circumstance by developing a rapid application composition or development framework.
Dec. 22, 2014 11:00 AM EST Reads: 1,491
The Internet of Things is not new. Historically, smart businesses have used its basic concept of leveraging data to drive better decision making and have capitalized on those insights to realize additional revenue opportunities. So, what has changed to make the Internet of Things one of the hottest topics in tech? In his session at @ThingsExpo, Chris Gray, Director, Embedded and Internet of Things, discussed the underlying factors that are driving the economics of intelligent systems. Discover how hardware commoditization, the ubiquitous nature of connectivity, and the emergence of Big Data a...
Dec. 21, 2014 02:00 PM EST Reads: 2,469
"BSQUARE is in the business of selling software solutions for smart connected devices. It's obvious that IoT has moved from being a technology to being a fundamental part of business, and in the last 18 months people have said let's figure out how to do it and let's put some focus on it, " explained Dave Wagstaff, VP & Chief Architect, at BSQUARE Corporation, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 21, 2014 01:00 PM EST Reads: 2,062
The major cloud platforms defy a simple, side-by-side analysis. Each of the major IaaS public-cloud platforms offers their own unique strengths and functionality. Options for on-site private cloud are diverse as well, and must be designed and deployed while taking existing legacy architecture and infrastructure into account. Then the reality is that most enterprises are embarking on a hybrid cloud strategy and programs. In this Power Panel at 15th Cloud Expo (http://www.CloudComputingExpo.com), moderated by Ashar Baig, Research Director, Cloud, at Gigaom Research, Nate Gordon, Director of T...
Dec. 21, 2014 11:30 AM EST Reads: 2,470
SYS-CON Events announced today that IDenticard will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. IDenticard™ is the security division of Brady Corp (NYSE: BRC), a $1.5 billion manufacturer of identification products. We have small-company values with the strength and stability of a major corporation. IDenticard offers local sales, support and service to our customers across the United States and Canada. Our partner network encompasses some 300 of the world's leading systems integrators and security s...
Dec. 21, 2014 10:00 AM EST Reads: 2,224
ARMONK, N.Y., Nov. 20, 2014 /PRNewswire/ -- IBM (NYSE: IBM) today announced that it is bringing a greater level of control, security and flexibility to cloud-based application development and delivery with a single-tenant version of Bluemix, IBM's platform-as-a-service. The new platform enables developers to build ap...
Dec. 21, 2014 06:15 AM EST Reads: 2,196
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 20, 2014 08:00 AM EST Reads: 1,479
DevOps Summit 2015 New York, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that it is now accepting Keynote Proposals. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential.
Dec. 18, 2014 09:45 PM EST Reads: 1,327
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 18, 2014 09:00 AM EST Reads: 1,437
Nigeria has the largest economy in Africa, at more than US$500 billion, and ranks 23rd in the world. A recent re-evaluation of Nigeria's true economic size doubled the previous estimate, and brought it well ahead of South Africa, which is a member (unlike Nigeria) of the G20 club for political as well as economic reasons. Nigeria's economy can be said to be quite diverse from one point of view, but heavily dependent on oil and gas at the same time. Oil and natural gas account for about 15% of Nigera's overall economy, but traditionally represent more than 90% of the country's exports and as...
Dec. 18, 2014 06:00 AM EST Reads: 995
The Internet of Things is a misnomer. That implies that everything is on the Internet, and that simply should not be - especially for things that are blurring the line between medical devices that stimulate like a pacemaker and quantified self-sensors like a pedometer or pulse tracker. The mesh of things that we manage must be segmented into zones of trust for sensing data, transmitting data, receiving command and control administrative changes, and peer-to-peer mesh messaging. In his session at @ThingsExpo, Ryan Bagnulo, Solution Architect / Software Engineer at SOA Software, focused on desi...
Dec. 17, 2014 11:15 PM EST Reads: 1,477
"At our booth we are showing how to provide trust in the Internet of Things. Trust is where everything starts to become secure and trustworthy. Now with the scaling of the Internet of Things it becomes an interesting question – I've heard numbers from 200 billion devices next year up to a trillion in the next 10 to 15 years," explained Johannes Lintzen, Vice President of Sales at Utimaco, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 17, 2014 11:00 PM EST Reads: 1,530
"For over 25 years we have been working with a lot of enterprise customers and we have seen how companies create applications. And now that we have moved to cloud computing, mobile, social and the Internet of Things, we see that the market needs a new way of creating applications," stated Jesse Shiah, CEO, President and Co-Founder of AgilePoint Inc., in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 17, 2014 08:00 PM EST Reads: 1,492
SYS-CON Events announced today that Gridstore™, the leader in hyper-converged infrastructure purpose-built to optimize Microsoft workloads, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Gridstore™ is the leader in hyper-converged infrastructure purpose-built for Microsoft workloads and designed to accelerate applications in virtualized environments. Gridstore’s hyper-converged infrastructure is the industry’s first all flash version of HyperConverged Appliances that include both compute and storag...
Dec. 17, 2014 06:30 PM EST Reads: 1,462
Today’s enterprise is being driven by disruptive competitive and human capital requirements to provide enterprise application access through not only desktops, but also mobile devices. To retrofit existing programs across all these devices using traditional programming methods is very costly and time consuming – often prohibitively so. In his session at @ThingsExpo, Jesse Shiah, CEO, President, and Co-Founder of AgilePoint Inc., discussed how you can create applications that run on all mobile devices as well as laptops and desktops using a visual drag-and-drop application – and eForms-buildi...
Dec. 17, 2014 11:45 AM EST Reads: 1,599
We certainly live in interesting technological times. And no more interesting than the current competing IoT standards for connectivity. Various standards bodies, approaches, and ecosystems are vying for mindshare and positioning for a competitive edge. It is clear that when the dust settles, we will have new protocols, evolved protocols, that will change the way we interact with devices and infrastructure. We will also have evolved web protocols, like HTTP/2, that will be changing the very core of our infrastructures. At the same time, we have old approaches made new again like micro-services...
Dec. 16, 2014 11:45 PM EST Reads: 1,458
Code Halos - aka "digital fingerprints" - are the key organizing principle to understand a) how dumb things become smart and b) how to monetize this dynamic. In his session at @ThingsExpo, Robert Brown, AVP, Center for the Future of Work at Cognizant Technology Solutions, outlined research, analysis and recommendations from his recently published book on this phenomena on the way leading edge organizations like GE and Disney are unlocking the Internet of Things opportunity and what steps your organization should be taking to position itself for the next platform of digital competition.
Dec. 15, 2014 11:45 PM EST Reads: 1,808
The 3rd International Internet of @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that its Call for Papers is now open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
Dec. 15, 2014 10:30 AM EST Reads: 6,958
As the Internet of Things unfolds, mobile and wearable devices are blurring the line between physical and digital, integrating ever more closely with our interests, our routines, our daily lives. Contextual computing and smart, sensor-equipped spaces bring the potential to walk through a world that recognizes us and responds accordingly. We become continuous transmitters and receivers of data. In his session at @ThingsExpo, Andrew Bolwell, Director of Innovation for HP's Printing and Personal Systems Group, discussed how key attributes of mobile technology – touch input, sensors, social, and ...
Dec. 15, 2014 10:00 AM EST Reads: 2,011