Welcome!

Industrial IoT Authors: William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz, SmartBear Blog

Related Topics: Industrial IoT, Microservices Expo

Industrial IoT: Article

Index XML Documents with VTD-XML

How to turn the indexing capability on in your application

Traditionally DOM or SAX-based enterprise applications have to repeat CPU-intensive XML parsing when accessing the same documents multiple times. VTD-XML 2.0 introduces a simple general-purpose XML index called VTD+XML (http://vtd-xml.sourceforge.net/persistence.html) that eliminates the need for repetitive parsing of those applications.

This article combines various examples and the latest benchmark reports to show you how to get started with this indexing. This article also discusses various scenarios and use cases where you may find VTD+XML useful.

Avoid Repetitive XML Parsing with VTD-XML
As discussed in "Simplify XML processing with VTD-XML," to date one of underlying assumptions in XML application development is that an XML document must be parsed before anything else can be done with it. In other words, the processing logic of XML applications can't start without parsing. Frequently considered a threat to database performance, XML parsing is usually many times slower than other XML operations such as XPath evaluation. When those applications perform multiple read-only access to XML data that don't change very often, wouldn't it be nice to able to eliminate the overhead of associated repetitive parsing?

With the native XML indexing feature introduced in version 2.0 of VTD-XML, you can do precisely that. VTDGen, the class encapsulating various parsing routines, now adds "readIndex(...)" and "writeIndex(...)." VTD-XML 2.0 also introduces two new exceptions: indexWriteException and indexReadException.

Let me put those new methods into action and show you how to turn on the indexing capability in your application. Consider the following XML document:

   <purchaseOrder orderDate="1999-10-21">
     <item partNum="872-AA">
       <productName>Lawnmower</productName>
       <quantity>1</quantity>
       <USPrice>148.95</USPrice>
     </item>
   </purchaseOrder>

Below is a simple pre-2.0 VTD-XML code named "printPrice.java" that prints out the content of the element "USPrice." Notice that it parses the XML file and then uses XPath to filter out the target nodes.

import com.ximpleware.*;
import com.ximpleware.xpath.*;
public class printPrice{
   public static void main(String args[]){
     VTDGen vg = new VTDGen();
     try{
       if (vg.parseFile("po.xml",true)){
         VTDNav vn = vg.getNav();
         AutoPilot ap = new AutoPilot(vn);
         ap.selectXPath("/purchaseOrder/item/USPrice/text()");
         int i=-1;
         while((i=ap.evalXPath())!=-1){
           System.out.println(" USPrice ==> "+vn.toString(i));
         }
       }
     }catch(Exception e){

     }
   }
}

A few changes are needed to add VTD-XML's new indexing capability to the Java code above. First, you need to read in the XML document, parse it, and then write out the indexed version of the same XML document. From that point onward, your application can run XPath query or processing logic directly on top of the index, saving the CPU cycles of parsing the XML document again. The following code snippets (named "genIndex.java" and "accessIndex.java" respectively) show you how to generate and access the index. Notice that, when executed sequentially, both applications produce the identical output as "printPrice.java."
The first application (genIndex.java) reads in "po.xml" and produces "po.vxl."

import com.ximpleware.*;
import com.ximpleware.xpath.*;
public class genIndex{
   public static void main(String args[]){
     VTDGen vg = new VTDGen();
     try{
       if (vg.parseFile("po.xml",true)){
         vg.writeIndex("po.vxl");
       }
     }catch(Exception e){
     }
   }
}

The second application (accessIndex.java) loads "po.vxl" and filters the document using XPath expression "/purchaseOrder/item/USPrice/text()."

import com.ximpleware.*;
import com.ximpleware.xpath.*;
public class accessIndex{
   public static void main(String args[]){
     VTDGen vg = new VTDGen();
     try{
       VTDNav vn = vg.loadIndex("po.vxl");
       AutoPilot ap = new AutoPilot(vn);
       ap.selectXPath("/purchaseOrder/item/USPrice/text()");
       int i=-1;
       while((i=ap.evalXPath())!=-1){
         System.out.println(" USPrice ==> "+vn.toString(i));
       }
     }catch(Exception e){
     }

   }
}


More Stories By Jimmy Zhang

Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
After years of investments and acquisitions, CloudBlue was created with the goal of building the world's only hyperscale digital platform with an increasingly infinite ecosystem and proven go-to-market services. The result? An unmatched platform that helps customers streamline cloud operations, save time and money, and revolutionize their businesses overnight. Today, the platform operates in more than 45 countries and powers more than 200 of the world's largest cloud marketplaces, managing mo...
When Enterprises started adopting Hadoop-based Big Data environments over the last ten years, they were mainly on-premise deployments. Organizations would spin up and manage large Hadoop clusters, where they would funnel exabytes or petabytes of unstructured data.However, over the last few years the economics of maintaining this enormous infrastructure compared with the elastic scalability of viable cloud options has changed this equation. The growth of cloud storage, cloud-managed big data e...
Your applications have evolved, your computing needs are changing, and your servers have become more and more dense. But your data center hasn't changed so you can't get the benefits of cheaper, better, smaller, faster... until now. Colovore is Silicon Valley's premier provider of high-density colocation solutions that are a perfect fit for companies operating modern, high-performance hardware. No other Bay Area colo provider can match our density, operating efficiency, and ease of scalability.
ScaleMP is the leader in virtualization for in-memory high-end computing, providing higher performance and lower total cost of ownership as compared with traditional shared-memory systems. The company's innovative Versatile SMP (vSMP) architecture aggregates multiple x86 systems into a single virtual x86 system, delivering an industry-standard, high-end shared-memory computer. Using software to replace custom hardware and components, ScaleMP offers a new, revolutionary computing paradigm. vSMP F...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
As you know, enterprise IT conversation over the past year have often centered upon the open-source Kubernetes container orchestration system. In fact, Kubernetes has emerged as the key technology -- and even primary platform -- of cloud migrations for a wide variety of organizations. Kubernetes is critical to forward-looking enterprises that continue to push their IT infrastructures toward maximum functionality, scalability, and flexibility. As they do so, IT professionals are also embr...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
As you know, enterprise IT conversation over the past year have often centered upon the open-source Kubernetes container orchestration system. In fact, Kubernetes has emerged as the key technology -- and even primary platform -- of cloud migrations for a wide variety of organizations. Kubernetes is critical to forward-looking enterprises that continue to push their IT infrastructures toward maximum functionality, scalability, and flexibility. As they do so, IT professionals are also embr...
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...