Welcome!

Industrial IoT Authors: Pat Romanski, William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz

Related Topics: Industrial IoT

Industrial IoT: Article

XML Journal Feature: Transforming Large XML Documents, An Alternative to XSLT

XSL standard also became very popular for transforming XML data to XML, text, PDF, etc

With the evolution of XML, the XSL standard also became very popular for transforming XML data to XML, text, PDF, etc. However there are some limitations to the XSLT transformation. Today's XSLT processors rely on holding input data in memory as a DOM tree while the transformation is taking place. The tree structure in memory can be as much as ten times the original data size, so in practice, the limit on data size for an XSLT conversion is just a few megabytes. As a result it can only handle XML documents with moderate size - to be processed as the full input, DOM needs to be in the memory for any XSL transformation.

This major shortcoming of the classical XSLT transformation may be solved with the schema-based transformation API discussed herein. This method uses a stream-based approach to load parts of the document in the memory at one time to proceed with the transformation process. So at any given point of time enough resources are available for the actual transformation process to complete. As classical XSL transformation can transform an XML file to any format viz. XML, text, EDI, EFT, PDF, etc., this approach is restricted to only XML-to-XML transformation. In this approach XML schema plays a pivotal role. As we all know an XML schema can describe the data structure, hierarchy, and validation rules for any XML file. So in this approach, transforming a source XML to another destination XML format is based on describing an XML schema for the destination XML file with control attributes defined in the element definition to aid transformation, and using full XPath APIs to carry out the actual transformation. This approach provides a scalable, stream-based way to transform XML to XML and ideally can handle input of any size, which is impossible to obtain with today's XSLT transformers. This approach can be successfully implemented to transform XML exclusively in the B2B DOMain, where a target schema is always present to validate the generated XML target XML.

The Problems Faced in Classical Transformation
To transform a big XML DOM from one form to another entails a lot of problems using classical XML transformation, as the full input DOM needs to be loaded in the memory. For a huge XML input file, just loading the XML file might fail, given limited system resources. There is no published solution to date to tackle such an issue. There were efforts in this regard to serialize the DOM in permanent storage, to free up memory to get the transformation completed, but that gives rise to several I/O issues. Even the simplest of transformations using this approach to transform big XML documents takes a considerable amount of time, and hence this is not feasible for enterprise usage.

This approach considers all of the complexities of transforming large XML files and comes up with a real-time, scalable solution to this whole problem. Apparently there are no performance bottlenecks in this approach because it's schema-based and works on some basic rules, as defined in the subsequent sections.

The Approach in Detail
This is a schema-based approach to transform a source XML to a destination XML. The schema-based approach is beneficial because the complete structure of the destination XML could well be populated based on the schema definition, and then populating the bare XML structure with the required values into the target XML document might be done with the help of attributes and annotations defined in the schema definition. Here the source XML will be read in a stream-based fashion to load nodes that match the schema definition, and once a match occurs all of the elements that require that XPath to do all of the transformations to populate the skeleton node will be done. Once all of the nodes in the target XML are populated using the loaded input node, it would be removed from the memory and the next chunk of data as node will be loaded to perform the next set of transformations, until the full input DOM is read.

There is one thing to note: the streaming of the input file, i.e., which node needs to be read from the input depends entirely on the user and has to be declared in the schema definition as control attributes. As the schema provides the structure of the final target XML with special processing instructions embedded in the control attributes, the schema will be queried a number of times to get to the correct structure of target XML and will be populated with data using information provided as qualified attributes from namespace xmlns:saxTran= "http://oracle.schemaTransform/saxTran." Table 1 shows the first set of attributes needed to provide the basic stream-based transformation functionality.

There may be a lot of other attributes needed in due course of implementation, but for now these are the most crucial ones anticipated.

Let's take a simple example to illustrate the approach in detail. In this example we select a simple OrgChart.xml that shows the organization hierarchy of a company. The basic structure is shown below:


<OrgChart>
<Office>
<Department>
<Person>
<First>Vernon</First>
<Last>Callaby</Last>
<Title>Office Manager</Title>
<PhoneExt>582</PhoneExt>
<EMail>[email protected]</EMail>
<Shares>1500</Shares>
</Person>
............
<Person>
</Person>
</Department>
<Department>
................
</Department>
</Office>
........
<Office>
........
</Office>
</OrgChart>
After the transformation the resulting document should show all of the persons in all departments and some calculations such as count, average, and summation are done on some fields of the Person element. To achieve this, classical XSL is written and could be found in personinfo.xsl file. The basic structure of the document after the transformation is shown below:

<PersonsInfo>
<Persons>
<Person>
<First>Vernon</First>
<Last>Callaby</Last>
<Title>Office Manager</Title>
<PhoneExt>582</PhoneExt>
<EMail>[email protected]</EMail>
<Shares>1500</Shares>
</Person>
<Person>
................
</Person>
</Persons>
<TotalPersons>20</TotalPersons>
<AvgSharePerPerson>200.0
</AvgSharePerPerson>
<TotalSharesWithPersons>4000
</TotalSharesWithPersons>
</PersonsInfo>

More Stories By Indroniel Deb Roy

Indroniel Deb Roy works as an UI Architect for BlueCoat Systems.He has more than 10 years of development experience in the fields of J2EE and Web Application development. In his past he worked in developing web applications for Oracle, Novell, Packeteer, Knova etc. He has a passion for innovation and works with various Web2.0 & J2EE technologies and recently started on Smart Phone & IPhone Development.

Comments (1)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
Moroccanoil®, the global leader in oil-infused beauty, is thrilled to announce the NEW Moroccanoil Color Depositing Masks, a collection of dual-benefit hair masks that deposit pure pigments while providing the treatment benefits of a deep conditioning mask. The collection consists of seven curated shades for commitment-free, beautifully-colored hair that looks and feels healthy.
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
We all love the many benefits of natural plant oils, used as a deap treatment before shampooing, at home or at the beach, but is there an all-in-one solution for everyday intensive nutrition and modern styling?I am passionate about the benefits of natural extracts with tried-and-tested results, which I have used to develop my own brand (lemon for its acid ph, wheat germ for its fortifying action…). I wanted a product which combined caring and styling effects, and which could be used after shampo...
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.