Welcome!

Industrial IoT Authors: Elizabeth White, Stackify Blog, Yeshim Deniz, SmartBear Blog, Liz McMillan

Related Topics: Industrial IoT

Industrial IoT: Article

Advanced ANSI SQL Native XML Integration-Part 1

Advanced ANSI SQL Native XML Integration-Part 1

This two-part article will change your view and understanding of standard SQL and its ability to integrate naturally and fully with native XML. The perceived problem with achieving full SQL-based integration of XML is that relational data is flat while XML data is hierarchical, producing a huge impediment to a seamless solution.

This belief has prevented a full integration solution, resulting in SQL vendors resorting to nonstandard SQL and external code, whose solutions fall far short of full XML integration. The usual method of integration used by SQL vendors is to shred or flatten the XML data in order to join it relationally with the relational data. This produces major efficiency problems for processing and memory utilization and ignores the hierarchical semantics in the XML data, which can be valuable.

This article demonstrates how standard ANSI SQL-92 can perform very sophisticated hierarchical processing by naturally raising the SQL level of processing to a hierarchical level. This allows SQL to integrate fully with native XML data at a full hierarchical processing level, even exceeding the nonprocedural hierarchical capabilities found in XML query languages while remaining ANSI SQL standard. This seamless XML integration is important because SQL developers require little or no training and SQL customers trust and feel comfortable with ANSI SQL.

Hierarchical Processing
A new and powerful capability that SQL hierarchical processing brings to XML is the nonprocedural hierarchical querying and processing of their multileg hierarchical structures. This means that the hierarchical semantics in the XML data and its structure are automatically used in the processing of the query; for example, the selecting of data in one leg of the hierarchical structure based on the value of data from another leg of the structure. Most XML query languages are based on XPath, which is primarily single path (leg) oriented and would have a difficult time correlating the semantics between two legs. But as this article will demonstrate, even this complex level of multi-leg hierarchical processing is automatically available in ANSI standard SQL syntax and semantics and can be seamlessly and directly applied to integrating native XML documents. In fact, these advanced hierarchical capabilities can perform complex XML operations such as hierarchical node promotion and fragment processing naturally and nonprocedurally in SQL.

Remarkably, the ANSI SQL capabilities that allow SQL to inherently perform full hierarchical processing come together and build on each other seamlessly, producing a truly unified and natural XML integration solution. To demonstrate this, several examples - each encompassing many aspects of this unified solution - will be used. In this way it's possible to demonstrate the big picture of SQL's complete hierarchical operation and full integration of native XML. The examples will show an unbroken progression from query start to finish of exactly how this complete and seamless XML integration is accomplished.

In Figure 1, an XML document and relational tables are joined into a larger hierarchical structure that is hierarchically processed automatically by the invoking ANSI standard SQL query. The relational tables are hierarchically modeled and joined hierarchically with the SQL modeled XML structure. The completed multi-leg structure then undergoes hierarchical data selection. The result can be returned in a relational row set whose flat result accurately reflects its hierarchically preserved data result, or it can be returned as a hierarchical XML document that, in addition to the hierarchically preserved data results, automatically reflects the query's hierarchical result structure. A hierarchical WYSIWYG display (not shown) is also possible for ad hoc interactive querying.

Hierarchical Data Modeling
Relational processing is controlled mainly by the relationships specified through the joining of tables, hence its name. Unfortunately, the standard INNER JOIN operation discards unmatched rows, creating a flat structure. With SQL-92 there is a LEFT OUTER JOIN (or just LEFT JOIN) operation that preserves unmatched data rows on the left side of the join operation. This operation is hierarchical in nature, preserving the left table over the right table because the left table rows can exist even if there are no matching right table rows, but the right table rows cannot exist without matching left table rows. The SQL-92 LEFT JOIN can string together any number of these joins to model and create a full hierarchical structure of any complexity. The LEFT JOIN has an ON clause that specifies the join criteria at each join operation. This enables the exact link points for each table (or node) in the structure to be unambiguously defined, allowing any hierarchical structure to be modeled and created. By linking to the same upper-level link point in multiple join operations, multiple hierarchical paths (legs) are modeled. This is demonstrated in the top right corner of Figure 1, where three relational tables are modeled as a multi-leg hierarchical structure using the LEFT JOIN operation.

Once a structure is modeled hierarchically by specifying its relationships as hierarchical using the LEFT JOIN, the relational engine will automatically process the structure hierarchically. The LEFT JOIN syntax enables the modeling of hierarchical structures, and the associated LEFT JOIN's semantics define the hierarchical operations to be carried out automatically by the relational engine. This will be described in more detail when we look at the relational engine and its Cartesian product processing.

Modeling relational tables is an example of modeling logical structures. This very same data modeling process can be applied to XML, which is a physical structure. In this case, the LEFT JOIN modeling must follow the physical hierarchical structure with the ON clause specifying the XML Element (or node) link points. In this way, the relational engine can process the XML hierarchical structure, properly reflecting its structure and semantics. This physical data modeling is shown in the SQL view in the upper left corner of the example in Figure 1. By performing hierarchical data modeling at the SQL view level, standard SQL has direct, full, and seamless access at the data item level for XML data values and utilization of the hierarchical semantics associated with the data.

SQL Views
Both of the data modeling examples at the top of Figure 1 demonstrate that the data modeling of different structures can be separated and stored in their own SQL view, offering significant hierarchical data abstraction, reuse, and ease of use. Since the ANSI SQL data modeling can model any kind of hierarchical structure uniformly in a standard fashion, the heterogeneous processing of different forms of hierarchical data structures becomes seamless and consistent.

These hierarchical SQL views can be created by hand, or they can be created automatically by software that translates directly from data definitions such as DTDs and schemas or even COBOL.

These heterogeneous combinations of hierarchical SQL views can be combined into larger hierarchical views or structures using the same LEFT JOIN data modeling process. When joining these hierarchical views hierarchically, the left structure is modeled hierarchically above the right structure and they are linked by the ON clause join criteria, which specifies the hierarchical link point node in each structure. As shown in the double box in the middle of Figure 1, this joining of hierarchical structures can be performed by the invoking SQL statement. In fact, the invoking SQL statement joining these two previously defined structures can be specified dynamically in an ad hoc fashion. This is a very powerful and flexible combination.

You may be wondering what is combining the metadata (semantics) associated with these separate complex multileg views into a unified virtual view. This happens automatically when these separate hierarchical views are naturally expanded during standard SQL processing, as shown directly below the double box in Figure 1. Remarkably, the separately expanded LEFT JOIN views expand into a single contiguous LEFT JOIN specification that fully models the virtual structure being accessed as shown in the Unified SQL View diagram to the right of the Expanded SQL.

The dashed boxes in the Unified SQL View diagram in Figure 1 represent unselected nodes of the structure that are not output. These unselected nodes can still be utilized in the query, as node R is in one of the ON clauses. The dashed lines in the Unified SQL View represent a path of nodes that do not require access. This is because if a node is not selected for output or on a path to a selected node, then it does not require access. This optimization is possible because of the way the LEFT JOIN hierarchically preserves data, which naturally models the semantics of hierarchical data structures. As such, this optimization operates heterogeneously over the Unified SQL View composed of relational and XML data.

This powerful hierarchical data access optimization also has a significant benefit for the LEFT JOIN data modeling views. Because unnecessary paths in these hierarchical data modeling SQL views can be determined and removed at query execution, each structure needs only a single global view definition, unlike standard INNER JOIN views. This is because INNER JOIN views must always access all tables defined, regardless of the data selected, in order to keep their view consistent. This precipitates the use of multiple INNER JOIN view definitions to be created for the same structure for specific uses. The LEFT JOIN single view ability further increases the data abstraction and ease of use for hierarchical processing. This is a seamless hierarchical optimization that is driven by the selected data determined at runtime, enabling maximum ad hoc optimization.

The expanded SQL containing the two LEFT JOIN views in Figure 1 may look a little unusual because of the way the views are nested after expansion. These expanded views insert LEFT JOIN and ON clause combinations, pushing the current matching ON clause to the right. This has the nice effect of stacking the current processing and placing the current structure definition in isolation by allocating a new working set (work area) for processing the embedded view. This processing is useful, since in some cases the processing of nested SQL views can be destructive to the portion of the structure already constructed. This operation and nested syntax construction is standard ANSI SQL and is transparent. It is not usually seen or used directly by the SQL developer, because this SQL processing happens automatically under the covers.

Relational Database Engine
With the SQL fully expanded, we can look at how and why the relational engine behaves hierarchically. The simple answer is because it is carrying out the hierarchical semantics specified by the expanded LEFT JOIN specification, which is modeling the virtual and unified data structure being processed. This process naturally follows the rules and principles governing hierarchical data structures such as data nodes can have only one parent and many different children; a parent data node can exist without its children; child data nodes can not exist without their parent; and on the more intriguing side is how sibling legs of the structure relate hierarchically when processing the query. In effect, SQL has been internally upgraded to a tree structure model (like XML) where the tables represent nodes in the structure.

Relational databases use a working set (relational row set) as working storage for processing the query. You could say that these are similar to internal temporary tables. They are flat with a fixed number of columns just like a table. Because they are fixed, it has been assumed that they could not support the processing of hierarchical data structures because hierarchical structures have multiple legs that can change in length dynamically from leg occurrence to occurrence because partial legs can exist. This happens with hierarchical structures because a parent data node can exist without its child node and so on. To represent a single data record occurrence (a root node and all of its descendent nodes) in a single row, the different legs are stored back to back. So you may assume with variable length legs that a consistent mapping of the data value locations would not be possible. But the LEFT JOIN operation cooperatively inserts null values to keep the column positions of each data value aligned. In this way, even the most complex multi-leg hierarchical structure is naturally mapped uniformly with standard relational processing. Such a Relational Working Set is shown at the bottom left side of Figure 1.

With the Relational Working Set example in Figure 1, the blank data value represents a null value that was inserted by the LEFT JOIN operation to represent missing data, preserving the structure and data. Notice how this correlates to the Hierarchical Working Data used for hierarchical engines located across from it on the opposite side of Figure 1. This same hierarchical data preserving action of the LEFT JOIN also preserves the hierarchically preserved data content correctly, otherwise the Relational Working Set row with the value "X2" would not be present.

When native XML or other hierarchical data is required, it is treated as remote data and a SQL request is sent to retrieve it. In this case, the SQL statement will be similar or identical to the LEFT JOIN view modeling the portion of the hierarchical structure being accessed. The XML document can be stored in the Internet or a column of a relational table. In our example in Figure 1, we show an XML Data Retrieval module at the bottom center that performs this operation. It retrieves the desired portion of the XML document and flattens the data into a row set (shown on its left) that represents the data defined in the requested LEFT JOIN operation (not shown). This is done dynamically so that different data requests for the same or different XML document can be processed and formatted as required for the specific query invocation. When returned to the relational engine, the row sets are joined into the relational working set at the location reserved for their hierarchical modeled portion of the unified view. This XML access is transparent to the relational engine.

Most interesting is how the relational Cartesian product engine processes complex hierarchical queries based on multiple legs; for example, selecting data from one leg of the hierarchical structure based on data from another leg of the hierarchical structure. This is the case with the invoking query example in Figure 1 inside the double box at the center of the diagram. It is selecting data from two legs of the structure using a WHERE clause to qualify (filter) the result based on data in one of the legs. It is selecting data for output based on node D values other than a "D1" value and requesting the output of the qualifying X, L, and D node values if their associated node D value qualifies. In the comparable Hierarchical Working Data diagram in the lower right side of Figure 1, you can see that the values "D2" and "D3" qualify and "D1" does not. We can also intuitively assume that the "X1" and "X2" ascendants on qualifying "D2" and "D3" path occurrences also qualify, but what about the "L1" and "L2" values on the sibling leg? Since they are both related by a common ancestor node data occurrence, "X1", they are both qualified and selected too. This is the same hierarchical WHERE processing decision logic performed by hierarchical processors.

The hierarchical query-processing logic described directly above is not trivial to carry out, so how does the relational engine perform this decision logic when it operates only on a row at a time? Either the relational engine selects a row or discards it; it does not put it aside and come back. It can do this because of the Cartesian product effect, which has produced all combinations of the data. The join relationships specified act as a data filter so that the result is a restricted Cartesian product that preserves only the meaningful and valid relationship combinations. In our case the relationships are all hierarchical. This allows the relational engine to process each row at a time and select the rows that qualify. This can be seen in the Relational Working Set at the bottom portion of the left side of Figure 1. The darkened rows are the rows that are discarded by the relational engine because they have a D1 value. Notice how both "L1" and "L2" values were selected a row at the time because of the Cartesian product effect. At this point, it has been established that hierarchical processing is a valid subset of relational processing.

Hierarchical Database Engine
With ANSI SQL hierarchical processing proven, it also establishes that the result of an ANSI SQL hierarchical syntax request is semantically correct for both relational and hierarchical processing. This can be seen at the bottom side portions of Figure 1 by comparing the Relational Working Set to the comparable Hierarchical Working Data. They have the same hierarchical data result except the row set has no way to externalize the structure, but the hierarchical structure does. This means that in the case of processing hierarchical structures, the relational engine can be seamlessly replaced by a hierarchical engine. This is shown in Figure 1 at the bottom with the box marked Relational or Hierarchical Engine. The relational engine uses a Relational Working Set shown directly to its left while a hierarchical engine uses the Hierarchical Working Data directly to its right. By using a hierarchical engine, the significant relational inefficiencies of processing and memory usage caused by the Cartesian product and its data explosions are avoided, and advanced XML hierarchical capabilities become possible. Another significant advantage occurs when the XML data is retrieved: it can be returned sdirectly as a hierarchical tree structure, which can be linked simply and very efficiently into the hierarchical structure being built as shown on the right of the XML Data Retrieval box at the bottom center of Figure 1.

Whether using a relational or hierarchical engine, the hierarchical result can be output as a Relational Result row set or as a hierarchical XML Structured Result document. This is shown at the bottom corners in Figure 1 where the different output formats are shown. Notice that the structure of the result data structures can be different from their Relational Working Set and Hierarchical Working Data structure because the unselected nodes such as the R node are removed because they are not selected for output. This final Result Structure is diagrammed at the bottom middle of Figure 1 directly under the Relational or Hierarchical Engine box and will be used as the default XML output structure. The XML data format style (i.e. Element or Attribute centric) is controlled by the FOR XML clause in the SQL query request. It can alternatively specify an output hierarchical SQL view to nonprocedurally specify specific output formats and transformations without having to introduce new procedural XML centric SQL syntax. In this example, FOR XML ELEMENT is demonstrated.

At the bottom of Figure 1, the Relational or Hierarchical Engines require knowledge of the hierarchical data structure in order to produce the XML Structured Result, perform hierarchical optimization, and for the creation and operation of the Hierarchical Working Data. This meta information is embedded in the Unified SQL View. Extracting this information is hindered because the Unified SQL View can be constructed dynamically, so it is not complete until execution. (Advanced Data Access Technologies, Inc., has developed the patented process that can dynamically determine the data structure being processed by examining the expanded Unified SQL View.)

Conclusion
Part 1 of this article has demonstrated how standard ANSI SQL can integrate fully, naturally, and seamlessly with XML by raising SQL processing naturally to a hierarchical level enabling relational data (including XML shredded data) to integrate at a hierarchical level directly with native XML. This was proven not only with examples, but by demonstrating at each stage of SQL processing how it works, from SQL syntax and semantics through the Cartesian product relational engine. It was also shown that the level of hierarchical support was significant, easily handling complex multi-legged structure queries intuitively. This allows SQL to fully utilize the hierarchical semantics in the data and the data structure. By operating at a hierarchical level the memory and processing efficiencies are greatly increased, and because SQL itself is performing the majority of the integration work, the XML support is very efficient and the footprint is very small. All of these features and capabilities support dynamic, ad hoc processing. This ad hoc processing includes powerful parameter-driven-query specification in the form of SELECT list specification and WHERE clause filtering that dynamically tailors and optimizes the most complex hierarchical query. Part 2 of this article will cover how standard SQL can seamlessly perform the more advanced XML capabilities such as node promotion, fragment processing, the handling of shared and duplicate elements, and multi-leg hierarchical data filtering. For more information see www.adatinc.com

More Stories By Michael M David

Michael M. David is founder and CTO of Advanced Data Access Technologies, Inc. He has been a staff scientist and lead XML architect for NCR/Teradata and their representative to the SQLX Group. He has researched, designed and developed commercial query languages for heterogeneous hierarchical and relational databases for over twenty years. He has authored the book "Advanced ANSI SQL Data Modeling and Structure Processing" Published by Artech House Publishers and many papers and articles on database topics. His research and findings have shown that Hierarchical Data Processing is a subset of Relational Processing and how to utilize this advanced inherent capability in ANSI SQL. Additionally, his research has shown that advanced multipath (LCA) processing is also naturally supported and performed automatically in ANSI SQL, and advanced hierarchical processing operations are also possible. These advanced capabilities can be performed and explained in the ANSI SQL Transparent XML Hierarchical Processor at his site at: www.adatinc.com/demo.html.

Comments (2)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
As IoT continues to increase momentum, so does the associated risk. Secure Device Lifecycle Management (DLM) is ranked as one of the most important technology areas of IoT. Driving this trend is the realization that secure support for IoT devices provides companies the ability to deliver high-quality, reliable, secure offerings faster, create new revenue streams, and reduce support costs, all while building a competitive advantage in their markets. In this session, we will use customer use cases...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
If a machine can invent, does this mean the end of the patent system as we know it? The patent system, both in the US and Europe, allows companies to protect their inventions and helps foster innovation. However, Artificial Intelligence (AI) could be set to disrupt the patent system as we know it. This talk will examine how AI may change the patent landscape in the years to come. Furthermore, ways in which companies can best protect their AI related inventions will be examined from both a US and...
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
DXWorldEXPO LLC, the producer of the world's most influential technology conferences and trade shows has announced the 22nd International CloudEXPO | DXWorldEXPO "Early Bird Registration" is now open. Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time t...