Welcome!

Industrial IoT Authors: William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz, SmartBear Blog

Related Topics: Industrial IoT

Industrial IoT: Article

Advanced ANSI SQL Native XML Integration-Part 2 - Supporting advanced XML capabilities

Advanced ANSI SQL Native XML Integration-Part 2 - Supporting advanced XML capabilities

Part 1 of this article demonstrated how standard ANSI SQL can integrate fully, naturally, and seamlessly with XML. This was accomplished by naturally raising SQL processing to a hierarchical level, enabling relational data (including XML-shredded data) to integrate at a full hierarchical level with native XML. Hierarchical processing and the utilization of the hierarchical semantics were also shown in Part 1, along with the hierarchical joining of hierarchical structures.

Part 2 will cover how standard SQL can naturally support more advanced XML capabilities such as node promotion, fragment processing, structure transformation, variable structures, and the handling of shared and duplicate elements.

Node Promotion and Fragment Processing
SQL's hierarchical processing capabilities do not stop with what was presented in Part 1. We will look at supporting XML's more advanced capabilities starting with node promotion and fragment processing. Node promotion occurs when a node in a hierarchical structure moves up past its parent and ascendants that have not been selected for output (this is controlled by projection in relational terms). This slicing out of nodes in the structure definition has the same effect as slicing them out when the data selected is transferred from the relational Working Set to the Result Set. This is shown at the bottom of Figure 1. This means that this standard XML hierarchical processing capability is also performed naturally in relational processing when a node (table or element) is not selected for output.

From what we saw in Part 1, the basic operation of SQL is to use the SELECT clause to specify the desired output data; the FROM clause to specify its input data; the LEFT JOIN to specify its data structure; and the WHERE clause to specify optional data filtering criteria. Using these simple and intuitive SQL language constructs, even more complex node promotion leading to fragment processing can be easily performed. The query in Figure 1 joins a structure fragment selected from the lower structure to the upper structure. The structure fragment is shown encircled in a dashed oval. Dashed boxes represent nodes that are not selected for output.

A fragment is a portion or grouping of nodes from a hierarchical structure that retains its basic hierarchical structure when unselected nodes are removed, enabling it to be manipulated as a unified structure. It can be embedded below the original root of the input structure and can be a loose collection of nodes like the structure fragment shown in Figure 1. It is defined by the nodes that are selected for output from its defined input structure. In this example, the fragment is located below the root node D of its original input structure, which contains nodes O and I, which are not selected for output and will not be included in the fragment. This causes nodes Y and J to be automatically promoted over nodes O and I so they both naturally collect under the fragment root node F. In fact, the whole process of fragment creation and processing is made possible by the natural action of node promotion.

With the structure fragment identified and isolated by data selection, it can be naturally joined to the upper structure as shown in Figure 1. When linking to a lower-level fragment, it is usually to its root node as in this example, where the link point is below the root of the original input structure node. Linking below or above the root of the fragment is also permitted because the root of the original lower structure remains the defining input structure's root (node D in this case) because it still affects the entire input structure. This is particularly important to logical structures that must be built on the relationships from the root down. However, this doesn't mean that these structure-defining nodes need to be in the output structure. For example, in Figure 1 the root node D and the link point node P in the upper structure are not selected for output, but they are necessary for processing the query. Node E, on the other hand, was not selected and is not necessary to the query so it can be optimized out, which is why it is not in the Working Set. The shaded columns in the Working Set are not selected for output, so they are not transferred to the Result Set shown at the bottom of Figure 1. This relational processing action naturally performs hierarchical node promotion, which in turn collectively supports fragments. These advanced procedures and concepts are being performed automatically and are easily controlled by the data fields specified on the SQL Select list, which can also be specified dynamically.

The advantage of linking below the root is that it allows linking using the most appropriate or convenient link criteria. The filtering of the lower-level structure matches the semantics of the link point, which makes sense semantically. Whether you want to link to the lower-level structure in Figure 1 based on a value in node D, F, O, or even J, the Result Structure shown would retain the same hierarchical structure for any of these link points because root D is still the defining root, and the resulting data (not shown) would match the actual link point semantics.

In SQL, users don't need to know about the concept of fragments, they just select the data they're interested in and the fragments will naturally form. The example in Figure 1 neatly generated a single fragment because there was a single fragment root node F, but this is not a requirement imposed on the user. Suppose node E had also been selected for output, creating two unrelated fragments. This does not present a problem because root D is still the defining root that controls how all the fragments are structurally linked to the upper structure. In this case, root E would be located in the Result Set between nodes B and F. This demonstrates the significant power, flexibility, and ease of use of SQL's nonprocedural hierarchical processing. While the underlying fragment-processing logic may seem complex to perform, its specification is very logical, naturally intuitive, and is performed automatically in standard SQL.

Duplicate and Shared Element Support
Now let's look at how to process XML's duplicate and shared elements, which can occur in an XML data structure when accessed from standard SQL. Duplicate elements occur where the same (named) element type occurs in multiple locations in the XML data structure. Shared elements are created by the XML IDREF and ID attribute constructs, which create a logical pathway in the physical XML data structure (usually creating a network structure). Both of these unconventional structures are demonstrated in Figure 2 as the top two structure diagrams. The Addr node in these diagrams represents the duplicate and shared element. The dotted arrow in the XML shared element diagram in Figure 2 represents the logical IDREF pathway. The problem with these two structures is that they are both ambiguous for a nonprocedural hierarchical query language such as SQL because there is no single unambiguous access path to a specific Addr node location. What makes nonprocedural hierarchical query languages so powerful is that the hierarchical data structures they operate on are naturally unambiguous because they only have a single path to each node. In this way, the query can be specified unambiguously because each node in the structure has its own unique access path and specific semantics that can be utilized automatically to answer the query.

The Alias feature of SQL allows the duplicate and shared element structures shown in Figure 2 to be data modeled as unambiguous hierarchical structures by using the optional AS keyword to rename the elements (nodes). In this example, both of the ambiguous structures can use the same data modeling SQL to produce an unambiguous structure that maintains the original semantics of both input structures. This is possible because the semantics of both input structures with differing storage use of the Addr node, are the same and produce the same result. With the unambiguously modeled structure shown in Figure 2, each specific Addr field can be unambiguously referenced by using its unique node name as a prefix to the field name. In this way each Addr field name reference has its own logical path with its own hierarchical semantics. This allows the full nonprocedural hierarchical power of SQL to be easily controlled with simple intuitive queries. Avoiding the use of node name prefixes can be accomplished by using the SQL Alias feature on the SELECT list to rename the duplicate field names to unique names. The underlying XML access module's logic will adapt transparently to the physical storage representation of the Addr element whether it is shared or duplicated.

Hierarchical Data Filtering
There are two levels of hierarchical data filtering supported with ANSI SQL: the WHERE clause query-level data filtering, and the ON clause path-level data filtering. WHERE clause filtering is the most common of the two. It can affect the entire multi-leg hierarchical structure by not only filtering down the structure, but also up the structure from the filtered node points. The ON clause data filtering, on the other hand, filters only downward from its filtered node point and is compatible with XPath data filtering. Both of these SQL data-filtering operations work consistently across ANSI SQL relational processing and also follow standard hierarchical structure processing semantics.

Figure 3 demonstrates WHERE clause query-level data filtering. Notice how it also filters up the structure affecting the ascendants of the explicitly filtered Dpnd node. In this example, the only Dpnd node "Dpnd2" for the Emp node occurrence "Emp2" is filtered out, which causes "Emp2" to be filtered out because the "Dpnd2" path occurrence is filtered. The associated root node data occurrence "DeptX" is qualified because there are other active paths leading to it from the node occurrence "Dpnd1", which was qualified. Also notice that node occurrence "Proj1" is also qualified because its path is still active and is a descendent of the qualified node occurrence "DeptX". If no Dpnd node occurrences had been qualified, then no data from this structure occurrence would be output. This behavior is logical and intuitive for the hierarchical result you would expect with this hierarchical WHERE clause filtering and is how hierarchical processors operate. This is also the result of relational processing (shown in Figure 3) by the Relational Working Set in this example, where the deleted row is filtered out, indicated by a darkened row. This is why the WHERE clause filtering process can affect the entire multi-leg structure occurrence, taking into account (correlating) the semantics between the data and relationships in its sibling legs. As shown in Figure 3, this complex hierarchical semantic processing occurs automatically thanks to the restricted Cartesian product, producing the required combination of hierarchical related data values allowing data selection to be carried out a row at a time by the relational engine.

Figure 4 demonstrates ON clause path-level data filtering using the same filtering criteria used with the Where clause query-level data filtering. ON clause filtering is similar to WHERE clause data filtering, but only filters from the point on the path where it is used and downward to its dependents. Its power is its fine hierarchical data filtering that is isolated to a single node in the structure. This operates just like XPath data filtering. It is also the result produced from relational processing, shown by the Relational Working Set in this example, where only the filtered Dpnd node occurrence "Dpnd2" is deleted. If there were other descendant nodes under the filtered node, they would also be filtered out. The reason that the entire row was not filtered is that the higher-level structure side (left side) of the LEFT JOIN is always preserved, enabling filtering to only occur from its right structure side down the structure. The WHERE clause filtering, on the other hand, is performed using INNER JOIN logic, which means that either side of the join operation (up or down the structure in this case) can cause nodes to be filtered out. Additionally, WHERE clause processing occurs after the entire row is built, causing it to be entirely filtered out or not filtered at all, while ON clause filtering occurs as the row is being built, allowing separate legs of the structure to be filtered out replacing their values with Null values to keep the alignment. This ON clause path filtering mirrors XPath's operation on XML and can be simulated for legacy database access when specified using SQL.

Structure Transformation
By combining fragment processing and the SQL Alias feature used in duplicate and shared element processing in Figures 1 and 2, it's possible to easily perform powerful structure transformations. This operates by enabling the specification of different fragments from the same structure by using the Alias capability to logically create multiple copies of the same structure so that different fragments can be isolated and then independently manipulated. The Alias feature is used to rename views, which enables duplicate input structures to be logically defined so that references to them can be made unambiguously (also useful for processing XML namespaces). Figure 5 demonstrates this by creating two separate and independent fragments from the StoreView view (encircled by dashed ovals). These fragments are then recombined into a different structure by rejoining them. This is a simple example of structure transformation; multiple structures can each have multiple fragments extracted, which can all be combined in any order, allowing fragments to be joined as they are needed. Structural transformations can also be stored in a SQL view for abstraction, easy reuse, and use in constructing larger structures.

Variable Structures
XML can define variable structures, which allow for considerable variability of structure formats for a single definition of the structure. This means that from structure occurrence to occurrence, or even within a single structure occurrence of a record or document, the structure format can vary. While XML does not require it, a variable structure usually has some piece of information from a higher-level node that indicates how a variable substructure is formed or is to form. With this information and using the ON clause filtering described earlier when hierarchical data filtering was covered, SQL data modeling (using Left Joins) can control the building of each structure occurrence to define the appropriate substructures dynamically. An example is shown in Figure 6.

Figure 6 is a simple example of how the generation of the data structure can vary depending on the value of the field StoreType in the Store node. In this case, only one area of the structure was affected, but there's no limit to the number of variations that can be controlled by ON clauses testing for any number of structure indicator values to control how they are generated depending on their current data values. In fact, variable substructures can contain variable substructures. These tests can be coded to duplicate the rules specified in XML DTD and schemas for varying element generation, which can become quite complex. The SQL for specifying a variable structure, as shown in Figure 6, defines how a logical (relational) structure is to be constructed in memory, or can be used to control the navigation of a physical (XML) structure when being retrieved into memory; and in either case, it controls how the variable output structure is generated.

Comparison with XQuery and the SQL/XML Standard
XQuery's use in SQL requires learning another query language and having to program the query logic into its FLWR (For, Let, Where, Return) expression. SQL's SELECT list functionality is contained in the FLWR expression, which is controlled by its FOR loops. This gives XQuery considerable procedural processing power and control, but also means that specifying additional output values is not trivial, requiring program logic modification. The FLWR statement is also used to control or drive database access using XPath navigation. This requires the XQuery developer to be knowledgeable of the data structure being processed. The XML output is constructed with the use of XML templates, which requires that the developer know XML and can include placing the templates in FLWR FOR loops for additional control. XQuery uses functions to abstract and reuse program logic.

With the ANSI SQL native XML integration solution shown in this article, the SQL developer does not need to know XML or the data structure (once it is modeled in a SQL view) and does not need to specify the query logic or database navigation even for the processing of the most complex multi-leg hierarchical structure. The naive user or developer can specify an ad hoc request or easily add an additional relational or XML data item to the SELECT list and it will be retrieved and hierarchically formatted automatically, utilizing the hierarchical semantics in the data. The SQL hierarchical view offers the highest level of data abstraction and reuse, allowing the processing logic to be dynamically tailored to the runtime requirements. These capabilities do not mean that SQL is better than XQuery, in fact XQuery is more powerful, but there is the standard tradeoff - with the increase in control of a computer language, there is a decrease in ease of use. XQuery requires its additional programming control to handle advanced text processing and complex transformations, required in an XML environment. The full extent of these capabilities may be required less in the SQL environment offset by the use of SQL's hierarchical capabilities.

The SQL/XML standards group has done an excellent job specifying and standardizing many useful mappings between XML and SQL that will be used by the standard SQL native XML integration described in this article. The SQL/XML standard also specifies XML-centric functions in SQL for producing XML documents from the standard flat relational data result of the SQL query, which may include input from XML. The desired XML-formatted hierarchical structure is produced by nesting the SQL/XML standard's XMLElement function within itself to control the desired hierarchical structure. As with XQuery, the SQL/XML standard user must know XML and have knowledge of the input and output hierarchical data structures. The XML-centric SQL functions require a programming addition when a new data item is added to the SELECT list for output. Alternatively, the ANSI SQL native XML integration solution described in this article is seamless and transparent, and produces a valid and accurate hierarchically processed result. It can automatically and dynamically publish XML documents without the introduction of XML-centric SQL functions and their limitations described above. This is made possible by seamlessly utilizing standard SQL's significant inherent hierarchical processing capability, described in Part 1 of this article.

Conclusion
This two-part article has demonstrated how standard ANSI SQL can integrate fully, naturally, and seamlessly with XML by raising SQL processing automatically to a hierarchical processing level, enabling relational data to integrate at a hierarchical level directly with native XML. This was proven not only with examples, but by demonstrating at each stage of SQL processing how it works, from SQL syntax and semantics through the Cartesian product relational engine described in Part 1 of this article. It was also shown that the level of hierarchical support was significant, handling complex multi-legged hierarchical queries intuitively, and joining hierarchical structures easily. This allows SQL to fully utilize the hierarchical semantics in the data and the data structure. By operating at a hierarchical level, the memory and processing efficiencies are greatly increased, and because SQL itself is performing the majority of the integration work, the XML support is very efficient and its footprint is very small, making it excellent for embedded use. All of these features and capabilities support dynamic processing. This ad hoc processing includes powerful parameter-driven query specification in the form of SQL SELECT list specification and WHERE clause filtering that dynamically tailors and optimizes the most complex stored hierarchical views.

In Part 2 of this article, advanced XML processing features that can also be performed by ANSI SQL's standard and inherent hierarchical processing and capabilities were covered. These advanced capabilities include seamless support for shared and duplicate elements in the data structure using the SQL Alias capability; node promotion and structure fragment processing automatically controlled by what data fields are selected on the SQL Select list; and structure transformations using a combination of the structure fragment and Alias capability. Since all the capabilities mentioned in this article inherently exist in ANSI SQL, they operate together in a seamless and unrestricted (orthogonal) fashion. This includes the multi-leg hierarchical data filtering, which automatically operates on the SQL modeled data structure and the full unlimited use of SQL views. This SQL native XML integration operation is naturally standard because it seamlessly and naturally stays within ANSI SQL. It does not require the addition of SQL standardized XML-centric functions, producing a powerful and easy-to-use hierarchical ad hoc query language for XML and other hierarchical forms of data, including SQL hierarchically modeled relational data. For more information on all these topics and additional ANSI SQL- supported XML capabilities, visit www.adatinc.com.

More Stories By Michael M David

Michael M. David is founder and CTO of Advanced Data Access Technologies, Inc. He has been a staff scientist and lead XML architect for NCR/Teradata and their representative to the SQLX Group. He has researched, designed and developed commercial query languages for heterogeneous hierarchical and relational databases for over twenty years. He has authored the book "Advanced ANSI SQL Data Modeling and Structure Processing" Published by Artech House Publishers and many papers and articles on database topics. His research and findings have shown that Hierarchical Data Processing is a subset of Relational Processing and how to utilize this advanced inherent capability in ANSI SQL. Additionally, his research has shown that advanced multipath (LCA) processing is also naturally supported and performed automatically in ANSI SQL, and advanced hierarchical processing operations are also possible. These advanced capabilities can be performed and explained in the ANSI SQL Transparent XML Hierarchical Processor at his site at: www.adatinc.com/demo.html.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
If a machine can invent, does this mean the end of the patent system as we know it? The patent system, both in the US and Europe, allows companies to protect their inventions and helps foster innovation. However, Artificial Intelligence (AI) could be set to disrupt the patent system as we know it. This talk will examine how AI may change the patent landscape in the years to come. Furthermore, ways in which companies can best protect their AI related inventions will be examined from both a US and...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of San...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...