|
|
YOUR FEEDBACK
SOA World Conference
Virtualization Conference $200 Savings Expire May 16, 2008... – Register Today! Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS Content Management
Advanced ANSI SQL Native XML Integration-Part 2 - Supporting advanced XML capabilities
By: Michael David
Digg This!
Part 1 of this article demonstrated how standard ANSI SQL can integrate fully, naturally, and seamlessly with XML. This was accomplished by naturally raising SQL processing to a hierarchical level, enabling relational data (including XML-shredded data) to integrate at a full hierarchical level with native XML. Hierarchical processing and the utilization of the hierarchical semantics were also shown in Part 1, along with the hierarchical joining of hierarchical structures. Part 2 will cover how standard SQL can naturally support more advanced XML capabilities such as node promotion, fragment processing, structure transformation, variable structures, and the handling of shared and duplicate elements.
Node Promotion and Fragment Processing
From what we saw in Part 1, the basic operation of SQL is to use the SELECT clause to specify the desired output data; the FROM clause to specify its input data; the LEFT JOIN to specify its data structure; and the WHERE clause to specify optional data filtering criteria. Using these simple and intuitive SQL language constructs, even more complex node promotion leading to fragment processing can be easily performed. The query in Figure 1 joins a structure fragment selected from the lower structure to the upper structure. The structure fragment is shown encircled in a dashed oval. Dashed boxes represent nodes that are not selected for output. A fragment is a portion or grouping of nodes from a hierarchical structure that retains its basic hierarchical structure when unselected nodes are removed, enabling it to be manipulated as a unified structure. It can be embedded below the original root of the input structure and can be a loose collection of nodes like the structure fragment shown in Figure 1. It is defined by the nodes that are selected for output from its defined input structure. In this example, the fragment is located below the root node D of its original input structure, which contains nodes O and I, which are not selected for output and will not be included in the fragment. This causes nodes Y and J to be automatically promoted over nodes O and I so they both naturally collect under the fragment root node F. In fact, the whole process of fragment creation and processing is made possible by the natural action of node promotion. With the structure fragment identified and isolated by data selection, it can be naturally joined to the upper structure as shown in Figure 1. When linking to a lower-level fragment, it is usually to its root node as in this example, where the link point is below the root of the original input structure node. Linking below or above the root of the fragment is also permitted because the root of the original lower structure remains the defining input structure's root (node D in this case) because it still affects the entire input structure. This is particularly important to logical structures that must be built on the relationships from the root down. However, this doesn't mean that these structure-defining nodes need to be in the output structure. For example, in Figure 1 the root node D and the link point node P in the upper structure are not selected for output, but they are necessary for processing the query. Node E, on the other hand, was not selected and is not necessary to the query so it can be optimized out, which is why it is not in the Working Set. The shaded columns in the Working Set are not selected for output, so they are not transferred to the Result Set shown at the bottom of Figure 1. This relational processing action naturally performs hierarchical node promotion, which in turn collectively supports fragments. These advanced procedures and concepts are being performed automatically and are easily controlled by the data fields specified on the SQL Select list, which can also be specified dynamically. The advantage of linking below the root is that it allows linking using the most appropriate or convenient link criteria. The filtering of the lower-level structure matches the semantics of the link point, which makes sense semantically. Whether you want to link to the lower-level structure in Figure 1 based on a value in node D, F, O, or even J, the Result Structure shown would retain the same hierarchical structure for any of these link points because root D is still the defining root, and the resulting data (not shown) would match the actual link point semantics. In SQL, users don't need to know about the concept of fragments, they just select the data they're interested in and the fragments will naturally form. The example in Figure 1 neatly generated a single fragment because there was a single fragment root node F, but this is not a requirement imposed on the user. Suppose node E had also been selected for output, creating two unrelated fragments. This does not present a problem because root D is still the defining root that controls how all the fragments are structurally linked to the upper structure. In this case, root E would be located in the Result Set between nodes B and F. This demonstrates the significant power, flexibility, and ease of use of SQL's nonprocedural hierarchical processing. While the underlying fragment-processing logic may seem complex to perform, its specification is very logical, naturally intuitive, and is performed automatically in standard SQL.
Duplicate and Shared Element Support
The Alias feature of SQL allows the duplicate and shared element structures shown in Figure 2 to be data modeled as unambiguous hierarchical structures by using the optional AS keyword to rename the elements (nodes). In this example, both of the ambiguous structures can use the same data modeling SQL to produce an unambiguous structure that maintains the original semantics of both input structures. This is possible because the semantics of both input structures with differing storage use of the Addr node, are the same and produce the same result. With the unambiguously modeled structure shown in Figure 2, each specific Addr field can be unambiguously referenced by using its unique node name as a prefix to the field name. In this way each Addr field name reference has its own logical path with its own hierarchical semantics. This allows the full nonprocedural hierarchical power of SQL to be easily controlled with simple intuitive queries. Avoiding the use of node name prefixes can be accomplished by using the SQL Alias feature on the SELECT list to rename the duplicate field names to unique names. The underlying XML access module's logic will adapt transparently to the physical storage representation of the Addr element whether it is shared or duplicated.
Hierarchical Data Filtering Figure 3 demonstrates WHERE clause query-level data filtering. Notice how it also filters up the structure affecting the ascendants of the explicitly filtered Dpnd node. In this example, the only Dpnd node "Dpnd2" for the Emp node occurrence "Emp2" is filtered out, which causes "Emp2" to be filtered out because the "Dpnd2" path occurrence is filtered. The associated root node data occurrence "DeptX" is qualified because there are other active paths leading to it from the node occurrence "Dpnd1", which was qualified. Also notice that node occurrence "Proj1" is also qualified because its path is still active and is a descendent of the qualified node occurrence "DeptX". If no Dpnd node occurrences had been qualified, then no data from this structure occurrence would be output. This behavior is logical and intuitive for the hierarchical result you would expect with this hierarchical WHERE clause filtering and is how hierarchical processors operate. This is also the result of relational processing (shown in Figure 3) by the Relational Working Set in this example, where the deleted row is filtered out, indicated by a darkened row. This is why the WHERE clause filtering process can affect the entire multi-leg structure occurrence, taking into account (correlating) the semantics between the data and relationships in its sibling legs. As shown in Figure 3, this complex hierarchical semantic processing occurs automatically thanks to the restricted Cartesian product, producing the required combination of hierarchical related data values allowing data selection to be carried out a row at a time by the relational engine.
Figure 4 demonstrates ON clause path-level data filtering using the same filtering criteria used with the Where clause query-level data filtering. ON clause filtering is similar to WHERE clause data filtering, but only filters from the point on the path where it is used and downward to its dependents. Its power is its fine hierarchical data filtering that is isolated to a single node in the structure. This operates just like XPath data filtering. It is also the result produced from relational processing, shown by the Relational Working Set in this example, where only the filtered Dpnd node occurrence "Dpnd2" is deleted. If there were other descendant nodes under the filtered node, they would also be filtered out. The reason that the entire row was not filtered is that the higher-level structure side (left side) of the LEFT JOIN is always preserved, enabling filtering to only occur from its right structure side down the structure. The WHERE clause filtering, on the other hand, is performed using INNER JOIN logic, which means that either side of the join operation (up or down the structure in this case) can cause nodes to be filtered out. Additionally, WHERE clause processing occurs after the entire row is built, causing it to be entirely filtered out or not filtered at all, while ON clause filtering occurs as the row is being built, allowing separate legs of the structure to be filtered out replacing their values with Null values to keep the alignment. This ON clause path filtering mirrors XPath's operation on XML and can be simulated for legacy database access when specified using SQL.
Structure Transformation
Variable Structures
Figure 6 is a simple example of how the generation of the data structure can vary depending on the value of the field StoreType in the Store node. In this case, only one area of the structure was affected, but there's no limit to the number of variations that can be controlled by ON clauses testing for any number of structure indicator values to control how they are generated depending on their current data values. In fact, variable substructures can contain variable substructures. These tests can be coded to duplicate the rules specified in XML DTD and schemas for varying element generation, which can become quite complex. The SQL for specifying a variable structure, as shown in Figure 6, defines how a logical (relational) structure is to be constructed in memory, or can be used to control the navigation of a physical (XML) structure when being retrieved into memory; and in either case, it controls how the variable output structure is generated.
Comparison with XQuery and the SQL/XML Standard With the ANSI SQL native XML integration solution shown in this article, the SQL developer does not need to know XML or the data structure (once it is modeled in a SQL view) and does not need to specify the query logic or database navigation even for the processing of the most complex multi-leg hierarchical structure. The naive user or developer can specify an ad hoc request or easily add an additional relational or XML data item to the SELECT list and it will be retrieved and hierarchically formatted automatically, utilizing the hierarchical semantics in the data. The SQL hierarchical view offers the highest level of data abstraction and reuse, allowing the processing logic to be dynamically tailored to the runtime requirements. These capabilities do not mean that SQL is better than XQuery, in fact XQuery is more powerful, but there is the standard tradeoff - with the increase in control of a computer language, there is a decrease in ease of use. XQuery requires its additional programming control to handle advanced text processing and complex transformations, required in an XML environment. The full extent of these capabilities may be required less in the SQL environment offset by the use of SQL's hierarchical capabilities. The SQL/XML standards group has done an excellent job specifying and standardizing many useful mappings between XML and SQL that will be used by the standard SQL native XML integration described in this article. The SQL/XML standard also specifies XML-centric functions in SQL for producing XML documents from the standard flat relational data result of the SQL query, which may include input from XML. The desired XML-formatted hierarchical structure is produced by nesting the SQL/XML standard's XMLElement function within itself to control the desired hierarchical structure. As with XQuery, the SQL/XML standard user must know XML and have knowledge of the input and output hierarchical data structures. The XML-centric SQL functions require a programming addition when a new data item is added to the SELECT list for output. Alternatively, the ANSI SQL native XML integration solution described in this article is seamless and transparent, and produces a valid and accurate hierarchically processed result. It can automatically and dynamically publish XML documents without the introduction of XML-centric SQL functions and their limitations described above. This is made possible by seamlessly utilizing standard SQL's significant inherent hierarchical processing capability, described in Part 1 of this article.
Conclusion In Part 2 of this article, advanced XML processing features that can also be performed by ANSI SQL's standard and inherent hierarchical processing and capabilities were covered. These advanced capabilities include seamless support for shared and duplicate elements in the data structure using the SQL Alias capability; node promotion and structure fragment processing automatically controlled by what data fields are selected on the SQL Select list; and structure transformations using a combination of the structure fragment and Alias capability. Since all the capabilities mentioned in this article inherently exist in ANSI SQL, they operate together in a seamless and unrestricted (orthogonal) fashion. This includes the multi-leg hierarchical data filtering, which automatically operates on the SQL modeled data structure and the full unlimited use of SQL views. This SQL native XML integration operation is naturally standard because it seamlessly and naturally stays within ANSI SQL. It does not require the addition of SQL standardized XML-centric functions, producing a powerful and easy-to-use hierarchical ad hoc query language for XML and other hierarchical forms of data, including SQL hierarchically modeled relational data. For more information on all these topics and additional ANSI SQL- supported XML capabilities, visit www.adatinc.com. XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||