Industrial IoT Authors: Pat Romanski, William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz

Related Topics: Industrial IoT

Industrial IoT: Article

Integrating Enterprise Information on Demand with XQuery, Part 1

Integrating Enterprise Information on Demand with XQuery, Part 1

Since the dawn of the database era more than three decades ago, enterprises have been amassing an ever-increasing volume of information - both current and historical - about their operations. For the past two of those three decades, the database world has struggled with the problem of somehow integrating information that natively resides in multiple database systems or other information sources (Landers and Rosenberg).

The IT world knows this problem today as the enterprise information integration (EII) problem: enterprise applications need to be able to easily access and combine information about a given business entity from a distributed and highly varied collection of information sources. Relevant sources include various relational database systems (RDBMSs); packaged applications from vendors such as Siebel, PeopleSoft, SAP, and others; "homegrown" proprietary systems; and an increasing number of data sources that are starting to speak XML, such as XML files and Web services.

During the past two decades, a number of research and commercial systems have been built in attempts to solve the EII problem. These systems have been known by a variety of names - heterogeneous distributed database systems, multi-database systems, federated database systems, data integration systems, and now enterprise information systems. But the problem itself has persisted, and it remains a very real problem.

Solutions to the data integration problem involve choosing a common data model into which all the existing data sources are (virtually) mapped, then using a query language designed to work with that model to extract the desired data from the set of mapped data sources. Many data models and languages have been invented and/or tried over the years - including relational (SQL), functional, logical, object-oriented (ODMG/OQL), and semi-structured approaches - but each has fallen short. The two biggest impediments to their success have been the challenge of naturally mapping the data from all the sources of interest into the chosen model and the lack of industry consensus on an appropriate and acceptable model into which to map the data.

Fortunately, the XML age is upon us, and with it has come a set of technologies that are uniquely suited to solving the EII problem. Much as the simplicity of HTML and HTTP led to their rapid adoption, which in turn led to the rapid growth of the Internet, the simplicity of XML is leading to its rapid adoption as the generally accepted format for data interchange and application integration in the IT world today. Because of the rapid adoption of XML, the XML Schema standard is also rapidly gaining traction as the way to describe enterprise data for integration purposes. These trends are due to the simplicity and flexibility of XML - it is straightforward to express data from almost any enterprise data source in XML form without having to commit an "unnatural act." For similar reasons, Web services - based on XML, SOAP, and WSDL - are rapidly gaining traction as the way for applications to interact, either synchronously or asynchronously, for point-to-point communication (Curbera). It follows from these trends that a query language for XML, one capable of querying and reshaping XML data as well as invoking functions, such as Web services, would provide an ideal foundation for solving the EII problem.

Enter XQuery, the emerging XML query language being produced by the W3C XML Query working group. In this article, we provide an introduction to XQuery and explain how it enables true enterprise information integration - allowing not just database data, but also information from applications, Web services, messages, XML files, and other data sources, to be integrated into coherent reusable views and then used to meet the query demands of enterprise applications.

XQuery: A Query Language for XML
The development of the SQL language for querying and manipulating relational data was a major force in ushering in the database age in the late 1970s and early 1980s. The goal of the W3C XML Query working group has been to design a similarly high-level, declarative query language for XML data.

Why not SQL?
It's natural to ask why SQL, or a SQL derivative, isn't the right solution to the problem of querying XML. The answer is that there are just too many differences between XML data and relational data to make SQL a good candidate for this task:

  • Relational tables are flat, whereas XML data tends to be hierarchically structured, often several levels deep.
  • Relational tables are highly uniform, while XML data tends to be more highly variable. Structural variations, typing variations, and missing data are more the norm than the exception with XML data.
  • Relational data is naturally unordered, while order often has an important meaning in XML data (particularly for document data!).
  • Tables have relatively static schemas that can be difficult to evolve, while XML Schemas tend to be more extensible, and the self-describing nature of XML blurs the data/meta-data distinction. Moreover, XML data may or may not have an associated schema, while relational data cannot exist in the absence of a schema.
  • Finally, in the XML world, textual information can be intermixed freely with structured (i.e., tagged) information.

    As a result, the W3C has been designing a new query language tailored to the unique needs of manipulating XML data. The result of that work is the language now called XQuery. Although XQuery is a work in progress, it is nearing completion at the time of this writing, and it is likely to become an official W3C Recommendation in late 2003.

    XQuery basics
    At the heart of the semantics of XQuery, and also of XPath 2.0, lies the XQuery data model. Just as the relational model laid the foundation for SQL, the XQuery data model lays the foundation for XQuery. Because XML data is naturally ordered, the XML data model is based on the notion of ordered trees. Central to the XML data model is the notion of a sequence. XML queries consume and produce sequences that consist of atomic values (based on the primitive types of XML Schema) and/or of XML nodes (element, attribute, text, document, and so on).

    XQuery is a functional, side-effect-free language. Like many other functional languages, a program (a query in the case of XQuery) consists of a prologue and a body, where the body is an expression. The result of a given query is the result of evaluating its body expression in the environment defined by its prologue. Expressions in XQuery can be simple expressions like primitive constants (e.g., "John Doe" or 1.3), variables, arithmetic expressions, function calls, or path expressions (familiar to users of XPath). They can also be combined to form more interesting expressions via operators, functions, and syntactic constructs including FLWOR expressions (discussed shortly), typeswitch expressions, and node constructors.

    The XQuery language is rich enough to support navigation within an XML input document, the combining of data from multiple XML inputs, and the generation of new XML structures from one or more XML inputs. To generate new XML structures, XQuery takes a JSP-like approach. A subset of the XML syntax itself is part of the XQuery language, enriched with XQuery expressions that are executed dynamically and replaced inside the XML structures with their results. One can switch between literal XML and query expressions via curly braces.

    From the standpoint of the EII problem, the most important expression in XQuery is the FLWOR (pronounced "flower") expression, which is roughly analogous to SELECT-FROM-WHERE-ORDER BY queries in SQL. The components of a FLWOR expression are:

  • A for clause that generates one or more value sequences, binding the values to query variables. The for clause in XQuery plays a role similar to the FROM clause in SQL.
  • A let clause that binds a temporary variable to the result of a query expression. The XQuery let clause is similar to support for temporary views in some dialects of SQL.
  • A where clause that contains Boolean predicates that restrict the FOR clause's variable bindings. The where clause in XQuery serves the same purpose as the WHERE clause in SQL.
  • An order by clause that contains a list of expressions that dictate the order of the FLWOR expression's XML output. XQuery's order by clause is directly analogous to SQL's ORDER BY clause.
  • A return clause that specifies the query's desired XML output. The XQuery return clause is analogous to the SELECT clause in SQL, but the structures that it can specify are much richer than those expressible in SQL. (For example, this is where XQuery's JSP-like XML node construction syntax can be found.)

    For data handling, XQuery has the richness of SQL and more - XQuery includes support for subqueries, union, intersection, difference, aggregate functions, sorting, existential and universal quantification, conditional expressions, user-defined functions (that may even be recursive), and static and dynamic typing, in addition to various constructs to support document manipulation (e.g., query primitives for order-related operations). The biggest thing that XQuery lacks relative to SQL today is support for updates; XQuery 1.0 is strictly a functional data access language, with update support being targeted for a later revision of the standard.

    Using XQuery for Enterprise Information Integration
    To show how XQuery can be applied to solve the EII problem, as well as illustrate the power of some of the main constructs of the language, let's consider a simple yet illustrative business scenario. A large consumer electronics retailer wants to organize its IT infrastructure to make its staff more productive and its business more effective. The retailer has both in-store customers and online customers, and it both sells and services home entertainment systems, computers, and other consumer electronic devices. To encourage customer loyalty, consumers receive reward points for their purchases.

    The bottom layer of Figure 1 shows what the electronics retailer's IT infrastructure looks like today. Its customer relationship management (CRM) data, such as information about customers and credit cards, is stored in an RDBMS. Order management is handled through an ERP system (SAP), and as a result, order information is available via a J2EE-CA adapter developed to access the ERP system's API. The adapter API provides calls like getOpenOrders( ), which takes a customer ID as input and returns a list of that customer's open order information. Service data is also stored in an RDBMS, but in a different one than the customer data. Finally, the electronics retailer utilizes an external service for performing customer credit checks. The external credit service provides a getCreditRating( ) Web service call that takes a social security number - formatted differently than in the electronic retailer's RDBMS - and runs a credit check on the specified individual.

    The electronics retailer's line of business managers have asked the company's IT department to create customer portals for three different sets of users. The three desired portals and their data provision requirements are:

  • An online customer self-service portal that will be directly accessed by customers via the Internet. This customer self-service portal should show the customer's profile information, registered credit cards, orders, and service cases, but it should not show the customer's credit rating information.
  • A credit approval portal that will only be accessed by credit approval personnel. This credit approval portal should show the customer's basic profile information, registered credit cards, and credit rating information.
  • An internal product service portal that will be used by clerks in the electronic retailer's service department. This service portal should show just the customer's basic profile information and service case information.

    All three portals require information about the same core business entity - namely the customer. However, each line of business manager wants a different view of the customer. The electronics retailer's IT department wanted a solution that would enable rapid development and provide high reusability of their initial data integration efforts as well as subsequent low maintenance. Fortunately, their data architects realized that they could achieve these goals by creating a single, integrated base view of the customer and then creating three application-specific views on top of the base view. This way, their data integration effort is spent on creating the base view, and the application-specific views are then easily created without concern for disparate data models, differing data source APIs, or other integration snafus. Later on, changes in underlying data source schemas can be dealt with by maintaining the base view; the application-specific views are shielded from most such changes.

    With XQuery, the solution sketched above can be implemented by viewing the enterprise's different data sources all as virtual XML documents and functions. XQuery can stitch the distributed customer information together into a comprehensive, reusable base view. That is, the base view definition can be expressed using XQuery, respecting the hierarchical nature of the data, given appropriate default XML views of the enterprise's data sources. As indicated in Figure 1, the relational data sources can be exposed using simple default XML Schemas, and the other sources - SAP and the credit-checking Web service - can be exposed to XQuery as callable XQuery functions with appropriate signatures. In the middle of Figure 1, we see a sketch of the desired "single view of customer" - here, the desire is for all data about customers to be made available for easy querying from various applications. The developers of these applications then simply work against this unified view - which is an XML view of customers where each customer has some basic data, some credit rating information, an associated set of credit cards, a set of open orders (each with all their line item details nested inside), and a set of service cases.

    Listing 1 shows in full detail how XQuery can be used to define the desired single view of customer. The XQuery shown in the listing defines a single well-formed XML document with top-level element CUSTPROFILE. The outermost FLWOR expression uses the variable $Cust to "iterate" logically over all of the customers in the CRM database's CUST table. Its let-clause binds a second variable, $CredInfo, to the result of calling the credit Web service's method getCreditRating( ). Note that this call deals with the disparate social security number formats by reformatting the value being passed to the Web service. The top-level return-clause is where most of the action is, as this is where the desired result is defined and shaped. For each customer, the view will contain a CUST element with the basic customer data at the top level and a nested CREDITINFO element with the customer's credit rating from the Web service. It will have a CREDITCARDS element containing subelements for each of the customer's credit cards, computed via a correlated FLWOR subquery (much like a nested query in SQL), and the view query has similar subqueries for computing the sets of ORDERS and CASES for each customer. In the case of ORDERS, notice that the subquery's for-clause ranges over the result obtained by calling the getOpenOrders( ) method of the ERP application adapter. Like the Web services call, this method appears to the view definer as another callable XQuery function.

    As shown at the top of Figure 1, there are three different queries to be written against the base customer view. One is CustomerSelfServiceQuery, a parameterized query that, given a customer ID, returns the information that the customer is allowed to see through the customer self-service portal. This query returns everything known about the customer except for the CREDITINFO element. Another query is CreditPersonnelQuery, for use by the personnel who handle credit approval requests. This query also takes a customer ID and returns customer information; however, it omits ORDERS and CASES, as they are not relevant for credit department use. The third query in Figure 1, ServicePersonnelQuery, is for use by the service department. This query takes a customer ID and returns basic information about the customer plus the set of open service cases for the customer. Listing 2 shows how simple it is to write the third query given the centralized customer view provided by Listing 1.

    This example, while it uses very simple data sources and schemas for clarity, illustrates a number of important points about the benefits of an XQuery-based EII solution. One benefit is that the data integration problem for a given business entity only needs to be solved once, when defining the centralized view. It can then be leveraged across multiple applications, and the queries or further views for those applications are vastly simplified (as shown by Listing 2). Another benefit is that the use of XML and XQuery provide a very natural basis for defining centralized views of real enterprise data. They make it simple to capture the naturally hierarchical nature of the data, particularly for data that lives within applications (as opposed to just flat RDBMS tables). XQuery also provides the power to deal with complications like key mismatches, either by calling a function to transform a key, as is done in the Web service call in Listing 1, or by incorporating a key mapping table or service into the base view query.

    It is important to mention that these benefits come with no requisite negative performance implications. When the XQuery-based EII system goes to process a query like the one in Listing 2, it will do inline-like expansion of the query's view reference (as has been done for decades in RDBMSs). This will result in a query that involves only the base data sources, a query in which predicates such as the customer ID parameter and the "Open" case status constant can be pushed all the way down to the appropriate data sources. Also, only those base sources that actually contain data needed for the query - the two RDBMSs in Listing 2's case, for example - will become involved in processing the query at runtime.

    In Part 1 of this article we have introduced the EII problem, provided a brief overview of XQuery, and explained XQuery's role in solving the EII problem. In Part 2, we will complete the picture by talking about two related technologies, namely EAI and ETL, and explaining how they relate to EII and XQuery. We will also describe an EII customer scenerio and explain how Liquid Data for WebLogic, an XQuery-based EII product from BEA, was used to tackle the data integration problems that this customer faced.


  • Landers, T., and Rosenberg, R., "An Overview of Multibase." Proceedings of the 2nd International Symposium on Distributed Data Bases, Berlin, Germany. North-Holland Publishing Co., September 1982.
  • Curbera, F., et al. "Unraveling the Web Services Web: An Introduction to SOAP, WSDL, and UDDI." IEEE Internet Computing 6(2), March-April 2002.
  • XQuery 1.0: www.w3.org/TR/xquery
  • Comments (2)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

    IoT & Smart Cities Stories
    The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
    There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
    Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
    At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
    Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
    BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
    The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
    With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
    DSR is a supplier of project management, consultancy services and IT solutions that increase effectiveness of a company's operations in the production sector. The company combines in-depth knowledge of international companies with expert knowledge utilising IT tools that support manufacturing and distribution processes. DSR ensures optimization and integration of internal processes which is necessary for companies to grow rapidly. The rapid growth is possible thanks, to specialized services an...
    At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...