Industrial IoT Authors: William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz, SmartBear Blog

Related Topics: Industrial IoT

Industrial IoT: Article

Semantics and Context

One of the core tenets of XML is its extensibility and flexibility

Although XML defines each data element in a given transaction (the semantics), there's no mechanism to also communicate the business context. This represents the difference between reading XML and understanding the business impact of the transaction. The use of namespaces, numeric values, and time stamps all create some context when looking across transactions or business entities. In this article we'll discuss the difference between semantics and context and the challenges this difference creates relative to performance and scalability.

One of the core tenets of XML is its extensibility and flexibility. XML facilitates these tenets because it's self-describing and has a DTD that provides the data structure necessary for reading the content of the associated document. This is the capability that sits at the core of the XML "hype versus hope" debate.

In today's increasingly dynamic business world this self-describing capability provides hope against obsolescence. By providing a mechanism for maintaining systems that can communicate changes in semantics as part of the transmission of a document, XML provides a way to reduce the maintenance cost for solutions based on changing business requirements. One of the main justifications for introducing XML into a solution is that this adaptability enables the core solution to scale and perform despite changes in requirements or exceptions to processes.

However, this same capability contributes to the hype surrounding XML. Take the assumption that if an XML document is self-describing and the tags associated with the description are clear, then the system responsible for reading the document is able to understand the implications of the change. A common mistake is to assume that this description in the XML actually de- scribes the changes. In reality the change is only implicit - it's impossible to connote the intent of the change, understand the business context that required the change to occur, or automatically derive how a new data element should be handled.

This article looks at a specific business example, differentiates the context from the semantics, and discusses the issues around these differences with respect to namespaces, numeric values, and date/time stamps. By separating the semantics from the context, one can consider these issues with respect to maintenance costs to improve system performance and scalability.

The Problem Definition
Let's look at the communications associated with placing a request for a proposal, beginning with an enterprise seeking to purchase specific products from its suppliers. In the center of Figure 1 is the Enterprise that Betty Buyer works for. On the right, left, and bottom of Enterprise are potential suppliers of the widgets that Betty Buyer wants to purchase. Betty must submit a Request for Quote (RFQ) to each supplier before she makes the purchase. Betty's Enterprise manages the RFQ through a homegrown system.

To reach each supplier, the RFQ must be sent via the Exchange, then on to the appropriate supplier. To accomplish this, Betty, through her RFQ system, must know the name of each supplier within her namespace as well as how it resolves that name to the namespace of the Exchange. While Betty may know Sam's company as "Best Supplier," the Exchange may use "Bsupplier, Inc.," or the supplier's D&B number. Indeed, Best Supplier could participate either directly with the Enterprise, through the Exchange on the right of Figure 1, or as a participant in Enterprise's Private Exchange. In this case, Betty's RFQ system may need to maintain three different names for Best Supplier.

To fully understand the contextual issues, let's look at the RFQ Betty sends through an intermediary like this Exchange. Betty wants to send the information in this RFQ to three of the potentially hundreds of registered suppliers. However, she doesn't want to duplicate the effort of creating the RFQ in the Exchange's system for every one of the many RFQs she creates. If the information is in XML, the Exchange can import and convert the RFQ into its system. But Betty still must address the RFQ to the appropriate suppliers as they're referenced in the Exchange's namespace. The Exchange has to resolve the business context - the product, terms, and delivery date communicated in the RFQ.

Since every participant in this scenario has a different software solution, XML is the ideal choice for communicating and translating the semantic information in the RFQ because XML is system independent. Betty Buyer creates an RFQ on Enterprise's RFQ system, which in turn creates an XML document that's similar to the example in Listing 1. Betty Buyer sends this document to all suppliers, then waits for a response from each. The self-describing nature of the XML RFQ enables each receiving supplier to map the tagged pairs of the RFQ data to its own internal system representation so the supplier can process the RFQ and respond in kind. But this isn't as simple as it sounds. Let's see what happens with the namespace, numeric, and date information that have contextual differences between the business entities.

Namespaces - What's in a Name?
Every system manages its own namespace. Names of companies, people, and items are all stored and referenced locally. When this is done by systems that must share information, the practice creates significant challenges. Let's consider how it impacts Betty Buyer's ability to buy from Sam Supplier.

To force a business context, the larger exchanges state that the transactions must be created in their environment, so all participants use the processes and business context enforced by each exchange. Consider how this works. An exchange maintains a centrally managed catalog that resolves part number, description, and pricing namespace issues. The exchange also maintains its own unique company namespace, document namespace, and all the business processes that allow two companies to coordinate the buying, selling, shipping, invoicing, and sometimes even the exchanging of funds.

But each exchange represents only a fraction of the entire market, so the practice of suppliers participating in multiple exchanges has become rather common. And despite the rapid growth in the number of exchanges that are currently available, many large enterprises believe they gain a competitive advantage by maintaining their own private exchange. A private exchange allows the enterprise to define and enforce its own unique business process and forces its suppliers to comply with that process.

While participating in multiple exchanges (both public and private) may seem to resolve the immediate business issue of gaining maximum market exposure and control of the business process, it actually defeats the purpose of having an intermediated marketplace. The intermediated marketplace was supposed to enable a wide range of suppliers to bid on an enterprise's RFQ, and the RFQ was supposed to be sent only once. If each enterprise participates in multiple marketplaces while also building its own private marketplace, then the only way to deliver on the promise of the intermediated marketplace is to enable all exchanges (both public and private) to communicate with each other. But this reintroduces the original namespace resolution issue - only an order of magnitude more complex.

One solution is to explicitly state that all B2B XML documents include data elements that are tied to a specific namespace. For example, a company may reference internal specifications by URL. In the earlier example, Betty's listing would need to tag part numbers as belonging to the Exchange's catalog so the receiving system could call that catalog. To handle this issue, a namespace specification that's based on the Universal Resource Indicators (URI) standard has been suggested. While the URI can eliminate broken links and identify a link universally and unambiguously, their use significantly complicates the parsing of the XML document. When parsing an XML document that contains a namespace reference, the referenced link must be called. Performing just a single Internet link within an XML document would introduce significant delays; when multiple links are referenced, the parsing challenge becomes even more problematic.

Products - What Do You Want?
The namespace issue isn't limited to company names. It becomes an issue for every object. At times this is made even more difficult by business practices. For example, it's not unusual for a supplier to offer the same product at different prices in different markets. One may be a spot market for inventory overstock, another may be an industry-specific market operated by a consortium, while an individual enterprise might have its own contract with a preferred supplier that guarantees 20% off list price.

But a supplier is unlikely to use the same part number across every ex- change. This would make it too easy for buyers and competitors to track its pricing policies. Instead, in the case of our example, Best Supplier maintains a different catalog of products for each market. Some items use the same part number, while others don't. To make sure Betty is getting the best possible deal, she must submit the RFQ to all three markets. But to do this, she must also know what product number and description is used to reference this product within each market.

Numeric Values - How Much Do You Want?
A more basic contextual issue involves amounts. For example, let's say Betty Buyer has requested 10,000 wid- gets. If you look at Table 1, the Exchange catalog has one supplier selling widgets in bulk packs of 12. How does the conversion between these units occur? What's the mechanism for communicating optional units of measure in an XML transaction? If Betty is willing to buy an overage, she can get a better price. But none of this information is represented in the semantics of the XML document.

Again, there's a semantic approach to solving this contextual issue if standards are considered. Most specifications require that units of measure be optional elements in the definition. Some even use attributes to communicate the amount that's in the document. However, these semantic capabilities are often associated with the accompanying business context. For suppliers who agree to use these standards, such as Open Applications Group, Inc., or RosettaNet, the solution can leverage the standards-based approach that's considered and addressed through semantics that map the specific contextual issues. In Listing 2 we've added a UOM section to the XML that will allow for communication of the specific semantics.

Date and Time - When Do You Want It?
Finally, let's look at date and time as a function of the time zone you're operating in and the regional notation of the data format. The contractual issues associated with physical delivery create this challenge. For example, the suppliers responding to the RFQ may be communicating deliveries based on their time zone and region. Best Supplier is located on the West Coast, and their ability to deliver widgets, as stipulated in the RFQ, would require next-day shipping, so their shipping costs may be higher. By modifying the XML DTD to allow for separation of the elements of a date, we can address some of the issues associated with the date-specific semantics.

However, anything beyond simple transformations requires a date data type. The solution for date and numerical issues will be much simpler when XML schemas arrive. The draft specification will allow for multiple data formats that in turn would allow the schema-based semantics to address these issues.

As you can see in each of these examples, although XML allows for easy resolution of the semantic differences between the business entities, the business context presents a greater challenge. Once the contextual issue is well understood, it becomes clear that XML can solve problems only within a finite domain, one identified by a shared context. For those who believed the hype generated by XML, this limitation is disappointing. For those wrestling with how to communicate with business partners, XML continues to deliver incredible business benefits:

  • Flexibility:Look at the date format and the ability to configure time zones.
  • Extensibility: Extend semantics to allow for the communication of context through optional elements and attributes.
  • Ease of use: Changes in context can be communicated through a revised DTD without breaking the overall solution.

Just because a document is self-describing and solutions can correctly read the document, it doesn't necessarily follow that the same system can understand the context. Many exchanges know full well the issues of business context in trying to create a single integrated catalog of products. Think of the product code for widgets and how that's represented to a supplier internally versus how it's presented to a buyer bidding on widgets through an exchange that represents hundreds of suppliers.

There are B2B specifications that allow for optional information for each of these capabilities. But the introduction of a new business context makes the entire solution more difficult to maintain. Hopefully, these standards will consolidate to a few, since adhering to multiple standards drastically reduces the efficiencies promised by adopting XML into your solution.

Certainly, one key is the completion of the specifications under review by the WC3. Consider the ability of XML schemas to differentiate between numbers, dates, and text - if combined with the ability for an Xquery to calculate and transform content, one could see a tool set built on these ratified specifications that would allow for solutions to translate between business context just as XML is used to transform B2B transactions today. Additionally, initiatives like Universal Description Discovery and Integration (UDDI), which focus on the creation of namespaces in the Internet as opposed to the proliferation of additional flavors of the same solution, present shared namespaces where context can be published and shared by business entities. If we're to avoid the cynicism and backlash from our business sponsors when the XML-hype bubble breaks, we need to drive our solutions to minimize the propagation of multiple standards and dialects.

The final article in this series will focus on parsers. As the processing engine for XML, transaction parsers are core to the use of XML. We'll look at the scalability of these engines and their ability to efficiently handle the emerging dialects of XML. We'll also summarize the series.

If you'd like to discuss a particular aspect of this or any other topic, e-mail me at [email protected]

Glossary SE·MAN·TICS

  • Linguistics: The study or science of meaning in language forms.
  • Logic: The study of relationships between signs and symbols and what they represent. In this sense, also called semasiology.
  • Semantics: The meaning of a string in some language, as opposed to syntax, which describes how symbols may be combined independent of their meaning.

The part of a text or statement that surrounds a particular word or passage and determines its meaning.

The circumstances in which an event occurs; a setting.

The unique definition of companies and people within a business entity.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
If a machine can invent, does this mean the end of the patent system as we know it? The patent system, both in the US and Europe, allows companies to protect their inventions and helps foster innovation. However, Artificial Intelligence (AI) could be set to disrupt the patent system as we know it. This talk will examine how AI may change the patent landscape in the years to come. Furthermore, ways in which companies can best protect their AI related inventions will be examined from both a US and...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of San...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...