Welcome!

Industrial IoT Authors: Yeshim Deniz, Elizabeth White, Stackify Blog, SmartBear Blog, Liz McMillan

Related Topics: Industrial IoT

Industrial IoT: Article

Eliminating Redundancy in XML Using ID/IDREF

Eliminating Redundancy in XML Using ID/IDREF

XML can be thought of as the "universal serialization of data." It provides a flexible, open approach for modeling data and sharing messages among business partners (or systems) in a consistent manner. XML provides the ideal solution to messaging in a B2B e-commerce infrastructure since it enables a loosely coupled design that can significantly lower a partner's barrier to entry.

While most users of XML utilize its hierarchical nature to define data, this article discusses possible approaches for eliminating redundant data within XML messages by employing the features of ID and IDREF to define a more relational approach to defining the data.

Reporting Data Is Redundant
Developing a common vocabulary for a single enterprise or a large B2B marketplace can be a daunting task. With XML the challenge of modeling commonly utilized data models in an open manner can be resolved. It's based on a formal W3C Recommendation and has extensive support from both users and IT vendors, thereby providing a stable, future-proof foundation for business applications. The technical and business benefits of utilizing an open, standards-based XML strategy are immense. Whenever possible, an enterprise's existing data models and semantics should be utilized to construct the markup language.

There are, however, several issues that must be addressed prior to effectively representing an organization's data models in XML. Common reporting and aggregate data sets frequently contain repetitive data such as name and address. These redundancies are usually carried over into the resulting markup. This approach leads to repetitive data structures - the result of which is poorly designed markup that doesn't make use of reusable data models. XML's ID and IDREF tokenized attributes enable designers to leverage relational modeling concepts and avoid creating redundant data structures.

ID is a specially defined attribute that uniquely represents an instantiation of an element within an XML message. The value of an ID attribute is similar to a primary key value in a relational database table in that no two ID attributes (for a given element) can have the same value. IDREF is another special XML attribute that references a previously defined ID value. IDREFS is similar to IDREF, but can point to several previously defined ID values, each separated by a space. Listing 1 displays a simple example of how ID, IDREF and IDREFS are typically used in an XML document.

As illustrated in Listing 1, ID, IDREF and IDREFS can be used to establish relationships within the data. They can also be used to represent data structures without utilizing nested elements. This approach enables developers to model data that lacks definite structure (such as a result set from a user-defined query).

Based on these concepts, a set of redundant elements can be effectively modeled using ID and IDREF attributes. A sample document with redundant elements appears in Listing 2. As this listing clearly illustrates, the Key, Firstname and Lastname elements appear in several locations throughout the document.

Listing 3 illustrates how this data can be modeled in a more effective manner using ID, IDREF and IDREFS.

Note that the code in Listing 3 establishes a data structure (Keys) that can be reused elsewhere in the document. This design reflects the relational approach to defining and utilizing translation tables to avoid many-to-many relationships (as seen in Figure 1).

Processing XML Using Xpath and ID/IDREF
Today's XML parsers don't make use of the ID/IDREF functionality as defined in this article. However, the W3C XPath specification ( www.w3.org/TR/1999/REC-xpath-19991116) provides facilities for identifying and selecting nodes based on their IDs. Additionally, IDREFs can automatically be retrieved from identified nodes and supplied directly as input into the ID function.

The XPath specification defines a set of core functions that must be available to all implementations of XPath processors. One of these functions operates over an XML document to provide a node-set result based on the id function. The id function takes a single string parameter identifying the node that defines the unique ID. Using the example in Listing 1, we could use the following XPath statement to retrieve the supervisor for a particular employee:

id("456") returns the SUPERVISOR node for Debbie Hamel

The input to the ID function can also be a white space-separated list of strings that represent multiple IDs and the resulting node set would include all nodes that have IDs matching those defined in the list.

Conclusion
As you can see by the differences in Listings 2 and 3, using XML to define data hierarchically can produce highly redundant elements. By using the method described in this article, it's possible to mix hierarchical and relational techniques within the same document, which results in a more usable and reusable XML document. In addition, this document will contain clearly recognizable elements that can be employed to describe information relative to data both within and external to the XML document.

More Stories By JP Morgenthal

JP Morgenthal is a veteran IT solutions executive and Distinguished Engineer with CSC. He has been delivering IT services to business leaders for the past 30 years and is a recognized thought-leader in applying emerging technology for business growth and innovation. JP's strengths center around transformation and modernization leveraging next generation platforms and technologies. He has held technical executive roles in multiple businesses including: CTO, Chief Architect and Founder/CEO. Areas of expertise for JP include strategy, architecture, application development, infrastructure and operations, cloud computing, DevOps, and integration. JP is a published author with four trade publications with his most recent being “Cloud Computing: Assessing the Risks”. JP holds both a Masters and Bachelors of Science in Computer Science from Hofstra University.

More Stories By John Evdemon

John Evdemon, formerly coeditor-in-chief of XML-Journal, is an Architect with Microsoft's Architecture Strategy Team covering BPM, SOA and Internet Scale Computing. He is an XML and e-business expert, having served as CTO/Director of XML-Related Products for both a large integration platform vendor and a small XML-centric start-up. He has been working with XML since its early beginnings, is an Invited Expert with the W3C XML Core Syntax Working Group and has chaired several industry-specific XML initiatives.

Comments (2) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Nayan 03/24/08 10:59:51 AM EDT

Incomplete article. Can't find the listings referred in the article.

Sanjit Pandey 11/19/07 03:06:57 PM EST

I cant find the examples (Listing 1, Listing 2...) mentioned in the article.

IoT & Smart Cities Stories
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform and how we integrate our thinking to solve complicated problems. In his session at 19th Cloud Expo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and sh...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time t...
What are the new priorities for the connected business? First: businesses need to think differently about the types of connections they will need to make – these span well beyond the traditional app to app into more modern forms of integration including SaaS integrations, mobile integrations, APIs, device integration and Big Data integration. It’s important these are unified together vs. doing them all piecemeal. Second, these types of connections need to be simple to design, adapt and configure...
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and Bi...
Contextual Analytics of various threat data provides a deeper understanding of a given threat and enables identification of unknown threat vectors. In his session at @ThingsExpo, David Dufour, Head of Security Architecture, IoT, Webroot, Inc., discussed how through the use of Big Data analytics and deep data correlation across different threat types, it is possible to gain a better understanding of where, how and to what level of danger a malicious actor poses to an organization, and to determin...
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...
The Founder of NostaLab and a member of the Google Health Advisory Board, John is a unique combination of strategic thinker, marketer and entrepreneur. His career was built on the "science of advertising" combining strategy, creativity and marketing for industry-leading results. Combined with his ability to communicate complicated scientific concepts in a way that consumers and scientists alike can appreciate, John is a sought-after speaker for conferences on the forefront of healthcare science,...
Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by ...