YOUR FEEDBACK
Chris Keene's Prescription for Curing the Java Flu
Pedro wrote: "Adobe and Microsoft are doing a far better job making their ...
SOA World Conference
Virtualization Conference
$200 Savings Expire May 16, 2008... – Register Today!


2007 West
GOLD SPONSORS:
Active Endpoints
Your SOA Needs BPEL for Orchestration
BEA
Virtualized SOA: Adaptive Infrastructure for Demanding Applications
Nexaweb
Overcoming Bandwidth Challenges with Nexaweb
TIBCO
What is Service Virtualization?
SILVER SPONSORS:
WSO2
Using Web Services Technologies and FOSS Solutions
Click For 2007 East
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TODAY'S TOP SOA & WEBSERVICES LINKS


Eliminating Redundancy in XML Using ID/IDREF

Digg This!

XML can be thought of as the "universal serialization of data." It provides a flexible, open approach for modeling data and sharing messages among business partners (or systems) in a consistent manner. XML provides the ideal solution to messaging in a B2B e-commerce infrastructure since it enables a loosely coupled design that can significantly lower a partner's barrier to entry.

While most users of XML utilize its hierarchical nature to define data, this article discusses possible approaches for eliminating redundant data within XML messages by employing the features of ID and IDREF to define a more relational approach to defining the data.

Reporting Data Is Redundant
Developing a common vocabulary for a single enterprise or a large B2B marketplace can be a daunting task. With XML the challenge of modeling commonly utilized data models in an open manner can be resolved. It's based on a formal W3C Recommendation and has extensive support from both users and IT vendors, thereby providing a stable, future-proof foundation for business applications. The technical and business benefits of utilizing an open, standards-based XML strategy are immense. Whenever possible, an enterprise's existing data models and semantics should be utilized to construct the markup language.

There are, however, several issues that must be addressed prior to effectively representing an organization's data models in XML. Common reporting and aggregate data sets frequently contain repetitive data such as name and address. These redundancies are usually carried over into the resulting markup. This approach leads to repetitive data structures - the result of which is poorly designed markup that doesn't make use of reusable data models. XML's ID and IDREF tokenized attributes enable designers to leverage relational modeling concepts and avoid creating redundant data structures.

ID is a specially defined attribute that uniquely represents an instantiation of an element within an XML message. The value of an ID attribute is similar to a primary key value in a relational database table in that no two ID attributes (for a given element) can have the same value. IDREF is another special XML attribute that references a previously defined ID value. IDREFS is similar to IDREF, but can point to several previously defined ID values, each separated by a space. Listing 1 displays a simple example of how ID, IDREF and IDREFS are typically used in an XML document.

As illustrated in Listing 1, ID, IDREF and IDREFS can be used to establish relationships within the data. They can also be used to represent data structures without utilizing nested elements. This approach enables developers to model data that lacks definite structure (such as a result set from a user-defined query).

Based on these concepts, a set of redundant elements can be effectively modeled using ID and IDREF attributes. A sample document with redundant elements appears in Listing 2. As this listing clearly illustrates, the Key, Firstname and Lastname elements appear in several locations throughout the document.

Listing 3 illustrates how this data can be modeled in a more effective manner using ID, IDREF and IDREFS.

Note that the code in Listing 3 establishes a data structure (Keys) that can be reused elsewhere in the document. This design reflects the relational approach to defining and utilizing translation tables to avoid many-to-many relationships (as seen in Figure 1).

Processing XML Using Xpath and ID/IDREF
Today's XML parsers don't make use of the ID/IDREF functionality as defined in this article. However, the W3C XPath specification ( www.w3.org/TR/1999/REC-xpath-19991116) provides facilities for identifying and selecting nodes based on their IDs. Additionally, IDREFs can automatically be retrieved from identified nodes and supplied directly as input into the ID function.

The XPath specification defines a set of core functions that must be available to all implementations of XPath processors. One of these functions operates over an XML document to provide a node-set result based on the id function. The id function takes a single string parameter identifying the node that defines the unique ID. Using the example in Listing 1, we could use the following XPath statement to retrieve the supervisor for a particular employee:

id("456") returns the SUPERVISOR node for Debbie Hamel
The input to the ID function can also be a white space-separated list of strings that represent multiple IDs and the resulting node set would include all nodes that have IDs matching those defined in the list.

Conclusion
As you can see by the differences in Listings 2 and 3, using XML to define data hierarchically can produce highly redundant elements. By using the method described in this article, it's possible to mix hierarchical and relational techniques within the same document, which results in a more usable and reusable XML document. In addition, this document will contain clearly recognizable elements that can be employed to describe information relative to data both within and external to the XML document.

About JP Morgenthal
JP Morgenthal, formerly coeditor-in-chief of XML-Journal and chief services architect at Software AG, is currently Managing Director of Ethink Systems, Inc. He has been writing and speaking about XML since 1997 and is an internationally prominent authority on XML, with more than 15 years of experience designing, developing, and analyzing software and technology.

About John Evdemon
John Evdemon, formerly coeditor-in-chief of XML-Journal, is an Architect with Microsoft's Architecture Strategy Team covering BPM, SOA and Internet Scale Computing. He is an XML and e-business expert, having served as CTO/Director of XML-Related Products for both a large integration platform vendor and a small XML-centric start-up. He has been working with XML since its early beginnings, is an Invited Expert with the W3C XML Core Syntax Working Group and has chaired several industry-specific XML initiatives.

Nayan wrote: Incomplete article. Can't find the listings referred in the article.
read & respond »
Sanjit Pandey wrote: I cant find the examples (Listing 1, Listing 2...) mentioned in the article.
read & respond »
XML JOURNAL LATEST STORIES . . .
EDI to XML: A Practical Approach
While EDI transactions account for most worldwide commercial activity, XML-based alternatives are beginning to gain traction. According to Forrester Research, stateful XML, stateless XML, and even flat file exchanges are all projected to grow at a faster rate than EDI over the next few
3rd International Virtualization Conference & Expo: Themes & Topics
From Application Virtualization to Xen, a round-up of the virtualization themes & topics being discussed in NYC June 23-24, 2008 by the world-class speaker faculty at the 3rd International Virtualization Conference & Expo being held by SYS-CON Events in The Roosevelt Hotel, in midtown
Red Hat Named "Platinum Sponsor" of Virtualization Conference & Expo
Red Hat is a trusted open source provider. Red Hat offers enterprise customers a long-term plan for building infrastructures on the quality and innovation of open source. Combining open source operating system platform, Red Hat Enterprise Linux, together with applications, management
JustSystems Contributes Key XBRL Rendering Technology to Financial Community
JustSystems announced that it is contributing intellectual property rights for its invention of eXtensible Business Reporting Language (XBRL) rendering technologies to XBRL International, the standards body responsible for the oversight of the XBRL specification. The invention, known a
JustSystems Launches Campaign for XBRL Success
JustSystems announced its campaign to help organizations adopt XBRL (eXtensible Business Reporting Language), the XML-based standard for communicating financial and business information. In related news, JustSystems also announced that it has contributed intellectual property rights of
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON FEATURED WHITEPAPERS


ADS BY GOOGLE
BREAKING XML NEWS
SAP Accelerates the Path to SOA for Customers
has led to customer requests for training and education involving SAP's proven design and de