| By Adam Kolawa | Article Rating: |
|
| October 23, 2005 05:00 PM EDT | Reads: |
24,451 |
Since its inception XML has at times been seen as the cure-all for every problem related to Web applications and integration projects. However, poorly written XML can either slow down an integration project, or worse, cause the integration project to collapse.
When developing integration systems such as Web services or any other business-to-business function, developers may encounter the following problems when writing XML:
- Non-verifiable code - XML is supposed to be easily validated by use of Document Type Definitions (DTDs) or schemas. Frequently however, DTDs and schemas may be invalid themselves, too complicated for XML documents to reference, or even insufficient for most businesses. Therefore, there is no way to really guarantee that a certain XML file is valid if it does not reference a valid DTD or schema.
- Human-readable, yet ambiguous code - Although human readability can be seen as an advantage of XML, it can also be viewed as a problem. Human-readable code isn't always readable by humans. For example, an element that has a specific meaning to one developer may be of no use, or make no sense, to another developer. Also, human-readable does not necessarily mean machine-readable. If XML code is written strictly for machine consumption, then there is no reason for having code that makes sense to humans but has no meaning to a machine.
- Versioning problems - Maintaining multiple versions of a single document can be very difficult. Developers can either maintain a full version of the code to understand each XML format, or they can reference different DTDs for each format. Both options are possible but each requires a lot of time and effort.
- Vogue attitude toward XML - Many developers turn to XML simply because it is the popular language of the moment and they do not consider whether or not it is the right solution. More often than not, XML introduces more complexities than needed where a simpler text file would have sufficed.
- Chaos of standards - XML standards are still in development and are constantly shifting. Without any stability in XML standards, developers are forced to either keep up with the rapid changes, or fall behind.
Preventing the use of poorly written XML is more complicated than most developers realize. The key to successfully using XML in an integration project is first understanding the inefficiencies that may cause poorly written XML, and then applying a rule-based system that establishes policies that can be adhered to. This article will outline the many drawbacks of XML, and will address how a rule-based system can prevent the use of poorly written XML in integration projects.
Understanding XML
The Extensible Markup Language (XML) is a family of technologies that describe structured data. By using XML companies can create common information formats and share this information on the World Wide Web. For example, a company can create an XML document to exchange information about its products over the Internet. For a simple example of an XML document, see Listing 1.
XML and Its Inefficiencies
Although the example XML document in Listing 1 appears to be written correctly, how can developers be completely sure that the code is valid and well-formed, is comprehensible to other developers, and adheres to specific standards? The answer to this question lies in a rule-based system that can establish team policies and practices to prevent poorly written XML.
The following sections will outline some of the inefficiencies that can lead to problematic XML, and will address how a rule-based system can prevent the use of poorly written XML in integration projects. After all, system performance is only as good as the data received and the instructions given. If errors are contained in the XML, it is more likely than not that the system will crash.
Validating XML
One of the main benefits of XML is that it provides mechanisms for verifying document validity. There are two basic mechanisms for verifying document validity: DTD and XML Schema. For example, when creating an XML document developers can reference either of these mechanisms from within the document itself. The DTD or schema that is referenced will specify exactly how the XML document is to be processed, which elements and attributes are contained in the document, and the order in which these elements and attributes should be listed.
Defining DTDs
The following is an example of a simple DTD that can be referenced by an XML document:
<!-- ProductList DTD -->
<!ELEMENT ProductList (Product)*>
<!ELEMENT Product (#PCDATA)>
<!ATTLIST Product color
(red|green|yellow|weird) #REQUIRED
file CDATA #REQUIRED
id CDATA #REQUIRED
isFruit (true|false) 'true'>
To reference this DTD from an XML document, the following header can be added to the beginning of the XML document:
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE ProductList PUBLIC "-
//OnlineGrocer//ProductList//EN" "ProductList.dtd">
A DTD is a specification based on the rules of the Standard Generalized Markup Language (SGML) and provides basic verification of XML documents. DTDs provide mechanisms for expressing which elements are allowed and what the composition of each element can be. Legal attributes can be defined per element type, and legal attribute values can be defined per attribute.
Defining Schemas
For an example of a simple schema that can be referenced by an XML document, see Listing 2. To reference this schema from an XML document, the attribute in the element can be specified with the following header:
<ProductList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="ProductList.xsd">
An XML schema, like a DTD, defines a set of legal elements, attributes, and attribute values. However, XML schemas provide a more robust verification for XML documents. XML schemas are namespace-aware and also cover data types, data bounds, schema class inheritance, and context-sensitive data values - all of which are not covered by DTDs.
Lack of DTD/Schema Enforcement
While referencing DTDs or schemas can guarantee the validity of XML documents, there is no requirement that developers will use headers to reference DTDs or schemas at all. In fact, developers need only to follow simple syntax rules in order for an XML document to be "well-formed." However, a well-formed document is not necessarily a valid document. Without referencing either a DTD or a schema, there is no way to verify whether the XML document is valid or not. Therefore, measures must be taken to ensure that XML documents do, in fact, reference a DTD or schema.
Published October 23, 2005 Reads 24,451
Copyright © 2005 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Adam Kolawa
Adam Kolawa is the co-founder and CEO of Parasoft, leading provider of solutions and services that deliver quality as a continuous process throughout the SDLC. In 1983, he came to the United States from Poland to pursue his PhD. In 1987, he and a group of fellow graduate students founded Parasoft to create value-added products that could significantly improve the software development process. Adam's years of experience with various software development processes has resulted in his unique insight into the high-tech industry and the uncanny ability to successfully identify technology trends. As a result, he has orchestrated the development of numerous successful commercial software products to meet growing industry needs to improve software quality - often before the trends have been widely accepted. Adam has been granted 10 patents for the technologies behind these innovative products.
Kolawa, co-author of Bulletproofing Web Applications (Hungry Minds 2001), has contributed to and written over 100 commentary pieces and technical articles for publications including The Wall Street Journal, Java Developer's Journal, SOA World Magazine, AJAXWorld Magazine; he has also authored numerous scientific papers on physics and parallel processing. His recent media engagements include CNN, CNBC, BBC, and NPR. Additionally he has presented on software quality, trends and development issues at various industry conferences. Kolawa holds a Ph.D. in theoretical physics from the California Institute of Technology. In 2001, Kolawa was awarded the Los Angeles Ernst & Young's Entrepreneur of the Year Award in the software category.
- Publishing Synergy: Blog, Twitter and Ulitzer
- Will PR Firms Survive The New Media Avalanche?
- Typhoon Ondoy (Ketsana) Hits the Philippines (Part 2)
- Confessions of a Ulitzer Addict
- Cloud Computing Expo 2010 East to Attract More Than 5,000 Delegates in New York City
- Cloud Computing Journal Continues To Publish World's Best Cloud Analysts
- CIA Falls for Cloud Computing in a Big Way
- Are You Comfortable With Where Your Data Sleeps at Night?
- Dr. Leslie Lenert of CDC Speaks on Healthcare IT
- Game-Changing Innovations and the Evolving SOA Appliance
- What Happened To SOA?
- Instant Professionalism Online Despite Yourself...with Ulitzer
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- Publishing Synergy: Blog, Twitter and Ulitzer
- Will PR Firms Survive The New Media Avalanche?
- Typhoon Ondoy (Ketsana) Hits the Philippines (Part 2)
- Confessions of a Ulitzer Addict
- My Thoughts on Ulitzer
- Combining the Cloud with the Computing: Application Delivery Networks
- Cloud Computing Expo 2010 East to Attract More Than 5,000 Delegates in New York City
- Ulitzer vs. Ning
- Cloud Computing Journal Continues To Publish World's Best Cloud Analysts
- CIA Falls for Cloud Computing in a Big Way
- Are You Comfortable With Where Your Data Sleeps at Night?
- Where Are RIA Technologies Headed in 2008?
- AJAX World RIA Conference & Expo Kicks Off in New York City
- JSON vs XML - A Jason vs Freddie Sequel
- Processing XML with C# and .NET
- Has the Technology Bounceback Begun?
- BPEL Processes and Human Workflow
- The Top 250 Players in the Cloud Computing Ecosystem
- Open Source Database Special Feature: An Introduction to Berkeley DB XML
- "HP's Problem Ain't the SAP Install," Says Sun's Schwartz
- eXist - An Introduction To Open Source Native XML Database
- Digitizing the Planet: Google Earth vs MSN Virtual Earth vs MapQuest
- Generating XML from Relational Database Tables


































