| By Sean McGrath | Article Rating: |
|
| December 28, 2001 12:00 AM EST | Reads: |
9,682 |
It sounds so easy. First, get a bunch of people together who share a common need to interchange some type of data - say, invoices. Explain XML to them. Explain the significant technical benefits of having an industry standard schema for invoices.
Get technically minded individuals into a room with plenty of whiteboards and caffeine. Sometime later they'll emerge with a consensus model of what it is to be an "invoice" enshrined in some schema language (UML/XML Schema/DTD/RelaxNG/whatever).
Thereafter, all interested parties use the schema for data interchange and all is sweetness and light.
This makes 100% technical sense but it often doesn't work in the real world. The reasons it doesn't work have nothing to do with flavors of schema language or indeed flavors of markup language. It often doesn't work because of soft issues concerning people.
Let's start with Zipf's law and the Principle of Least Effort (http://cogsci.umn.edu/millennium/1109153206.html). Simply put, humans strive to do as little as possible to communicate. The language this article is written in, English, is literally riddled with structures that break the rules of English grammar - the schema - in the interests of quick and easy communication.
All human languages exhibit this phenomenon. Successful languages adapt to allow humans to cut corners in the interest of quicker communications. XML tag languages that don't offer similar functionality are asking for trouble. That trouble can manifest itself in a number of possible ways.
First, users may engage in "tag abuse" - using tags for purposes they weren't intended for because it makes their life easier.
Second, users may create point-to-point side agreements between themselves for simpler communications and convert their simple communications to the official schema only when forced to.
Third, users may lobby for "flexibility" that allows them to make local modifications to the industry standard schema, thus creating an entire family of mutually incompatible but similar languages that start as patois and grow into full-blown, mutually incompatible dialects.
Fourth, users may bring the standard initiative down. If this happens, everything from the coffee to the schema language can be blamed. Everything, that is, except noncompliance with Zipf's Principle of Least Effort.
Then there's the matter of organic schema growth and what I call the "tag bag syndrome."
Successful languages, and XML-based tag languages are no exception, need to exhibit the ability to change and evolve over time. Otherwise they atrophy and eventually die. With human languages we just change their grammatical structures and idioms without worrying about historical material fitting the new forms. In other words, we aren't worried about backward compatibility.
With XML tag languages we're typically very worried about backward compatibility. When we need to modify a schema to cater to a new phenomenon, we cannot allow previously valid documents to become invalid. As a result, we loosen the constraints in the schema. Over time, the gradual loosening of constraints erodes the tight control over structure the schema designers put there in the first place. The result is a bag of tags - structures of the form "X can consist of any number of A or B or C or...." In DTD terminology these are known as repeating OR groups.
This leads us nicely to the next soft issue: George Miller's law concerning the cognitive limits of human beings. If you present humans with a list of more than seven options to choose from at any one time, they'll start to feel overloaded and uncomfortable. When it comes to creating XML conforming to a "bag of tags schema," there can be an overwhelming feeling of drowning in a sea of options.
In the back office the software developer is similarly overwhelmed because the large number of choices at each point in the schema translates into a programmer's worst nightmare. known as an exploding state space.
Schema creators ignore Miller's rule at their peril. User and software developers alike can and will subvert the technically elegant designs in order to work within comfortable cognitive limits.
It's important to remember that the "group of people getting together to agree on a data interchange standard" phenomenon is not a new one. Stepping back just one generation, this went on wholesale in the SGML world - without much success. It wouldn't be good soft-issue tactics for me to mention any failed standards initiatives by name. You'll find plenty of evidence if you read through some history of the SGML years, especially the early-to-mid-'90s.
My contention is that the failed industry-standard schema initiatives of the past did not fail for technical reasons; they failed for human reasons. There is a rich lore of experience here that the new wave of XML schema designers could do worse than mine for valuable insights. Those that don't learn from the mistakes of the past truly are doomed to repeat them.
Apart from paying due regard to history, I think XML schema design needs to take a leaf out of the extreme programming book. Start with the customer (human), do the smallest thing than can possibly work, and start using it. Never lose sight of the human creating the XML content or the human writing software to process the content. Remember Miller's law. Remember Zipf's law. Allow the schema to grow/evolve organically over time. It will anyway, whether the schema designer likes it or not.
The only industry standard schemas still standing at the end of this decade will be the ones that address the soft issues.
Published December 28, 2001 Reads 9,682
Copyright © 2001 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Sean McGrath
Sean McGrath is founder and CTO of Propylon, one of Ireland's fastest growing software companies. Headquartered in Dublin, with development centers in Sligo, Ireland and Mumbai, India, Propylon delivers what it terms "industrial strength XML" and XML consultancy services to its service and product partners in Europe and the United States.
- Cloud People: A Who's Who of Cloud Computing
- Cloud Expo New York: Delivering Digital Marketing on the Cloud
- AWS Going into a New Line of Work
- Session Topics: 12th Cloud Expo / Cloud Expo New York
- Five Big Data Features in SQL Server
- How Bon-Ton Stores Align Business Goals with IT Requirements
- Amazon Cuts Prices on S3
- Cloud Conversations: AWS EBS, Glacier and S3 Overview | Part 2 S3
- Cloud Conversations: AWS EBS, Glacier and S3 Overview | Part 3
- Compuware Signs New APM Partnership
- Google Submits Concessions to EC; Gets Sued in the UK
- GenieDB Makes MySQL Web-Scale & Always Available
- Cloud People: A Who's Who of Cloud Computing
- Cloud Expo New York: Delivering Digital Marketing on the Cloud
- AWS Going into a New Line of Work
- Session Topics: 12th Cloud Expo / Cloud Expo New York
- Help Desk Solution Empowers Employees
- Five Big Data Features in SQL Server
- Big Data Is Not Just About Marketing: Don’t Forget the IT Department’s Needs
- How Bon-Ton Stores Align Business Goals with IT Requirements
- A Cloud-Based Testing Tool for the Budget-Minded
- Top Considerations for Your Hybrid Cloud Environment
- Componentizing Applications with Layered Architecture
- From ESBs to API Portals, an Evolutionary Journey | Part 2
- Where Are RIA Technologies Headed in 2008?
- Processing XML with C# and .NET
- AJAX World RIA Conference & Expo Kicks Off in New York City
- JSON vs XML - A Jason vs Freddie Sequel
- The Top 250 Players in the Cloud Computing Ecosystem
- Has the Technology Bounceback Begun?
- BPEL Processes and Human Workflow
- i-Technology Viewpoint: The Very Confused World of 3D and XML
- Generating XML from Relational Database Tables
- "HP's Problem Ain't the SAP Install," Says Sun's Schwartz
- Open Source Database Special Feature: An Introduction to Berkeley DB XML
- eXist - An Introduction To Open Source Native XML Database






















