|
|
YOUR FEEDBACK
SOA World Conference
Virtualization Conference $200 Savings Expire May 16, 2008... – Register Today! Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS XML Tips
Replace DTDs? Why?
By: Bob DuCharme
Digg This!
Of all the standards to accompany XML that are currently in progress at the W3C, few are more anxiously awaited than the Schema standard - the specification that provides an alternative to XML 1.0 DTDs as a way to describe a document's structure. But what's wrong with XML 1.0 DTDs? How many alternatives have been proposed, and by whom? Why didn't the W3C address these concerns in the original XML 1.0 specification instead of waiting until now? I'll answer those questions in this column, and in my next column we'll look at the current state of the W3C Schema Working Group's unfinished proposal.
What Can They Do?
What's wrong with these?
Weak Data Typing This wasn't a big deal in the SGML world because nearly every application was a publishing application. With XML's popularity in e-commerce development, data values like quantities and especially prices become more important. Although XML 1.0 offers a few types that help constrain attribute values, classic types such as integers, real numbers, Booleans and dates aren't among the choices, and application developers need them for element content as well as attribute values.
Document Structure Not Stored in an XML Document Since then, a revision to the SGML standard allows for legal SGML documents without the DTD declarations used to specify document structure - that is, to have what the XML world calls "well-formed documents." If an XML document with no DTD can still be a legal SGML document, then the primary reason for using SGML DTD syntax no longer applies. Another argument against specifying DTD structure with XML elements was that it would be confusing to include elements that describe other elements right in there with the elements that they describe. As it turned out, no one does this anyway; schema documents are always kept separate from the documents they describe, and documents point to their schemas with a processing instruction, a namespace declaration or some other mechanism. Using XML elements to describe document structures has several benefits. It makes these structures much easier to develop because you can use any XML editor to edit and manipulate them - and I mean any XML editor, even the lame ones that merely dump your document to a visual tree and then write that tree back out when you save your document. (Paragraphs of text like the ones you're reading here are very cumbersome to edit on such an editor, but a schema document is naturally treelike.) Application development is also easier for documents whose structure is stored in a well-formed XML document, because applications have easier access to information about document structure. SAX and DOM, the two current XML API standards, offer very little to an application that wants to check DTD information such as an attribute's declared type or whether a particular element is optional. With document structure definitions stored in a DOM tree or triggering the same SAX events that the document's elements trigger, an application can find out all it wants about that structure.
No Inheritance One of the great features of the object-oriented world is the ability to define data structures as extensions of existing structures. With a well-designed hierarchy of object classes inheriting from each other, simple changes can affect as much or as little of this hierarchy as you wish. Developers with object-oriented experience appreciate XML's ability to define and manipulate complex data stuctures, but they know that specifying every detail of every data structure from the ground up isn't the most efficient way to develop a system. They want a way to base a new element type on an existing one.
Potential Messiness of Parameter Entities To keep the design of complex DTDs modular and maintainable, internal parameter entities sometimes build on each other in multiple layers, leaving you with references to parameter entities that have parameter entity references themselves - and those may refer in turn to parameter entities that contain more parameter entity references. Because it's all implemented using string substitution, it can get messy quickly. Specialized data structures suited to each of these purposes would give developers more robust components to mix and match when building a document type's structure.
Weak Self-Documentation Facilities It's ironic that Java is better than XML at allowing automated documentation generation, for two reasons. First, a big factor in the popularity of SGML was the way it easily let developers create systems that automated the creation of print, Web, WinHelp and CD documentation. Second, the original idea for XML, like Java, came from Sun; it was Sun's Online Information Technology Architect Jon Bosak who put together the W3C Working Group that devised a simpler version of SGML that would work more easily over the Web.
Replacement Candidates A group of eight authors, five of whom worked for Microsoft or DataChannel (a Redmond company that's done a lot of XML work with Microsoft) submitted the XML-Data proposal to the W3C on January 5, 1998, making it the only proposal to predate XML's ascent to Recommendation status. A simplified version of XML-Data known as XML-Data Reduced, or XDR, was submitted to the W3C on July 3, 1998. On Microsoft's Web site XDR is also known simply as "schemas," with no mention of its full name, greatly adding to the confusion over schemas. Just remember that when Microsoft literature describes the use of schemas with IE5 or BizTalk, they mean XDR. Microsoft, IBM and independent consultant Tim Bray submitted the Document Content Description (DCD) schema proposal on July 31, 1999. It expresses document structure using the XML-based Resource Description Format (RDF). While neither Microsoft or IBM has shown any interest in following up with DCD or even RDF since then, Object Design's (now eXcelon Corporation) eXcelon product still uses the DCD format to store its own schemas. Before e-commerce software developers CommerceOne acquired Veo systems, developers at Veo submitted "Schema for Object-Oriented XML" (SOX) to the W3C on September 9, 1998. True to its full name, SOX makes mapping between element type declarations and object-oriented data structure definitions simpler and more straightforward than its predecessors do. The SOX proposal's frequent use of the term electronic commerce gives another clue about what kind of application development concerns drove its design. Finally, the xml-dev mailing list that gave the world the Simple API for XML (SAX, the standard event-driven API to XML documents) also submitted the Document Definition Markup Language, or DDML (also known as "XSchema" and "XSD" along the way), on January 19, 1999. Although no one ever implemented it, DDML indicated to the W3C where an important group of XML developers saw the priorities in schema language development. After receiving these proposals, the W3C took authors and editors from each of them and assembled a working group to put together their own schema proposal. After publishing a requirements document in February 1999, they released the first draft of their two-part proposal in May and the most recent in December. In my next article we'll take a look at some of the features in the W3C's proposal. XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||