| By PG Bartlett | Article Rating: |
|
| August 2, 2005 05:00 PM EDT | Reads: |
19,665 |
In Part 1 of this series we discussed some of the key problems of capturing and sharing information and in Part 2 we looked at the critical components of a solution: modularization, automation, and XML.
In part 3, we start getting technical - but in a nontechnical way. We examine the essential parts of building a solution, including developing data models (which are either DTDs or Schemas), designing stylesheets, and integrating various components of the solution.
Data Models? Why Would I Care About Data Models?
We're fond of history lessons in this column, so let's go back to the year 1900. At that time, the average worker in the United States earned about $500 a year and a bicycle cost $600. No wonder the men on old-fashioned bicycles wore a top hat and tails - you had to be rich to own one!
The reason for the high cost of bicycles is that they were handmade. They were built one at a time and required a tremendous amount of labor to handcraft the parts and painstakingly adjust them until they fit together. Repairs were expensive and time-consuming as well, since the parts had to be made and fitted by hand.
Twenty-five years later, a Model T from Ford cost $260 and median annual incomes had risen to over $1500. Within a single generation, manufactured transportation had gone from an impossible luxury to widely affordable.
Several manufacturing innovations were behind this remarkable change, including the moving assembly line, specialization of workers, and interchangeable parts. Later, automated machinery would replace the tedious, dangerous work that humans performed, further reducing costs, increasing quality, and expanding variety.
Applying these same principles - automation, specialization, and interchangeable parts - to the creation and sharing of information delivers the same kind of benefits. Whether you're planning to automate manufacturing or publishing, one of the keys is to design interchangeable parts that you can be confident will fit together easily when the time comes to assemble them.
That's where XML and data models come in. Whether a data model is based on a DTD (which stands for Document Type Definition) or a Schema (invented more recently than DTDs; see the article at www.arbortext.com/resources/xpn_june_03.html#tech for a comparison between DTDs and Schemas), the data model does for documents what design drawings do for interchangeable parts.
The data model describes all of the parts of a document along with the rules for how those parts may be combined. By following these rules when you create documents, software programs can automatically manipulate the documents later. Data models serve as the foundation of XML-based applications. All of the functionality in an XML publishing system rests on the data model. In most cases, if the data model changes, something else has to change as well. If you look at a data model in its raw form, whether it's a DTD or Schema, it looks scary. So we won't look. Instead, let's consider an abstract and highly simplified view in Figure 1.
You can see that a data model is like an organizational chart. The data model describes not only the parts (which we call "elements") of a document, such as chapter and section, but also the hierarchy (for example, a section always comes within a chapter).
You can also see that the data model prescribes the order of the elements. In the example above, the main parts of a Document are Foreword, Body, and Appendix, and they must come in that order.
Data models also prescribe how many of an element can appear, such as "exactly one," which you would want for the title of a chapter; "at least two," which you would want for the items in a list; or "any number," which you would want for paragraphs.
Gee, this seems pretty easy, doesn't it? That's only because we left out a lot of detail. So far, we showed examples of the organizational elements of a document, such as chapter and section. There are many reasons for capturing information into separate elements, such as:
- Organization (chapter and section) - We have already seen elements of this type, which prescribe the basic structure of the document. Documents may be books, articles, catalogs, datasheets, and so on, and each of these typically has some unique characteristics in its structure. For example, a book may have chapter elements while a catalog may have price elements.
- Formatting (emphasis) - Many elements exist only to make sure they have different formatting. For example, because emphasized words usually appear in italics, we designate an element for them such as emphasis. To provide a contrasting example, in virtually all cases we do not capture nouns or verbs as separate elements because we do not do anything different with them - they look the same as any other word in a sentence.
Even though XML separates content from formatting so that your information exists independent of the way it's presented, you must consider your formatting goals while you design your data model. Too many times, we have seen organizations finalize their data models only to find that their stylesheets become very complex and expensive to develop and maintain, or they have to spend more time and money to revise their data models later, or they fail to meet all of their formatting design objectives.
- Reuse (topic) - One of the most important benefits to be gained from an XML publishing system is the capability to reuse information in multiple documents. Approaches to reuse have varied widely, but one emerging best practice is to reuse information at a "topic" level rather than at a chapter or section level. Whether this works for you depends heavily on the specifics of your application, so this is a prime area for expert assistance.
Published August 2, 2005 Reads 19,665
Copyright © 2005 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
About PG Bartlett
PG Bartlett is vice president of product marketing at Arbortext, where he is responsible for corporate positioning, marketing strategy, and product direction. Bartlett joined Arbortext in 1994, bringing more than 18 years of experience in both technical and marketing positions at leading-edge high technology companies. He is a frequent presenter at major industry events and has been invited to speak and chair sessions at Comdex, Seybold Seminars, XML conferences, AIIM conferences, and others.
- AJAX World RIA Conference & Expo Kicks Off in New York City
- Ulitzer’s Amazing First 30 Days in Public Beta
- "Government IT Expo" to Highlight Cloud Computing and SOA
- Ulitzer vs. Ning - a Quick Review
- Improving the Efficiency of SOA-Based Applications
- Make Your Design Ideas Speak: Using UML in PowerBuilder Projects
- Ted Weissman and Lois Paul & Partners PR Firm
- SOA to Reduce Complexity?
- VMware Poaches CA Exec to Run Asia Pacific
- Cisco to Buy Tidal Software
- AJAX World RIA Conference & Expo Kicks Off in New York City
- Building the Right Project Team: The Rule of Five
- Ulitzer’s Amazing First 30 Days in Public Beta
- "Government IT Expo" to Highlight Cloud Computing and SOA
- DataDirect Data Integration Suite Features XQuery 4.0, XML Converters and Stylus Studio 2009
- Reducing Development Costs with SOA
- Macrovision White Paper Showcases Digital Entertainment Media
- Software AG Releases Tamino XML Server for SOA Interface
- Dajeil Launches Xerces/Xalan Hardware Accelerator for XML and SOA
- Ulitzer vs. Ning - a Quick Review
- AJAX World RIA Conference & Expo Kicks Off in New York City
- JSON vs XML - A Jason vs Freddie Sequel
- Processing XML with C# and .NET
- i-Technology Viewpoint: The Very Confused World of 3D and XML
- BPEL Processes and Human Workflow
- Open Source Database Special Feature: An Introduction to Berkeley DB XML
- "HP's Problem Ain't the SAP Install," Says Sun's Schwartz
- eXist - An Introduction To Open Source Native XML Database
- Digitizing the Planet: Google Earth vs MSN Virtual Earth vs MapQuest
- Product Review: Altova Enterprise Suite 2005







































