Welcome!

XML Authors: Arthur Hefti, NeonDrum News, Katharine Hadow, Corey Roth, Bill Roth

Related Topics: XML

XML: Article

An Easy Introduction to XML Publishing - Part 3 of a Five-Part Series

Developing a new publishing system

In Part 1 of this series we discussed some of the key problems of capturing and sharing information and in Part 2 we looked at the critical components of a solution: modularization, automation, and XML.

In part 3, we start getting technical - but in a nontechnical way. We examine the essential parts of building a solution, including developing data models (which are either DTDs or Schemas), designing stylesheets, and integrating various components of the solution.

Data Models? Why Would I Care About Data Models?
We're fond of history lessons in this column, so let's go back to the year 1900. At that time, the average worker in the United States earned about $500 a year and a bicycle cost $600. No wonder the men on old-fashioned bicycles wore a top hat and tails - you had to be rich to own one!

The reason for the high cost of bicycles is that they were handmade. They were built one at a time and required a tremendous amount of labor to handcraft the parts and painstakingly adjust them until they fit together. Repairs were expensive and time-consuming as well, since the parts had to be made and fitted by hand.

Twenty-five years later, a Model T from Ford cost $260 and median annual incomes had risen to over $1500. Within a single generation, manufactured transportation had gone from an impossible luxury to widely affordable.

Several manufacturing innovations were behind this remarkable change, including the moving assembly line, specialization of workers, and interchangeable parts. Later, automated machinery would replace the tedious, dangerous work that humans performed, further reducing costs, increasing quality, and expanding variety.

Applying these same principles - automation, specialization, and interchangeable parts - to the creation and sharing of information delivers the same kind of benefits. Whether you're planning to automate manufacturing or publishing, one of the keys is to design interchangeable parts that you can be confident will fit together easily when the time comes to assemble them.

That's where XML and data models come in. Whether a data model is based on a DTD (which stands for Document Type Definition) or a Schema (invented more recently than DTDs; see the article at www.arbortext.com/resources/xpn_june_03.html#tech for a comparison between DTDs and Schemas), the data model does for documents what design drawings do for interchangeable parts.

The data model describes all of the parts of a document along with the rules for how those parts may be combined. By following these rules when you create documents, software programs can automatically manipulate the documents later. Data models serve as the foundation of XML-based applications. All of the functionality in an XML publishing system rests on the data model. In most cases, if the data model changes, something else has to change as well. If you look at a data model in its raw form, whether it's a DTD or Schema, it looks scary. So we won't look. Instead, let's consider an abstract and highly simplified view in Figure 1.

You can see that a data model is like an organizational chart. The data model describes not only the parts (which we call "elements") of a document, such as chapter and section, but also the hierarchy (for example, a section always comes within a chapter).

You can also see that the data model prescribes the order of the elements. In the example above, the main parts of a Document are Foreword, Body, and Appendix, and they must come in that order.

Data models also prescribe how many of an element can appear, such as "exactly one," which you would want for the title of a chapter; "at least two," which you would want for the items in a list; or "any number," which you would want for paragraphs.

Gee, this seems pretty easy, doesn't it? That's only because we left out a lot of detail. So far, we showed examples of the organizational elements of a document, such as chapter and section. There are many reasons for capturing information into separate elements, such as:

  • Organization (chapter and section) - We have already seen elements of this type, which prescribe the basic structure of the document. Documents may be books, articles, catalogs, datasheets, and so on, and each of these typically has some unique characteristics in its structure. For example, a book may have chapter elements while a catalog may have price elements.
  • Formatting (emphasis) - Many elements exist only to make sure they have different formatting. For example, because emphasized words usually appear in italics, we designate an element for them such as emphasis. To provide a contrasting example, in virtually all cases we do not capture nouns or verbs as separate elements because we do not do anything different with them - they look the same as any other word in a sentence.

    Even though XML separates content from formatting so that your information exists independent of the way it's presented, you must consider your formatting goals while you design your data model. Too many times, we have seen organizations finalize their data models only to find that their stylesheets become very complex and expensive to develop and maintain, or they have to spend more time and money to revise their data models later, or they fail to meet all of their formatting design objectives.

  • Reuse (topic) - One of the most important benefits to be gained from an XML publishing system is the capability to reuse information in multiple documents. Approaches to reuse have varied widely, but one emerging best practice is to reuse information at a "topic" level rather than at a chapter or section level. Whether this works for you depends heavily on the specifics of your application, so this is a prime area for expert assistance.

More Stories By PG Bartlett

PG Bartlett is vice president of product marketing at Arbortext, where he is responsible for corporate positioning, marketing strategy, and product direction. Bartlett joined Arbortext in 1994, bringing more than 18 years of experience in both technical and marketing positions at leading-edge high technology companies. He is a frequent presenter at major industry events and has been invited to speak and chair sessions at Comdex, Seybold Seminars, XML conferences, AIIM conferences, and others.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.