Personalization - To enable the assembly of information for different audiences, you will want to use components for assembly, which can range in size from a cell in a table to a chapter. While some of those components may already exist as separate elements, you may have to create others explicitly for the purpose of including them in or omitting them from a specific rendition of the document.
Validation (partnum) - In some cases, you may want to validate the content of an element against a database or other rules. For example, you may want to ensure that a Social Security number always be in the form of 3 digits, dash, 2 digits, dash, and 4 digits. Schemas are better at data validation than DTDs, but in either case you will need to put the information to validate in a separate element.
Searching (definition) - One of the potential benefits of XML is to improve the searchability of information. For example, you could improve the relevance of a search by allowing users to search for service information only within service procedures instead of searching across all document types.
Translation (productname) - To simplify or automate translation to other languages, you may want to designate terms such as product name or company name that are translated through a database or some other method instead of being translated manually.
Behavior (link) - Cross-references and hyperlinks not only look different than other elements, but they also have specific behaviors associated with them. You could also implement more elaborate behavior for an element, such as making footnotes on Web pages pop up when the user's mouse hovers over them. Regardless, you must specifically identify as elements those parts of a document to which you want to assign specific behaviors.
Other automation - For anything you need to automate, you may need to create separate elements so that you can write software to find and manipulate them.
Even though we started with the data model in this article, in reality you must consider your system architecture and specific choices of products as you create your data model to make sure you can accomplish all of your goals. Whole books have been written about the process of creating a data model. It takes days of concentrated discussion plus weeks of analysis to come up with a data model or a set of data models appropriate to your business. You will find more information about creating DTDs, including links to even more resources at www.arbortext.com/resources/xpn_july_03.html#tech.
Once you create a description of your data model, encoding it as a DTD or Schema is the easy part. However for creating your data models in the first place, we recommend that you enlist the aid of an expert as well as plan to invest some time in it yourself. It's just as easy to overdo a data model and make it too complex as it is to omit elements that you'll wish for later.
Thank Goodness Data Modeling Is Over!
What Comes Next?
There are several major areas of design and development in the creation of your XML publishing system. The following paragraphs highlight each of these areas.
Conversion - You will probably want to convert some of your existing word processing and desktop publishing files to XML. Because conversion tends to be expensive, you should only convert documents that you're continuing to improve or that you're planning to reuse/repurpose for new documents.
Conversion is expensive because it's complex: you're converting documents with a flat structure that are filled with inconsistencies to files with a hierarchical structure and absolute consistency.
Some documents are so complex and inconsistent that re-keying is the best approach. Arbortext partners such as Data Conversion Labs (www.dclab.com/) offer such services. On the other hand, your documents may be sufficiently consistent and simple that you can use software to convert automatically or semi-automatically, most likely requiring manual cleanup afterwards. Setting up this software requires creating a "map" between styles and tags. Arbortext, our partners, and other companies offer software and setup services, which the next article in this series will describe in more detail.
Stylesheets - Because an XML document contains no instructions for formatting its contents for viewing or printing, those instructions have to exist somewhere. That's where stylesheets come in. Stylesheets contain instructions for how to display or print each element (you can learn a lot more at www.arbortext.com/resources/xpn_sep_03.html#tech).
You'll need stylesheets both for editing and for publishing. Although you could ask your authors to create and edit XML without a stylesheet, they will put up stiff resistance because they will intensely dislike the loss of on-screen formatting because they have grown accustomed to seeing visual cues about the meaning of each element. In other words, they'll want to see the contents of an emphasis tag displayed on the screen in italics instead of just seeing the emphasis tag.
Without a Stylesheet
With a Stylesheet
This is <emphasis>very</emphasis> easy!
This is very easy!
You may need many different stylesheets. You will need one for each different XML editing tool that you use, for each type of document you produce, for each medium to which you publish that document, and for each format variation. For example, if you are producing both print and Web versions of a product catalog and a technical manual, and if you have to produce the print version on both 8 1/2 x 11 and A4 paper, you would need the following stylesheets:
- Editor stylesheet for catalog
- Editor stylesheet for technical manual
- Web stylesheet for catalog
- Web stylesheet for technical manual
- Print stylesheet for catalog for 8 1/2 x 11
- Print stylesheet for catalog for A4
- Print stylesheet for technical manual for 8 1/2 x 11
- Print stylesheet for technical manual for A4
About PG Bartlett PG Bartlett is vice president of product marketing at Arbortext, where he is responsible for corporate positioning, marketing strategy, and product direction. Bartlett joined Arbortext in 1994, bringing more than 18 years of experience in both technical and marketing positions at leading-edge high technology companies. He is a frequent presenter at major industry events and has been invited to speak and chair sessions at Comdex, Seybold Seminars, XML conferences, AIIM conferences, and others.
Two of the biggest
launches in Rich Internet
Application history took
place in 2007/2008 when
Adobe launched AIR 1.0 in
February '08 and
Microsoft launched
Silverlight (September
'07). At the 6th
International AJAXWorld
RIA Conference & Expo in
October SYS-CON Events is
delighted to be
SYS-CON's upcoming '3rd
International
Virtualization Conference
& Expo' faculty includes
such distinguished
speakers as: Al Aghili
(Managed Methods), Alan
Chhabra (Egenera), Andi
Mann (Enterprise
Management Associates),
Andrew Conte (APC), Andy
Astor (EnterpriseDB),
Ariel Cohen (Xsigo
XML is increasingly being
used as the language of
data exchange. An XML
document based on a DTD
or an XML Schema contains
data that conforms to a
standard structure. A
number of technologies,
such as ebXML (Electronic
Business XML), UDDI
(Universal Description,
Discovery, and Integrati
From Application
Virtualization to Xen, a
round-up of the
virtualization themes &
topics being discussed in
NYC June 23-24, 2008 by
the world-class speaker
faculty at the 3rd
International
Virtualization Conference
& Expo being held by
SYS-CON Events in The
Roosevelt Hotel, in
midtown
At the eleventh hour
Brazil, India and
Venezuela joined South
Africa in appealing ISO's
highly politicized
standardization of
Microsoft's Office Open
XML (OOXML) file format.
Meanwhile, the Danish
Open Source Business
Association has protested
the Danish Standard's
'yes' vote for stand
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice: