Industrial IoT Authors: Pat Romanski, William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz

Related Topics: Industrial IoT

Industrial IoT: Article

Content Management Part 2

Content Management Part 2

The golden rule of a content management system is this: the day you take it to production is the day you start work on the next version of the system.

Content contributors will submit change requests for the document structures and input formats, publishers will ask for more metadata to enable more sophisticated delivery of content, and editors will ask for better workflow.

And you may discover major design flaws in the logic governing how the documents are supposed to interact. So you break open the definitions and trash the database. Or do you?

"Schema evolution? No need for it. I believe our users get it right first time," said the CEO of a leading supplier of content management software, at the Seybold Boston conference, April 2001.

Research and practical experience show that fault-tolerant XML and XML schema evolution are of critical importance to the successful development and management of any complex XML-based application, especially a content management system.

This article addresses document-centric content management systems, as used in corporate publishing, content syndication, and conventional publishing activities. Imagine the luxurious point at the beginning of the project when everything is still a clean sheet. It's day one. You've selected and purchased a content management system. It's still in the box.

Here's a list of the things that you now have to do:

  • Carry out an information-mapping exercise
  • Build the XML environment
  • Go from where you are now to where you want to be
     - Convert existing material
     - Rewrite as appropriate
     - Add new content
     - Go live and stay live!

    These steps become less critical the further along you get. The most important ones are the information-mapping exercise and the way you build the XML environment.

    At the beginning of the project, you have the power to build a system that allows you to evolve schemas and move with the changing requirements of your organization. Get the early stages wrong, and you'll find yourself locked into the kind of logic trap that there's no recovering from - version 37 of your DTDs and no room for any more changes.

    Carry Out the Information-Mapping Exercise
    In a previous article (XML-J, Vol. 2, issue 6), we explained that mapping the information in your organization means (1) charting what kinds of information the users of the system are expecting to receive and (2) how you intend to build that information at the authoring stage. An information map formally describes the following (see Figure 1):

    • Common information types
    • Reusable objects (textual or otherwise)
    • Architecture of the information system from an end-user's perspective
    • Architecture of the information system from a contributor's perspective
    • Fine-grained information topics (concept, example, illustration, table, task, purchase order, invoice, and so on)
    • Publication structures that knit topics together in an appropriate way
    • Usage patterns for typical end users
    • Procedures for contributors
    • Workflow
    The information map provides a blueprint for the whole system. A system builder or integrator should be able to build most of the content management system from the information map.

    Guiding Principles
    Your guiding principles should be to keep your documents small, to build a modular system, to reuse content and definitions as much as possible, and to create powerful metadata. Your content management system gives you the power to use single-source publishing techniques, so do that.

    Redundancy is dangerous. Think of your system as a highly tuned, normalized database, rather than as a place where documents are kept. Your application - the reason you're building this system in the first place - is like a well-built database application. The principles are the same. Data is enriched with metadata. Relationships are defined and enforced. Referential integrity constraints are factored in. Links are not allowed to die.

    Building the XML Environment
    From an XML point of view, the information-mapping process will provide a starting point for DTDs and schemas, a definition of which objects are reused and which are not, style considerations, transformation considerations, workflow, and so on. A system builder or integrator should be able to build most of the system from this map.

    The trick, however, is to implement the system using an iterative process of development. This means building the system in a way that allows you to have round-trip access to your starting point - the map or model - as opposed to locking you into the cascading model in which you can't get back to the starting point. This is a complicated subject that should not be treated lightly, and which I discuss later.

    User Interface
    From a content contributor's perspective, the user interface is equivalent to the visible part of any database application. It's in your power to design a very good user interface, so plan one at the information map stage. An example of this is to state that a content contributor should never have to explicitly choose a default value, or supply information that the system could obtain from profiles, or choose some other kind of variable.

    Another good example is to state that a fixed range of choices should always appear as a dropdown list if there are more than seven choices, and a listbox if less. These criteria affect the way the whole application is built (see Figures 2 and 3).

    In many cases you won't have the chance to influence the user interface without programming against the application programming interface (API) of the system you've purchased, which can quickly become prohibitively expensive. It's worth undertaking this analysis of your requirements before assembling the request for proposal (RFP) from the supplier of the content management system in the first place.

    Small Documents
    Small documents are concise topics with a discrete semantic value. For example, an illustration is something that works very well as a document type in its own right, and should therefore be defined as a separate document type. Storing the illustration as a separate, small topic allows you to easily reuse the illustration in another document, or endow it with functionality at runtime such as representing it as a thumbnail image that expands to a popup window when the user clicks on it.

    The alternative is to keep the illustration in the place where it is first used - in the surrounding body paragraphs of a large chapter, for example - and accept that you can't index it, reuse it elsewhere, or apply simple rules-based publishing to suppress it or convert it to a popup.

    Admittedly, you can write as many scripts and XSLT transformations as you like to achieve this functionality, and you can even hard-code publishing programs using the API of the content management system (if there is one), assuming that you have the time and skills in-house. But it's not a cost-effective approach. Maintenance is expensive, scalability is reduced, and bugs are more likely to occur.

    Remember, any content management system is only as versatile as the structures you impose on it. This means that the monolithic structures so typical of SGML environments (using such epic DTDs as Docbook) are not advisable. Your corporate database records are not efficient with hundreds of fields, and you should think of your documents in the same way. Build small building blocks and publish the larger things that you can construct with those building blocks.

    However, small topics increase the administration effort required of content contributors and publishers. You should also consider locking issues and transaction control when multiple authors are sharing material. On the other hand, small topics can significantly increase the quality, flexibility, and performance of the runtime system.

    It's advisable to find a sensible level of categorization when defining document types. Look for the commonality in your definitions and exploit it. Remember that every structure you define potentially requires specific naming when handling style, transformation, and other actions or properties.

    This increases the work required to go live, and decreases the maintainability of the system. Imagine, for example, that you're building a system for publishing a catalog for use in an online marketplace. Your catalog describes many different things, some of which are raw materials and some processed. You could choose to define separate document types for raw materials and processed materials, or you could define one for both and include an element or attribute that lets you differentiate between the two. The latter makes it easier to maintain.

    Imagine that you need to make a change to a DTD. An element (A) no longer provides the scope you need, and should be replaced by a sequence of two other elements (B, C). There are various ways of implementing this.

    You could drop the element A and replace it wherever it has been used with the sequence (B, C). You could redefine A to contain the child content (B, C) and nothing else (that is, not have the content be mixed). If your entire content consists of document instances using that DTD, your whole system will probably break. If element A occurs only in one small document type in a set of many other document types, most of the system will survive the change.

    Conversely, if element A is used in many other DTDs, you have a problem. How do you know where element A has been used? How do you identify in every DTD where A is declared that it's a reused object definition from another source?

    Analyzing the impact of a potential change to a complex XML environment is currently not a scientific process. You can search through DTDs and schemas for names of objects that you know are going to change. You can make a change in a DTD or schema and parse all the derived document instances and see what breaks. You can look for matching patterns in stylesheets, XSLT files, XML document instances, and so on.

    If you have a simple system and a small deployed set of documents, you can probably afford to spend some time searching and replacing, and cranking the existing document instances back in line with the new version of reality. If you have a large, complex system, especially one that can't afford any downtime, you have a serious problem. The solution is usually to move up a version and leave some legacy alive under an older version.

    Content management systems are notoriously unable to support change. Take DTDs, for example. Most content management systems associate a document with a DTD that is stored elsewhere in the system without explicitly understanding the DTD. The document is fed to the authoring software together with its DTD, and parsed when it's next checked back in.

    Manipulating the DTD in the content management system is not common functionality. How can the system understand the change that you need to implement?

    As Figure 4 shows, maintainable content management environments should implement the cyclical, or spiral, process of development. True evolution round-trips through the modeling phase to build on the good and throw away the bad.

    Fault-Tolerant XML
    Consider the primary benefits (in an XML sense) of object-oriented (OO) programming: polymorphism, encapsulation, and inheritance. Change element A at the source, and all references or extensions to it should automatically inherit that change, thus allowing a designer to make a change to a source definition and regenerate a complete environment at the touch of a magic button. This, alas, is not the case in the current product offering in content management.

    Managing an XML environment requires an OO approach to setting it up in the first place. Using true OO techniques at the design level, the element A (that is, object A) can exist in only one place. The design level should be an abstract, conceptual space where the rules that govern the use of the content management system can be recorded. Structures modeled in the design level can reference object A, but not copy in the original object and cut the link with the starting point.

    Such structures should be used for generating DTDs, schemas, and associated properties. The content management system shouldn't lose the link with the conceptual design level.

    Content management systems need to evolve. Unfortunately, changing the content model in any way for any schema usually means breaking all existing document instances. Fixing them can cost as much effort as the original implementation. This is because XML environments typically lock you into a linear process in which there is no intelligent modeling space in which your content models can be dealt with in a way that handles all the dependencies correctly.

    The moment you generate deployment files that use the definitions that you're expressing in XML (DTDs, stylesheets, Java classes, XSLT transformations, stored procedures, and so on), you're running the risk of locking yourself into just such a linear application development process. Certainly, if these deployment files can't be automatically generated, the process is already linear.

    As Figure 5 shows, linear development processes are expensive when you need to make changes. Ideally, systems evolve through cyclical design.

    A linear development process is the equivalent of what used to be known as the "cascading" or "waterfall" method of development, in which each phase hands over irrevocably to the next phase, and no return is possible. In other words, you design, then develop, then test, then deploy. If a fix is required, you start again.

    In a perfect system, a model-driven architecture allows you to use a cyclical development process to round-trip back to the design stage when change is needed, thus automating the regeneration of environments and the implementation of changes in the documents.

    Ideally, a designer should be able to make a change to an object in a conceptual design space, and then analyze the impact of that change to the other objects and to the deployed document instances. The designer should be able to automate the implementation of the change both in the way the environment has been built and in the deployed document instances themselves.

    Strictly speaking, the content management system should be able to create new DTDs on the fly according to a requirement from the outside world, and transform an entire database of content to fit the new DTD.

    Barbadosoft was founded to provide the XML infrastructure software to answer these needs. Truly fault-tolerant and maintainable XML should be an essential part of any XML application that deals with sources and data or document instances.

    The Barbadosoft solution takes the form of an "XML virtual machine," or programmable infrastructure, that manages the sources and deployed objects of an XML application through the key phases of the application's life cycle: design, development, deployment, and maintenance. Barbadosoft provides XML object modeling, impact analysis, change management, and infinitely extensible property sets in a rich model of interdependencies and relationships.

    In the absence of such infrastructure, you can help yourself by carefully considering each design decision and trying to imagine the impact of a future change before building the environment. If the repercussions of a change in a fine-grained, small-topic architecture are potentially too great, use larger document types. It's a fine balance. Monolithic DTDs generally reduce maintainability, and smaller document types increase maintainability.

    Go from Where You Are Now to Where You Want to Be
    Most people have to integrate new content management systems with large sets of existing data. Often, the new system is designed to eventually replace the old way of doing things, which means that the content of the old way of working needs to be migrated or rewritten into the new system. The cost of getting your existing data into the new content management system depends on what format it's in. Of course, if you have no existing data or other forms of legacy material to convert, your system can go live very quickly.

    A significant number of new users of XML-based content management systems come from an SGML background. Conversion from SGML to XML is relatively straightforward in terms of the documents and DTDs. (See James Clark's sx product at www.jclark.com/sp/sx.htm. After all, most SGML without a DTD is nearly well-formed XML already.

    Generally, most SGML converts to XML very well. SGML to XML is a "down translation," however (going from more complicated to less complicated), which means that you might need to make some decisions if the conversion encounters functionality in the SGML that's not supported in XML. You'll have problems with more obscure SGML constructs such as HyTime, and if this daunts you I would advise seeking professional help.

    If your existing data is in any structured database format, you're lucky. Many database systems nowadays support an XML export function, which gets you most of the way there. If such a function doesn't exist, serializing the data and converting it to XML should be an easy task for even the most humble writer of Perl scripts (or the equivalent). Once you have XML, a relatively straightforward series of transformations using XSLT should convert the data into the structure you've built for the new content management system.

    Converting unstructured data such as Adobe FrameMaker or Microsoft Word files is very difficult. The "Save As XML" functions of desktop publishing and word-processing software is so unreliable it's dangerous. Unless you've been very strict in the use of templates, paragraph-naming conventions, and styles, attempting to derive structure from the data can be almost impossible. And even if you have been strict, the chances of being able to match old document styles one-on-one into new XML document types are small.

    I discovered to my cost in a similar project (I had around 10,000 pages of FrameMaker files, roughly the same in WinHelp format, and a smattering of Word documents) that while it is possible to use various "Save As" formats, scripting languages, and other conversion tools, the time and rewriting required to make it work outweighed the advantages of the automation.

    In that example, the purpose of the project was to replace a traditional corporate technical publishing process with one driven by a content management system: do more with less, reduce costs, increase usability, and so on. The new way of authoring documents in the new system, coupled with the radically different way of accessing the information from a user's perspective, meant that almost all the older material deserved to be rewritten rather than blindly converted.

    Go Live and Stay Live!
    Whatever you do, don't try to do everything at once. Always keep the previous system running in parallel until everything has been tried and tested. If your content management system is complex, especially if it's playing a mission-critical role in a core aspect of your business, consider phasing in the introduction in a modular way.

    It's well worth trying to change at least one important DTD before going live, and seeing what happens. Your ability to future-proof the application will make or break the system when emergencies happen (such as being asked by Commerce One to move from xCBL 2.0 to xCBL 3.0, or 4.0). Write a set of procedures. Make sure that a delegate can understand and do the work.

    This brings me to my final piece of advice: document everything you do. Make sure that at the very least you have the following documents in place: the information map, the descriptions of the DTDs, and the procedures for keeping the system alive when you need to make changes. For change, you surely will.

  • More Stories By Jim Gabriel

    Jim Gabriel has authored tens of thousands of pages of technical documentation, ranging from entry-level tutorial material to programmers' reference manuals. He is literate in XML, SGML, and XSL, among others.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

    IoT & Smart Cities Stories
    There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
    Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
    At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
    Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
    BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
    The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
    With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
    DSR is a supplier of project management, consultancy services and IT solutions that increase effectiveness of a company's operations in the production sector. The company combines in-depth knowledge of international companies with expert knowledge utilising IT tools that support manufacturing and distribution processes. DSR ensures optimization and integration of internal processes which is necessary for companies to grow rapidly. The rapid growth is possible thanks, to specialized services an...
    At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
    Scala Hosting is trusted by 50 000 customers from 120 countries and hosting 700 000+ websites. The company has local presence in the United States and Europe and runs an internal R&D department which focuses on changing the status quo in the web hosting industry. Imagine every website owner running their online business on a fully managed cloud VPS platform at an affordable price that's very close to the price of shared hosting. The efforts of the R&D department in the last 3 years made that pos...